Получение градиентного спуска для работы в octave. (Курс машинного обучения Эндрю Нг, упражнение 1)

#machine-learning #octave #gradient-descent

#машинное обучение #octave #градиентный спуск

Вопрос:

Я пытаюсь реализовать / решить первое упражнение по программированию из курса машинного обучения Эндрю Нга на coursera. У меня возникли проблемы с реализацией линейного градиентного спуска (для одной переменной) в octave. Я не получаю те же значения параметров, что и в решении, но мои параметры идут в том же направлении (по крайней мере, я так думаю). Так что, возможно, где-то в моем коде ошибка. Может быть, кто-то, у кого больше опыта, чем у меня, может просветить меня.

 function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

theta1 = theta(1);
theta2 = theta(2);

temp0 = 0;
temp1 = 0;

h = X * theta;
for iter = 1:(num_iters)

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
    temp0 = 0;
    temp1 = 0;
    for i=1:m
        error = (h(i) - y(i));
        temp0 = temp0   error * X(i, 1));;
        temp1 = temp1   error * X(i, 2));
    end
    theta1 = theta1 - ((alpha/m) * temp0);
    theta2 = theta2 - ((alpha/m) * temp1);
    theta = [theta1;theta2];

    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end
end

Мои ожидаемые результаты для упражнения 1 с тэтой, инициализированной с помощью [0;0], должны быть для theta1: -3.6303 и для theta2: 1.1664

Но я получаю на выходе theta1 равно 0,095420, а thetha2 равно 0,51890

вот фотография, на которой вы можете увидеть мою функцию

Это формула, которую я использую для линейного градиентного спуска.

это моя формула для линейного градиентного спуска

ПРАВКА1: отредактированный код. Теперь я получил для theta1:

87.587

И для theta2

979.93

1.Во внутреннем цикле for вы заменяете temp0 temp1 m времена и, а затем просто используете последнее значение

2. спасибо, я думаю, что это может быть ошибкой. Я совершенно не понимал, что я такой глупый. Большое вам спасибо.

Ответ №1:

Теперь я знаю, в чем была моя проблема. Я собираюсь быстро описать это для всех, кто может быть заинтересован в этом. Итак, я случайно вычислил avriable за h пределами моего цикла. Таким образом, каждый раз в цикле он вычисляется с одним и тем же значением.

Вот исправленный код:

 function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

theta1 = theta(1);
theta2 = theta(2);

temp0 = 0;
temp1 = 0;
error = 0;

for iter = 1:(num_iters)
    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %

    h = X * theta; %heres the variable i moved into the loop

    temp0 = 0;
    temp1 = 0;
    for i=1:m
        error = (h(i) - y(i));
        temp0 = temp0   (error * X(i, 1));
        temp1 = temp1   (error * X(i, 2));
        %disp(error);
    end
    theta1 = theta1 - ((alpha/m) * temp0);
    theta2 = theta2 - ((alpha/m) * temp1);
    theta = [theta1;theta2];

    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end
end

1. является ли команда sum() избыточной в приведенном выше примере, поскольку значение theta показывает суммирование по прогнозируемой ошибке??

2. @AbhishekChoudhery Я не понимаю, что вы имеете в виду.