30. The gradient descent has been run for 15 iterations with learning rate a=0.3 and the corresponding loss function J (theta)is computed after each iteration. You find that the value of J (Theta) decreases quickly and then levels off. Based on thisobservation, which one of the following conclusion seems most plausible?
(A) Rather than using the current value of a, use a larger value of a (say a=1.0)
(B) Rather than using the current value of a, use a smaller value of a (say a=0.1)
(C) a=0.3 is an effective choice of learning rate
(D) Overfitting. Rather than using the current definition of J, a better loss function of J shall be chosen.
(E) None of the above

參考答案

無參考答案

內容推薦

內容推薦