This second module focuses on the idea of fashions scores, together with the take a look at rating and practice rating. These scores are then used to outline overfitting and underfitting, in addition to the ideas of bias and variance.
We’ll additionally see how you can examine mannequin’s efficiency with respect to their complexity and the variety of enter samples.
All pictures by writer.
In case you didn’t catch it, I strongly suggest my first submit of this collection — it’ll be means simpler to observe alongside:
The primary idea I need to discuss are practice rating and take a look at rating. The rating is a approach to numericaly specific the efficiency of a mannequin. To compute such efficiency, we use a rating perform, that aggregates the “distance” or “error” between what the mannequin predicted versus what the bottom fact is. For instance:
mannequin = LinearRegressor()
y_predicted = mannequin.predict(X_test)
test_score = some_score_function(y_predicted, y_test)
In sklearn, all fashions (additionally known as estimators) present an excellent faster approach to compute a rating utilizing the mannequin:
# the mannequin will computed the expected y-value from X_test,
# and evaluate it to y_test with a rating perform
test_score = mannequin.rating(X_test, y_test)
train_score = mannequin.rating(X_train, y_train)
The precise rating perform of the mannequin will depend on the mannequin and the type of downside it’s designed to unravel. For instance a linear regressor is the R² coefficient (numerical regression) whereas a support-verctor classifier (classication) will use the accuracy which is basicaly the variety of good class-prediction.