Fit Nonlinear Data with a Linear Model!
Fitting nonlinear data with a linear model is a technique called Polynomial Regression. The intuition is that the model will have a higher degree of freedom to fit the data.
First, we generate the data (note that y is a quadratic function of X):
m = 100
X = 9 * np.random.rand(m, 1) - 7
y = X**2 + 3*X + 5 + np.random.randn(m, 1)
The linear regression model (without Polynomial features):
reg = LinearRegression()
Adding polynomial features (X → X, X**2):
poly= PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
The first five samples from X:
The first five samples from X_poly:
[-4.63090189, 21.4452523 ],
[ 0.23522634, 0.05533143],
The linear regression model (with Polynomial features):
reg.intercept_, reg.coef_ #--> 4.84, 3.04, 1.01
The models’ coefficients are almost identical to y.
This trick has many applications in machine learning (such as Support Machine Vectors). However, polynomial features can cause over-fitting. The solution is to use grid search to pick the optimal parameter for the polynomial feature function.
Bonus: The grid search implementation is in the link below: