# Fit Nonlinear Data with a Linear Model!

Fitting nonlinear data with a linear model is a technique called Polynomial Regression. The intuition is that the model will have a higher degree of freedom to fit the data.

First, we generate the data (note that **y** is a quadratic function of **X**):

`m = 100`

X = 9 * np.random.rand(m, 1) - 7

y = X**2 + 3*X + 5 + np.random.randn(m, 1)

The linear regression model (*without* Polynomial features):

`reg = LinearRegression()`

reg.fit(X, y)

Adding polynomial features (**X** → **X**, **X**2**):

`poly= PolynomialFeatures(degree=2, include_bias=False)`

X_poly = poly.fit_transform(X)

The first five samples from **X**:

`>>> X[:5]`

array([[-0.63502308]

[-6.87887923],

[-4.63090189],

[ 0.23522634],

[-5.11050991]])

The first five samples from **X_poly**:

`>>> X_poly[:5]`

array([[-0.63502308, 0.40325431],

[-6.87887923, 47.31897949],

[-4.63090189, 21.4452523 ],

[ 0.23522634, 0.05533143],

[-5.11050991, 26.11731159]])

The linear regression model (*with* Polynomial features):

`reg.fit(X_poly, y)`

`reg.intercept_, reg.coef_ #--> 4.84, 3.04, 1.01`

The models’ coefficients are almost identical to **y**.

This trick has many applications in machine learning (such as Support Machine Vectors). However, polynomial features can cause over-fitting. The solution is to use grid search to pick the optimal parameter for the polynomial feature function.

**Bonus**: The grid search implementation is* in the link below:*