Build Your Own Scikit-learn Transformer

2 min readAug 15, 2021

--

Using a pipeline to transform data will allow you to use the preprocessing step as a tunable hyperparameter during grid search. We do this by creating a class.
The custom transformer class should inherit the BaseEstimator and TransformerMixin class.
In this example, we will use the titanic data set. We aim to extract a new feature by combining two columns. We will combine the siblings/spouses and the parents/children columns, add both to extract a new feature family size.

Let’s look at the following class:

class CustomTransformer(TransformerMixin, BaseEstimator):
    def __init__(self, combine_sibsp_parch=True):
        self.combine_sibsp_parch = combine_sibsp_parch
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        if self.combine_sibsp_parch:
            return np.c_[X, X.sibsp + X.parch]
        else:
            return X

def __init__()
The initialization method has the parameter combine_sibsp_parch.
def fit()
The fit method returns the instance.
def transform()
The transform method is where the transformation happens. In this case, we add both columns, add the result to the rest of the columns. However, if combine_sibsp_parch = False, we return the data unaltered.

np.c_[X, X.sibsp + X.parch]

Now, we create a pipeline and call the transform method on X_train. We can see that the number of the columns went from 13 to 14.

pipe = make_pipeline(CustomTransformer(combine_sibsp_parch=True))titanic_tr = pipe.transform(X_train)X_train.shape       # => (1047, 13)titanic_tr.shape    # => (1047, 14)

Finally, assert our result.

check_res = titanic_tr[:, 4] + titanic_tr[:, 5] == titanic_tr[:, -1]

The length of check_res is 1047. Using alien technology, we can reduce the vector to a single representative value.

reduce(lambda a, b: a == b, check_res)    # => True

You can find the complete code here:

https://github.com/booletic/medium/blob/89db8b01af267255f1ffad125892ff076c871152/mixin.ipynb

Reference

Build Your Own Scikit-learn Transformer

Written by Mansoor Aldosari