Goal

The goal of this document is to show how to get an Individual Recipe from an experiment and why you would want to use it.


Overview


Since Driverless AI 1.10.2, you will have the ability to generate an Individual Recipe from an experiment.  This creates auto-generated, editable Python code of the final model selected in the experiment.


To create the Individual Recipe, click on Tune Experiment -> Create Individual Recipe -> Upload As Custom Recipe


You can see these steps being done below:




Once the experiment is uploaded, you can edit the Python code and click "Save as New Recipe and Activate".  This will test the recipe for soundness and make it available in the Recipes menu.


If you build a new experiment with this recipe used as the Individual Recipe, no experiment evolution will occur.  Instead the final model will be rebuilt (with any customizations done by editing the code).  Note: If the original experiment generated a MOJO, the experiment using the recipe will also generate a MOJO.


Here is an example of using an Individual Recipe in an experiment:



Note: In 1.10.3, there will be a button from the Recipe view to Create an Experiment from that Recipe.


Uses


In this section, we will cover why someone would want to have this auto-generated Python code.


Transparency

The auto-generated Python code gives a much greater level of transparency into the final model.  The user will see the final model and all the model parameters of that model.  


See this by looking for the set_model function:




They will also see each feature that is in the model and information on how it was transformed.


See this by looking for the set_genes function:




Model Control


The get_model function provides information on the algorithm and parameters of the final model.  If the end user would like to make small modifications to the parameters, they are able to do this by editing the self.model_params dictionary.  This can be helpful if the user would like to see if slight changes may result in more robust or more accurate models or if they are required to change the model parameters for business/regulatory reasons.



Feature Control

The get_genes function provides information on the features that went into the final model.  Each feature that is used in the model is listed in the function, starting from features that were not engineered and followed by engineered features.


Original features are denoted by the OriginalTransformer as shown below.  These will show up first.

# Gene Normalized Importance: 0.62291
# Transformed Feature Names and Importances: {'0_Age': 0.6229130029678345}
# Valid parameters: ['num_cols', 'random_state', 'output_features_to_drop', 'labels']
params = {'num_cols': ['Age'], 'random_state': 730763716}
self.add_transformer('OriginalTransformer', col_type='numeric', gene_index=0, forced=False, mono=False, **params)


Engineered features will be below the original features.  Here is an example of an engineered feature where Target Encoding was applied to the columns: NumCompaniesWorked and YearsWithCurrManager.


# Gene Normalized Importance: 0.17195
# Transformed Feature Names and Importances: {'31_NumToCatTE:NumCompaniesWorked:YearsWithCurrManager.0': 0.17194800078868866}
# Valid parameters: ['num_cols', 'bins', 'num_folds', 'cv_type', 'inflection_point', 'steepness', 'min_rows', 'multi_class', 'random_state', 'output_features_to_drop', 'labels']
# Allowed parameters and mutations (first mutation in list is default): {'bins': [25, 10, 100, 250], 'num_folds': [5], 'random_state': [42], 'cv_type': ['KFold'], 'inflection_point': [10, 20, 100], 'steepness': [3, 1, 5, 10], 'min_rows': [10, None, 20, 100], 'multi_class': [False]}

params = {'bins': 100,
                  'cv_type': 'KFold',
                  'inflection_point': 20,
                  'min_rows': 10,
                  'multi_class': False,
                  'num_cols': ['NumCompaniesWorked', 'YearsWithCurrManager'],
                  'num_folds': 5,
                  'random_state': 3582695529,
                  'steepness': 1}
self.add_transformer('NumToCatTETransformer', col_type='numeric', gene_index=31, forced=False, mono=False, **params)

The end user may want to delete a feature, add a new feature, or modify an existing feature.  The reasons to do these steps and how to do them or discussed below.


Deleting Features


There is a Drop Columns setting in the Experiment setup which allows you to drop columns from an experiment so they are not used by any model.  A user, however, may want to use a column but may not like how it was engineered by Driverless AI.


For example, I may want Driverless AI to use the column: JobLevel but I do not like the feature JobLevel divided by YearsWithCurrManager.  This may not make business sense, it may not be approved from a regulatory perspective, or it may be viewed as unnecessarily complex.


 In this case, I can simply delete the feature I do not like from the editable Python code.


Adding Features


During the experiment, Driverless AI uses a Genetic Algorithm to determine which features should be dropped from the model.  A user, however, may want to force a column to be used by the model for business reasons.  


Here are instructions for how to do this:


If you want to force in a Numeric column in that was dropped by Driverless AI:

  1. Copy an OriginalTransformer feature that is already in the code and paste it below
  2. Change the 'num_cols' to the column you want to force in.  
    • In the example below, Driverless AI dropped 'YearsSinceLastPromotion' so I copied an OriginalTransformer example that was already there and edited the 'num_cols' value.
  3. Change the forced parameter to True - this means the model has to use the feature.
  4. Change the gene_index to some high number.
    • The gene_index of each feature needs to be unique.  The gene_index of the features in the code starts at 0 and goes up sequentially.  You need to make sure that your new feature has a gene_index that is not the same as any other gene in the code.

Here is an example of what my final code will look like:

params = {'num_cols': ['YearsSinceLastPromotion'], 'random_state': 730763716}
self.add_transformer('OriginalTransformer', col_type='numeric', gene_index=100, forced=True, mono=False, **params)


Modifying Features


Driverless AI automatically creates engineered features.  Engineered features will have a list of parameters that are specific to the transformer.  These can be modified by the end user, however, they are internal parameters so it would require H2O guidance about how and why it would need to be modified.


There is an option to enforce monotonicity with the mono command.  This parameter can enforce monotonicity by some direction.  For example, mono=1 means the target needs to have a monotonically increasing relationship with the prediction.