Text Binary Count Transformer Recipe

This recipe creates simple text features and flags records when a specific token exists in the text.  The goal is to use Driverless AI to build an interpretable model.

To use the recipe

Step 1: Go to Expert Settings when setting up the experiment and click on the Recipes tab.  Then click on the "Include Specific Transformers" drop down menu.

Step 2: Select the custom recipe transformer called: "TextBinaryCountTransformer" and turn off all other Text transformers (anything that starts with the word "Text" should be deselected).  See below:

Step 3: Save the settings.  You should see the Experiment preview on the left-hand side updated, and the only Text transformer mentioned will be the "TextBinaryCountTransformer".

Step 4: Run the experiment.

This will result in an experiment whose features are simple - does a word exist in the text or not.  If you choose to interpret this model, you will be able to use features like Shapley to determine which words are important and how they affect the prediction.



This recipe is an interpretability recipe used to interpret the results of a Driverless AI model. The recipe calculates the average effect the token has on the prediction across all the records with that token.  If I remove the word "great" from any row where "great" existed in the text, what would be the average effect? Note: this is the average, not absolute average so the results will have a sign associated with them to indicate direction.

To Use Recipe

Step 1: Click on the MLI button at the top left of the screen.

Step 2: Click the "New Interpretation" button.

Step 3: Select the model and dataset used for your text experiment.

Step 4: Click on the recipes button and select only `NLP LOCO means per Column'

Step 5: Click on the wheel button at the right to modify any expert settings.  This is where you can choose how many n-grams to analyze.

Step 6: Click Done and Launch MLI.

This will generate a bar chart which shows the average LOCO effect.  Tokens with positive values indicate that the existence of the word in the text increases the predictions. Tokens with negative values indicate that the existence of the word in the text decreases the predictions.

The image below shows the results with bigrams and trigrams enabled.  We can see that the phrase `best variety` has the greatest positive impact on the prediction on average.

If we go to the last page of results.  We will see the words and phrases that had the largest negative impact on prediction.  In this case, the phrase `smashed packaging` has the greatest negative impact on the prediction on average.