Driverless AI uses an inherently random genetic algorithm during feature evolution to find the best set of features and model parameters. In addition, early stopping is enabled by default which could add to the difference in behavior across iterations. 

If no validation dataset is provided, the training data is split internally to create internal validation holdout data. For accuracy <= 7, a single holdout split is used, and a “lucky” or “unlucky” split can bias estimates for small datasets or datasets with high variance. If a validation dataset is provided, then all performance estimates are solely based on the entire validation dataset (independent of accuracy settings).

If you would like to minimize the likelihood of the final model performance appearing worse than previous iterations, here are some recommendations:

  • Increase accuracy settings
  • Provide a validation dataset

If you think your model performance is still poor and very different from your previous iterations, please reach out to the support team and provide experiment logs for the two runs, so that we can help address the issue.