Goal:


This article is currently meant to be used as an internal document on what the multi-node Driverless AI feature of 1.10.4 is and how it can be used.  Driverless AI 1.10.4 does have the capability of running in multi-node.  It is, however, undergoing internal testing and will not be marketed externally until the testing has complete.  Once it has official completed testing, the pertinent points from this document will be moved to the external public-facing documentation.


This document is therefore to be used for:

  • help with internal testing
  • help for selected customers who need Driverless AI multi-node and are aware that it is still be internally tested
    • these customers should be approved by Product Management/Engineering first before being told how to use Driverless AI multi-node

Single User Multi-Node Clusters

What:

In 1.10.4, Driverless AI has added the ability to launch a single user multi-node Driverless AI cluster.  At a high level, this means that there is a master node and worker nodes.  The master node is a small, always on instance.  Worker nodes are created when experiments are run and the work is delegated to the worker nodes.


Why:

By allowing workers to be spun up to do the "work" of running experiments, there are several benefits.  

  • Reviewing insights from the leaderboard, MLI, diagnostics, etc. will be done on the master node which is much less expensive to run than a worker node
    • This means you do not need to spend a lot of money per hour wasting a large AWS instance when you are simply reviewing your MLI results 
  • User does not need to wait to start up an instance of Driverless AI to see their results, the server is always on 
  • When it is time to run experiments or perform other intensive steps, the work is done by more expensive worker nodes and once the work is done, the nodes are shutdown 
    • The results are sent to the master node where the user can then review them


How:


You can turn on multi-node Driverless AI in Steam by doing the following:


TBD