Setting up CAM for distributed computing

(Login)

CAM supports distributed computing for simulation experiments. This facilitates large, computationally-expensive experiments that otherwise would not be possible. (Note this feature does not currently support standard Monte-Carlo simulations, only simulation experiments)

It is quite easy to set up:

How to set up a simulation cluster

To set up a simulation cluster, you must decide which computers will be 'servers' (ie, will run simulations) and which will be the client (ie., that you will use to configure and manage simulation jobs). Simulation jobs are co-ordinated via files in a shared folder. All the computers that participate must therefore have (read and write) access to the same shared folder. It is not intended for remote shared folders, because the protocol requires frequent file read and write operations.

On each computer that you wish to be a server:

  • Make sure that the toolboxes and palettes for the simulation you wish to distribute are installed
  • Start CAM normally
  • Under the 'Tools' menu, select the option 'Start simulation server'.
  • Enter the name of the shared folder (as visible to that computer).
  • Press the 'start server' button.

How to distribute simulation experiments over a cluster

  • On the client machine, push the button 'distribute over cluster' which appears after configuring and starting a simulation experiment.
  • A dialog will appear, allowing you to enter the path to the shared folder (as visible to to the client machine) and discover available servers.
  • After clicking the 'Distribute job' button, the progress dialog will be altered to allows the progress of all machines in the cluster to be monitored. Note it may take some seconds for the distributed job to start up.

Notes:

  • All computers must run the same version of CAM, including the same palettes and plugins.
  • The distributed computing carries a significant setup overhead. It is only worthwhile to distribute an experiment that takes several minutes or more.
  • All computers will use almost all their processor power to run simulations, and cannot be used for other tasks at the same time.
  • All simulation data will be lost if any server or the client is interrupted, runs out of memory, etc, while running a distributed job.
  • The CAM server applications will occasionally need restarting between jobs, for instance if a job is interrupted. Do this when a server no longer appears to respond when viewed from the client machine.