Running parallel simulations

Where the underlying simulator supports distributed simulations, in which the computations are spread over multiple processors using MPI (this is the case for NEURON, NEST and PCSIM), PyNN also supports this. To run a distributed simulation on eight nodes:

$ mpirun -np 8 -machinefile ~/mpi_hosts python myscript.py

Depending on the implementation of MPI you have, mpirun could be replaced by mpiexec or another command, and the options may also be somewhat different.

For NEURON only, you can also run distributed simulations using nrniv instead of the python executable:

$ mpirun -np 8 -machinefile ~/mpi_hosts nrniv -python -mpi myscript.py

Additional requirements

First, make sure you have compiled the simulators you wish to use with MPI enabled. There is usually a configure flag called something like “--with-mpi” to do this, but see the installation documentation for each simulator for details.

If you wish to use the default “gather” feature (see below), which automatically gathers output data from all the nodes to the master node (the one on which you launched the simulation), you will need to install the mpi4py module (see http://mpi4py.scipy.org/ for downloads and documentation). Installation is usually very straightforward, although, if you have more than one MPI implementation installed on your system (e.g. OpenMPI and MPICH2), you must be sure to build mpi4py with the same MPI implementation that you used to build the simulator.

Code modifications

In most cases, no modifications to your code should be necessary to run in parallel. PyNN, and the simulator, take care of distributing the computations between nodes. Furthermore, the default settings should give results that are independent of the number of processors used, even when using random numbers.

Gathering data to the master node

The various methods of the Projection and Projection classes that deal with accessing recorded data or writing it to disk, such as getSpikes(), print_v(), etc., have an optional argument “gather”, which is True by default.

If gather is True, then data generated on other nodes is sent to the master node. This means that, for example, printSpikes() will create only a single file, on the filesystem of the master node. If gather is False, each node will write a file on its local filesystem. This option is often desirable if you wish to do distributed post-processing of the data. (Don’t worry, by the way, if you are using a shared filesystem such as NFS. If gather is False then the node ID is appended to the filename, so there is no chance of conflict between the different nodes).

Random number generators

In general, we expect that our results should not depend on the number of processors used to produce them. If our simulations use random numbers in setting-up or running the network, this means that each object that uses random numbers should receive the same sequence independent of which node it is on or how many nodes there are. PyNN achieves this by ensuring the generator seed is the same on all nodes, and then generating as many random numbers as would be used in the single-processor case and throwing away those that are not needed.

This obviously has a potential impact on performance, and so it is possible to turn it off by passing parallel_safe=False as argument when creating the random number generator, e.g.:

>>> import pyNN.neuron as sim
>>> np = sim.num_processes()
>>> node = sim.rank()
>>> from pyNN.random import NumpyRNG
>>> rng = NumpyRNG(seed=249856, parallel_safe=False,
                   rank=node, num_processes=np)

Now, PyNN will ensure the seed is different on each node, and will generate only as many numbers as are actually needed on each node.

Note that the above applies only to the random number generators provided by the PyNN random module, not to the native RNGs used internally by each simulator. This means that, for example, you should prefer SpikeSourceArray (for which you can generate Poisson spike times using a parallel-safe RNG) to SpikeSourcePoisson, which uses the simulator’s internal RNG, if you care about being independent of the number of processors.