sssched is a program that launch N tasks on M computers throught a tunnel. It is designed for working in a Unix environment. This is very useful to launch many number crunching experiments on a cluster of machines, with the assumption that those machines are accessible through ssh. This program is coded in Python by Christian Gagne and myself.
Download the script here
To use the script, it is useful to give it execution rights with the following command
chmod 744 sssched
You have a file with the list of the machine you want to use, let's call that file machines.lst. It contains one machine name or IP adress per line, empty lines are skipped. Here it is an example of such a file
uber-computer-01 uber-computer-02 uber-computer-03 18.104.22.168
You have a file with the list of the command you want to execute, let's call that file tasks.lst. It contains one command per line. Empty lines are skipped.
launchxp.sh mutation=0.1 out=run.1.out launchxp.sh mutation=0.1 out=run.2.out launchxp.sh mutation=0.1 out=run.3.out launchxp.sh mutation=0.1 out=run.4.out launchxp.sh mutation=0.9 out=run.5.out launchxp.sh mutation=0.9 out=run.6.out launchxp.sh mutation=0.9 out=run.7.out launchxp.sh mutation=0.9 out=run.8.out
Here, the commands are calls to a launch script with parameters, the typical case for some experiments.
To launch all those commands by schedulding them on the set of machine you specify, just do the following command :
./sssched -m machines.lst -c tasks.lst
or, in a more verbose fashion
./sssched --machines=machines.lst --commands=tasks.lst
If you have a momentary lapse of memory about the command line, ask some help
- The tasks will be schedulded on the machines from the list. Once a machine is free, a new task will be launched on it, so there is no wasted times due to a variable execution time of your experiment.
- sssched is always with a very low process priority, so it is not likely to slowdown your computer.
- sssched detects if a machine is not reachable.
- To use N times a machine, just put it name or IP adress N times in the machines list.