A workflow handler can easily become a monster. We had to give ourselves some principles and limits:
Principles of lemmings
A) Lemmings must behave like any human user
It must do what a human user do, only faster. Indeed, each time we tried to do more, it felt like a layer of “black magic” to some of our beta testers. We will stick to “simple and stupid”.
B) Lemmings must stay non-intrusive**
The whole stuff must be pip-installable by a user without root access. As a consequence, it must comply to the usual constraints for non-root users. In particular, no extra-work shall be given to the Computer Support Group of the cluster hosting the simulations.
C) Lemmings must not replace pre-existing services**.
There are many useful services, like advanced accounting or monitoring, already made available by the computer support. Lemmings must not become an alternative.
Limits of lemmings
Lemming is no more allowing restarts without resets
A soft workflow restart is , for exemple, when your workflow fails at step 4, you fix it and relaunch, and the logs restart from step 4. However, we found the hard way that making a workflow compatible with these restarts was horrible, at best. As a rebound effect, we had some hellish supports.
We eventually found that better workflows using disc-based information instead of database and loops numbers were much more resilient. In case of crash, simply relaunch a new lemmings. This is the current best practice.
Lemming will not tidy up your folders for you
When you use Lemmings, a lot of
.e. It is tempting to make Lemmings able to tidy up the folders, moving all these ugly files aways from your line of sight … but:
HPC jobs are bound to crash. At those times, the support need your log files in place where they expect to find it. If lemmings (re)moves things on its own, only a lemming expert will know where to search.
It is therefore a better practice to make your own “cleaner script”, and use it manually when needed.
Lemmings is not a simulation monitoring system
A simulation monitoring system need a persistent database, able to cope with errors and failure. To our knowledge, there is no way to ensure a persistent database without a close collaboration with the HPC ressource support group. (Principles B & C)
Lemmings is no more a CPU-consumption limiter
Initially a lemmings job asked the user how much hours he was ready to spend before launch. We observed a rebound effect about some of our users: having this feature made them careless. The
cpu-limit was replaced by a systematic, nominative disclaimer at each command.
While lemmings try to provide you somme accounting information, these are indicative values. All users have an official accounting protocol, and need to know how to monitor their allocation.