Note to self: ticket #1351, #1310, suite u-bs602.
The JULES model can be run in serial or in parallel on a computer platform [by which I mean a UNIX/Linux computer - see my UNIX Basics doc if not familiar with UNIX/Linux - for example one of the compute machines listed below]. My tutorial JULES From Scratch is wholly focused on serial runs only. In the world of parallel programming, there are three (currently) ways to do it: (1) Multi-core (= using MPI) (2) Multi-thread (= using OMP) (3) Multi-core and multi-thread at the same time (see Fig. 1 right) (n.b. interrupt-driven multi-tasking is not a form of parallelisation). From the user point of view, many things change moving from serial to parallel. Serial running is available on every platform, but not all platforms support parallel, so I've created a quick checklist here to go through to make sure that moving to parallel is possible for you on the platform you are using: CHECKLIST: (i) Check whether your server architecture has more than one core (these days, this is an almost certain "yes", but single-core servers do still exist, so this is worth checking). If your server has only one core, then specifying MPI in the JULES setup will either have no effect or cause an error. (ii) Check whether your server architecture will support hyperthreading, i.e. multi-thread runs (if it can't, then specifying OMP in the JULES setup will either have no effect or cause an error). To find out how many cores you have access to, type: lscpu | grep -E '^Thread|^Core|^Socket\(' and (Cores per socket) * (No. of sockets) will give you the number of cores on your machine, e.g. on Jasmin Cylc I get: [tmarthews@cylc1 ~]$ lscpu | grep -E '^Thread|^Core|^Socket\(' Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 8 which tells me that I have 8 sockets (=CPUs), 1*8=8 cores (and 1 thread on each of those cores). (iii) For a multi-core run, you will need appropriate NetCDF libraries for a parallel run (multi-thread runs can always use the same libraries as for a non-parallel run). See here. (iv) You will need your driving data and ancillary file location(s) and output location(s) to be on partitions that support parallel reading & writing (e.g. on JASMIN the scratch partitions scratch-nopw and scratch-pw3 are named according to whether they will support parallel writing (pw) or not (nopw)). |
Figure 1: Schematic representation of a hierarchical mixed mode progamming model for a two-dimensional grid array.
|
Compute resource terminology:
NODE = Server/computer
CORE = Processor (most servers have 64 or 128 cores). Not the same as a CPU: a CPU usually has multiple cores.
CLUSTER = A collection of interconnected nodes
MPI = Message Passing Interface, the standard for producing message passing libraries (message passing = exchanging information between processes, frequently on separate nodes).
MPICH = A common MPI implementation built on the chameleon communications layer
Compute machines available* for running a model like JULES:
JASMIN = A UK superdata cluster for environmental science research https://jasmin.ac.uk/
- I personally use JASMIN a lot for JULES runs: see my page JULES on JASMIN.
ARCHER2 = A UK national supercomputing service https://www.archer2.ac.uk/
MONSOON2 = UK Met Office & NERC joint supercomputer system https://www.metoffice.gov.uk/research/approach/collaboration/jwcrp/monsoon-hpc
* n.b. not necessarily restricted to use by UK nationals: please check the platform's terms & conditions.
NODE = Server/computer
CORE = Processor (most servers have 64 or 128 cores). Not the same as a CPU: a CPU usually has multiple cores.
CLUSTER = A collection of interconnected nodes
MPI = Message Passing Interface, the standard for producing message passing libraries (message passing = exchanging information between processes, frequently on separate nodes).
MPICH = A common MPI implementation built on the chameleon communications layer
Compute machines available* for running a model like JULES:
JASMIN = A UK superdata cluster for environmental science research https://jasmin.ac.uk/
- I personally use JASMIN a lot for JULES runs: see my page JULES on JASMIN.
ARCHER2 = A UK national supercomputing service https://www.archer2.ac.uk/
MONSOON2 = UK Met Office & NERC joint supercomputer system https://www.metoffice.gov.uk/research/approach/collaboration/jwcrp/monsoon-hpc
* n.b. not necessarily restricted to use by UK nationals: please check the platform's terms & conditions.