Hydrology- and Soils-focused Ancillary files for JULES:
My work is focused increasingly strongly on Hydrology and Soils (see Marthews et al. 2014), specifically the representation of both in Land Surface Models like JULES. For this kind of work, I need to work with the best ancillary files that I can source: I need a good characterisation of soil parameters and I also need control over various options such as the land mask and land_ice coverage so that I can ensure all hydrological pathways are as realistic as possible.
During 2016-23 I have developed a bespoke script to produce my ancillary files. The latest iteration of this is called GENHSA and it can produce what I call 'hydrology- and soils-focused ancillary files'.
With GENHSA I can do several things that I do not believe are currently possible using ANTS (the standard ancillary-generation program of the UK Met Office) or Iris (a python library for manipulating spatial datafiles). Specifically:
(1) LAND-OCEAN MASK: It is critical to identify the world coastline as accurately as possible for hydrological consistency. I want to be able to generate sub-1 km resolution ancillary files, which means the resolution of my source land-ocean mask must be finer than this. I have generated a 250 m global land-ocean mask based on WorldBoundaries_ESRI from ArcGIS Online, which is higher resolution than the 1 km resolution AVHRR-based land-ocean mask qrparm.mask used by ANTS and which forms the basis of WATCH land masks (the AVHRR mask seems to miss a lot of small islands especially in the Pacific and also a lot of coastal points including the estuaries of major rivers).
I do understand that there are meteorological issues here (small islands introduce perturbations to the atmospheric physics that become too large if the land mass of that island is 'rounded up' to a 50 km x 50 km gridcell - see ATDP01). However, from a land surface perspective these are less important and I prefer to ensure that the land mass is there so that JULES can at least simulate something on it (e.g. if a client is from that small island!).
(2) RIVER DIRECTIONS: For my work I always need hydrological and river-routing ancillaries, but ANTS currently does not produce these (topographic index values, logn_ values, river directions). I have coded up a way using Wu & Kimball's Global Dominant River Tracing (DRT) layers (which has involved gap-filling coastal points where they lack data, where I used aspect-derived approximate flow directions).
(3) SOIL PARAMETER LAYERS: I need to have a wider selection of soil parameter options than provided by ANTS, so I have coded up a wide selection of pedotransfer functions (currently, Cosby et al., Saxton & Rawls, Hodnett & Tomasella and Tóth et al. pedotransfer equation sets). I have also implemented some important automatic data checks (e.g. applying reasonable upper and lower limits to soil parameters). Base soil data for this is all taken from SoilGrids250m.
(4) BETTER LAKES: Moving from 0.50° to 0.25° resolution and finer, many more lakes appear on the land surface. For better hydrological realism, I need these to be actually simulated but unfortunately even the SoilGrids250m layers lack soil data for many lake points. I have used gap-filling with approximate values to ensure that all lakes are included in the simulation.
Also, large lakes: I need all lakes to be included in the land mask, but seas and oceans to be excluded (i.e. including Lake Victoria/Nnalubaale and all the US/Canadian Great Lakes, but excluding the Caspian Sea and the Black Sea). Some land-ocean masks do not follow this convention (see e.g. Lake Victoria/Nnalubaale and the Great Lakes here and also slide 6 of here, but I impose it in my scripts). I am aware that this approach puts me at odds with ANTS, e.g. from the ANTS documentation ATDP01 "Lakes can be a problem in the final mask, especially with high resolution grids. Now, it may be argued that since the lakes are real they should remain in the mask to maintain reality. However, doing so may result in unacceptable noise in the lower boundary physics and the model becoming unstable and subsequently aborting ... Therefore, in most situations it is probably best to remove small lakes and only retain significant large lakes." (although I can understand why this is important from an atmospheric point of view, for my runs I need lakes!).
(5) GLACIERS AND ICE SHEETS: For high latitude and many other applications I need a high quality land_ice layer, which makes a huge difference in many parts of the world (e.g. the Himalayas). For e.g. my MOCABORS project in Norway I needed glacier coverage and I could not see any way to use ANTS for this (the land_ice data I have seen from ANTS-generated frac ancillaries seems to be very approximate and in some cases all glaciers appear to have been simply removed as described here). To sort this out, I generated a 48 Gb raster layer at 250 m resolution composed of the union of three land_ice layers: GLIMSv6 and data on the Greenland ice sheet and Antarctica.
(6) ROBUST PARAMETER AVERAGING: I need to avoid a particular hard-wired default that ANTS uses: I need my soil properties to be a calculated average of parameter values within each gridcell, rather than the dominant soil type within the gridcell (search for "The most straightforward method to aggregate" in Montzka et al. (2017) to find a short paragraph that describes the 3 main ways to do averaging of soil properties in ancillary files: it is clear from the code here that ANTS can use the dominant soil type approach only).
There are also other issues, e.g. ANTS tries to gap-fill gridcells with no soil data "by spiral searching to the nearest adjacent land". For my runs this is undesirable because it puts in potentially erroneous soil data on many gridcells, so I prefer to leave these as no-data cells.
(7) SOIL THERMAL PROPERTIES: I need to use equations for soil thermal properties (heat capacity and heat conductivity) that are updated from what is available in ANTS. Basically, ANTS uses equations for soil heat capacity (hcap) and soil heat conductivity (hcon) from Jones (2008) that I believe are actually completely incorrect. I have rederived these equations myself from first principles and checked the sources quoted in Jones (2008) and I now have corrected versions, but I have not yet published these myself (although they are implemented in GENHSA). Since I became aware of this situation in early 2022, I have been testing my new versions of these equations to see how much difference it makes to use them.
(8) ROTATED AND VARIABLE-RESOLUTION GRIDS: Finally, because I use a variety of different grids in my current projects, I needed a script that could calculate ancillary files whatever my required grid resolution and extent, as well as being able to handle graticular grids (e.g. 0.25° resolution, N96 and Nxxx resolution), rotated grids (e.g. UTM, OSGB36) and variable-resolution grids (e.g. UKCP). To be fair, I think ANTS can do all of this, but I believe it's a bit tricky (involving preparing some files 'by hand' using Iris first), but my script has been generalised and these options can be simply chosen in a straightforward menu, which means I can avoid having to use Iris.
** A note about terminology: "regular grid" is a much-abused term and can often mean that the grid has constant resolution in terms of degrees (i.e. not variable resolution like UKCP or UKV) or that the grid is graticular (i.e. the X,Y of the grid are lines of equal longitude and latitude; the grid is unrotated)
(9) PROVENANCE: The most important reason I use GENHSA rather than ANTS, however, is because using this I know exactly where all the data comes from (and can therefore justify it in a write-up). Using ANTS it still pulls in many files with rather cryptic filenames and no provenance information in the NetCDF headers, which is a significant problem.
I want to say that, at the end of the day, I can see that ANTS is a very impressive system and, despite these points above, I don't really want to knock it too much. Having tried to code an ancil-generator myself, I can appreciate that there are a LOT of issues and the developers of ANTS have had to take quick decisions in order to produce something workable.
However, with the move to higher spatial resolution and a wider variety of use-needs, I believe that a lot of those quick decisions need to be revisited and reconsidered. In particular, without those features above I believe currently that ANTS is not fit for purpose for the projects I am currently involved with.
I am not currently engaged with the development of ANTS, but I do have a long-term hope that some of the scripts I have coded up here can be used to improve ANTS and give it some of the functionality I feel is missing above, which I believe would make the ancillaries it creates more suitable for land surface (rather than atmospheric) simulation. My scripts are still being validated, but when that has been achieved I would be very open to the idea of working with a Python-developer in order to achieve this.
*** ANTS v1.0.0 was officially released October 2022 (see here) and I have not yet had a chance to assess it (my comments about ANTS below refer to pre-release versions up to v0.18) , so it may actually now have some of the capability below ***
Also note that in late 2016 a lot of documentation about the precursors of ANTS was uploaded to this ticket.
*** Iris also needs a mention. I went through the tutorial here and my impression is that Iris is useful in a few ways (e.g. probably the easiest way to convert GRIB and PP files to NetCDF), but it doesn't replace my GENHSA script. Essentially, it loads spatial data into a 'cube' in memory (in Python) and allows you to do a lot of the same manipulations that are possible with NCO tools, with slightly less of the difficult syntax of NCO or GDAL commands. I personally have reservations about this (despite the lazy loading, I think there will be memory limitations using Iris that don't apply with NCO and GDAL), although for small grids I think it looks great. ***
Jones CP (1996). Specification of ancillary fields. Unified Model Documentation Paper 70. Please note this document states “This document has not been published. Permission to quote from it must be obtained from the Head of Numerical Modelling [at the UK Met Office]”.
Jones CP (2008). Ancillary file data sources (v.10). Unified Model Documentation Paper 70 [updated version]. Please note this document states “This document has not been published. Permission to quote from it must be obtained from the Head of Numerical Modelling [at the UK Met Office]”.
Bovis K (2012). Ancillary review - position paper. Internal position paper for the TIAN project.