Tutorial¶
This tutorial will walk you through a simple experiment using WRF-Ensembly. It will cover the basic steps of setting up an experiment, running it, and postprocessing the results.
Prerequisites¶
You will need: - A working installation of WRF-Ensembly. If you haven't installed it yet, follow the Installation guide. - A working installation of WRF and WPS. You can find more information on how to install WRF here. If you need it, you can use WRF-CHEM. - A working installation of DART. You can find more information on how to install DART here.
Setting up the experiment¶
WRF-Ensembly keeps experiments inside a directory structure, which contains the model, any input and output files, and the configuration files. You can interact with experiments by putting the path as the first argument to the wrf-ensembly
command.
wrf-ensembly /path/to/experiment group command [options]
All actions/commands are categorised into groups. You can read about all commands in Usage. To begin, we will use the experiment create
command to create a new experiment. This will create the directory structure and the configuration files needed to run the experiment.
wrf-ensembly /path/to/experiment experiment create iridium_chem_4.6.0
This will create a new experiment at /path/to/experiment
using the config template iridium_chem_4.6.0
. The templates are included in the source code.
Your first course of action should be to inspect and edit the config.toml
file. This file contains all the configuration options for the experiment. You can find more information about the configuration options in the Configuration section. The most important options to edit are:
- In the
[metadata]
section, set the experimentname
and a smalldescription
. - In the
[directories]
section, you must set the paths towrf_root
,wps_root
anddart_root
. Forwrf_root
, you should point to the directory that contains therun/
directory of your WRF installation. Forwps_root
, you should point to the directory that contains thegeogrid.exe
,ungrib.exe
, andmetgrid.exe
executables. Fordart_root
, you should point to the repository root (that containsmodels/
,obs/
, etc.) - In the
[domain_control]
and[time_control]
sections you can set up the relevant options from WRF. These options are passed to WRF and WPS when they are run. - In the
[data]
section, you should set paths towps_geog
,meteorology
(directory that contains the GRIB files), whichmeteorology_vtable
you want to use (only filename, not a full path). - In the
[assimilation]
section, set how many members your ensemble will have inn_members
.
WRF-Chem users that want initial conditions w/ interpolator-for-wrfchem should also set the following variables in [data]
:
manage_chem_ic = true
chemistry = { path = '/home/thgeorgiou/data/Modelling/AIRSENSE/CAMS_FC', model_name = 'cams_global_forecasts' }
Also read WRF-CHEM later for wrf-chem specific information.
After setting up your configuration, use the experiment cycle-info
command to see the cycles of your experiment. This will show which cycles will run given your time control settings.
wrf-ensembly /path/to/experiment experiment cycle-info
# You can also use `--to-csv=` to save the table in a CSV file.
wrf-ensembly /path/to/experiment experiment cycle-info --to-csv=cycles
If everything looks good, you can use experiment copy-model
to make a copy of WRF and WPS inside the experiment directory. This is required and is done to ensure you can run the experiment even if there are changes in the WRF or WPS installations.
wrf-ensembly /path/to/experiment experiment copy-model
Preprocessing¶
At this stage, you should have a working experiment with the configuration set up. The next step is to preprocess the input data. This is done using the preprocess
group of commands, which will run the necessary WPS steps to prepare the input data for WRF. The steps are:
# Prepare the preprocessing directory
wrf-ensembly /path/to/experiment preprocess setup
# Run the three main WPS steps
wrf-ensembly /path/to/experiment preprocess geogrid
wrf-ensembly /path/to/experiment preprocess ungrib
wrf-ensembly /path/to/experiment preprocess metgrid
The namelists for WPS are generated automatically based on the configuration file. All preprocessing takes place inside the work/preprocess
subdirectory of the experiment.
The geogrid table is configurable in the geogrid.table
configuration field, while the ungrib variable table is set in the data.meteorology_vtable
configuration field.
At this point, you will be able to find the met_em*
files inside the work/preprocess/WPS
directory. To run real, you can use the preprocess real <CYCLE>
command:
wrf-ensembly /path/to/experiment preprocess real 0 # run for cycle 0
The above command will take care to generate the namelist for real.exe
, run real.exe
, and copy the final wrfinput_d01
and wrfbdy_d01
files to the data/initial_conditions
directory. You must run real for every cycle in your experiment.
To make this whole process easier, you can generate a SLURM jobfile for preprocessing using the slurm preprocessing
command:
wrf-ensembly /path/to/experiment slurm preprocessing
sbatch /path/to/experiment/jobfiles/preprocessing.sh
You can setup which #SBATCH
directives you want to include in the jobfile by editing the [slurm]
section of the config file.
Successful execution of all preprocessing steps will result in the data/initial_conditions
directory being populated with the wrfinput_d01
and wrfbdy_d01
files for each cycle. You can check the status of the preprocessing using the status
command:
If you are using WRF-Chem and you want to use the interpolator-for-wrfchem to generate initial conditions, you can use the preprocess interpolate-chem
command:
wrf-ensembly /path/to/experiment preprocess interpolate-chem
Observations¶
Every data assimilation experiment needs observations to assimilate. DART stores observations in the obs_seq
format and provides routines in FORTRAN to read/write them. WRF-Ensembly mainly contains a bunch of helper commands to make it easier to work with observations.
At the end of each cycle, we can assimilate ONE obs_seq
file. The workflow to generate this file is shown in the diagram below, for cycle 000. We assume there is one input file per observation type:
ec_atl_ebd.nc] --> B1[EarthCARE EBD
obs_converter] A2[Aeolus L2B Winds
aeolus_l2b.DBL] --> B2[Aeolus L2B Winds
obs_converter] A3[MODIS AOD
modis.h4] --> B3[MODIS AOD
obs_converter] A4[Radiosonde
obs_radiosonde.csv] --> B4[Radiosonde
obs_converter] %% Individual obs_seq files B1 --> C1[ec_ebd.obs_seq] B2 --> C2[aeolus_l2b.obs_seq] B3 --> C3[modis_aod.obs_seq] B4 --> C4[radiosonde.obs_seq] %% Combine all obs_seq files C1 --> D[obs_sequence_tool] C2 --> D C3 --> D C4 --> D %% Combined obs_seq file D --> E[cycle_000.obs_seq] %% Domain filtering E --> F[wrf_dart_obs_preprocess] %% Final output F --> G[cycle_000.obs_seq
Domain-filtered observations] %% Styling classDef inputFiles fill:#e1f5fe,stroke:#01579b,stroke-width:2px classDef converters fill:#f3e5f5,stroke:#4a148c,stroke-width:2px classDef obsSeq fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px classDef tools fill:#fff3e0,stroke:#ef6c00,stroke-width:2px classDef final fill:#ffebee,stroke:#c62828,stroke-width:2px class A1,A2,A3,A4,A5,A6 inputFiles class B1,B2,B3,B4,B5,B6 converters class C1,C2,C3,C4,C5,C6,E obsSeq class D,F tools class G final
In wrf-ensembly, we have the concept of the "observation group" file, which is a TOML file that contains the paths to a set of observation files, as well as their start and end dates. For example:
kind = "LIDAR_EXTINCTION"
converter = "/dart/observations/obs_converters/earthcare/work/convert_ec_atl_ebd"
cwd = "/dart/observations/obs_converters/earthcare/work/"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T000437Z_20250401T180155Z_04777G.ds.h5"
start_date = "20250401T000430"
end_date = "20250401T001558"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T001544Z_20250401T180206Z_04777H.ds.h5"
start_date = "20250401T001537"
end_date = "20250401T002749"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T002733Z_20250401T180115Z_04778A.ds.h5"
start_date = "20250401T002726"
end_date = "20250401T003922"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T003908Z_20250401T194300Z_04778B.ds.h5"
start_date = "20250401T003901"
end_date = "20250401T005109"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T005055Z_20250401T194301Z_04778C.ds.h5"
start_date = "20250401T005048"
end_date = "20250401T010212"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T010159Z_20250401T194207Z_04778D.ds.h5"
start_date = "20250401T010152"
end_date = "20250401T011400"
[[files]]
path = "/data/ECA_EXAE_ATL_EBD_2A_20250401T011345Z_20250401T194206Z_04778E.ds.h5"
start_date = "20250401T011338"
end_date = "20250401T012534"
This way, wrf-ensembly can filter out which files have data during an assimilation window and only convert them.
These files are created using some helper scripts we provide in the repository, under the obs_scripts directory. Since they are specific to one set of observations, you are expected to write your own scripts to generate these files. The scripts should output the TOML file in the obs_group
format, as shown above.
Assuming you have your .toml
files inside the obs/
subdirectory of your experiment, you can use the observations convert-obs
command to convert the observation files to the obs_seq
format and the `observations combine-obs:
# Use --jobs to convert many files in parallel. Watch out for memory usage!
wrf-ensembly /path/to/experiment observations convert-obs --jobs 32
# Combine all obs_seq files into one per cycle
wrf-ensembly /path/to/experiment observations combine-obs --jobs 32
Finally, you can (optionally) filter the observations to the domain of the WRF simulation using the observations preprocess-for-wrf
command using the wrf_dart_obs_preprocess
tool (read here):
wrf-ensembly /path/to/experiment observations preprocess-for-wrf
Following these steps above will result in the obs/
directory having a bunch of cycle_ABC.obs_seq
files, one for each cycle. These files are ready to be assimilated in the next step. If a cycle doesn't have a file, your observations did not cover the assimilation window of that cycle, so no observations will be assimilated for that cycle.
Preparing the ensemble¶
Now that you have the initial conditions and the observations ready, you can run the ensemble. This is done using the ensemble
group of commands, which handle preparing the ensemble, advancing the members, running the assimilation filter and finally cycling. The commands are shown visually in the diagram below:
We begin with the ensemble setup
command, which prepares the ensemble by copying the initial conditions and setting up the directories for each member. This command should be run only once at the beginning of the experiment.
wrf-ensembly /path/to/experiment ensemble setup
Now is the time to handle perturbations, if you are using them. Perturbations are used to generate an ensemble of members that are slightly different from each other from one set of initial conditions. First, you must add the appropriate perturbation configuration in the config.toml
file, under the [perturbations]
section. You can read more about perturbations in the Configuration section. For example, if we want to perturb the U and V fields:
[perturbations.variables.V]
operation = 'add'
mean = 0
sd = 8
rounds = 8
boundary = 10
[perturbations.variables.U]
operation = 'add'
mean = 0
sd = 8
rounds = 8
boundary = 10
After setting up the perturbations, you can generate them using the ensemble generate-perturbations
command:
wrf-ensembly /path/to/experiment ensemble generate-perturbations --jobs 8
This will generate the perturbation files inside data/diagnostics/perturbations/
. You can inspect the files to see how the perturbations look like. If you want to apply the perturbations to the initial conditions, you can use the ensemble apply-perturbations
command:
wrf-ensembly /path/to/experiment ensemble apply-perturbations --jobs 8
You can repeat the setup
-> perturbations process as many times as you want to tune your perturbations. The setup
command will always copy the original wrfinput_d01
and wrfbdy_d01
files from the data/initial_conditions/
directory, so you can always start fresh.
The update-bc
step is crucial after applying perturbations or cycling. When you modify the initial conditions field, you might introduce inconsistencies with the boundary conditions if there are changes near the boundary. These inconsistencies can lead to unexpected behavior in the model. The ensemble update-bc
will ensure that the edges of the domain are consistent with the boundary conditions using the update-wrf-bc
tool from DART. You should run this command after applying perturbations or cycling the ensemble:
wrf-ensembly /path/to/experiment ensemble update-bc
Running WRF¶
It is finally time to advance the model, which is done using the ensemble advance-member
command:
# Advance member 2 to the next cycle using 24 cores
wrf-ensembly /path/to/experiment ensemble advance-member --member 2 --cores 24
The forecasts are stored inside scratch/forecasts/cycle_ABC
. Of course, you must advance all members to the next cycle before continuing. There is a SLURM helper for this we will cover later.
Running the assimilation filter, generating the analysis¶
Running the assimilation filter involves placing the model state files in the correct place and running filter.exe
from the DART WRF directory. Currently, WRF-Ensembly does not handle the DART namelist, so you might have to make some changes to input.nml
in the models/wrf/work
directory. Namely, you should set the correct number of ensemble members in filter_nml::ens_size
and the correct observation types in obs_kind_nml::assimilate_these_obs_types
. The state variables in model_nml::wrf_state_variables
must also match config.yml
and what you expect.
After adjusting input.nml
, you can run the assimilation filter using the ensemble filter
command:
wrf-ensembly /path/to/experiment ensemble filter
The DART output files are stored in scratch/dart/cycle_ABC
. After filter is executed, this command will automatically move the obs_seq.final
file to data/diagnostics/cycle_ABC.obs_seq.final
so you can check how the assimilation went later.
At this point, you can generate the analysis files, which are the final forecast wrfout
files but with the fields corrected by the assimilation. This is done using the ensemble analysis
command:
wrf-ensembly /path/to/experiment ensemble analysis
The analysis files are stored in scratch/analysis/cycle_ABC
.
Cycling the experiment¶
Finally, you can cycle the experiment using the ensemble cycle
command. This will prepare the members for the next cycle by copying the new initial and boundary condition files, and adding the new analysis fields to them. After cycle
, you should run update-bc
and then the members are ready to be advanced to the next cycle. You can run the cycle
command as follows:
wrf-ensembly /path/to/experiment ensemble cycle
wrf-ensembly /path/to/experiment ensemble update-bc
Automating all this with SLURM¶
Running all the above steps is very tedious if you have more than a toy amount of cycles and members. WRF-ensembly provides a command to queue the whole experiment with SLURM, so you can run it in the background and forget about it. The command is slurm run-experiment
, which will generate a set of jobs to run your experiment.
wrf-ensembly /path/to/experiment slurm run-experiment
Specifically, you will get N+1 jobs, where N is the number of members. There is one job per member to advance and there is a final job that runs the assimilation filter and cycles the experiment. The jobs are automatically submitted to SLURM and the final one uses dependencies to ensure that it runs only after all members have been advanced. A side-effect for this is that you must have permission to submit N+1 jobs to the SLURM queue, potentially limiting your ensemble size.