Postprocessing¶
WRF Ensembly provides a comprehensive postprocessing pipeline for transforming raw WRF output files into analysis-ready datasets. The postprocessing system consists of several stages that can be executed individually or through SLURM jobs, allowing for flexible and efficient data processing workflows.
Overview¶
The postprocessing pipeline transforms raw wrfout
files through several stages:
- Data Processing Pipeline - Applies custom transformations and standardizes the data format
- Ensemble Statistics - Computes mean and standard deviation across ensemble members
- Concatenation - Combines temporal files into single files per cycle with compression
- Cleanup - Removes temporary files to save disk space
All postprocessing commands follow the pattern:
wrf-ensembly EXPERIMENT_PATH postprocess COMMAND [OPTIONS]
File Transformations and Output Paths¶
The postprocessing pipeline transforms raw WRF output files in several key ways, making them more suitable for analysis and visualization while reducing file sizes.
Key Transformations¶
Coordinate Standardization:
- Time dimension renamed from XTIME
to t
with proper CF-compliant attributes
- Latitude/longitude coordinates renamed from XLAT
/XLONG
to latitude
/longitude
- Standardized coordinate attributes (standard_name
, axis
, units
)
Data Processing:
- Destaggering: WRF's staggered grids are destaggered to regular grids
- Diagnostic variables: New variables computed (e.g., air_density
, air_pressure
, geopotential_height
)
- Wind transformation: Grid-relative winds (U
, V
) converted to earth-relative (wind_east
, wind_north
)
- Variable filtering: Only specified variables retained based on variables_to_keep
configuration
File Structure:
- Ensemble statistics computed: *_mean
and *_sd
files for each time step
- Temporal concatenation: Individual time files combined into single files per cycle
- Compression applied: Lossless compression and optional quantization reduce file sizes
- Custom variables: User-defined processors can add derived variables (e.g., extinction coefficients, AOD)
Output File Paths¶
The postprocessing system creates files in several locations within your experiment directory:
Intermediate Files (scratch directory):
EXPERIMENT_PATH/scratch/
├── analysis/cycle_XXX/member_YY/
│ ├── wrfout_* # Raw WRF analysis files
│ ├── wrfout_*_post # Processed individual files
│ ├── wrfout_*_mean # Ensemble mean files
│ └── wrfout_*_sd # Ensemble std deviation files
└── forecasts/cycle_XXX/member_YY/
├── wrfout_* # Raw WRF forecast files
├── wrfout_*_post # Processed individual files
├── wrfout_*_mean # Ensemble mean files
└── wrfout_*_sd # Ensemble std deviation files
Final Output Files (data directory):
EXPERIMENT_PATH/data/
├── analysis/cycle_XXX/
│ ├── analysis_mean_cycle_XXX.nc # Concatenated analysis ensemble mean
│ └── analysis_sd_cycle_XXX.nc # Concatenated analysis ensemble std dev
└── forecasts/cycle_XXX/
├── forecast_mean_cycle_XXX.nc # Concatenated forecast ensemble mean
├── forecast_sd_cycle_XXX.nc # Concatenated forecast ensemble std dev
└── forecast_member_YY_cycle_XXX.nc # Individual members (if keep_per_member=true)
The final concatenated files in the data/
directory are the primary analysis products, containing all time steps for a cycle in CF-compliant NetCDF format with optimized compression.
Customization Options¶
Key configuration options for controlling the transformations:
variables_to_keep
: Regular expressions to filter which variables are retainedcompression_filters
: Control lossless compression algorithms (e.g.,"shf|zst,3"
)ppc_filter
: Precision control for lossy quantization (e.g.,"default=3#Z.*=6"
)keep_per_member
: Whether to save individual member files alongside ensemble statisticsprocessors
: Custom data processors for specialized transformations
For complete configuration details, see the Configuration documentation.
Commands¶
print-variables-to-keep¶
wrf-ensembly EXPERIMENT_PATH postprocess print-variables-to-keep
Prints which variables will be kept in the wrfout files after postprocessing. This is useful to check if the variables you want to keep are actually kept or check how the regex filters are applied.
The variables are defined in the PostprocessConfig.variables_to_keep
configuration option. This command requires that cycle 0 output for member 0 exists.
Example output:
DUST_\\d -> DUST_1: (('Time', 'bottom_top', 'south_north', 'west_east',)) (float32)
DUST_\\d -> DUST_2: (('Time', 'bottom_top', 'south_north', 'west_east',)) (float32)
U -> U: (('Time', 'bottom_top', 'south_north', 'west_east',)) (float32)
wind_.* -> wind_east: (('Time', 'bottom_top', 'south_north', 'west_east',)) (float32)
wind_.* -> wind_north: (('Time', 'bottom_top', 'south_north', 'west_east',)) (float32)
process-pipeline¶
wrf-ensembly EXPERIMENT_PATH postprocess process-pipeline [--cycle CYCLE] [--jobs JOBS]
Applies the configured data processor pipeline to output files. This command replaces the legacy apply-scripts
approach with a more efficient plugin-based system that processes data in memory using a configurable pipeline of DataProcessor
instances.
Options:
- --cycle CYCLE
- Cycle to process (uses current cycle if not specified)
- --jobs JOBS
- Number of files to process in parallel (default: determined automatically)
The pipeline processes all wrfout
files in both the analysis and forecast directories for the specified cycle, creating *_post
files that contain the processed data.
statistics¶
wrf-ensembly EXPERIMENT_PATH postprocess statistics [--cycle CYCLE] [--jobs JOBS]
Calculates ensemble mean and standard deviation from the forecast/analysis files for the given cycle. This function reads the *_post
files created by the process-pipeline
command and computes ensemble statistics across all members.
Options:
- --cycle CYCLE
- Cycle to compute statistics for (uses current cycle if not specified)
- --jobs JOBS
- Number of files to process in parallel (default: determined automatically)
Output files:
- *_mean
- Ensemble mean files
- *_sd
- Ensemble standard deviation files
For single-member experiments, the command simply copies the files without computing statistics.
concatenate¶
wrf-ensembly EXPERIMENT_PATH postprocess concatenate [--cycle CYCLE] [--jobs JOBS]
Concatenates all output files (mean and standard deviation) into consolidated files for each cycle. Uses the *_mean
and *_sd
files created by the statistics
command and combines them temporally into single NetCDF files.
Options:
- --cycle CYCLE
- Cycle to concatenate (uses current cycle if not specified)
- --jobs JOBS
- Number of NCO commands to execute in parallel (default: 4)
Output files:
- forecast_mean_cycle_XXX.nc
- Concatenated forecast ensemble mean
- forecast_sd_cycle_XXX.nc
- Concatenated forecast ensemble standard deviation
- analysis_mean_cycle_XXX.nc
- Concatenated analysis ensemble mean
- analysis_sd_cycle_XXX.nc
- Concatenated analysis ensemble standard deviation
If keep_per_member
is enabled, also creates:
- forecast_member_XX_cycle_XXX.nc
- Individual member forecasts
The concatenation process applies compression and quantization filters as specified in the configuration.
clean¶
wrf-ensembly EXPERIMENT_PATH postprocess clean [--cycle CYCLE] [--remove-wrfout]
Cleans up the scratch directory for the given cycle to save disk space. This command should be run after completing the other postprocessing steps.
Options:
- --cycle CYCLE
- Cycle to clean (uses current cycle if not specified)
- --remove-wrfout
- Remove raw wrfout files (default: true)
Data Processor Pipeline¶
The heart of the postprocessing system is the data processor pipeline, which applies a series of transformations to each WRF output file. The pipeline is highly configurable and extensible through a plugin system.
Built-in Processors¶
XWRFPostProcessor¶
The XWRFPostProcessor
is automatically applied as the first step in every pipeline. It performs essential transformations including:
- Air density computation - Calculates air density from model variables
- xWRF postprocessing - Applies destaggering and diagnostic computations
- Coordinate standardization - Renames dimensions and coordinates to standard names
- Variable filtering - Removes unused variables and filters based on
variables_to_keep
- Metadata cleanup - Fixes coordinate attributes and removes serialization artifacts
ScriptProcessor¶
The ScriptProcessor
allows integration of external scripts for backward compatibility. Scripts must accept {in}
and {out}
placeholders for input and output files.
Configuration example:
[[postprocess.processors]]
processor = "script"
params = { script = "python enhance_data.py {in} {out}" }
The script can also use additional placeholders:
- {d}
- Member number (0-based)
- {c}
- Cycle number
Custom Processors¶
You can create custom processors by implementing the DataProcessor
abstract base class:
from wrf_ensembly.processors import DataProcessor, ProcessingContext
import xarray as xr
class MyCustomProcessor(DataProcessor):
def __init__(self, custom_param="default", **kwargs):
super().__init__(**kwargs)
self.custom_param = custom_param
def process(self, ds: xr.Dataset, context: ProcessingContext) -> xr.Dataset:
"""Apply your custom processing logic here"""
# Example: Add a new variable
ds['custom_variable'] = ds['U'] * 2.0
ds['custom_variable'].attrs['description'] = 'Custom variable'
# Access context information
if context.member == 0:
# Special processing for first member
pass
return ds
@property
def name(self) -> str:
return "My Custom Processor"
Processor Configuration¶
Processors are configured in the postprocess.processors
section of the configuration file:
# Built-in script processor
[[postprocess.processors]]
processor = "script"
params = { script = "python add_variable.py {in} {out}" }
# External processor from module
[[postprocess.processors]]
processor = "my_package.processors:CustomProcessor"
params = { custom_param = "value" }
# External processor from file
[[postprocess.processors]]
processor = "/path/to/custom_processor.py:MyProcessor"
params = { param1 = "value1", param2 = 42 }
Processor Loading¶
Processors can be loaded from several sources:
- Built-in processors - Use names like
"script"
- Module paths - Format:
"module.path:ClassName"
- File paths - Format:
"/path/to/file.py:ClassName"
- File with auto-discovery - Format:
"/path/to/file.py"
(uses filename as class name)
Processing Context¶
The ProcessingContext
object provides processors with metadata about the current processing operation:
@dataclass
class ProcessingContext:
member: int # Ensemble member number (0-based)
cycle: int # Cycle number
input_file: Path # Path to input file
output_file: Path # Path to output file
config: Config # Full experiment configuration
This context allows processors to adapt their behavior based on the member, cycle, or other experiment parameters.
Best Practices¶
Variable Filtering¶
Use regular expressions in variables_to_keep
to efficiently filter variables:
variables_to_keep = [
"DUST_\\d", # All dust species (DUST_1, DUST_2, etc.)
"U|V|W", # Wind components
"wind_.*", # Derived wind variables
"T|P|PH|MU", # Thermodynamic variables
"QVAPOR", # Water vapor
"longitude|latitude", # Coordinates
"t" # Time
]
Keep only the variables you need for your science to reduce file sizes and processing time. Avoid using overly broad patterns that may match unintended variables.
You can check which variables will be kept using the print-variables-to-keep
command, which prints the variables that match the regex patterns defined in variables_to_keep
.
Compression and Quantization¶
The default compression settings provide a good balance between file size and processing time:
- Compression filters:
"shf|zst,3"
applies shuffle and Zstandard compression - Quantization:
"default=3#Z.*=6#X.*=6"
keeps 3 significant digits for most variables, 6 for height/horizontal coordinates
These filters are passed to nco's commands when doing the concatenation step. Read NCO's documentation for more details on the available compression and quantization options.
If your netCDF libraries are not built to handle Zstandard compression, you will not see any errors but the files will not be compressed. Always double check your netCDF output file sizes and check them with ncdump -hs
to see which compression is applied. If you can't seem to get Zstandard to work, just use deflate
instead.
Resource Allocation¶
For large ensembles, tune the core allocation:
processor_cores = 24 # Parallel file processing
statistics_cores = 8 # Parallel statistics computation
concatenate_cores = 8 # Parallel NCO operations
Your main bottleneck here will likely be RAM usage, so ensure that you have enough RAM to run all these things in parallel. The processor_cores
setting controls how many files are processed in parallel, while statistics_cores
and concatenate_cores
control parallelism for statistics computation and NCO concatenation, respectively.
Error Handling¶
The pipeline stops on any processor error. To debug issues:
- Check log files in the experiment's
logs/
directory - Test processors individually with small datasets
- Use the
print-variables-to-keep
command to verify variable filtering - Examine intermediate
*_post
files for processing artifacts
Integration with SLURM¶
The postprocessing commands integrate seamlessly with SLURM through the slurm postprocess
command, which runs all postprocessing steps as a single job. The SLURM configuration uses the directives_postprocess
settings for resource allocation.
For more information on SLURM integration, see the SLURM documentation.