Vlasiator output data
Why we teach this lesson
Vlasiator has several grids and types of variables. Here we introduce them and go through which ones you might want to use for diagnostics and which one for sciencing.
Intended learning outcomes
You will be familiar with the outputs generated by Vlasiator and will understand which files and structures contain which information.
Three kinds of data
Vlasiator produces three kinds of output files during a simulation run, the contents of which vary based on simulation parameters:
1. logfile.txt
, the simulation run log. This is a timestamped ascii
file providing basic diagnostic output of the run, including memory
usage, time steps, counts of spatial cells at refinement levels etc.
2. diagnostic.txt
. The contents of this file
can be configured by the diagnostic =
options in the run config
file. In general, this ascii file will contain one line per (1, 10, or
so) simulation timesteps, with the columns determined by the selected
data reducers. These include, for example, simple scalar values like
overall plasma mass, number of velocity space blocks in the simulation,
maximum time step allowed by each solver, mass loss due to sparsity etc.
3. VLSV files are the main output data products. These files come in multiple varieties:
* Restart files. These checkpoint contain the whole simulation state, including the full phase space density, all relevant electromagnetic fields and metadata. Simulations can be restarted from them (hence the name), but they tend to be very heavy, easily multiple terabytes in size for production runs. They do not contain the output of data reducer operators (detailed below).
* Bulk files. In these, reduced spatial simulation data is written for further scientific analysis. Usually, this includes moments of the distribution functions and electromagnetic fields, but can also contain much more complex data reducer operators, as listed below. It is also possible (and common) to configure a subset (e.g. every 25th cell) of the velocity distribution functions to be written for further analysis.
N.b. saving FSgrid variables for large 3D runs can lead to significant disk space usage, due to the FSgrid being uniform at the highest resolution setting. For this reason, storing FSgrid variables in bulk files should be carefully considered. It is also possible to declare several different bulk file settings, one of which can be defined to exclude FSgrid variables and be output more often, with the FSgrid-variable including version output only e.g. every 10 seconds.
The VLSV file format
The VLSV library is used to write this versatile container format. Analysator can be used to load and handle these files in python.
The file format is optimized for parallel write performance: Data is dumped to disk in the same memory structure as it is in the Vlasiator simulation, as binary blobs. Once all data is written, an XML footer that describes the data gets added to the end.
An example XML footer might look like this:
<VLSV>
<MESH arraysize="208101" datasize="8" datatype="uint" max_refinement_level="1" name="SpatialGrid" type="amr_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">989580</MESH>
<MESH arraysize="652800" datasize="8" datatype="uint" name="fsgrid" type="multi_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">4011008</MESH>
<PARAMETER arraysize="1" datasize="8" datatype="float" name="time" vectorsize="1">989488</PARAMETER>
<PARAMETER arraysize="1" datasize="8" datatype="float" name="dt" vectorsize="1">989496</PARAMETER>
<VARIABLE arraysize="123544" datasize="8" datatype="uint" mesh="SpatialGrid" name="CellID" vectorsize="1">1136</VARIABLE>
<VARIABLE arraysize="652800" datasize="8" datatype="float" mesh="fsgrid" name="fg_b" unit="T" unitConversion="1.0" unitLaTeX="$\mathrm{T}$" variableLaTeX="$B$" vectorsize="3">9558184</VARIABLE>
</VLSV>
Each XML tag describes one dataset in the file, with arraysize
,
datatype
, datasize
and vectorsize
describing the array. The
XML tag’s content contains the byte offset in the file, where this
dataset’s raw binary data lies.
The two most important tag types are PARAMETER
, for single numbers
describing the file as a whole, such as resolutions, timesteps etc., and
VARIABLE
, for spatially varying data reducer data maps.
Additional metadata is often added to the datasets, such as their physical units, LaTeX formatted plotting hints, etc.
It can sometimes be useful to use the command line to look directly at
the XML footer information which contains information on all variables
included in the file, e.g. tail -n 60 bulk.0001234.vlsv |less
. You
can adjust the line count until you have the information you need.
Adding too many lines will result in human-unreadable binary output.
Spatial ordering: Vlasov- vs. FSGrid vs. Velocity space variables
Note that the XML tags in the file do not yet give sufficient
information to describe the spatial structure of the variable arrays.
The construction differs depending on the grid they are linked to
(denoted by the mesh=
attribute):
Vlasov grid variables, typically marked with a
vg_
in their name, are stored as cell parameters in the DCCRG grid underlying the vlasov solver. As the simulation is dynamically load balanced, their memory order changes unpredictably, so the data must be presumed completely unordered in the file.Fortunately, the
CellID
variable gets written into the file first, which contains the flattened spatial index of the given simulation cells in the same order as all further Vlasov grid variables. In the simplest, non mesh-refined version, the CellID is defined asCellID = x_index + x_size * y_index + x_size * y_size * z_index + 1
By reading both the intended target variable and the CellID, the data can thus be brought into flattened spatial order by simply sorting both arrays in the same order. In analysator, this is typically achieved by running
c = file.read_variable("CellID") b = file.read_variable("rho") b = b[numpy.argsort(c)] b.reshape(f.get_spatial_mesh_size())
FSGrid variables are stored on the simulations fieldsolver grid, which is partitioned quite differently for performance reasons. The spatial domain is subdivided into equally sized rectangular domains, which are written for each compute rank in parallel. If written from a simulation with a single MPI rank, the resulting array is directly ordered in spatial order, as by the cellID definition above. For simulations on multiple ranks, every rank writes its data in this structure, end-to-end. The
num_writing_ranks
and theMESH_DECOMPOSITION
arguments in the XML tag allow the spatial partition to be reconstructed on load time.Ionospheric grid variables are stored on the simulations ionosphere grid, which is a statically refined triangular mesh designed for solving ionospheric potentials.
Velocity space variables (at the moment, this is only the phase space density f for every species), follow yet another structure due to the sparse velocity grid structure on which they are stored.
Simulation data reducers
This is a (mostly) up-to date list of simulation output options that can be enabled in the config file. Note that older simulation possibly use slightly different names, as the code is in constant development.
Variable name |
config option |
unit |
meaning |
literature ref |
---|---|---|---|---|
CellID |
always written |
cells |
Spatial ordering of vlasov grid cells |
|
fg_b |
|
T |
Overall magnetic field (vector) |
[Palmroth et al. 2018](https://link.springer.com/article/10.1007%2Fs41115-018-0003-2) |
fg_b_background |
|
T |
Static background magnetic field (i.e. dipole field in a magnetosphere simulation. Vector.) |
[Palmroth et al. 2018](https://link.springer.com/article/10.1007%2Fs41115-018-0003-2) |
fg_b_perturbed |
|
T |
Fluctuating component of the magnetic field (vector) |
[Palmroth et al. 2018](https://link.springer.com/article/10.1007%2Fs41115-018-0003-2) |
fg_e |
|
V/m |
Electric field calculated as ∇ × B (Vector) |
|
vg_rhom |
|
kg/m³ |
combined mass density of all simulation species |
|
fg_rhom |
|
kg/m³ |
-‘’- |
|
vg_rhoq |
|
C/m³ |
combined charge density of all simulation species |
|
fg_rhoq |
|
C/m³ |
-‘’- |
|
proton_vg_rho |
|
1/m³ |
Number density for each simulated particle population |
|
vg_v |
|
m/s |
Bulk plasma velocity (velocity of the centre-of-mass frame vector) |
|
fg_v |
|
m/s |
-‘’- |
|
proton_vg_v |
|
m/s |
Per-population bulk velocity |
|
proton_vg_rho_thermal |
|
1/m³ |
Number density for the thermal component of every population |
|
proton_vg_v_thermal |
-‘’- |
m/s |
Velocity (vector) for the thermal component of every population |
|
proton_vg_ptensor_diagonal_thermal |
-‘’- |
Pa |
Diagonal components of the pressure tensor for the thermal component of every population |
|
proton_vg_ptensor_offdiagonal_thermal |
-‘’- |
Pa |
Off-Diagonal components of the pressure tensor for the thermal component of every population |
|
proton_vg_rho_nonthermal |
|
1/m³ |
Number density for the nonthermal component of every population |
|
proton_vg_v_nonthermal |
-‘’- |
m/s |
Velocity (vector) for the nonthermal component of every population |
|
proton_vg_ptensor_diagonal_nonthermal |
-‘’- |
Pa |
Diagonal components of the pressure tensor for the nonthermal component of every population |
|
proton_vg_ptensor_offdiagonal_nonthermal |
-‘’- |
Pa |
Off-Diagonal components of the pressure tensor for the nonthermal component of every population |
|
proton_minvalue |
|
m⁻⁶s³ |
Effective sparsity threshold for every cell. |
[Yann’s PhD Thesis](http://urn.fi/URN:ISBN:978-952-336-001-3) page 91 |
proton_rholossadjust |
|
1/m³ |
Tracks how much mass was lost in the sparse velocity space block removal |
[Yann’s PhD Thesis](http://urn.fi/URN:ISBN:978-952-336-001-3) page 90 |
vg_lbweight |
|
arb. unit |
Load balance metric |
used for dynamic rebalancing of computational load between mpi tasks |
vg_maxdt_acceleration |
|
s |
Maximum timestep limit of the acceleration solver |
|
proton_vg_maxdt_acceleration |
|
s |
-‘’- |
per-population |
vg_maxdt_translation |
|
s |
Maximum timestep limit of the translation solver |
|
proton_vg_maxdt_translation |
|
s |
-‘’- |
per-population |
VLSV data tools
A short note on the included tools, compiled by:
make vlsvextract vlsvdiff
Some older tools included in make tools
are not currently supported.
vlsvextract
vlsvextract
can be used to extract VDF data from vlsv files and store it as a separate VLSV file for visualization.
USAGE: ./vlsvextract_DP <file name mask> <options>
To get a list of options use --help
Options:
--help display help
--debug write debugging info to stderr
--cellid arg Set cell id
--cellidlist arg Set list of cell ids
--rotate Rotate velocities so that they face z-axis
--plasmaFrame Shift the distribution so that the bulk velocity is 0
--coordinates arg Set spatial coordinates x y z
--unit arg Sets the units. Options: re, km, m (OPTIONAL)
--point1 arg Set the starting point x y z of a line
--point2 arg Set the ending point x y z of a line
--pointamount arg Number of points along a line (OPTIONAL)
--outputdirectory arg The directory where the file is saved (default current folder) (OPTIONAL)
For example, let’s pick a VDF from the foreshock of the Mercury 5D example run; see VisIt lecture one method on how we can find the cellID, here we have a cellID pre-picked.
./vlsvextract_DP /scratch/project_465000693/example_runs/Mercury5D/bulk/bulk.0000122.vlsv --cellid 332776
This can be used to extract VDFs over lines and multiple files as well.
vlsvdiff
vlsvdiff
we use for e.g. continuous integration testing. There is an included testpackage, from which one can generate reference data and compare the effects of one’s code edits locally.
Other use is to extract differences between different files - for example, time differences.
Other output files
If the PHIPROF profiler suite is in use, you will also see e.g. phiprof_0.txt
in the run directory,
providing rough ASCII tables of run-time timers, useful for rudimentary profiling of the Vlasiator
code, solvers, and I/O.
Interesting questions you might get
Q: Why are the output formats so convoluted?
A: They are optimized for run-time performance, so that each MPI task can simply pour its data into one contiguous region on-disk via MPI writes.
Typical pitfalls
Read Vlasov grid data and forget the order the cells based on CELLIDS
Read FSGrid data and accidentally order that also according to CELLIDS