Vlasiator output data

Why we teach this lesson

Vlasiator has several grids and types of variables. Here we introduce them and go through which ones you might want to use for diagnostics and which one for sciencing.

Intended learning outcomes

You will be familiar with the outputs generated by Vlasiator and will understand which files and structures contain which information.

Three kinds of data

Vlasiator produces three kinds of output files during a simulation run, the contents of which vary based on simulation parameters:

1. logfile.txt, the simulation run log. This is a timestamped ascii file providing basic diagnostic output of the run, including memory usage, time steps, counts of spatial cells at refinement levels etc.

2. diagnostic.txt. The contents of this file can be configured by the diagnostic = options in the run config file. In general, this ascii file will contain one line per (1, 10, or so) simulation timesteps, with the columns determined by the selected data reducers. These include, for example, simple scalar values like overall plasma mass, number of velocity space blocks in the simulation, maximum time step allowed by each solver, mass loss due to sparsity etc.

3. VLSV files are the main output data products. These files come in multiple varieties:

* Restart files. These checkpoint contain the whole simulation state, including the full phase space density, all relevant electromagnetic fields and metadata. Simulations can be restarted from them (hence the name), but they tend to be very heavy, easily multiple terabytes in size for production runs. They do not contain the output of data reducer operators (detailed below).

* Bulk files. In these, reduced spatial simulation data is written for further scientific analysis. Usually, this includes moments of the distribution functions and electromagnetic fields, but can also contain much more complex data reducer operators, as listed below. It is also possible (and common) to configure a subset (e.g. every 25th cell) of the velocity distribution functions to be written for further analysis.

N.b. saving FSgrid variables for large 3D runs can lead to significant disk space usage, due to the FSgrid being uniform at the highest resolution setting. For this reason, storing FSgrid variables in bulk files should be carefully considered. It is also possible to declare several different bulk file settings, one of which can be defined to exclude FSgrid variables and be output more often, with the FSgrid-variable including version output only e.g. every 10 seconds.

The VLSV file format

The VLSV library is used to write this versatile container format. Analysator can be used to load and handle these files in python.

The file format is optimized for parallel write performance: Data is dumped to disk in the same memory structure as it is in the Vlasiator simulation, as binary blobs. Once all data is written, an XML footer that describes the data gets added to the end.

An example XML footer might look like this:

<VLSV>
   <MESH arraysize="208101" datasize="8" datatype="uint" max_refinement_level="1" name="SpatialGrid" type="amr_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">989580</MESH>
   <MESH arraysize="652800" datasize="8" datatype="uint" name="fsgrid" type="multi_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">4011008</MESH>
   <PARAMETER arraysize="1" datasize="8" datatype="float" name="time" vectorsize="1">989488</PARAMETER>
   <PARAMETER arraysize="1" datasize="8" datatype="float" name="dt" vectorsize="1">989496</PARAMETER>
   <VARIABLE arraysize="123544" datasize="8" datatype="uint" mesh="SpatialGrid" name="CellID" vectorsize="1">1136</VARIABLE>
   <VARIABLE arraysize="652800" datasize="8" datatype="float" mesh="fsgrid" name="fg_b" unit="T" unitConversion="1.0" unitLaTeX="$\mathrm{T}$" variableLaTeX="$B$" vectorsize="3">9558184</VARIABLE>
</VLSV>

Each XML tag describes one dataset in the file, with arraysize, datatype, datasize and vectorsize describing the array. The XML tag’s content contains the byte offset in the file, where this dataset’s raw binary data lies.

The two most important tag types are PARAMETER, for single numbers describing the file as a whole, such as resolutions, timesteps etc., and VARIABLE, for spatially varying data reducer data maps.

Additional metadata is often added to the datasets, such as their physical units, LaTeX formatted plotting hints, etc.

It can sometimes be useful to use the command line to look directly at the XML footer information which contains information on all variables included in the file, e.g. tail -n 60 bulk.0001234.vlsv |less. You can adjust the line count until you have the information you need. Adding too many lines will result in human-unreadable binary output.

Spatial ordering: Vlasov- vs. FSGrid vs. Velocity space variables

Note that the XML tags in the file do not yet give sufficient information to describe the spatial structure of the variable arrays. The construction differs depending on the grid they are linked to (denoted by the mesh= attribute):

Vlasov grid variables, typically marked with a vg_ in their name, are stored as cell parameters in the DCCRG grid underlying the vlasov solver. As the simulation is dynamically load balanced, their memory order changes unpredictably, so the data must be presumed completely unordered in the file.

Fortunately, the CellID variable gets written into the file first, which contains the flattened spatial index of the given simulation cells in the same order as all further Vlasov grid variables. In the simplest, non mesh-refined version, the CellID is defined as

CellID = x_index + x_size * y_index + x_size * y_size * z_index + 1

By reading both the intended target variable and the CellID, the data can thus be brought into flattened spatial order by simply sorting both arrays in the same order. In analysator, this is typically achieved by running
```
c = file.read_variable("CellID")
b = file.read_variable("rho")
b = b[numpy.argsort(c)]
b.reshape(f.get_spatial_mesh_size())
```
FSGrid variables are stored on the simulations fieldsolver grid, which is partitioned quite differently for performance reasons. The spatial domain is subdivided into equally sized rectangular domains, which are written for each compute rank in parallel. If written from a simulation with a single MPI rank, the resulting array is directly ordered in spatial order, as by the cellID definition above. For simulations on multiple ranks, every rank writes its data in this structure, end-to-end. The num_writing_ranks and the MESH_DECOMPOSITION arguments in the XML tag allow the spatial partition to be reconstructed on load time.
Ionospheric grid variables are stored on the simulations ionosphere grid, which is a statically refined triangular mesh designed for solving ionospheric potentials.
Velocity space variables (at the moment, this is only the phase space density f for every species), follow yet another structure due to the sparse velocity grid structure on which they are stored.

Simulation data reducers

This is a (mostly) up-to date list of simulation output options that can be enabled in the config file. Note that older simulation possibly use slightly different names, as the code is in constant development.

Vlasiator outputs
Variable name	config option	unit	meaning	literature ref
CellID	always written	cells	Spatial ordering of vlasov grid cells
fg_b	`fg_b`	T	Overall magnetic field (vector)	[Palmroth et al. 2018](https://link.springer.com/article/10.1007%2Fs41115-018-0003-2)
fg_b_background	`fg_backgroundb`	T	Static background magnetic field (i.e. dipole field in a magnetosphere simulation. Vector.)	[Palmroth et al. 2018](https://link.springer.com/article/10.1007%2Fs41115-018-0003-2)
fg_b_perturbed	`fg_perturbedb`	T	Fluctuating component of the magnetic field (vector)	[Palmroth et al. 2018](https://link.springer.com/article/10.1007%2Fs41115-018-0003-2)
fg_e	`fg_e`	V/m	Electric field calculated as ∇ × B (Vector)
vg_rhom	`vg_rhom`	kg/m³	combined mass density of all simulation species
fg_rhom	`fg_rhom`	kg/m³	-‘’-
vg_rhoq	`vg_rhoq`	C/m³	combined charge density of all simulation species
fg_rhoq	`fg_rhoq`	C/m³	-‘’-
proton_vg_rho	`populations_rho`	1/m³	Number density for each simulated particle population
vg_v	`vg_v`	m/s	Bulk plasma velocity (velocity of the centre-of-mass frame vector)
fg_v	`fg_v`	m/s	-‘’-
proton_vg_v	`populations_v`	m/s	Per-population bulk velocity
proton_vg_rho_thermal	`populations_moments_thermal`	1/m³	Number density for the thermal component of every population
proton_vg_v_thermal	-‘’-	m/s	Velocity (vector) for the thermal component of every population
proton_vg_ptensor_diagonal_thermal	-‘’-	Pa	Diagonal components of the pressure tensor for the thermal component of every population
proton_vg_ptensor_offdiagonal_thermal	-‘’-	Pa	Off-Diagonal components of the pressure tensor for the thermal component of every population
proton_vg_rho_nonthermal	`populations_moments_nonthermal`	1/m³	Number density for the nonthermal component of every population
proton_vg_v_nonthermal	-‘’-	m/s	Velocity (vector) for the nonthermal component of every population
proton_vg_ptensor_diagonal_nonthermal	-‘’-	Pa	Diagonal components of the pressure tensor for the nonthermal component of every population
proton_vg_ptensor_offdiagonal_nonthermal	-‘’-	Pa	Off-Diagonal components of the pressure tensor for the nonthermal component of every population
proton_minvalue	`populations_vg_effectivesparsitythreshold`	m⁻⁶s³	Effective sparsity threshold for every cell.	[Yann’s PhD Thesis](http://urn.fi/URN:ISBN:978-952-336-001-3) page 91
proton_rholossadjust	`populations_vg_rho_loss_adjust`	1/m³	Tracks how much mass was lost in the sparse velocity space block removal	[Yann’s PhD Thesis](http://urn.fi/URN:ISBN:978-952-336-001-3) page 90
vg_lbweight	`vg_lbweight`	arb. unit	Load balance metric	used for dynamic rebalancing of computational load between mpi tasks
vg_maxdt_acceleration	`vg_maxdt_acceleration`	s	Maximum timestep limit of the acceleration solver
proton_vg_maxdt_acceleration	`populations_vg_maxdt_acceleration`	s	-‘’-	per-population
vg_maxdt_translation	`vg_maxdt_translation`	s	Maximum timestep limit of the translation solver
proton_vg_maxdt_translation	`populations_vg_maxdt_translation`	s	-‘’-	per-population

VLSV data tools

A short note on the included tools, compiled by:

make vlsvextract vlsvdiff

Some older tools included in make tools are not currently supported.

vlsvextract

vlsvextract can be used to extract VDF data from vlsv files and store it as a separate VLSV file for visualization.

 USAGE: ./vlsvextract_DP <file name mask> <options>

 To get a list of options use --help

 Options:
--help                display help
--debug               write debugging info to stderr
--cellid arg          Set cell id
--cellidlist arg      Set list of cell ids
--rotate              Rotate velocities so that they face z-axis
--plasmaFrame         Shift the distribution so that the bulk velocity is 0
--coordinates arg     Set spatial coordinates x y z
--unit arg            Sets the units. Options: re, km, m (OPTIONAL)
--point1 arg          Set the starting point x y z of a line
--point2 arg          Set the ending point x y z of a line
--pointamount arg     Number of points along a line (OPTIONAL)
--outputdirectory arg The directory where the file is saved (default current folder) (OPTIONAL)

For example, let’s pick a VDF from the foreshock of the Mercury 5D example run; see VisIt lecture one method on how we can find the cellID, here we have a cellID pre-picked.

./vlsvextract_DP /scratch/project_465000693/example_runs/Mercury5D/bulk/bulk.0000122.vlsv --cellid 332776

This can be used to extract VDFs over lines and multiple files as well.

vlsvdiff

vlsvdiff we use for e.g. continuous integration testing. There is an included testpackage, from which one can generate reference data and compare the effects of one’s code edits locally.

Other use is to extract differences between different files - for example, time differences.

Other output files

If the PHIPROF profiler suite is in use, you will also see e.g. phiprof_0.txt in the run directory, providing rough ASCII tables of run-time timers, useful for rudimentary profiling of the Vlasiator code, solvers, and I/O.

Interesting questions you might get

Q: Why are the output formats so convoluted?

A: They are optimized for run-time performance, so that each MPI task can simply pour its data into one contiguous region on-disk via MPI writes.

Typical pitfalls

Read Vlasov grid data and forget the order the cells based on CELLIDS
Read FSGrid data and accidentally order that also according to CELLIDS