IO Namespace#
The IO namespace (src/io/) handles all file-based input and output in ChemNetworks. It provides trajectory readers
for loading simulation data, output writers for saving results, graph readers for reloading previously written files,
and utility classes for tokenizing, logging, filenames, and string validation.
Trajectory Readers#
Trajectory readers load simulation frames from various file formats. Each format-specific reader (e.g., Read_XYZ,
Read_LAMMPSTRJ, Read_GRO, etc.) extends Node_Subject and follows a common pattern. See
TRAJECTORY.TYPE for the list of supported formats and the attributes each provides.
Dispatch#
Read_Trajectory (src/io/read_trajectory.hpp) acts as a dispatcher. It selects the appropriate format-specific
reader based on the trajectory type string ("xyz", "lammpstrj", "gro", etc.).
For each call to execute(timestep), it instantiates the format reader, attaches itself as an observer, executes the
read, then detaches. Read_Trajectory is itself both a Node_Observer and a Node_Subject, forwarding attribute
notifications from the format reader up to its own observers (such as Write_Node_Graph).
Common Read Pattern#
Each trajectory reader follows these general steps:
-
The file is opened using
FileTokenizer. If aTrajectoryOffsetcursor has a saved position, the reader seeks to that byte offset and resumes from the recorded timestep, avoiding re-reading earlier frames. -
The reader scans forward through frames until the target timestep is reached. If the end of the file is reached before finding the target,
reached_eofis set totrueand the reader returns early. -
Header data is read from the trajectory. This includes the number of atoms and, depending on the format, periodic boundary condition (PBC) box dimensions and attribute column names. Some formats have fixed attribute sets (e.g., XYZ always provides
atom_type,x,y,z), while others discover attributes from headers (e.g., LAMMPSTRJ reads them from theITEM: ATOMSline). -
Atom attributes are read into a 2D vector structured as
vector<vector<string>>(num_attributes, vector<string>(num_atoms)). Each inner vector holds one attribute's values across all atoms. -
The reader calls
Notify_Node_Attributes(timestep, attributes, all_node_attributes)to push the attribute data to all attached observers in a single batch. -
The
TrajectoryOffsetcursor is updated with the current timestep and byte position so the next call can resume without re-reading the previously processed frames.
TrajectoryOffset#
TrajectoryOffset (src/io/trajectory_offset.hpp) stores the byte position and timestep index of the last
successfully read frame. This allows trajectory readers to resume from where they left off rather than scanning from
the beginning of the file on each timestep. Each trajectory block maintains its own cursor instance.
Format-Specific Notes#
- XYZ (
src/io/read_xyz.hpp): Fixed attributes (atom_type,x,y,z). Two header lines per frame (atom count and a discarded comment line). - LAMMPSTRJ (
src/io/read_lammpstrj.hpp): Attributes discovered from theITEM: ATOMSheader. Reads PBC dimensions fromITEM: BOX BOUNDS. Currently only supports orthorhombic PBCs. Locates frames by matchingITEM: TIMESTEPvalues. - GRO (
src/io/read_gro.hpp): Fixed-column format parsed usingFileTokenizer::parse_cols_into(). Providesmol_id,mol_type,atom_type,atom_id,x,y,z, and optionally velocity attributes. Handles the 99999 index wrap-around in GROMACS files by tracking loop counters. - XTC (
src/io/read_xtc.hpp): Binary format read through the xdrfile C library. Provides only coordinate attributes (x,y,z) and PBC dimensions. - GROXTC (
src/io/read_groxtc.hpp): Combines a GRO file for topology attributes (mol_id,mol_type,atom_type) with an XTC file for coordinates and PBC dimensions. The GRO file is read once for these attributes; coordinates come entirely from the XTC file on every frame.
Output Writers#
Output writers write graph data to files. ChemNetworks supports writing in its own formats (.nds, .edg, .zmt)
as well as external formats (LAMMPS data, PDB, and MOL2). See the GRAPH.WRITE
commands for configuration and file format details.
Common Write Pattern#
Node and edge writers follow a similar structure:
-
Filename_Utils::create_timestamped_filename()generates the output filename by interpolating the timestep into the base filename (e.g.,./edg_files/edges.0.edg). Parent directories are created automatically if they do not exist. -
If the attribute list is
"*", it is expanded to include all vertex or edge attributes present on the graph. -
The writer queries igraph for each attribute's type and stores them in a map. This determines which igraph accessor to use when writing values.
-
A header line listing the attribute column names is written, using the configured file delimiter.
-
Each node or edge is written as a delimiter-separated row.
Write_Node_File#
Write_Node_File (src/io/write_node_file.hpp) writes one row per node. The header line includes a TIMESTEP row
followed by the column names (i, then attribute names). Each data row contains the node index followed by its
attribute values.
Write_Edge_File#
Write_Edge_File (src/io/write_edge_file.hpp) writes one row per edge. Each row contains the source node index i,
the source node's requested attributes, the target node index j, the target node's attributes, and any edge
attributes. The header reflects this column layout.
Write_ZMatrix_File#
Write_ZMatrix_File (src/io/write_zmatrix_file.hpp) implements ZMatrix_Observer. It accumulates per-match
Z-matrix output via update_zmatrix() during the search, then writes the collected data to a .zmt file when
execute() is called.
External Format Writers#
The LAMMPS data, PDB, and MOL2 writers export graph data into formats readable by external simulation and visualization
tools. All three share a common approach: they resolve node attributes through
fallback chains (e.g., trying atom_type, then atom_name, then
element for atom names), and read PBC dimensions from the trajectory component.
Write_LAMMPS_Data_File (src/io/write_lammps_data_file.hpp) writes LAMMPS .data files using the full atom style.
It includes header, masses, atoms, and bonds sections. Atom types are auto-mapped to sequential integer IDs. Bond types
are determined from atom type pairings, with order-independent keys (e.g., O-H and H-O receive the same type ID).
Write_PDB_File (src/io/write_pdb_file.hpp) writes PDB files with CRYST1, HETATM, and CONECT records.
Write_MOL2_File (src/io/write_mol2_file.hpp) writes Tripos MOL2 files with @<TRIPOS>MOLECULE, ATOM, BOND, and
FF_PBC records.
Graph Readers#
Read_Graph (src/io/read_graph.hpp) reads previously written .nds and .edg files back into a graph. It
dispatches to Read_nds or Read_edg based on the file type, and acts as both a node and edge observer/subject to
forward the data to graph-building commands.
The filename string used for reading must match the one used when writing. Read_Graph uses Filename_Utils to
reconstruct the per-timestep filename from the base string, so the same pattern (including any * wildcards) must be
provided. See ADD.NODES.FROM_FILE and
ADD.EDGES.FROM_FILE for usage details.
Read_nds (src/io/read_nds.hpp) parses a .nds file, determines the maximum node index (to handle sparse indices),
and calls notify_resize() followed by Notify_Node_Attributes().
Read_edg (src/io/read_edg.hpp) parses a .edg file, validates that all referenced node indices exist in the
graph, creates edges via notify_edges(), and attaches edge attributes via notify_edge_attributes().
Utilities#
Tokenizers#
ChemNetworks provides two tokenizer classes (src/io/tokenizer.hpp) for parsing structured text: LineTokenizer for
in-memory strings and FileTokenizer for reading directly from files.
LineTokenizer#
LineTokenizer splits a string into tokens by a delimiter (default: space). Consecutive separators are
merged by default, so "a b", with two spaces in between, produces two tokens ("a" and "b") rather than three ("
a", "", and "b").
Tokens are consumed sequentially using one of four methods:
parse_next_into(T& value)parses the next token into a typed variable usingstd::istringstream. If the conversion fails, an error is raised with the attempted type name.next()returns the next token as a string.peek()returns the next token without consuming it.count()returns the number of remaining unconsumed tokens.
FileTokenizer#
FileTokenizer extends LineTokenizer to read from files line by line. It wraps an std::ifstream and provides:
read_next_line()reads the next line from the file, creating a LineTokenizer for that line.skip_next_line()advances past the next line without tokenizing.has_next_line()checks whether more lines are available.skip_bytes(offset)seeks to a byte position in the file. Used for resuming from a saved cursor position.tell()returns the current byte position.
FileTokenizer also supports fixed-width column parsing via parse_cols_into(width, value), which extracts a
substring of width characters from the current line and parses it into a typed variable. This is used by the GRO
reader, which uses a fixed-column format. Token-based and column-based parsing cannot be mixed on the same line.
Log#
Log (src/io/log.hpp) provides static logging methods at four levels:
root(fmt, args...)prints only on MPI rank 0, always shown.log(fmt, args...)queues output on all ranks, always shown.info(fmt, args...)queues output, hidden with-q.debug(fmt, args...)queues output, shown only with-v.error(fmt, args...)prints to stderr and exits (or throws in debug builds).
Output from log, info, and debug is buffered in a log_queue stringstream and flushed at the end of each
timestep via write_log_queue(). All methods use std::format for string formatting.
Delimiters#
Delimiters (src/io/delimiters.hpp) defines the separator characters used for output files (file, default tab)
and key-value map parsing (map, default colon). These can be configured through the
SETTINGS.DELIM sub-commands.
Filename_Utils#
Filename_Utils (src/io/filename_utils.hpp) generates timestamped output filenames. If the base filename contains
*, each * is replaced with the timestep number. Otherwise, the timestep is appended between the base name and the
file extension (e.g., edges becomes edges.0.edg). See the
file naming convention for details.
Other Utilities#
validate_string(src/utils/io.hpp): Iterates over every character in a string and callsLog::errorif any character failsstd::isgraph. Accepts an optionalerror_msgformat string (default:"Found illegal character >>{}<< in string >>{}<<."); the first{}is replaced with the offending character and the second with the full string.validate_true_false(src/utils/io.hpp): Parses a boolean value from a string. Acceptstrue,false,1, or0(case-insensitive). CallsLog::errorfor unrecognized non-empty values; returnsfalsefor empty input.parse_params(src/io/parse_params.hpp): Parseskey:value key:value defaultformatted strings into a map with an optional default. Used for attribute mapping in input file components.Count_Lines_In_File(src/io/count_lines_in_file.hpp): Counts lines in a file with configurable modes (all lines, non-empty, or non-whitespace-only).