IO Namespace#

The IO namespace (src/io/) handles all file-based input and output in ChemNetworks. It provides trajectory readers for loading simulation data, output writers for saving results, graph readers for reloading previously written files, and utility classes for tokenizing, logging, filenames, and string validation.

Trajectory Readers#

Trajectory readers load simulation frames from various file formats. Each format-specific reader (e.g., Read_XYZ, Read_LAMMPSTRJ, Read_GRO, etc.) extends Node_Subject and follows a common pattern. See TRAJECTORY.TYPE for the list of supported formats and the attributes each provides.

Dispatch#

Read_Trajectory (src/io/read_trajectory.hpp) acts as a dispatcher. It selects the appropriate format-specific reader based on the trajectory type string ("xyz", "lammpstrj", "gro", etc.). For each call to execute(timestep), it instantiates the format reader, attaches itself as an observer, executes the read, then detaches. Read_Trajectory is itself both a Node_Observer and a Node_Subject, forwarding attribute notifications from the format reader up to its own observers (such as Write_Node_Graph).

Common Read Pattern#

Each trajectory reader follows these general steps:

The file is opened using FileTokenizer. If a TrajectoryOffset cursor has a saved position, the reader seeks to that byte offset and resumes from the recorded timestep, avoiding re-reading earlier frames.
The reader scans forward through frames until the target timestep is reached. If the end of the file is reached before finding the target, reached_eof is set to true and the reader returns early.
Header data is read from the trajectory. This includes the number of atoms and, depending on the format, periodic boundary condition (PBC) box dimensions and attribute column names. Some formats have fixed attribute sets (e.g., XYZ always provides atom_type, x, y, z), while others discover attributes from headers (e.g., LAMMPSTRJ reads them from the ITEM: ATOMS line).
Atom attributes are read into a 2D vector structured as vector<vector<string>>(num_attributes, vector<string>(num_atoms)). Each inner vector holds one attribute's values across all atoms.
The reader calls Notify_Node_Attributes(timestep, attributes, all_node_attributes) to push the attribute data to all attached observers in a single batch.
The TrajectoryOffset cursor is updated with the current timestep and byte position so the next call can resume without re-reading the previously processed frames.

TrajectoryOffset#

TrajectoryOffset (src/io/trajectory_offset.hpp) stores the byte position and timestep index of the last successfully read frame. This allows trajectory readers to resume from where they left off rather than scanning from the beginning of the file on each timestep. Each trajectory block maintains its own cursor instance.

Format-Specific Notes#

XYZ (src/io/read_xyz.hpp): Fixed attributes (atom_type, x, y, z). Two header lines per frame (atom count and a discarded comment line).
LAMMPSTRJ (src/io/read_lammpstrj.hpp): Attributes discovered from the ITEM: ATOMS header. Reads PBC dimensions from ITEM: BOX BOUNDS, supporting orthorhombic, restricted-triclinic (xy xz yz), and general-triclinic (abc origin) boxes. Locates frames by matching the value on the ITEM: TIMESTEP line (see the note on TIMESTEPS semantics in docs/input_file/settings.md).
GRO (src/io/read_gro.hpp): Fixed-column format parsed using FileTokenizer::parse_cols_into(). Provides mol_id, mol_type, atom_type, atom_id, x, y, z, and optionally velocity attributes. Handles the 99999 index wrap-around in GROMACS files by tracking loop counters.
XTC (src/io/read_xtc.hpp): Binary format read through the xdrfile C library. Provides only coordinate attributes (x, y, z) and PBC dimensions.
GROXTC (src/io/read_groxtc.hpp): Combines a GRO file for topology attributes (mol_id, mol_type, atom_type) with an XTC file for coordinates and PBC dimensions. Only the GRO file's first frame is used for topology, but note that frame 0 is currently re-parsed on every timestep. Coordinates come entirely from the XTC file on every frame.

Output Writers#

Output writers write graph data to files. ChemNetworks supports writing in its own formats (.nds, .edg, .zmt) as well as external formats (LAMMPS data, PDB, and MOL2). See the GRAPH.WRITE commands for configuration and file format details.

Common Write Pattern#

Node and edge writers follow a similar structure:

Utils::IO::create_timestamped_filename() generates the output filename by interpolating the timestep into the base filename (e.g., ./edg_files/edges.0.edg). Parent directories are created automatically if they do not exist.
If the attribute list is "*", it is expanded to include all vertex or edge attributes present on the graph.
The writer queries igraph for each attribute's type and stores them in a map. This determines which igraph accessor to use when writing values.
A header line listing the attribute column names is written, using the configured file delimiter.
Each node or edge is written as a delimiter-separated row.

Write_Node_File#

Write_Node_File (src/io/write_node_file.hpp) writes one row per node. The header line includes a TIMESTEP row followed by the column names (i, then attribute names). Each data row contains the node index followed by its attribute values.

Write_Edge_File#

Write_Edge_File (src/io/write_edge_file.hpp) writes one row per edge. Each row contains the source node index i, the source node's requested attributes, the target node index j, the target node's attributes, and any edge attributes. The header reflects this column layout.

Write_ZMatrix_File#

Write_ZMatrix_File (src/io/write_zmatrix_file.hpp) implements ZMatrix_Observer. It accumulates per-match Z-matrix output via update_zmatrix() during the search, then writes the collected data to a .zmt file when execute() is called.

External Format Writers#

The LAMMPS data, PDB, and MOL2 writers export graph data into formats readable by external simulation and visualization tools. All three share a common approach: they resolve node attributes through fallback chains (e.g., trying atom_type, then atom_name, then element for atom names), and read PBC dimensions from the trajectory component.

Write_LAMMPS_Data_File (src/io/write_lammps_data_file.hpp) writes LAMMPS .data files using the full atom style. It includes header, masses, atoms, and bonds sections. Atom types are auto-mapped to sequential integer IDs. Bond types are determined from atom type pairings, with order-independent keys (e.g., O-H and H-O receive the same type ID).

Write_PDB_File (src/io/write_pdb_file.hpp) writes PDB files with CRYST1, HETATM, and CONECT records.

Write_MOL2_File (src/io/write_mol2_file.hpp) writes Tripos MOL2 files with @<TRIPOS>MOLECULE, ATOM, BOND, and FF_PBC records.

Graph Readers#

Read_Graph (src/io/read_graph.hpp) reads previously written .nds and .edg files back into a graph. It dispatches to Read_nds or Read_edg based on the file type, and acts as both a node and edge observer/subject to forward the data to graph-building commands.

The filename string used for reading must match the one used when writing. Read_nds and Read_edg use Utils::IO::create_timestamped_filename() to reconstruct the per-timestep filename from the base string, so the same pattern (including any * wildcards) must be provided. See ADD.NODES.FROM_FILE and ADD.EDGES.FROM_FILE for usage details.

Read_nds (src/io/read_nds.hpp) parses a .nds file, determines the maximum node index (to handle sparse indices), and calls notify_resize() followed by Notify_Node_Attributes().

Read_edg (src/io/read_edg.hpp) parses a .edg file, validates that all referenced node indices exist in the graph, creates edges via notify_create_edges(), and attaches edge attributes via notify_edge_attributes().

Utilities#

Tokenizers#

ChemNetworks provides two tokenizer classes (src/io/tokenizer.hpp) for parsing structured text: LineTokenizer for in-memory strings and FileTokenizer for reading directly from files.

LineTokenizer#

LineTokenizer splits a string into tokens by a delimiter (default: space). Consecutive separators are merged by default, so "a b", with two spaces in between, produces two tokens ("a" and "b") rather than three (" a", "", and "b").

Tokens are consumed sequentially using one of four methods:

parse_next_into(T& value) parses the next token into a typed variable using std::istringstream. If the conversion fails, an error is raised with the attempted type name.
next() returns the next token as a string.
peek() returns the next token without consuming it.
count() returns the number of remaining unconsumed tokens.

FileTokenizer#

FileTokenizer extends LineTokenizer to read from files line by line. It wraps an std::ifstream and provides:

read_next_line() reads the next line from the file, creating a LineTokenizer for that line.
skip_next_line() advances past the next line without tokenizing.
has_next_line() checks whether more lines are available.
skip_bytes(offset) seeks to a byte position in the file. Used for resuming from a saved cursor position.
tell() returns the current byte position.

FileTokenizer also supports fixed-width column parsing via parse_cols_into(width, value), which extracts a substring of width characters from the current line and parses it into a typed variable. This is used by the GRO reader, which uses a fixed-column format. Token-based and column-based parsing cannot be mixed on the same line.

Log#

Log (src/io/log.hpp) provides static logging methods at four levels:

root(fmt, args...) prints only on MPI rank 0, always shown.
log(fmt, args...) queues output on all ranks, always shown.
info(fmt, args...) queues output, hidden with -q.
debug(fmt, args...) queues output, shown only with -v.
error(fmt, args...) prints to stderr and exits (or throws in debug builds).

Output from log, info, and debug is buffered in a log_queue stringstream and flushed at the end of each timestep via write_log_queue(). All methods use std::format for string formatting.

Delimiters#

Delimiters (src/io/delimiters.hpp) defines the separator characters used for output files (file, default tab) and key-value map parsing (map, default colon). These can be configured through the SETTINGS.DELIM sub-commands.

create_timestamped_filename#

create_timestamped_filename (src/utils/io.hpp) generates timestamped output filenames. If the base filename contains *, each * is replaced with the timestep number. Otherwise, the timestep is appended between the base name and the file extension (e.g., edges becomes edges.0.edg). See the file naming convention for details.

open_output_file#

open_output_file (src/utils/io.hpp) opens a truncated output stream at a given path, creating any missing parent directories first and calling Log::error if the file cannot be opened.

Other Utilities#

validate_string (src/utils/io.hpp): Iterates over every character in a string and calls Log::error if any character fails std::isgraph. Accepts an optional error_msg format string (default: "Found illegal character >>{}<< in string >>{}<<."); the first {} is replaced with the offending character and the second with the full string.
validate_true_false (src/utils/io.hpp): Parses a boolean value from a string. Accepts true, false, 1, or 0 (case-insensitive). Calls Log::error for unrecognized non-empty values; returns false for empty input.
parse_params (src/io/parse_params.hpp): Parses key:value key:value default formatted strings into a map with an optional default. Used for attribute mapping in input file components.