Analysis Commands#

Attribute Histogram#

Usage#

ANALYSIS = attr_hist < node | edge > $attr_name $output_path [$bins]

Computes a histogram of a node or edge attribute and writes the result to a .hst file at $output_path (see File Naming Conventions).

When $bins is provided and $attr_name is a numeric attribute, values are grouped into bins. When $bins is omitted or $attr_name is a string attribute, unique values/categories are counted instead.

$bins may be specified as:

A positive integer N: divides the data range [min, max] into N equal-width bins.
A colon-separated list of bin edges e0:e1:...:eN: uses the provided boundaries directly.

All bins are half-open [lo, hi) except the last, which is closed [lo, hi]. Attribute values outside the specified range are ignored.

Categorical output is sorted by attribute value, ascending.

Output Format#

TIMESTEP    $n
$attr_name  count
[lo hi) $count
...
[lo hi] $count

For categorical mode:

TIMESTEP    $n
$attr_name  count
$value  $count
...

Examples:

ANALYSIS = attr_hist node degree ./hist/ 10 computes 10 equal-width bins over the full range of node degree values.
ANALYSIS = attr_hist node degree ./hist/ 0:5:10:20 bins degrees into [0 5), [5 10), [10 20].
ANALYSIS = attr_hist node mol_type ./hist/ counts occurrences of each unique mol_type string value.

Betweenness#

Usage#

ANALYSIS = betweenness < node | edge >

Calculates the node/edge betweenness centrality for each node/edge in the graph. The centrality is added as a property to the node/edge with the property name betweenness.

Connected Components#

Usage#

ANALYSIS = connected_components [< weak | strong >]

Default = weak

Identifies all connected components within the graph. All nodes within the graph are then given the connected_components node property, where nodes with equal values of this property are connected by some edge path.

weak (default) : Treats the graph as undirected when finding components.

strong : Respects edge direction. Only nodes reachable via directed edges are in the same component.

Degree#

Usage#

ANALYSIS = degree [< all | in | out >]

Default = all

Calculates the degree for every node in the graph. The degree for each node is stored in that node's degree property.

all (default) : Total degree (in + out for directed graphs, or total connections for undirected graphs).

in : In-degree (incoming edges). Only meaningful for directed graphs.

out : Out-degree (outgoing edges). Only meaningful for directed graphs.

Delete Edges#

Usage#

ANALYSIS = delete_edges $property $operator $value

For every edge in the graph, check whether the edge's property matches the operator and value combination. If the edge matches the condition, the edge is deleted from the graph. The operator can be any of: >=, <=, ==, !=, >, or <.

For boolean attributes, only == may be used as the operator, and $value must be true, false, 1, or 0 (case-insensitive).

For numeric (floating point) attributes, exact == and != comparisons rarely match because computed values will often differ due to machine precision error. Prefer a narrow >=/<= window.

Ex. ANALYSIS = delete_edges betweenness < 3 would delete every edge with the betweenness less than, but not equal to 3.

Delete Nodes#

Usage#

ANALYSIS = delete_nodes $property $operator $value

For every node in the graph, check whether the node's property matches the operator and value combination. If the node matches the condition, the node is deleted from the graph. The operator can be any of: >=, <=, ==, !=, >, or <.

For boolean attributes, only == may be used as the operator, and $value must be true, false, 1, or 0 (case-insensitive).

For numeric (floating point) attributes, exact == and != comparisons rarely match because computed values will often differ due to machine precision error. Prefer a narrow >=/<= window.

Ex. ANALYSIS = delete_nodes betweenness < 3 would delete every node with the betweenness less than, but not equal to 3.

Label Node Blocks#

Usage#

ANALYSIS = label_node_blocks $attribute_name $count:$size [$count:$size ...]

Assigns an incrementing integer attribute to contiguous blocks of atoms based on known block (molecule) sizes and counts. Each count:size pair specifies count molecules of size atoms each. Atoms are assigned sequentially, with block IDs incrementing across all pairs. An error will be thrown if the sum of the (count*size) pairs does not equal the number of nodes in the graph. The block ID is stored as the $attribute_name node property.

Ex. ANALYSIS = label_node_blocks mol_id 100:3 10:2 5:12 would assign mol_id 0 through 99 to the first 100 blocks of 3 atoms, mol_id 100 through 109 to the next 10 blocks of 2 atoms, mol_id 110 through 114 to the next 5 blocks of 12 atoms.

Label Nodes#

Usage#

ANALYSIS = label_nodes $match_attr $output_attr $key_spec:$label [$key_spec:$label ...] [$default_label]

Assigns a string label to each node based on the value of a numeric node attribute. $match_attr is read from each node and compared against the provided rules in order; the first matching rule's label is written to $output_attr. If no rule matches, $default_label is assigned (an empty string if not specified).

$match_attr must be a numeric node attribute.

Each $key_spec is a selector that can be:

A single integer: 5
A comma-separated list: 0,1,2
An inclusive range: 0-10
A combination of the above: 0-10,15-20

Both the key specs and the matched attribute value are treated as integers, truncated toward zero if it was a floating point number before it is compared against the key specs.

Examples:

ANALYSIS = label_nodes mol_id binding_site 0-100,150-200:1 101-149:2 201-300:high 0 assigns labels molecules by binding site membership. Non-binding site nodes are given a label of 0.
ANALYSIS = label_nodes degree label 0:isolated 1-3:low 4-10:high assigns labels based on node degree. Nodes with degree outside 0-10 receive an empty string.
ANALYSIS = label_nodes mol_id group 0,2,4:even 1,3,5:odd other assigns even or odd based on molecule ID. Unmatched nodes receive the label other.

Layout#

Usage#

ANALYSIS = layout < random | sphere | grid | fr | kk > [$param:value ...]

Applies a graph layout algorithm to assign new coordinates to each node. The x, y, and z node attributes are overwritten with the computed layout positions, if present.

random : Assigns random 3D coordinates to each node.

sphere : Evenly distributes nodes on the surface of a sphere.

grid : Places nodes on a 3D grid. Optional parameters:

width:N (nodes along first dimension)
height:N (nodes along second dimension).

fr : Fruchterman-Reingold force-directed layout. Simulates nodes as repelling particles connected by springs. Optional parameters:

niter:N (iterations, default 500)
start_temp:T (initial temperature, default 10).

kk : Kamada-Kawai spring-based layout. Optimizes node positions based on graph-theoretic distances. Optional parameters:

maxiter:N (default 10 * node count)
epsilon:E (convergence threshold, default 0)
kkconst:K (spring constant, default node count).

Ex. ANALYSIS = layout fr niter:1000 start_temp:20 would run Fruchterman-Reingold with 1000 iterations and starting temperature of 20.

Lifetime#

Usage#

ANALYSIS = lifetime $output_path [$id_attribute]

Tracks how long each edge (contact) persists across the trajectory, writing one row per lifetime event to $output_path. Each frame, the current edges are compared against the contacts already being tracked. A contact still present continues its run, a contact that has disappeared is written out as an event, and a newly seen contact starts a new run. A single pair of atoms can therefore produce several events if its contact repeatedly breaks and reforms. The output spans the whole trajectory, so $output_path is the complete filename and is not timestamped.

Endpoints are identified by their node index by default. Pass $id_attribute to identify them by a different node attribute instead. Nodes are assumed to have the same id across all frames. On an undirected graph, (i, j) and (j, i) are the same contact. On a directed graph they are tracked separately (i is the source, j the target). A contact that vanishes for a single frame and returns is recorded as two events.

Lifetimes are reported as frame counts rather than times.

Pair this analysis with lifetime_distribution to convert and fit the durations.

May only be run on a single MPI process, since lifetimes depend on consecutive frames.

Output Format#

i   j   start_timestep  end_timestep    lifetime
$i  $j  $start  $end    $frames
...

One row per lifetime event. i and j are the endpoint identities (node index, or the $id_attribute value if given). On an undirected graph they are canonicalized so i <= j. On a directed graph i is the source and j the target. start_timestep and end_timestep are the trajectory timesteps at which the contact was first and last seen, and lifetime is the number of consecutive frames it persisted.

Examples:

ANALYSIS = lifetime output/lifetime.dat records every contact's lifetime, identifying endpoints by node index.
ANALYSIS = lifetime output/lifetime.dat mol_id identifies endpoints by the mol_id node attribute, tracking contacts at the molecule level.

Lifetime Distribution#

Usage#

ANALYSIS = lifetime_distribution $lifetime_file $output_file $dt $bin_size_snapshots [$n_exponentials]

Post-processes a lifetime event file written by lifetime into a probability density of contact durations and fits a sum of $n_exponentials exponentials (default 2), A_1 * exp(-t / tau_1) + ... + A_N * exp(-t / tau_N), whose terms model populations of contacts with characteristic residence times tau_k (reported in ascending order, with non-negative amplitudes A_k).

$dt is the time per frame and converts each frame-count lifetime to a duration. $bin_size_snapshots is the histogram bin width in frames (a positive integer), and the corresponding width in time is $bin_size_snapshots * $dt. Durations are binned into a normalized probability density and fit by least squares. The decay times are determined by a refining grid search. The amplitudes are solved in a closed form that requires at least 2 * $n_exponentials non-empty bins.

$n_exponentials values of four or more components are often ill-conditioned and prone to overfitting.

Output Format#

metric  value
events  $n_events
dt  $dt
bin_size_snapshots  $n_frames
bin_size_time   $bin_time
n_exponentials  $N
A1  $A_1
tau1    $tau_1
...
AN  $A_N
tauN    $tau_N
R_squared   $r_squared

lifetime    probability_density fit
$t  $density    $fit
...

The first block lists the scalar fit results: events is the number of lifetime events processed, the Ak/tauk pairs are the fitted amplitudes and decay times (k = 1 .. N, ascending in tau), and R_squared is the goodness of fit. The second block is the binned distribution: each row is a bin-center lifetime, its empirical probability density, and the value of the fitted curve at that point.

Examples:

ANALYSIS = lifetime_distribution output/lifetime.dat output/lifetime_fit.dat 0.05 9 fits a biexponential, treating each frame as 0.05 time units and grouping the lifetimes into 9-frame-wide bins.
ANALYSIS = lifetime_distribution output/lifetime.dat output/lifetime_fit.dat 0.05 9 3 fits a triexponential instead.

Merge Equal Nodes#

Usage#

ANALYSIS = merge_equal_nodes $property [$attr:method ...] [$default_comb (ignore)]

Merge all nodes with an equal value of the property into a single node. Edges between merged nodes are retained on the new merged node.

When nodes are merged, the combination of attributes may be specified using the standard Attribute Combination syntax. If not specified, node attributes are dropped with \$default_comb = ignore.

Examples:

ANALYSIS = merge_equal_nodes mol_type merges all nodes with the same mol_type value into a single node. All attributes are dropped.
ANALYSIS = merge_equal_nodes mol_type x:mean y:mean z:mean first merges all nodes with the same mol_type value and sets the new node's coordinates to the mean of the merged nodes coordinates (center of geometry). The remaining attributes are set to those from the first node in the merge group by graph index (first).

Modularity Optimization#

Usage#

ANALYSIS = modularity_optimization [< fast | full >]

Default = fast

Perform modularity optimization on the graph, assigning each node a community ID stored in the modularity_optimization node property. Nodes with equal values belong to the same community.

fast (default) : Uses a fast-greedy algorithm when calculating the optimal modularity. Requires an undirected graph.

full : Uses brute force to calculate the optimal modularity. Supports both directed and undirected graphs. Exceedingly slow for systems with more than 50 particles. ChemNetworks will print a warning if you have more than 50 nodes in your graph. Use fast unless you know your system needs the brute force algorithm.

Rename Attribute#

Usage#

ANALYSIS = rename_attr < node | edge > $old_name $new_name

Renames a node or edge attribute from $old_name to $new_name. The attribute values are preserved exactly. An error is thrown if $old_name does not exist or if $new_name already exists. The old attribute is deleted, though can be reused.

Ex. ANALYSIS = rename_attr node atom_type element renames the atom_type node attribute to element.

Simplify#

Usage#

ANALYSIS = simplify [< both | multiple | loops >] [$attr_spec]

Default = both first

Removes multiple edges and/or loop edges (self-edges) from the graph. This operation modifies the graph structure in place by removing edges that match the specified criteria. When multiple edges are combined, edge attributes are merged according to the attribute combination specification.

both (default) : Removes both multiple edges and loop edges from the graph.

multiple : Removes only multiple edges (duplicate edges between the same pair of nodes), keeping loop edges.

loops : Removes only loop edges (edges where the source and target node are the same), keeping multiple edges.

When edges are combined (merged), edge attribute combination may be specified using the standard Attribute Combination syntax.

Examples:

ANALYSIS = simplify both : Remove all loops and multiple edges, keeping first edge's attributes.
ANALYSIS = simplify multiple dist:min : Remove multiple edges only, keeping the edge with minimum distance.
ANALYSIS = simplify both dist:min weight:sum first : Remove both types, use min for dist, sum for weight, first for every other attribute.

Steinhardt#

Usage#

ANALYSIS = steinhardt $l $n $attr $v [$average]

Calculates the Steinhardt bond-order parameter q_l for each node whose $attr attribute equals $v, as defined in Steinhardt et al., PRB 28, 784 (1983). The result is stored as the steinhardt_$l node attribute. Nodes not matching the selection condition $attr == $v1 are assigned -1. The neighborhood is defined by graph distance. A $n value of 2 would include all neighbors within 2 edges of the matching node.

Requires x, y, and z node attributes. The minimum image convention is used for position calculations if PBC.DIM has been set.

The optional $average parameter accepts true or false (default: false). When set to true, the averaged bond-order parameters are computed as defined in Lechner and Dellago, J. Chem. Phys. 129, 114707 (2008).

Examples:

ANALYSIS = steinhardt 4 1 atom_type OW calculates q_4 for all nodes with atom_type = OW using their direct graph neighbors.
ANALYSIS = steinhardt 6 1 mol_type SOL true calculates the averaged q_6 for all nodes with mol_type = SOL.

Subgraph Census#

Usage#

ANALYSIS = subgraph_census $output_path $attr_name $type:$count [$type:$count ...]

Enumerates all induced connected subgraphs of the current graph whose node composition matches the specified attribute constraints. A subgraph matches if it contains exactly the requested count of nodes for each specified attribute value. Each attribute value may be listed only once, and each count must be positive.

Each matching subgraph is found and counted exactly once. Parallel edges are treated as a single connection, and self-loops are ignored. On directed graphs, subgraphs only need to be connected when ignoring edge direction, but the edge directions distinguish otherwise identical structures.

Results are written to an .sgc file at $output_path (see Write File Names). Structures are sorted in descending order of occurrence count, then by their canonical key.

Output Format#

TIMESTEP    $n
# $attr_name $type:$count ...
# TOTAL $n_instances_total  $n_structures

# 0
$canonical_key
$n_instances
$i  $j
...

# 1
...

TOTAL gives the total number of subgraph instances found across all classes, followed by the number of distinct structures. Each block of edges represents a (zero-indexed) unique structure. $canonical_key is a string encoding of the canonical edge list in the form i,j;i,j;..., uniquely identifying the isomorphism class. Edge list node indices are zero-indexed, with node indices matching the corresponding counts. For example, if the attribute constraint is A:2 B:3, nodes 0-1 would correspond to type A, and nodes 2-4 would correspond to type B.

This analysis exhaustively enumerates connected induced subgraphs. For dense graphs and larger subgraph sizes, analysis times will be significant.

Ex. ANALYSIS = subgraph_census ./subgraphs/ mol_type DGA:2 SOL:4 finds all connected induced subgraphs of size 6 where exactly 2 nodes have mol_type = DGA and 4 have mol_type = SOL. Output is written to ./subgraphs/0.sgc, ./subgraphs/1.sgc, etc.

Subgraph Enumerate#

Usage#

ANALYSIS = subgraph_enumerate $output_path $attr_name $type:$count [$type:$count ...]

Enumerates all induced connected subgraphs of the current graph whose node composition matches the specified attribute constraints, writing each instance individually to a file. Uses the same enumeration algorithm as subgraph_census, but outputs every found subgraph as its own edge list using the original graph node IDs rather than the isomorphism class-based node IDs.

Results are written to an .sge file at $output_path ( see File Naming Conventions).

Output Format#

TIMESTEP    $n
# $attr_name $type:$count ...
# TOTAL $n_instances

# 0
$canonical_key
$i  $j
...

# 1
...

TOTAL gives the total number of subgraph instances found. Each block is a zero-indexed subgraph instance, starting with the canonical string encoding of its isomorphism class (see subgraph_census for format details). $i and $j are actual graph node IDs, allowing cross-reference with node and edge files.

This analysis exhaustively enumerates connected induced subgraphs. For dense graphs and larger subgraph sizes, analysis times will be significant.

Ex. ANALYSIS = subgraph_enumerate ./subgraphs/ mol_type DGA:2 SOL:4 finds all connected induced subgraphs of size 6 where exactly 2 nodes have mol_type = DGA and 4 have mol_type = SOL. Output is written to ./subgraphs/0.sge, ./subgraphs/1.sge, etc.

Subgraph Search#

Usage#

ANALYSIS = subgraph_search $output_path $edge_list $attr_name $type:$count [$type:$count ...]

Searches for all occurrences of a specific subgraph structure in the graph. The search structure is defined by an edge list and a node composition. Node indices in the edge list are zero-indexed, with the same type-group ordering as subgraph_census: nodes 0 through $count_0 - 1 belong to the first type, nodes $count_0 through $count_0 + $count_1 - 1 belong to the second, and so on. The provided edge list does not need to be in canonical form, but must form a connected graph (no isolated nodes).

A match is exact if the matched nodes are connected to each other precisely as the edge list describes. If the graph contains an extra edge between two matched nodes that the search structure does not include, that candidate is not reported as a match. For example, the three node search subgraph 0,1;1,2;2,3 will not match the three node subgraph 0,1;1,2;2,3;3,0.

The edge list may not contain self-loops ($u,$u) or repeated edges. On undirected graphs, $u,$v and $v,$u describe the same edge, so listing both is rejected. On directed graphs, $u,$v describes an edge pointing from $u to $v, and both directions between the same pair of nodes may be listed.

Results are written to an .sgs file at $output_path ( see File Naming Conventions).

Output Format#

TIMESTEP    $n
# $canonical_key
# $attr_name $type:$count ...
# TOTAL $n_instances

# 0
$i  $j
...

# 1
...

$canonical_key is the canonical string encoding of the search structure's isomorphism class (see subgraph_census for format details). Each numbered block corresponds to a matching subgraph instance. $i and $j are node IDs from the original graph.

This analysis exhaustively enumerates connected induced subgraphs. For dense graphs and larger subgraph sizes, analysis times will be significant.

Ex. ANALYSIS = subgraph_search ./subgraphs/ 0,2;1,2; mol_type SOL:2 Li:1 finds all connected induced subgraphs where nodes 0 and 1 are SOL and node 2 is Li, with edges between 0-2 and 1-2 (i.e., a Li node bonded to two SOL nodes). Output is written to ./subgraphs/0.sgs, ./subgraphs/1.sgs, etc.