Analysis Commands#
Attribute Histogram#
Usage#
ANALYSIS = attr_hist < node | edge > $attr_name $output_path [$bins]
Computes a histogram of a node or edge attribute and writes the result to a .hst file at
$output_path (see File Naming Conventions).
When $bins is provided and $attr_name is a numeric attribute, values are grouped into bins.
When $bins is omitted or $attr_name is a string attribute, unique values/categories are counted instead.
$bins may be specified as:
- A positive integer
N: divides the data range[min, max]intoNequal-width bins. - A colon-separated list of bin edges
e0:e1:...:eN: uses the provided boundaries directly.
All bins are half-open [lo, hi) except the last, which is closed [lo, hi]. Attribute values outside the specified
range are ignored.
Categorical output is sorted by attribute value, ascending.
Output Format#
For categorical mode:
Examples:
ANALYSIS = attr_hist node degree ./hist/ 10computes 10 equal-width bins over the full range of node degree values.ANALYSIS = attr_hist node degree ./hist/ 0:5:10:20bins degrees into[0 5),[5 10),[10 20].ANALYSIS = attr_hist node mol_type ./hist/counts occurrences of each uniquemol_typestring value.
Betweenness#
Usage#
ANALYSIS = betweenness < node | edge >
Calculates the node/edge betweenness centrality for each
node/edge in the graph. The centrality is added as a property to the node/edge with the property name betweenness.
Connected Components#
Usage#
ANALYSIS = connected_components [< weak | strong >]
Default = weak
Identifies all connected components within the graph. All
nodes within the graph are then given the connected_components node property, where nodes with equal values of this
property are connected by some edge path.
weak (default) : Treats the graph as undirected when finding components.
strong : Respects edge direction. Only nodes reachable via directed edges are in the same component.
Degree#
Usage#
ANALYSIS = degree [< all | in | out >]
Default = all
Calculates the degree for every node in the graph. The degree for
each node is stored in that node's degree property.
all (default) : Total degree (in + out for directed graphs, or total connections for undirected graphs).
in : In-degree (incoming edges). Only meaningful for directed graphs.
out : Out-degree (outgoing edges). Only meaningful for directed graphs.
Delete Edges#
Usage#
ANALYSIS = delete_edges $property $operator $value
For every edge in the graph, check whether the edge's property matches the operator and value combination. If the edge matches the condition, the edge is deleted from the graph. The operator can be any of: >=, <=, ==, !=, >, or <.
For boolean attributes, only == may be used as the operator, and $value must be true, false, 1, or 0
(case-insensitive).
Ex. ANALYSIS = delete_edges betweenness < 3 would delete every edge with the betweenness less than, but not equal
to 3.
Delete Nodes#
Usage#
ANALYSIS = delete_nodes $property $operator $value
For every node in the graph, check whether the node's property matches the operator and value combination. If the node matches the condition, the node is deleted from the graph. The operator can be any of: >=, <=, ==, !=, >, or <.
For boolean attributes, only == may be used as the operator, and $value must be true, false, 1, or 0
(case-insensitive).
Ex. ANALYSIS = delete_nodes betweenness < 3 would delete every node with the betweenness less than, but not equal
to 3.
Label Node Blocks#
Usage#
ANALYSIS = label_node_blocks $attribute_name $count:$size [$count:$size ...]
Assigns an incrementing integer attribute to contiguous blocks of atoms based on known molecule (block) sizes and
counts. Each count:size pair specifies count molecules of size atoms each. Atoms are assigned sequentially, with
molecule IDs incrementing across all pairs. An error will be thrown if the sum of the (count*size) pairs does not
equal the number of nodes in the graph. The molecule ID is stored as the $attribute_name node property.
Ex. ANALYSIS = label_node_blocks mol_id 100:3 10:2 5:12 would assign mol_id 0 through 99 to the first 100 blocks of 3
atoms, mol_id 100 through 109 to the next 10 blocks of 2 atoms, mol_id 110 through 114 to the next 5 blocks of 12
atoms.
Label Nodes#
Usage#
ANALYSIS = label_nodes $match_attr $output_attr $key_spec:$label [$key_spec:$label ...] [$default_label]
Assigns a string label to each node based on the value of a numeric node attribute. $match_attr is read
from each node and compared against the provided rules in order; the first matching rule's label is written
to $output_attr. If no rule matches, $default_label is assigned (an empty string if not specified).
$match_attr must be a numeric node attribute.
Each $key_spec is a selector that can be:
- A single integer:
5 - A comma-separated list:
0,1,2 - An inclusive range:
0-10 - A combination of the above:
0-10,15-20
Examples:
ANALYSIS = label_nodes mol_id binding_site 0-100,150-200:1 101-149:2 201-300:high 0assigns labels molecules by binding site membership. Non-binding site nodes are given a label of0.ANALYSIS = label_nodes degree label 0:isolated 1-3:low 4-10:highassigns labels based on node degree. Nodes with degree outside 0-10 receive an empty string.ANALYSIS = label_nodes mol_id group 0,2,4:even 1,3,5:odd otherassignsevenoroddbased on molecule ID. Unmatched nodes receive the labelother.
Layout#
Usage#
ANALYSIS = layout < random | sphere | grid | fr | kk > [$param:value ...]
Applies a graph layout algorithm to assign new coordinates to each node. The x, y, and z node attributes are
overwritten with the computed layout positions, if present.
random : Assigns random 3D coordinates to each node.
sphere : Evenly distributes nodes on the surface of a sphere.
grid : Places nodes on a 3D grid. Optional parameters:
width:N(nodes along first dimension)height:N(nodes along second dimension).
fr : Fruchterman-Reingold force-directed layout. Simulates nodes as repelling particles connected by springs. Optional parameters:
niter:N(iterations, default 500)start_temp:T(initial temperature, default 10).
kk : Kamada-Kawai spring-based layout. Optimizes node positions based on graph-theoretic distances. Optional parameters:
maxiter:N(default 10 * node count)epsilon:E(convergence threshold, default 0)kkconst:K(spring constant, default node count).
Ex. ANALYSIS = layout fr niter:1000 start_temp:20 would run Fruchterman-Reingold with 1000 iterations and starting
temperature of 20.
Merge Equal Nodes#
Usage#
ANALYSIS = merge_equal_nodes $property [$attr:method ...] [$default_comb (ignore)]
Merge all nodes with an equal value of the property into a single node. Edges between merged nodes are retained on the new merged node.
When nodes are merged, the combination of attributes may be specified using the
standard Attribute Combination syntax. If not specified, node attributes are dropped
with \$default_comb = ignore.
Examples:
ANALYSIS = merge_equal_nodes mol_typemerges all nodes with the samemol_typevalue into a single node. All attributes are dropped.ANALYSIS = merge_equal_nodes mol_type x:mean y:mean z:mean firstmerges all nodes with the samemol_typevalue and sets the new node's coordinates to the mean of the merged nodes coordinates (center of geometry). The remaining attributes are set to those from the first node in the merge group by graph index (first).
Modularity Optimization#
Usage#
ANALYSIS = modularity_optimization [< fast | full >]
Default = fast
Perform modularity optimization on the graph, assigning each node
a community ID stored in the modularity_optimization node property. Nodes with equal values belong to the same
community.
fast (default) : Uses a fast-greedy algorithm when calculating the optimal modularity. Requires an undirected graph.
full : Uses brute force to calculate the optimal modularity. Supports both directed and undirected graphs. Exceedingly
slow for systems with more than 50 particles. ChemNetworks will print a warning if you have more than 50 nodes in your
graph. Use fast unless you know your system needs the brute force algorithm.
Rename Attribute#
Usage#
ANALYSIS = rename_attr < node | edge > $old_name $new_name
Renames a node or edge attribute from $old_name to $new_name. The attribute values are preserved exactly. An error
is thrown if $old_name does not exist or if $new_name already exists. The old attribute is deleted, though can be
reused.
Ex. ANALYSIS = rename_attr node atom_type element renames the atom_type node attribute to element.
Simplify#
Usage#
ANALYSIS = simplify [< both | multiple | loops >] [$attr_spec]
Default = both first
Removes multiple edges and/or loop edges (self-edges) from the graph. This operation modifies the graph structure in place by removing edges that match the specified criteria. When multiple edges are combined, edge attributes are merged according to the attribute combination specification.
both (default) : Removes both multiple edges and loop edges from the graph.
multiple : Removes only multiple edges (duplicate edges between the same pair of nodes), keeping loop edges.
loops : Removes only loop edges (edges where the source and target node are the same), keeping multiple edges.
When edges are combined (merged), edge attribute combination may be specified using the standard Attribute Combination syntax.
Examples:
ANALYSIS = simplify both: Remove all loops and multiple edges, keeping first edge's attributes.ANALYSIS = simplify multiple dist:min: Remove multiple edges only, keeping the edge with minimum distance.ANALYSIS = simplify both dist:min weight:sum first: Remove both types, use min for dist, sum for weight, first for every other attribute.
Steinhardt#
Usage#
ANALYSIS = steinhardt $l $n $attr $v [$average]
Calculates the Steinhardt bond-order parameter q_l for each node whose $attr attribute equals $v, as defined in
Steinhardt et al., PRB 28, 784 (1983). The result is stored as the
steinhardt_$l node attribute. Nodes not matching the selection condition $attr == $v1 are assigned -1. The
neighborhood is defined by graph distance. A $n value of 2 would include all neighbors within 2 edges of the matching
node.
Requires x, y, and z node attributes. The minimum image convention is used for position calculations if
PBC.DIM has been set.
The optional $average parameter accepts true or false (default: false). When set to true, the averaged
bond-order parameters are computed as defined in
Lechner and Dellago, J. Chem. Phys. 129, 114707 (2008).
Examples:
ANALYSIS = steinhardt 4 1 atom_type OWcalculates q_4 for all nodes withatom_type = OWusing their direct graph neighbors.ANALYSIS = steinhardt 6 1 mol_type SOL truecalculates the averaged q_6 for all nodes withmol_type = SOL.
Subgraph Census#
Usage#
ANALYSIS = subgraph_census $output_path $attr_name $type:$count [$type:$count ...]
Enumerates all induced connected subgraphs of the current graph whose node composition matches the specified attribute constraints. A subgraph matches if it contains exactly the requested count of nodes for each specified attribute value.
Results are written to an .sgc file at $output_path (see Write File Names).
Structures are sorted in descending order of occurrence count.
Output Format#
TIMESTEP $n
# $attr_name $type:$count ...
# TOTAL $n_instances_total $n_structures
# 0
$canonical_key
$n_instances
$i $j
...
# 1
...
TOTAL gives the total number of subgraph instances found across all classes, followed by the number of distinct
structures. Each block of edges represents a (zero-indexed) unique structure. $canonical_key is a string encoding of
the canonical edge list in the form i,j;i,j;..., uniquely identifying the isomorphism class. Edge list
node indices are zero-indexed, with node indices matching the corresponding counts. For example, if the attribute
constraint is A:2 B:3, nodes 0-1 would correspond to type A, and nodes 2-4 would correspond to type B.
This analysis exhaustively enumerates connected induced subgraphs. For dense graphs, analysis times for large subgraph sizes greater will be significant.
Ex. ANALYSIS = subgraph_census ./subgraphs/ mol_type DGA:2 SOL:4 finds all connected induced subgraphs of size 6 where
exactly 2 nodes have mol_type = DGA and 4 have mol_type = SOL. Output is written to ./subgraphs/0.sgc,
./subgraphs/1.sgc, etc.
Subgraph Enumerate#
Usage#
ANALYSIS = subgraph_enumerate $output_path $attr_name $type:$count [$type:$count ...]
Enumerates all induced connected subgraphs of the current graph whose node composition matches the specified attribute
constraints, writing each instance individually to a file. Uses the same enumeration algorithm as subgraph_census,
but outputs every found subgraph as its own edge list using the original graph node IDs rather than the isomorphism
class-based node IDs.
Results are written to an .sge file at $output_path (
see File Naming Conventions).
Output Format#
TOTAL gives the total number of subgraph instances found. Each block is a zero-indexed subgraph instance. $i and
$j are actual graph node IDs, allowing cross-reference with node and edge files. $canonical_key is the canonical
string encoding of the subgraph's isomorphism class (see subgraph_census for format details).
This analysis exhaustively enumerates connected induced subgraphs. For dense graphs, analysis times for large subgraph sizes greater will be significant.
Ex. ANALYSIS = subgraph_enumerate ./subgraphs/ mol_type DGA:2 SOL:4 finds all connected induced subgraphs of size 6
where exactly 2 nodes have mol_type = DGA and 4 have mol_type = SOL. Output is written to ./subgraphs/0.sge,
./subgraphs/1.sge, etc.
Subgraph Search#
Usage#
ANALYSIS = subgraph_search $output_path $edge_list $attr_name $type:$count [$type:$count ...]
Searches for all occurrences of a specific subgraph structure in the graph. The search structure is defined by an edge
list and a node composition. Node indices in the edge list are zero-indexed, with the same type-group ordering as
subgraph_census: nodes 0 through $count_0 - 1 belong to the first type, nodes $count_0 through
$count_0 + $count_1 - 1 belong to the second, and so on. The provided edge list does not need to be in canonical
form, but must form a connected graph (no isolated nodes).
Results are written to an .sgs file at $output_path (
see File Naming Conventions).
Output Format#
TIMESTEP $n
# $canonical_key
# $attr_name $type:$count ...
# TOTAL $n_instances
# 0
$i $j
...
# 1
...
$canonical_key is the canonical string encoding of the search structure's isomorphism class (see subgraph_census for
format details). Each numbered block corresponds to a matching subgraph instance. $i and $j are node IDs from the
original graph.
This analysis exhaustively enumerates connected induced subgraphs. For dense graphs, analysis times for large subgraph sizes greater will be significant.
Ex. ANALYSIS = subgraph_search ./subgraphs/ 0,2;1,2; mol_type SOL:2 Li:1 finds all connected induced subgraphs
where nodes 0 and 1 are SOL and node 2 is Li, with edges between 0-2 and 1-2 (i.e., a Li node bonded to two SOL
nodes). Output is written to ./subgraphs/0.sgs, ./subgraphs/1.sgs, etc.