Skip to content

Analysis Commands#

Attribute Histogram#

Usage#

ANALYSIS = attr_hist < node | edge > $attr_name $output_path [$bins]

Computes a histogram of a node or edge attribute and writes the result to a .hst file at $output_path (see File Naming Conventions).

When $bins is provided and $attr_name is a numeric attribute, values are grouped into bins. When $bins is omitted or $attr_name is a string attribute, unique values/categories are counted instead.

$bins may be specified as:

  • A positive integer N: divides the data range [min, max] into N equal-width bins.
  • A colon-separated list of bin edges e0:e1:...:eN: uses the provided boundaries directly.

All bins are half-open [lo, hi) except the last, which is closed [lo, hi]. Attribute values outside the specified range are ignored.

Categorical output is sorted by attribute value, ascending.

Output Format#

TIMESTEP    $n
$attr_name  count
[lo hi) $count
...
[lo hi] $count

For categorical mode:

TIMESTEP    $n
$attr_name  count
$value  $count
...

Examples:

  • ANALYSIS = attr_hist node degree ./hist/ 10 computes 10 equal-width bins over the full range of node degree values.
  • ANALYSIS = attr_hist node degree ./hist/ 0:5:10:20 bins degrees into [0 5), [5 10), [10 20].
  • ANALYSIS = attr_hist node mol_type ./hist/ counts occurrences of each unique mol_type string value.

Betweenness#

Usage#

ANALYSIS = betweenness < node | edge >

Calculates the node/edge betweenness centrality for each node/edge in the graph. The centrality is added as a property to the node/edge with the property name betweenness.

Connected Components#

Usage#

ANALYSIS = connected_components [< weak | strong >]

Default = weak

Identifies all connected components within the graph. All nodes within the graph are then given the connected_components node property, where nodes with equal values of this property are connected by some edge path.

weak (default) : Treats the graph as undirected when finding components.

strong : Respects edge direction. Only nodes reachable via directed edges are in the same component.

Degree#

Usage#

ANALYSIS = degree [< all | in | out >]

Default = all

Calculates the degree for every node in the graph. The degree for each node is stored in that node's degree property.

all (default) : Total degree (in + out for directed graphs, or total connections for undirected graphs).

in : In-degree (incoming edges). Only meaningful for directed graphs.

out : Out-degree (outgoing edges). Only meaningful for directed graphs.

Delete Edges#

Usage#

ANALYSIS = delete_edges $property $operator $value

For every edge in the graph, check whether the edge's property matches the operator and value combination. If the edge matches the condition, the edge is deleted from the graph. The operator can be any of: >=, <=, ==, !=, >, or <.

For boolean attributes, only == may be used as the operator, and $value must be true, false, 1, or 0 (case-insensitive).

Ex. ANALYSIS = delete_edges betweenness < 3 would delete every edge with the betweenness less than, but not equal to 3.

Delete Nodes#

Usage#

ANALYSIS = delete_nodes $property $operator $value

For every node in the graph, check whether the node's property matches the operator and value combination. If the node matches the condition, the node is deleted from the graph. The operator can be any of: >=, <=, ==, !=, >, or <.

For boolean attributes, only == may be used as the operator, and $value must be true, false, 1, or 0 (case-insensitive).

Ex. ANALYSIS = delete_nodes betweenness < 3 would delete every node with the betweenness less than, but not equal to 3.

Label Node Blocks#

Usage#

ANALYSIS = label_node_blocks $attribute_name $count:$size [$count:$size ...]

Assigns an incrementing integer attribute to contiguous blocks of atoms based on known molecule (block) sizes and counts. Each count:size pair specifies count molecules of size atoms each. Atoms are assigned sequentially, with molecule IDs incrementing across all pairs. An error will be thrown if the sum of the (count*size) pairs does not equal the number of nodes in the graph. The molecule ID is stored as the $attribute_name node property.

Ex. ANALYSIS = label_node_blocks mol_id 100:3 10:2 5:12 would assign mol_id 0 through 99 to the first 100 blocks of 3 atoms, mol_id 100 through 109 to the next 10 blocks of 2 atoms, mol_id 110 through 114 to the next 5 blocks of 12 atoms.

Label Nodes#

Usage#

ANALYSIS = label_nodes $match_attr $output_attr $key_spec:$label [$key_spec:$label ...] [$default_label]

Assigns a string label to each node based on the value of a numeric node attribute. $match_attr is read from each node and compared against the provided rules in order; the first matching rule's label is written to $output_attr. If no rule matches, $default_label is assigned (an empty string if not specified).

$match_attr must be a numeric node attribute.

Each $key_spec is a selector that can be:

  • A single integer: 5
  • A comma-separated list: 0,1,2
  • An inclusive range: 0-10
  • A combination of the above: 0-10,15-20

Examples:

  • ANALYSIS = label_nodes mol_id binding_site 0-100,150-200:1 101-149:2 201-300:high 0 assigns labels molecules by binding site membership. Non-binding site nodes are given a label of 0.
  • ANALYSIS = label_nodes degree label 0:isolated 1-3:low 4-10:high assigns labels based on node degree. Nodes with degree outside 0-10 receive an empty string.
  • ANALYSIS = label_nodes mol_id group 0,2,4:even 1,3,5:odd other assigns even or odd based on molecule ID. Unmatched nodes receive the label other.

Layout#

Usage#

ANALYSIS = layout < random | sphere | grid | fr | kk > [$param:value ...]

Applies a graph layout algorithm to assign new coordinates to each node. The x, y, and z node attributes are overwritten with the computed layout positions, if present.

random : Assigns random 3D coordinates to each node.

sphere : Evenly distributes nodes on the surface of a sphere.

grid : Places nodes on a 3D grid. Optional parameters:

  • width:N (nodes along first dimension)
  • height:N (nodes along second dimension).

fr : Fruchterman-Reingold force-directed layout. Simulates nodes as repelling particles connected by springs. Optional parameters:

  • niter:N (iterations, default 500)
  • start_temp:T (initial temperature, default 10).

kk : Kamada-Kawai spring-based layout. Optimizes node positions based on graph-theoretic distances. Optional parameters:

  • maxiter:N (default 10 * node count)
  • epsilon:E (convergence threshold, default 0)
  • kkconst:K (spring constant, default node count).

Ex. ANALYSIS = layout fr niter:1000 start_temp:20 would run Fruchterman-Reingold with 1000 iterations and starting temperature of 20.

Merge Equal Nodes#

Usage#

ANALYSIS = merge_equal_nodes $property [$attr:method ...] [$default_comb (ignore)]

Merge all nodes with an equal value of the property into a single node. Edges between merged nodes are retained on the new merged node.

When nodes are merged, the combination of attributes may be specified using the standard Attribute Combination syntax. If not specified, node attributes are dropped with \$default_comb = ignore.

Examples:

  • ANALYSIS = merge_equal_nodes mol_type merges all nodes with the same mol_type value into a single node. All attributes are dropped.
  • ANALYSIS = merge_equal_nodes mol_type x:mean y:mean z:mean first merges all nodes with the same mol_type value and sets the new node's coordinates to the mean of the merged nodes coordinates (center of geometry). The remaining attributes are set to those from the first node in the merge group by graph index (first).

Modularity Optimization#

Usage#

ANALYSIS = modularity_optimization [< fast | full >]

Default = fast

Perform modularity optimization on the graph, assigning each node a community ID stored in the modularity_optimization node property. Nodes with equal values belong to the same community.

fast (default) : Uses a fast-greedy algorithm when calculating the optimal modularity. Requires an undirected graph.

full : Uses brute force to calculate the optimal modularity. Supports both directed and undirected graphs. Exceedingly slow for systems with more than 50 particles. ChemNetworks will print a warning if you have more than 50 nodes in your graph. Use fast unless you know your system needs the brute force algorithm.

Rename Attribute#

Usage#

ANALYSIS = rename_attr < node | edge > $old_name $new_name

Renames a node or edge attribute from $old_name to $new_name. The attribute values are preserved exactly. An error is thrown if $old_name does not exist or if $new_name already exists. The old attribute is deleted, though can be reused.

Ex. ANALYSIS = rename_attr node atom_type element renames the atom_type node attribute to element.

Simplify#

Usage#

ANALYSIS = simplify [< both | multiple | loops >] [$attr_spec]

Default = both first

Removes multiple edges and/or loop edges (self-edges) from the graph. This operation modifies the graph structure in place by removing edges that match the specified criteria. When multiple edges are combined, edge attributes are merged according to the attribute combination specification.

both (default) : Removes both multiple edges and loop edges from the graph.

multiple : Removes only multiple edges (duplicate edges between the same pair of nodes), keeping loop edges.

loops : Removes only loop edges (edges where the source and target node are the same), keeping multiple edges.

When edges are combined (merged), edge attribute combination may be specified using the standard Attribute Combination syntax.

Examples:

  • ANALYSIS = simplify both : Remove all loops and multiple edges, keeping first edge's attributes.
  • ANALYSIS = simplify multiple dist:min : Remove multiple edges only, keeping the edge with minimum distance.
  • ANALYSIS = simplify both dist:min weight:sum first : Remove both types, use min for dist, sum for weight, first for every other attribute.

Steinhardt#

Usage#

ANALYSIS = steinhardt $l $n $attr $v [$average]

Calculates the Steinhardt bond-order parameter q_l for each node whose $attr attribute equals $v, as defined in Steinhardt et al., PRB 28, 784 (1983). The result is stored as the steinhardt_$l node attribute. Nodes not matching the selection condition $attr == $v1 are assigned -1. The neighborhood is defined by graph distance. A $n value of 2 would include all neighbors within 2 edges of the matching node.

Requires x, y, and z node attributes. The minimum image convention is used for position calculations if PBC.DIM has been set.

The optional $average parameter accepts true or false (default: false). When set to true, the averaged bond-order parameters are computed as defined in Lechner and Dellago, J. Chem. Phys. 129, 114707 (2008).

Examples:

  • ANALYSIS = steinhardt 4 1 atom_type OW calculates q_4 for all nodes with atom_type = OW using their direct graph neighbors.
  • ANALYSIS = steinhardt 6 1 mol_type SOL true calculates the averaged q_6 for all nodes with mol_type = SOL.

Subgraph Census#

Usage#

ANALYSIS = subgraph_census $output_path $attr_name $type:$count [$type:$count ...]

Enumerates all induced connected subgraphs of the current graph whose node composition matches the specified attribute constraints. A subgraph matches if it contains exactly the requested count of nodes for each specified attribute value.

Results are written to an .sgc file at $output_path (see Write File Names). Structures are sorted in descending order of occurrence count.

Output Format#

TIMESTEP    $n
# $attr_name $type:$count ...
# TOTAL $n_instances_total  $n_structures

# 0
$canonical_key
$n_instances
$i  $j
...

# 1
...

TOTAL gives the total number of subgraph instances found across all classes, followed by the number of distinct structures. Each block of edges represents a (zero-indexed) unique structure. $canonical_key is a string encoding of the canonical edge list in the form i,j;i,j;..., uniquely identifying the isomorphism class. Edge list node indices are zero-indexed, with node indices matching the corresponding counts. For example, if the attribute constraint is A:2 B:3, nodes 0-1 would correspond to type A, and nodes 2-4 would correspond to type B.

This analysis exhaustively enumerates connected induced subgraphs. For dense graphs, analysis times for large subgraph sizes greater will be significant.

Ex. ANALYSIS = subgraph_census ./subgraphs/ mol_type DGA:2 SOL:4 finds all connected induced subgraphs of size 6 where exactly 2 nodes have mol_type = DGA and 4 have mol_type = SOL. Output is written to ./subgraphs/0.sgc, ./subgraphs/1.sgc, etc.

Subgraph Enumerate#

Usage#

ANALYSIS = subgraph_enumerate $output_path $attr_name $type:$count [$type:$count ...]

Enumerates all induced connected subgraphs of the current graph whose node composition matches the specified attribute constraints, writing each instance individually to a file. Uses the same enumeration algorithm as subgraph_census, but outputs every found subgraph as its own edge list using the original graph node IDs rather than the isomorphism class-based node IDs.

Results are written to an .sge file at $output_path ( see File Naming Conventions).

Output Format#

TIMESTEP    $n
# $attr_name $type:$count ...
# TOTAL $n_instances

# 0
$i  $j
...
$canonical_key

# 1
...

TOTAL gives the total number of subgraph instances found. Each block is a zero-indexed subgraph instance. $i and $j are actual graph node IDs, allowing cross-reference with node and edge files. $canonical_key is the canonical string encoding of the subgraph's isomorphism class (see subgraph_census for format details).

This analysis exhaustively enumerates connected induced subgraphs. For dense graphs, analysis times for large subgraph sizes greater will be significant.

Ex. ANALYSIS = subgraph_enumerate ./subgraphs/ mol_type DGA:2 SOL:4 finds all connected induced subgraphs of size 6 where exactly 2 nodes have mol_type = DGA and 4 have mol_type = SOL. Output is written to ./subgraphs/0.sge, ./subgraphs/1.sge, etc.

Usage#

ANALYSIS = subgraph_search $output_path $edge_list $attr_name $type:$count [$type:$count ...]

Searches for all occurrences of a specific subgraph structure in the graph. The search structure is defined by an edge list and a node composition. Node indices in the edge list are zero-indexed, with the same type-group ordering as subgraph_census: nodes 0 through $count_0 - 1 belong to the first type, nodes $count_0 through $count_0 + $count_1 - 1 belong to the second, and so on. The provided edge list does not need to be in canonical form, but must form a connected graph (no isolated nodes).

Results are written to an .sgs file at $output_path ( see File Naming Conventions).

Output Format#

TIMESTEP    $n
# $canonical_key
# $attr_name $type:$count ...
# TOTAL $n_instances

# 0
$i  $j
...

# 1
...

$canonical_key is the canonical string encoding of the search structure's isomorphism class (see subgraph_census for format details). Each numbered block corresponds to a matching subgraph instance. $i and $j are node IDs from the original graph.

This analysis exhaustively enumerates connected induced subgraphs. For dense graphs, analysis times for large subgraph sizes greater will be significant.

Ex. ANALYSIS = subgraph_search ./subgraphs/ 0,2;1,2; mol_type SOL:2 Li:1 finds all connected induced subgraphs where nodes 0 and 1 are SOL and node 2 is Li, with edges between 0-2 and 1-2 (i.e., a Li node bonded to two SOL nodes). Output is written to ./subgraphs/0.sgs, ./subgraphs/1.sgs, etc.