network

network contains functions to arrange and analyze glycans in the context of networks. In such a network, each node represents a glycan and edges represent, for instance, their connection via a biosynthetic step. It should be noted, since glycowork treats glycans as molecular graphs, that these networks represent hierarchical graphs, with the network being one graph and each node within the network also a graph. network contains the following modules:

biosynthesis

constructing and analyzing biosynthetic glycan networks


construct_network

 construct_network (glycans, allowed_ptms=frozenset({'3S', '3P', 'OS',
                    '1P', 'OAc', '6S', 'OP', '6P', '9Ac', '4Ac'}),
                    edge_type='monolink', permitted_roots=None,
                    abundances=[])

*Construct a glycan biosynthetic network

Arguments:
glycans (list): list of glycans in IUPAC-condensed format
allowed_ptms (set): list of PTMs to consider
edge_type (string): indicates whether edges represent monosaccharides (‘monosaccharide’), monosaccharide(linkage) (‘monolink’), or enzyme catalyzing the reaction (‘enzyme’); default:‘monolink’
permitted_roots (set): which nodes should be considered as roots; default:will be inferred
abundances (list): optional list of abundances, in the same order as glycans; default:empty list
Returns:
Returns a networkx object of the network*
glycans = ["Gal(b1-4)Glc-ol", "GlcNAc(b1-3)Gal(b1-4)Glc-ol",
           "GlcNAc6S(b1-3)Gal(b1-4)Glc-ol",
           "Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol", "Fuc(a1-2)Gal(b1-4)Glc-ol",
          "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol"]
network = construct_network(glycans)
network.nodes()
NodeView(('Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'GlcNAc6S(b1-3)Gal(b1-4)Glc-ol', 'GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'Fuc(a1-2)Gal(b1-4)Glc-ol', 'Gal(b1-4)Glc-ol', 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-3)GlcNAc(b1-6)[Gal(b1-4)GlcNAc(b1-3)]Gal(b1-4)Glc-ol', 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-3)GlcNAc(b1-6)[GlcNAc(b1-3)]Gal(b1-4)Glc-ol', 'GlcNAc(b1-6)[GlcNAc(b1-3)]Gal(b1-4)Glc-ol'))

plot_network

 plot_network (network, plot_format='pydot2', edge_label_draw=True,
               lfc_dict=None)

*Visualizes biosynthetic network

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
plot_format (string): how to layout network, either ‘pydot2’, ‘kamada_kawai’, or ‘spring’; default:‘pydot2’
edge_label_draw (bool): draws edge labels if True; default:True
lfc_dict (dict): dictionary of enzyme:log2-fold-change to scale edge width; default:None*
plot_network(network, plot_format = 'kamada_kawai')

infer_network

 infer_network (network, network_species, species_list, network_dic)

*Replaces virtual nodes if they are observed in other species

Arguments:
network (networkx object): biosynthetic network that should be inferred
network_species (string): species from which the network stems
species_list (list): list of species to compare network to
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network)
Returns:
Returns network with filled in virtual nodes*

retrieve_inferred_nodes

 retrieve_inferred_nodes (network, species=None)

*Returns the inferred virtual nodes of a network that has been used with infer_network

Arguments:
network (networkx object): biosynthetic network with inferred virtual nodes
species (string): species from which the network stems (only relevant if multiple species in network); default:None
Returns:
Returns inferred nodes as list or dictionary (if species argument is used)*

update_network

 update_network (network_in, edge_list, edge_labels=None,
                 node_labels=None)

*Updates a network with new edges and their labels

Arguments:
network (networkx object): network that should be modified
edge_list (list): list of edges as node tuples
edge_labels (list): list of edge labels as strings
node_labels (dict): dictionary of form node:0 or 1 depending on whether the node is observed or virtual
Returns:
Returns network with added edges*

trace_diamonds

 trace_diamonds (network, species_list, network_dic, threshold=0.0,
                 nb_intermediates=2, mode='presence')

*Extracts diamond-shape motifs from biosynthetic networks (A->B,A->C,B->D,C->D) and uses evolutionary information to determine which path is taken from A to D

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
species_list (list): list of species to compare network to
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network)
threshold (float): everything below or equal to that threshold will be cut; default:0.
nb_intermediates (int): number of intermediate nodes expected in a network motif to extract; has to be a multiple of 2 (2: diamond, 4: hexagon,…)
mode (string): whether to analyze for “presence” or “abundance” of intermediates; default:“presence”
Returns:
Returns dataframe of each intermediary glycan and its proportion (0-1) of how often it has been experimentally observed in this path (or average abundance if mode = abundance)*

evoprune_network

 evoprune_network (network, network_dic=None, species_list=None,
                   node_attr='abundance', threshold=0.01,
                   nb_intermediates=2, mode='presence')

*Given a biosynthetic network, this function uses evolutionary relationships to prune impossible paths

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network); default:pre-computed milk networks
species_list (list): list of species to compare network to; default:species from pre-computed milk networks
node_attr (string): which (numerical) node attribute to use for pruning; default:‘abundance’
threshold (float): everything below or equal to that threshold will be cut; default:0.01
nb_intermediates (int): number of intermediate nodes expected in a network motif to extract; has to be a multiple of 2 (2: diamond, 4: hexagon,…)
mode (string): whether to analyze for “presence” or “abundance” of intermediates; default:“presence”
Returns:
Returns pruned network (with virtual node probability as a new node attribute)*
plot_network(evoprune_network(network), plot_format = 'kamada_kawai')

highlight_network

 highlight_network (network, highlight, motif=None, abundance_df=None,
                    glycan_col='glycan', intensity_col='rel_intensity',
                    conservation_df=None, network_dic=None, species=None)

*Highlights a certain attribute in the network that will be visible when using plot_network

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
highlight (string): which attribute to highlight (choices are ‘motif’ for glycan motifs, ‘abundance’ for glycan abundances, ‘conservation’ for glycan conservation, ‘species’ for highlighting 1 species in multi-network)
motif (string): highlight=motif; which motif to highlight (absence/presence, in violet/green); default:None
abundance_df (dataframe): highlight=abundance; dataframe containing glycans and their relative intensity
glycan_col (string): highlight=abundance; column name of the glycans in abundance_df
intensity_col (string): highlight=abundance; column name of the relative intensities in abundance_df
conservation_df (dataframe): highlight=conservation; dataframe containing glycans from different species
network_dic (dict): highlight=conservation/species; dictionary of form species name : biosynthetic network (gained from construct_network); default:pre-computed milk networks
species (string): highlight=species; which species to highlight in a multi-species network
Returns:
Returns a network with the additional ‘origin’ (motif/species) or ‘abundance’ (abundance/conservation) node attribute storing the highlight*

export_network

 export_network (network, filepath, other_node_attributes=None)

*Converts NetworkX network into files usable, e.g., by Cytoscape or Gephi

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
filepath (string): should describe a valid path + file name prefix, will be appended by file description and type
other_node_attributes (list): string names of node attributes that should also be extracted; default:[]
Returns:
(1) saves a .csv dataframe containing the edge list and edge labels
(2) saves a .csv dataframe containing node IDs and labels*

get_maximum_flow

 get_maximum_flow (network, source='Gal(b1-4)Glc-ol', sinks=None)

*Estimate maximum flow and flow paths between source and sinks

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
source (string): usually the root node of network; default:“Gal(b1-4)Glc-ol”
sinks (list of strings): specified sinks to estimate flow for; default:all terminal nodes
Returns:
Returns a dictionary of type sink : {maximum flow value, flow path dictionary}*

get_max_flow_path

 get_max_flow_path (network, flow_dict, sink, source='Gal(b1-4)Glc-ol')

*Get the actual path between source and sink that gave rise to the maximum flow value

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
flow_dict (dict): dictionary of type source : {sink : flow} as returned by get_maximum_flow
sink (string): specified sink to retrieve maximum flow path
source (string): usually the root node of network; default:“Gal(b1-4)Glc-ol”
Returns:
Returns a list of (source, sink) tuples describing the maximum flow path*

get_reaction_flow

 get_reaction_flow (network, res, aggregate=None)

*Get the aggregated flows for a type of reaction across entire network

Arguments:
network (networkx object): biosynthetic network, returned from construct_network
res (dict): dictionary of type sink : {maximum flow value, flow path dictionary} as returned by get_maximum_flow
aggregate (string): if reaction flow values should be aggregated, options are “sum” and “mean”; default:None
Returns:
Returns a dictionary of form reaction : flow(s)*

get_differential_biosynthesis

 get_differential_biosynthesis (df, group1, group2=None,
                                analysis='reaction', paired=False,
                                longitudinal=False, id_column='ID')

*Compares biosynthetic patterns between glycomes of two conditions or across multiple time points

Arguments:
df (dataframe): dataframe containing glycan sequences and relative abundances [or filepath to .csv]
group1 (list): list of column indices/names for first group of samples (or time points in longitudinal analysis)
group2 (list): list of column indices/names for second group of samples (ignored in longitudinal analysis)
analysis (string): type of analysis to perform on networks, “reaction” or “flow”; default: “reaction”
paired (bool): whether samples are paired or not; default: False
longitudinal (bool): whether to perform longitudinal analysis; default: False
id_column (str): name of the column containing sample IDs for longitudinal analysis in the ID-style of participant_time_replicate; default: “ID”
Returns:
For binary comparison: A dataframe with differential flow features and statistics
For longitudinal analysis: A dataframe with reaction changes over time*
get_differential_biosynthesis(human_skin_O_PMC5871710_BCC, [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
                              [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], paired = True)
You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.
Mean abundance Log2FC p-val corr p-val significant Effect size
Feature
Gal(b1-3) 9.367467 -0.452220 0.000503 0.003564 True -0.935558
Neu5Ac(a2-?) 4.814383 -0.406728 0.000866 0.003564 True -0.882400
Gal(b1-?) 7.090566 -0.448621 0.000972 0.003564 True -0.871148
Fuc(a1-2) 2.173742 -0.665850 0.002498 0.006869 True -0.778493
Neu5Ac(a2-6) 4.553800 -0.475070 0.003775 0.007150 True -0.737642
Gal(b1-4) 4.813666 -0.422501 0.004550 0.007150 True -0.719071
GlcNAc(b1-6) 4.813666 -0.422501 0.004550 0.007150 True -0.719071
Neu5Ac(a2-3) 7.584599 -0.346187 0.005372 0.007387 True -0.702486
Neu5Ac(a2-8) 2.304750 -0.422735 0.008033 0.009818 True -0.662001
OS 2.249050 -0.521844 0.019236 0.021160 True -0.571950
6S 2.639924 -0.273368 0.031858 0.031858 True -0.517978

extend_network

 extend_network (network, steps=1, to_extend='all', strict_context=False)

*Given a biosynthetic network, tries to extend it in a physiological manner

Arguments:
network (networkx): glycan biosynthetic network as returned by construct_network
steps (int): how many biosynthetic steps to extend the network
to_extend (string/dict/list): which leaves to extend (default is “all”), a glycan as a string indicates a specific leaf node to extend, a dict indicates a target composition to be reached from the best leaf
strict_context (bool): whether to infer permitted sequence contexts for extension from database (False) or only from network (True); default:False
Returns:
Returns updated network and a list of added glycans*
new_network, new_glycans = extend_network(network, strict_context = True)
len(new_glycans)
20

evolution

investigating evolutionary relationships of glycans


distance_from_embeddings

 distance_from_embeddings (df, embeddings, cut_off=10, rank='Species',
                           averaging='median')

*calculates a cosine distance matrix from learned embeddings

Arguments:
df (dataframe): dataframe with glycans as rows and taxonomic information as columns
embeddings (dataframe): dataframe with glycans as rows and learned embeddings as columns (e.g., from glycans_to_emb)
cut_off (int): how many glycans a rank (e.g., species) needs to have at least to be included; default:10
rank (string): which taxonomic rank to use for grouping organisms; default:‘Species’
averaging (string): how to average embeddings, by ‘median’ or ‘mean’; default:‘median’
Returns:
Returns a rank x rank distance matrix*

distance_from_metric

 distance_from_metric (df, networks, metric='Jaccard', cut_off=10,
                       rank='Species')

*calculates a distance matrix of generated networks based on provided metric

Arguments:
df (dataframe): dataframe with glycans as rows and taxonomic information as columns
networks (list): list of networks in networkx format
metric (string): which metric to use, available: ‘Jaccard’; default:‘Jaccard’
cut_off (int): how many glycans a rank (e.g., species) needs to have at least to be included; default:10
rank (string): which taxonomic rank to use for grouping organisms; default:‘Species’
Returns:
Returns a rank x rank distance matrix*

dendrogram_from_distance

 dendrogram_from_distance (dm, ylabel='Mammalia', filepath='')

*plots a dendrogram from distance matrix

Arguments:
dm (dataframe): a rank x rank distance matrix (e.g., from distance_from_embeddings)
ylabel (string): how to label the y-axis of the dendrogram; default:‘Mammalia’
filepath (string): absolute path including full filename allows for saving the plot*

check_conservation

 check_conservation (glycan, df, network_dic=None, rank='Order',
                     threshold=5, motif=False)

*estimates evolutionary conservation of glycans and glycan motifs via biosynthetic networks

Arguments:
glycan (string): full glycan or glycan motif in IUPAC-condensed nomenclature
df (dataframe): dataframe in the style of df_species, each row one glycan and columns are the taxonomic levels
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network); default:pre-computed milk networks
rank (string): at which taxonomic level to assess conservation; default:Order
threshold (int): threshold of how many glycans a species needs to have to consider the species;default:5
motif (bool): whether glycan is a motif (True) or a full sequence (False); default:False
Returns:
Returns a dictionary of taxonomic group : degree of conservation*

get_communities

 get_communities (network_list, label_list=None)

*Find communities for each graph in a list of graphs

Arguments:
network_list (list): list of undirected biosynthetic networks, in the form of networkx objects
label_list (list): labels to create the community names, which are running_number + _ + label[k] for graph_list[k]; default:range(len(graph_list))
Returns:
Returns a merged dictionary of community : glycans in that community*