network
contains functions to arrange and analyze glycans in the context of networks. In such a network, each node represents a glycan and edges represent, for instance, their connection via a biosynthetic step. It should be noted, since glycowork
treats glycans as molecular graphs, that these networks represent hierarchical graphs, with the network being one graph and each node within the network also a graph. network
contains the following modules:
biosynthesis
contains functions to construct and analyze biosynthetic glycan networks
evolution
contains functions to compare (taxonomic) groups as to their glycan repertoires
biosynthesis
constructing and analyzing biosynthetic glycan networks
construct_network
construct_network (glycans, allowed_ptms=frozenset({'3S', '3P', 'OS',
'1P', 'OAc', '6S', 'OP', '6P', '9Ac', '4Ac'}),
edge_type='monolink', permitted_roots=None,
abundances=[])
*Construct a glycan biosynthetic network
glycans (list): list of glycans in IUPAC-condensed format
allowed_ptms (set): list of PTMs to consider
edge_type (string): indicates whether edges represent monosaccharides (‘monosaccharide’), monosaccharide(linkage) (‘monolink’), or enzyme catalyzing the reaction (‘enzyme’); default:‘monolink’
permitted_roots (set): which nodes should be considered as roots; default:will be inferred
abundances (list): optional list of abundances, in the same order as glycans; default:empty list
Returns a networkx object of the network*
glycans = ["Gal(b1-4)Glc-ol" , "GlcNAc(b1-3)Gal(b1-4)Glc-ol" ,
"GlcNAc6S(b1-3)Gal(b1-4)Glc-ol" ,
"Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol" , "Fuc(a1-2)Gal(b1-4)Glc-ol" ,
"Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol" ]
network = construct_network(glycans)
network.nodes()
NodeView(('Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'GlcNAc6S(b1-3)Gal(b1-4)Glc-ol', 'GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'Fuc(a1-2)Gal(b1-4)Glc-ol', 'Gal(b1-4)Glc-ol', 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-3)GlcNAc(b1-6)[Gal(b1-4)GlcNAc(b1-3)]Gal(b1-4)Glc-ol', 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-3)GlcNAc(b1-6)[GlcNAc(b1-3)]Gal(b1-4)Glc-ol', 'GlcNAc(b1-6)[GlcNAc(b1-3)]Gal(b1-4)Glc-ol'))
plot_network
plot_network (network, plot_format='pydot2', edge_label_draw=True,
lfc_dict=None)
*Visualizes biosynthetic network
network (networkx object): biosynthetic network, returned from construct_network
plot_format (string): how to layout network, either ‘pydot2’, ‘kamada_kawai’, or ‘spring’; default:‘pydot2’
edge_label_draw (bool): draws edge labels if True; default:True
lfc_dict (dict): dictionary of enzyme:log2-fold-change to scale edge width; default:None*
infer_network
infer_network (network, network_species, species_list, network_dic)
*Replaces virtual nodes if they are observed in other species
network (networkx object): biosynthetic network that should be inferred
network_species (string): species from which the network stems
species_list (list): list of species to compare network to
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network)
Returns network with filled in virtual nodes*
retrieve_inferred_nodes
retrieve_inferred_nodes (network, species=None)
*Returns the inferred virtual nodes of a network that has been used with infer_network
network (networkx object): biosynthetic network with inferred virtual nodes
species (string): species from which the network stems (only relevant if multiple species in network); default:None
Returns inferred nodes as list or dictionary (if species argument is used)*
update_network
update_network (network_in, edge_list, edge_labels=None,
node_labels=None)
*Updates a network with new edges and their labels
network (networkx object): network that should be modified
edge_list (list): list of edges as node tuples
edge_labels (list): list of edge labels as strings
node_labels (dict): dictionary of form node:0 or 1 depending on whether the node is observed or virtual
Returns network with added edges*
trace_diamonds
trace_diamonds (network, species_list, network_dic, threshold=0.0,
nb_intermediates=2, mode='presence')
*Extracts diamond-shape motifs from biosynthetic networks (A->B,A->C,B->D,C->D) and uses evolutionary information to determine which path is taken from A to D
network (networkx object): biosynthetic network, returned from construct_network
species_list (list): list of species to compare network to
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network)
threshold (float): everything below or equal to that threshold will be cut; default:0.
nb_intermediates (int): number of intermediate nodes expected in a network motif to extract; has to be a multiple of 2 (2: diamond, 4: hexagon,…)
mode (string): whether to analyze for “presence” or “abundance” of intermediates; default:“presence”
Returns dataframe of each intermediary glycan and its proportion (0-1) of how often it has been experimentally observed in this path (or average abundance if mode = abundance)*
evoprune_network
evoprune_network (network, network_dic=None, species_list=None,
node_attr='abundance', threshold=0.01,
nb_intermediates=2, mode='presence')
*Given a biosynthetic network, this function uses evolutionary relationships to prune impossible paths
network (networkx object): biosynthetic network, returned from construct_network
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network); default:pre-computed milk networks
species_list (list): list of species to compare network to; default:species from pre-computed milk networks
node_attr (string): which (numerical) node attribute to use for pruning; default:‘abundance’
threshold (float): everything below or equal to that threshold will be cut; default:0.01
nb_intermediates (int): number of intermediate nodes expected in a network motif to extract; has to be a multiple of 2 (2: diamond, 4: hexagon,…)
mode (string): whether to analyze for “presence” or “abundance” of intermediates; default:“presence”
Returns pruned network (with virtual node probability as a new node attribute)*
plot_network(evoprune_network(network), plot_format = 'kamada_kawai' )
highlight_network
highlight_network (network, highlight, motif=None, abundance_df=None,
glycan_col='glycan', intensity_col='rel_intensity',
conservation_df=None, network_dic=None, species=None)
*Highlights a certain attribute in the network that will be visible when using plot_network
network (networkx object): biosynthetic network, returned from construct_network
highlight (string): which attribute to highlight (choices are ‘motif’ for glycan motifs, ‘abundance’ for glycan abundances, ‘conservation’ for glycan conservation, ‘species’ for highlighting 1 species in multi-network)
motif (string): highlight=motif; which motif to highlight (absence/presence, in violet/green); default:None
abundance_df (dataframe): highlight=abundance; dataframe containing glycans and their relative intensity
glycan_col (string): highlight=abundance; column name of the glycans in abundance_df
intensity_col (string): highlight=abundance; column name of the relative intensities in abundance_df
conservation_df (dataframe): highlight=conservation; dataframe containing glycans from different species
network_dic (dict): highlight=conservation/species; dictionary of form species name : biosynthetic network (gained from construct_network); default:pre-computed milk networks
species (string): highlight=species; which species to highlight in a multi-species network
Returns a network with the additional ‘origin’ (motif/species) or ‘abundance’ (abundance/conservation) node attribute storing the highlight*
export_network
export_network (network, filepath, other_node_attributes=None)
*Converts NetworkX network into files usable, e.g., by Cytoscape or Gephi
network (networkx object): biosynthetic network, returned from construct_network
filepath (string): should describe a valid path + file name prefix, will be appended by file description and type
other_node_attributes (list): string names of node attributes that should also be extracted; default:[]
(1) saves a .csv dataframe containing the edge list and edge labels
(2) saves a .csv dataframe containing node IDs and labels*
get_maximum_flow
get_maximum_flow (network, source='Gal(b1-4)Glc-ol', sinks=None)
*Estimate maximum flow and flow paths between source and sinks
network (networkx object): biosynthetic network, returned from construct_network
source (string): usually the root node of network; default:“Gal(b1-4)Glc-ol”
sinks (list of strings): specified sinks to estimate flow for; default:all terminal nodes
Returns a dictionary of type sink : {maximum flow value, flow path dictionary}*
get_max_flow_path
get_max_flow_path (network, flow_dict, sink, source='Gal(b1-4)Glc-ol')
*Get the actual path between source and sink that gave rise to the maximum flow value
network (networkx object): biosynthetic network, returned from construct_network
flow_dict (dict): dictionary of type source : {sink : flow} as returned by get_maximum_flow
sink (string): specified sink to retrieve maximum flow path
source (string): usually the root node of network; default:“Gal(b1-4)Glc-ol”
Returns a list of (source, sink) tuples describing the maximum flow path*
get_reaction_flow
get_reaction_flow (network, res, aggregate=None)
*Get the aggregated flows for a type of reaction across entire network
network (networkx object): biosynthetic network, returned from construct_network
res (dict): dictionary of type sink : {maximum flow value, flow path dictionary} as returned by get_maximum_flow
aggregate (string): if reaction flow values should be aggregated, options are “sum” and “mean”; default:None
Returns a dictionary of form reaction : flow(s)*
get_differential_biosynthesis
get_differential_biosynthesis (df, group1, group2=None,
analysis='reaction', paired=False,
longitudinal=False, id_column='ID')
*Compares biosynthetic patterns between glycomes of two conditions or across multiple time points
df (dataframe): dataframe containing glycan sequences and relative abundances [or filepath to .csv]
group1 (list): list of column indices/names for first group of samples (or time points in longitudinal analysis)
group2 (list): list of column indices/names for second group of samples (ignored in longitudinal analysis)
analysis (string): type of analysis to perform on networks, “reaction” or “flow”; default: “reaction”
paired (bool): whether samples are paired or not; default: False
longitudinal (bool): whether to perform longitudinal analysis; default: False
id_column (str): name of the column containing sample IDs for longitudinal analysis in the ID-style of participant_time_replicate; default: “ID”
For binary comparison: A dataframe with differential flow features and statistics
For longitudinal analysis: A dataframe with reaction changes over time*
get_differential_biosynthesis(human_skin_O_PMC5871710_BCC, [1 ,3 ,5 ,7 ,9 ,11 ,13 ,15 ,17 ,19 ,21 ,23 ,25 ,27 ,29 ,31 ,33 ,35 ,37 ,39 ],
[2 ,4 ,6 ,8 ,10 ,12 ,14 ,16 ,18 ,20 ,22 ,24 ,26 ,28 ,30 ,32 ,34 ,36 ,38 ,40 ], paired = True )
You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.
Feature
Gal(b1-3)
9.367467
-0.452220
0.000503
0.003564
True
-0.935558
Neu5Ac(a2-?)
4.814383
-0.406728
0.000866
0.003564
True
-0.882400
Gal(b1-?)
7.090566
-0.448621
0.000972
0.003564
True
-0.871148
Fuc(a1-2)
2.173742
-0.665850
0.002498
0.006869
True
-0.778493
Neu5Ac(a2-6)
4.553800
-0.475070
0.003775
0.007150
True
-0.737642
Gal(b1-4)
4.813666
-0.422501
0.004550
0.007150
True
-0.719071
GlcNAc(b1-6)
4.813666
-0.422501
0.004550
0.007150
True
-0.719071
Neu5Ac(a2-3)
7.584599
-0.346187
0.005372
0.007387
True
-0.702486
Neu5Ac(a2-8)
2.304750
-0.422735
0.008033
0.009818
True
-0.662001
OS
2.249050
-0.521844
0.019236
0.021160
True
-0.571950
6S
2.639924
-0.273368
0.031858
0.031858
True
-0.517978
extend_network
extend_network (network, steps=1, to_extend='all', strict_context=False)
*Given a biosynthetic network, tries to extend it in a physiological manner
network (networkx): glycan biosynthetic network as returned by construct_network
steps (int): how many biosynthetic steps to extend the network
to_extend (string/dict/list): which leaves to extend (default is “all”), a glycan as a string indicates a specific leaf node to extend, a dict indicates a target composition to be reached from the best leaf
strict_context (bool): whether to infer permitted sequence contexts for extension from database (False) or only from network (True); default:False
Returns updated network and a list of added glycans*
new_network, new_glycans = extend_network(network, strict_context = True )
len (new_glycans)
evolution
investigating evolutionary relationships of glycans
distance_from_embeddings
distance_from_embeddings (df, embeddings, cut_off=10, rank='Species',
averaging='median')
*calculates a cosine distance matrix from learned embeddings
df (dataframe): dataframe with glycans as rows and taxonomic information as columns
embeddings (dataframe): dataframe with glycans as rows and learned embeddings as columns (e.g., from glycans_to_emb)
cut_off (int): how many glycans a rank (e.g., species) needs to have at least to be included; default:10
rank (string): which taxonomic rank to use for grouping organisms; default:‘Species’
averaging (string): how to average embeddings, by ‘median’ or ‘mean’; default:‘median’
Returns a rank x rank distance matrix*
distance_from_metric
distance_from_metric (df, networks, metric='Jaccard', cut_off=10,
rank='Species')
*calculates a distance matrix of generated networks based on provided metric
df (dataframe): dataframe with glycans as rows and taxonomic information as columns
networks (list): list of networks in networkx format
metric (string): which metric to use, available: ‘Jaccard’; default:‘Jaccard’
cut_off (int): how many glycans a rank (e.g., species) needs to have at least to be included; default:10
rank (string): which taxonomic rank to use for grouping organisms; default:‘Species’
Returns a rank x rank distance matrix*
dendrogram_from_distance
dendrogram_from_distance (dm, ylabel='Mammalia', filepath='')
*plots a dendrogram from distance matrix
dm (dataframe): a rank x rank distance matrix (e.g., from distance_from_embeddings)
ylabel (string): how to label the y-axis of the dendrogram; default:‘Mammalia’
filepath (string): absolute path including full filename allows for saving the plot*
check_conservation
check_conservation (glycan, df, network_dic=None, rank='Order',
threshold=5, motif=False)
*estimates evolutionary conservation of glycans and glycan motifs via biosynthetic networks
glycan (string): full glycan or glycan motif in IUPAC-condensed nomenclature
df (dataframe): dataframe in the style of df_species, each row one glycan and columns are the taxonomic levels
network_dic (dict): dictionary of form species name : biosynthetic network (gained from construct_network); default:pre-computed milk networks
rank (string): at which taxonomic level to assess conservation; default:Order
threshold (int): threshold of how many glycans a species needs to have to consider the species;default:5
motif (bool): whether glycan is a motif (True) or a full sequence (False); default:False
Returns a dictionary of taxonomic group : degree of conservation*
get_communities
get_communities (network_list, label_list=None)
*Find communities for each graph in a list of graphs
network_list (list): list of undirected biosynthetic networks, in the form of networkx objects
label_list (list): labels to create the community names, which are running_number + _ + label[k] for graph_list[k]; default:range(len(graph_list))
Returns a merged dictionary of community : glycans in that community*