network

network contains functions to arrange and analyze glycans in the context of networks. In such a network, each node represents a glycan and edges represent, for instance, their connection via a biosynthetic step. It should be noted, since glycowork treats glycans as molecular graphs, that these networks represent hierarchical graphs, with the network being one graph and each node within the network also a graph. network contains the following modules:

biosynthesis contains functions to construct and analyze biosynthetic glycan networks
evolution contains functions to compare (taxonomic) groups as to their glycan repertoires

biosynthesis

constructing and analyzing biosynthetic glycan networks

construct_network

 construct_network (glycans:List[str],
                    allowed_ptms:FrozenSet[str]=frozenset({'3P', '1P',
                    '4Ac', 'OAc', '9Ac', '3S', '6S', '6P', 'OS', 'OP'}),
                    edge_type:str='monolink',
                    permitted_roots:Optional[FrozenSet[str]]=None,
                    abundances:List[float]=[])

Construct glycan biosynthetic network

	Type	Default	Details
glycans	List		List of glycans
allowed_ptms	FrozenSet	frozenset({‘3P’, ‘1P’, ‘4Ac’, ‘OAc’, ‘9Ac’, ‘3S’, ‘6S’, ‘6P’, ‘OS’, ‘OP’})	Set of allowed PTMs
edge_type	str	monolink	Edge label type: monolink/monosaccharide/enzyme
permitted_roots	Optional	None	Allowed root nodes
abundances	List	[]	Glycan abundances in the same order as glycans; default:empty
Returns	DiGraph		Biosynthetic network

glycans = ["Gal(b1-4)Glc-ol", "GlcNAc(b1-3)Gal(b1-4)Glc-ol",
           "GlcNAc6S(b1-3)Gal(b1-4)Glc-ol",
           "Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol", "Fuc(a1-2)Gal(b1-4)Glc-ol",
          "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol"]
network = construct_network(glycans)
network.nodes()

NodeView(('Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'GlcNAc6S(b1-3)Gal(b1-4)Glc-ol', 'GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'Fuc(a1-2)Gal(b1-4)Glc-ol', 'Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)[Gal(b1-3)GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Gal(b1-3)GlcNAc(b1-6)[GlcNAc(b1-3)]Gal(b1-4)Glc-ol', 'Gal(b1-4)GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-3)Gal(b1-4)Glc-ol', 'Gal(b1-3)GlcNAc(b1-6)Gal(b1-4)Glc-ol', 'GlcNAc(b1-3)[GlcNAc(b1-6)]Gal(b1-4)Glc-ol', 'GlcNAc(b1-6)Gal(b1-4)Glc-ol'))

plot_network

 plot_network (network:networkx.classes.digraph.DiGraph,
               plot_format:str='spring', edge_label_draw:bool=True,
               lfc_dict:Optional[Dict[str,float]]=None)

Visualize biosynthetic network

	Type	Default	Details
network	DiGraph		Biosynthetic network
plot_format	str	spring	Layout type: pydot2/kamada_kawai/spring
edge_label_draw	bool	True	Whether to draw edge labels
lfc_dict	Optional	None	Enzyme:log2FC mapping for edge width
Returns	None		Displays plot

plot_network(network)

Loading BokehJS ...

figure(

id = 'p1004', …)

infer_network

 infer_network (network:networkx.classes.digraph.DiGraph,
                network_species:str, species_list:List[str],
                network_dic:Dict[str,networkx.classes.digraph.DiGraph])

Replace virtual nodes observed in other species

	Type	Details
network	DiGraph	Network to infer
network_species	str	Source species
species_list	List	Species to compare against
network_dic	Dict	Species:network mapping
Returns	Graph	Network with inferred nodes

retrieve_inferred_nodes

 retrieve_inferred_nodes (network:networkx.classes.digraph.DiGraph,
                          species:Optional[str]=None)

Get inferred virtual nodes from network

	Type	Default	Details
network	DiGraph		Network with inferred nodes
species	Optional	None	Source species if multiple
Returns	Union		Inferred nodes list or dict

update_network

 update_network (network_in:networkx.classes.digraph.DiGraph,
                 edge_list:List[Tuple[str,str]],
                 edge_labels:Optional[List[str]]=None,
                 node_labels:Optional[Dict[str,int]]=None)

Update network with new edges and labels

	Type	Default	Details
network_in	DiGraph		Input network
edge_list	List		List of edges to add
edge_labels	Optional	None	Labels for new edges
node_labels	Optional	None	Node virtual status (0: observed, 1: virtual)
Returns	DiGraph		Updated network

trace_diamonds

 trace_diamonds (network:networkx.classes.digraph.DiGraph,
                 species_list:List[str],
                 network_dic:Dict[str,networkx.classes.digraph.DiGraph],
                 threshold:float=0.0, nb_intermediates:int=2,
                 mode:str='presence')

Analyze diamond motif (A->B,A->C,B->D,C->D) path preferences using evolutionary data

	Type	Default	Details
network	DiGraph		Biosynthetic network
species_list	List		Species to compare against
network_dic	Dict		Species:network mapping
threshold	float	0.0	Cutoff threshold
nb_intermediates	int	2	Number of intermediate nodes; has to be a multiple of 2
mode	str	presence	Analysis mode: presence/abundance
Returns	DataFrame		Path analysis results, with proportion (0-1) of how often glycan has been experimentally observed in this path (or average abundance)

evoprune_network

 evoprune_network (network:networkx.classes.digraph.DiGraph,
                   network_dic:Optional[Dict[str,networkx.classes.digraph.
                   DiGraph]]=None, species_list:Optional[List[str]]=None,
                   node_attr:str='abundance', threshold:float=0.01,
                   nb_intermediates:int=2, mode:str='presence')

Prune network using evolutionary path preferences

	Type	Default	Details
network	DiGraph		Biosynthetic network
network_dic	Optional	None	Species:network mapping
species_list	Optional	None	Species to compare against
node_attr	str	abundance	Node attribute to use for pruning
threshold	float	0.01	Cutoff threshold
nb_intermediates	int	2	Number of intermediate nodes; has to be a multiple of 2
mode	str	presence	Analysis mode: presence/abundance
Returns	DiGraph		Evolutionarily pruned network (with virtual node probability as a new node attribute)

plot_network(evoprune_network(network))

Loading BokehJS ...

figure(

id = 'p1462', …)

highlight_network

 highlight_network (network:networkx.classes.digraph.DiGraph,
                    highlight:str, motif:Optional[str]=None, abundance_df:
                    Optional[pandas.core.frame.DataFrame]=None,
                    glycan_col:str='glycan',
                    intensity_col:str='rel_intensity', conservation_df:Opt
                    ional[pandas.core.frame.DataFrame]=None, network_dic:O
                    ptional[Dict[str,networkx.classes.digraph.DiGraph]]=No
                    ne, species:Optional[str]=None)

Add visual highlighting to network nodes, to be used in plot_network

	Type	Default	Details
network	DiGraph		Biosynthetic network
highlight	str		What to highlight: motif/species/abundance/conservation
motif	Optional	None	Motif to highlight; highlight=motif
abundance_df	Optional	None	Glycan abundance data; highlight=abundance
glycan_col	str	glycan	Glycan column name; highlight=abundance
intensity_col	str	rel_intensity	Intensity column name; highlight=abundance
conservation_df	Optional	None	Species-glycan data; highlight=conservation
network_dic	Optional	None	Species:network mapping; highlight=conservation/species
species	Optional	None	Species to highlight; highlight=species
Returns	DiGraph		Network with highlight attributes (‘origin’ (motif/species) or ‘abundance’ (abundance/conservation) node attribute)

export_network

 export_network (network:networkx.classes.digraph.DiGraph, filepath:str,
                 other_node_attributes:Optional[List[str]]=None)

Export network to Cytoscape/Gephi compatible files

	Type	Default	Details
network	DiGraph		Biosynthetic network
filepath	str		Output path prefix, will be appended by file description and type
other_node_attributes	Optional	None	Additional attributes for extraction
Returns	None		Saves network files (edge list/labels + node IDs and labels)

get_maximum_flow

 get_maximum_flow (network:networkx.classes.digraph.DiGraph,
                   source:str='Gal(b1-4)Glc-ol',
                   sinks:Optional[List[str]]=None)

Estimate maximum flow and flow paths between source and sinks

	Type	Default	Details
network	DiGraph		Biosynthetic network
source	str	Gal(b1-4)Glc-ol	Source node
sinks	Optional	None	Target nodes; default:all terminal nodes
Returns	Dict		Flow results; sink: {maximum flow value, flow path dictionary}

get_max_flow_path

 get_max_flow_path (network:networkx.classes.digraph.DiGraph,
                    flow_dict:Dict[str,Dict[str,float]], sink:str,
                    source:str='Gal(b1-4)Glc-ol')

Get path giving maximum flow value

	Type	Default	Details
network	DiGraph		Biosynthetic network
flow_dict	Dict		Flow dictionary as returned by get_maximum_flow
sink	str		Target node
source	str	Gal(b1-4)Glc-ol	Source node
Returns	List		Path edge list

get_reaction_flow

 get_reaction_flow (network:networkx.classes.digraph.DiGraph,
                    res:Dict[str,Dict[str,float]],
                    aggregate:Optional[str]=None)

Get aggregated flows by reaction type

	Type	Default	Details
network	DiGraph		Biosynthetic network
res	Dict		Flow results as returned by get_maximum_flow
aggregate	Optional	None	Aggregation: sum/mean/None
Returns	Union		Reaction flows (reaction: flow)

get_differential_biosynthesis

 get_differential_biosynthesis (df:Union[pandas.core.frame.DataFrame,str],
                                group1:List[Union[str,int]], group2:Option
                                al[List[Union[str,int]]]=None,
                                analysis:str='reaction',
                                paired:bool=False,
                                longitudinal:bool=False,
                                id_column:str='ID')

Compare biosynthetic patterns between conditions/timepoints

	Type	Default	Details
df	Union		Glycan abundance data (first column: glycan sequences)
group1	List		First group column indices/names (or time points in longitudinal analysis)
group2	Optional	None	Second group column indices/names (or time points in longitudinal analysis)
analysis	str	reaction	Type: reaction/flow
paired	bool	False	Whether samples are paired
longitudinal	bool	False	Whether to do perform longitudinal analysis
id_column	str	ID	Sample ID column for longitudinal analysis in the ID-style of participant_time_replicate
Returns	DataFrame		Differential analysis results (differential flow features and statistics OR reaction changes over time

get_differential_biosynthesis(human_skin_O_PMC5871710_BCC, [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
                              [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], paired = True)

You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.

	Mean abundance	Log2FC	p-val	corr p-val	significant	Effect size
Feature
Neu5Ac(a2-3)	4.542537	-0.391611	0.001702	0.003147	True	-0.816234
Gal(b1-?)	3.395793	-0.435594	0.002206	0.003147	True	-0.790748
Gal(b1-3)	5.967389	-0.460385	0.003713	0.003147	True	-0.739291
Neu5Ac(a2-?)	4.193290	-0.449044	0.004196	0.003147	True	-0.727147
Neu5Ac(a2-6)	5.143191	-0.496132	0.007521	0.004513	True	-0.668654
Neu5Ac(a2-8)	2.894141	-0.464140	0.013887	0.006944	True	-0.605967
OS	2.249050	-0.521844	0.019236	0.008244	True	-0.571950
6S	0.824198	-0.270074	0.050454	0.015136	True	-0.466992
GlcNAc(b1-6)	0.824198	-0.270074	0.050454	0.015136	True	-0.466992
Gal(b1-4)	0.824198	-0.270074	0.050454	0.015136	True	-0.466992

extend_network

 extend_network (network:networkx.classes.digraph.DiGraph, steps:int=1,
                 to_extend:Union[str,Dict[str,int],List[str]]='all',
                 strict_context:bool=False)

Extend biosynthetic network physiologically

	Type	Default	Details
network	DiGraph		Biosynthetic network
steps	int	1	Number of extension steps; default:1
to_extend	Union	all	Nodes to extend (all, specific leaf node, target composition)
strict_context	bool	False	Whether to use network only to derive allowed reaction products; default:False
Returns	Tuple		(Extended network, New glycans)

new_network, new_glycans = extend_network(network, strict_context = True)
len(new_glycans)

evolution

investigating evolutionary relationships of glycans

distance_from_embeddings

 distance_from_embeddings (df:pandas.core.frame.DataFrame,
                           embeddings:pandas.core.frame.DataFrame,
                           cut_off:int=10, rank:str='Species',
                           averaging:str='median')

Calculate cosine distance matrix from learned embeddings

	Type	Default	Details
df	DataFrame		DataFrame with glycans (rows) and taxonomic info (columns)
embeddings	DataFrame		DataFrame with glycans (rows) and embeddings (columns) (e.g., from glycans_to_emb)
cut_off	int	10	Minimum glycans per rank to be included; default:10
rank	str	Species	Taxonomic rank for grouping; default:Species
averaging	str	median	How to average embeddings: median/mean
Returns	DataFrame		Rank x rank distance matrix

distance_from_metric

 distance_from_metric (df:pandas.core.frame.DataFrame,
                       networks:List[networkx.classes.graph.Graph],
                       metric:str='Jaccard', cut_off:int=10,
                       rank:str='Species')

Calculate distance matrix between networks using provided metric

	Type	Default	Details
df	DataFrame		DataFrame with glycans (rows) and taxonomic info (columns)
networks	List		List of networkx networks
metric	str	Jaccard	Distance metric to use
cut_off	int	10	Minimum glycans per rank to be included; default:10
rank	str	Species	Taxonomic rank for grouping; default:Species
Returns	DataFrame		Rank x rank distance matrix

dendrogram_from_distance

 dendrogram_from_distance (dm:pandas.core.frame.DataFrame,
                           ylabel:str='Mammalia', filepath:str='')

Plot dendrogram from distance matrix

	Type	Default	Details
dm	DataFrame		Rank x rank distance matrix (e.g., from distance_from_embeddings)
ylabel	str	Mammalia	Y-axis label
filepath	str		Path to save plot including filename
Returns	None		Displays or saves dendrogram plot

check_conservation

 check_conservation (glycan:str, df:pandas.core.frame.DataFrame,
                     network_dic:Optional[Dict[str,networkx.classes.graph.
                     Graph]]=None, rank:str='Order', threshold:int=5,
                     motif:bool=False)

Estimate evolutionary conservation of glycans via biosynthetic networks

	Type	Default	Details
glycan	str		Glycan or motif in IUPAC-condensed format
df	DataFrame		DataFrame with glycans (rows) and taxonomic levels (columns)
network_dic	Optional	None	Species:biosynthetic network mapping
rank	str	Order	Taxonomic level to assess
threshold	int	5	Minimum glycans per species to be included
motif	bool	False	Whether glycan is a motif vs sequence
Returns	Dict		Taxonomic group-to-conservation mapping

get_communities

 get_communities (network_list:List[networkx.classes.graph.Graph],
                  label_list:Optional[List[str]]=None)

Find communities for each graph in list of graphs

	Type	Default	Details
network_list	List		List of undirected biosynthetic networks
label_list	Optional	None	Labels for community names, running_number + _ + label_list[k] for network_list[k]; default:range(len(graph_list))
Returns	Dict		Community-to-glycan list mapping