motif

motif contains many functions to process glycans in various ways and use this processing to analyze glycans via curated motifs, graph features, and sequence features. It contains the following modules:

draw

drawing glycans in SNFG style


GlycoDraw

 GlycoDraw (draw_this, vertical=False, compact=False, show_linkage=True,
            dim=50, highlight_motif=None, highlight_termini_list=[],
            repeat=None, repeat_range=None, draw_method=None,
            filepath=None, suppress=False, per_residue=[])

*Draws a glycan structure based on the provided input.

Arguments:
draw_this (string): The glycan structure or motif to be drawn.
vertical (bool, optional): Set to True to draw the structure vertically. Default: False.
compact (bool, optional): Set to True to draw the structure in a compact form. Default: False.
show_linkage (bool, optional): Set to False to hide the linkage information. Default: True.
dim (int, optional): The dimension (size) of the individual sugar units in the structure. Default: 50.
highlight_motif (string, optional): Glycan motif to highlight within the parent structure.
highlight_termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
repeat (bool
repeat_range (list of 2 int): List of index integers for the first and last main-chain monosaccharide in repeating unit. Monosaccharides are numbered starting from 0 (invisible placeholder = 0 in case of structure terminating in a linkage) at the reducing end.
draw_method (string, optional): Specify ‘chem2d’ or ‘chem3d’ to draw chemical structures; default:None (SNFG figure)
filepath (string, optional): The path to the output file to save as SVG or PDF when drawing SNFG/chem2d figures or PDB when generating 3D conformers. Default: None.
suppress (bool, optional): Whether to suppress the visual display of drawings into the console; default:False
per_residue (list, optional): list of floats (order should be the same as the monosaccharides in glycan string) to quantitatively highlight monosaccharides.*
GlycoDraw("Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc",
         highlight_motif = "GlcNAc(b1-?)Man")


annotate_figure

 annotate_figure (svg_input, scale_range=(25, 80), compact=False,
                  glycan_size='medium', filepath='', scale_by_DE_res=None,
                  x_thresh=1, y_thresh=0.05, x_metric='Log2FC')

*Modify matplotlib svg figure to replace text labels with glycan figures

Arguments:
svg_input (string): absolute path including full filename for input svg figure
scale_range (tuple): tuple of two integers defining min/max glycan dim; default:(25,80)
compact (bool): if True, draw compact glycan figures; default:False
glycan_size (string): modify glycan size; default:‘medium’; options are ‘small’, ‘medium’, ‘large’
filepath (string): absolute path including full filename allows for saving the plot
scale_by_DE_res (df): result table from motif_analysis.get_differential_expression. Include to scale glycan figure size by -10logp
x_thresh (float): absolute x metric threshold for datapoints included for scaling, set to match get_differential_expression; default:1.0
y_thresh (float): corr p threshhold for datapoints included for scaling, set to match get_differential_expression; default:0.05
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’
Returns:
Modified figure svg code*

plot_glycans_excel

 plot_glycans_excel (df, folder_filepath, glycan_col_num=0,
                     scaling_factor=0.2, compact=False)

*plots SNFG images of glycans into new column in df and saves df as Excel file

Arguments:
df (dataframe): dataframe containing glycan sequences [alternative: filepath to .csv or .xlsx]
folder_filepath (string): full filepath to the folder you want to save the output to
glycan_col_num (int): index of the column containing glycan sequences; default:0 (first column)
scaling_factor (float): how large the glycans should be; default:0.2
compact (bool, optional): Set to True to draw the structures in a compact form. Default: False.
Returns:
Saves the dataframe with glycan images as output.xlsx into folder_filepath*

analysis

downstream analyses of important glycan motifs


get_pvals_motifs

 get_pvals_motifs (df, glycan_col_name='glycan', label_col_name='target',
                   zscores=True, thresh=1.645, sorting=True,
                   feature_set=['exhaustive'], multiple_samples=False,
                   motifs=None, custom_motifs=[])

*returns enriched motifs based on label data or predicted data

Arguments:
df (dataframe): dataframe containing glycan sequences and labels [alternative: filepath to .csv or .xlsx]
glycan_col_name (string): column name for glycan sequences; arbitrary if multiple_samples = True; default:‘glycan’
label_col_name (string): column name for labels; arbitrary if multiple_samples = True; default:‘target’
zscores (bool): whether data are presented as z-scores or not, will be z-score transformed if False; default:True
thresh (float): threshold value to separate positive/negative; default is 1.645 for Z-scores
sorting (bool): whether p-value dataframe should be sorted ascendingly; default: True
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
multiple_samples (bool): set to True if you have multiple samples (rows) with glycan information (columns); default:False
motifs (dataframe): can be used to pass a modified motif_list to the function; default:None
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns dataframe with p-values, corrected p-values, and Cohen’s d as effect size for every glycan motif*
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN',
          'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
test_df = pd.DataFrame({'glycan':glycans, 'binding':label})

print("Glyco-Motif enrichment p-value test")
out = get_pvals_motifs(test_df, 'glycan', 'binding').iloc[:10,:]
Glyco-Motif enrichment p-value test
  motif pval corr_pval effect_size
4 GlcNAc 0.038120 0.205849 1.530905
8 Man 0.054356 0.234990 1.390253
24 Man(a1-?)Man 0.060923 0.234990 1.308333
22 Man(a1-3)Man 0.034212 0.205849 1.196586
14 GlcNAc(b1-4)GlcNAc 0.019543 0.175885 1.168815
23 Man(a1-6)Man 0.019543 0.175885 1.168815
25 Man(b1-4)GlcNAc 0.019543 0.175885 1.168815
7 Kdo 0.328790 0.479672 -0.811679
2 Glc 0.644180 0.668956 -0.811679
21 Man(a1-2)Man 0.177461 0.479672 0.772320

get_representative_substructures

 get_representative_substructures (enrichment_df)

*builds minimal glycans that contain enriched motifs from get_pvals_motifs

Arguments:
enrichment_df (dataframe): output from get_pvals_motifs
Returns:
Returns up to 10 minimal glycans in a list*

get_heatmap

 get_heatmap (df, motifs=False, feature_set=['known'], transform='',
              datatype='response', rarity_filter=0.05, filepath='',
              index_col='glycan', custom_motifs=[], return_plot=False,
              show_all=False, **kwargs)

*clusters samples based on glycan data (for instance glycan binding etc.)

Arguments:
df (dataframe): dataframe with glycan data, rows are samples and columns are glycans [alternative: filepath to .csv or .xlsx]
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
transform (string): whether to transform the data before plotting, currently the only option is “CLR”, recommended for glycomics data; default: no transformation
datatype (string): whether df comes from a dataset with quantitative variable (‘response’) or from presence_to_matrix (‘presence’)
rarity_filter (float): proportion of samples that need to have a non-zero value for a variable to be included; default:0.05
filepath (string): absolute path including full filename allows for saving the plot
index_col (string): default column to convert to dataframe index; default:‘glycan’
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
return_plot (bool): whether to return the plot object for external saving; default:False
show_all (bool): whether to plot all ticklabels, no matter how many there are (this might cause visual overlaps); default:False
**kwargs: keyword arguments that are directly passed on to seaborn clustermap
Returns:
Prints clustermap*
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P',
           'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
label2 = [0.134, 0.345, 1.15, 0.233, 2.981]
label3 = [0.334, 0.245, 1.55, 0.133, 2.581]
test_df = pd.DataFrame([label, label2, label3], columns = glycans)

get_heatmap(test_df, motifs = True, feature_set = ['known', 'exhaustive'])


plot_embeddings

 plot_embeddings (glycans, emb=None, label_list=None, shape_feature=None,
                  filepath='', alpha=0.8, palette='colorblind', **kwargs)

*plots glycan representations for a list of glycans

Arguments:
glycans (list): list of IUPAC-condensed glycan sequences as strings
emb (dictionary): stored glycan representations; default takes them from trained species-level SweetNet model
label_list (list): list of same length as glycans if coloring of the plot is desired
shape_feature (string): monosaccharide/bond used to display alternative shapes for dots on the plot
filepath (string): absolute path including full filename allows for saving the plot
alpha (float): transparency of points in plot; default:0.8
palette (string): color palette to color different classes; default:‘colorblind’
**kwargs: keyword arguments that are directly passed on to matplotlib*
df_fabales = df_species[df_species.Order == 'Fabales'].reset_index(drop = True)
plot_embeddings(df_fabales.glycan.values.tolist(), label_list = df_fabales.Family.values.tolist())


characterize_monosaccharide

 characterize_monosaccharide (sugar, df=None, mode='sugar',
                              glycan_col_name='glycan', rank=None,
                              focus=None, modifications=False,
                              filepath='', thresh=10)

*for a given monosaccharide/linkage, return typical neighboring linkage/monosaccharide

Arguments:
sugar (string): monosaccharide or linkage
df (dataframe): dataframe to use for analysis; default:df_species
mode (string): either ‘sugar’ (connected monosaccharides), ‘bond’ (monosaccharides making a provided linkage), or ‘sugarbond’ (linkages that a provided monosaccharides makes); default:‘sugar’
glycan_col_name (string): column name under which glycans can be found; default:‘glycan’
rank (string): add column name as string if you want to filter for a group
focus (string): add row value as string if you want to filter for a group
modifications (bool): set to True if you want to consider modified versions of a monosaccharide; default:False
filepath (string): absolute path including full filename allows for saving the plot
thresh (int): threshold count of when to include motifs in plot; default:10 occurrences
Returns:
Plots modification distribution and typical neighboring bond/monosaccharide*
characterize_monosaccharide('D-Rha', rank = 'Kingdom', focus = 'Bacteria', modifications = True)


get_differential_expression

 get_differential_expression (df, group1, group2, motifs=False,
                              feature_set=['exhaustive', 'known'],
                              paired=False, impute=True, sets=False,
                              set_thresh=0.9, effect_size_variance=False,
                              min_samples=0.1, grouped_BH=False,
                              custom_motifs=[], transform=None, gamma=0.1,
                              custom_scale=0, glycoproteomics=False,
                              level='peptide', monte_carlo=False)

*Calculates differentially expressed glycans or motifs from glycomics data

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
impute (bool): replaces zeroes with a Random Forest based model; default:True
sets (bool): whether to identify clusters of highly correlated glycans/motifs to test for differential expression; default:False
set_thresh (float): correlation value used as a threshold for clusters; only used when sets=True; default:0.9
effect_size_variance (bool): whether effect size variance should also be calculated/estimated; default:False
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10%
grouped_BH (bool): whether to perform two-stage adaptive Benjamini-Hochberg as a grouped multiple testing correction; will SIGNIFICANTLY increase runtime; default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate)
glycoproteomics (bool): whether the analyzed data in df comes from a glycoproteomics experiment; default:False
level (string; only relevant if glycoproteomics=True): whether to analyze glycoform differential expression at the level of ‘peptide’ or ‘protein’; default:‘peptide’
monte_carlo (bool): whether to account for technical variation via Monte Carlo simulations; will be slower and much more conservative; default:False
Returns:
Returns a dataframe with:
(i) Differentially expressed glycans/motifs/sets
(ii) Their mean abundance across all samples in group1 + group2
(iii) Log2-transformed fold change of group2 vs group1 (i.e., negative = lower in group2)
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean
(v) Corrected p-values (Welch’s t-test with two-stage Benjamini-Hochberg correction) for difference in mean
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold
(vii) Corrected p-values (Levene’s test for equality of variances with Benjamini-Hochberg correction) for difference in variance
(viii) Effect size as Cohen’s d (sets=False) or Mahalanobis distance (sets=True)
(xi) Corrected p-values of equivalence test to test whether means are significantly equivalent; only done for p-values > 0.05 from (iv)
(x) [only if effect_size_variance=True] Effect size variance*
test_df = glycomics_data_loader.human_skin_O_PMC5871710_BCC

res = get_differential_expression(test_df, group1 = [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
                                  group2 = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], motifs = True, paired = True)
res
You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.
Glycan Mean abundance Log2FC p-val corr p-val significant corr Levene p-val Effect size Equivalence p-val
1 GalOS(b1-3)GalNAc 0.159900 -0.949766 0.000471 0.003768 True 0.935976 -0.942071 1.000000
4 Terminal_LacNAc_type2 2.328696 0.500929 0.001587 0.003801 True 0.690077 0.823108 1.000000
10 Neu5Ac(a2-3)Gal 7.956860 0.276936 0.001957 0.003801 True 0.935976 0.802525 1.000000
3 GlcNAc6S(b1-6)GalNAc 1.046247 0.922369 0.002160 0.003801 True 0.935976 0.792804 1.000000
2 H_type2 0.247156 -0.701400 0.002376 0.003801 True 0.935976 -0.783442 1.000000
5 Terminal_LacNAc_type2 2.440640 -0.475144 0.009863 0.013151 True 0.935976 -0.641129 1.000000
13 Neu5Ac 12.726196 0.194904 0.022240 0.025417 True 0.935976 0.556605 1.000000
0 Neu5Ac(a2-8)Neu5Ac 0.038663 -0.588491 0.033743 0.033743 True 0.690077 -0.511708 1.000000
8 Oglycan_core1 3.790085 0.186020 0.042067 0.042067 True 0.935976 0.487392 1.000000
14 Gal 12.886096 0.132187 0.049873 0.049873 False 0.935976 0.468301 1.000000
11 Gal(b1-3)GalNAc 4.769337 0.116511 0.082862 0.082862 False 0.935976 0.409377 0.681761
12 GalNAc 12.345115 0.106238 0.106760 0.106760 False 0.935976 0.378600 0.681761
7 Neu5Ac(a2-6)GalNAc 2.440640 -0.078716 0.502123 0.502123 False 0.935976 -0.152987 0.605365
6 Disialyl_T_antigen 2.328696 -0.027796 0.818170 0.818170 False 0.935976 -0.052125 0.570158
9 Mucin_elongated_core2 4.169726 0.010202 0.932748 0.932748 False 0.935976 0.019121 0.570158

get_volcano

 get_volcano (df_res, y_thresh=0.05, x_thresh=0, n=None,
              label_changed=True, x_metric='Log2FC',
              annotate_volcano=False, filepath='', **kwargs)

*Plots glycan differential expression results in a volcano plot

Arguments:
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx]
y_thresh (float): corr p threshhold for labeling datapoints; default:0.05
x_thresh (float): absolute x metric threshold for labeling datapoints; default:0
n (float): sample size for Bayesian-Adaptive Alpha Adjustment; default = None
label_changed (bool): if True, add text labels to significantly up- and downregulated datapoints; default:True
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’
annotate_volcano (bool): whether to annotate the dots in the plot with SNFG images; default: False
filepath (string): absolute path including full filename allows for saving the plot
**kwargs: keyword arguments that are directly passed on to seaborn scatterplot
Returns:
Prints volcano plot*
get_volcano(res)
You're working with a default alpha of 0.05. Set sample size (n = ...) for Bayesian-Adaptive Alpha Adjustment


get_coverage

 get_coverage (df, filepath='')

*Plot glycan coverage across samples, ordered by average intensity

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
filepath (string): absolute path including full filename allows for saving the plot
Returns:
Prints the heatmap*
test_df = pd.concat([test_df.iloc[:, 0], test_df[test_df.columns[1:]].astype(float)], axis = 1)

get_coverage(test_df)


get_pca

 get_pca (df, groups=None, motifs=False, feature_set=['known',
          'exhaustive'], pc_x=1, pc_y=2, color=None, shape=None,
          filepath='', custom_motifs=[], transform=None,
          rarity_filter=0.05)

*PCA plot from glycomics abundance dataframe

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
groups (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3]); default:None
alternatively: design dataframe with ‘id’ column of samples names and additional columns with meta information
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
pc_x (int): principal component to plot on x axis; default:1
pc_y (int): principal component to plot on y axis; default:2
color (string): if design dataframe is provided: column name for color grouping; default:None
shape (string): if design dataframe is provided: column name for shape grouping; default:None
filepath (string): absolute path including full filename allows for saving the plot
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (string): whether to transform the data before plotting, options are “CLR” and “ALR”, recommended for glycomics data; default: no transformation
rarity_filter (float): proportion of samples that need to have a non-zero value for a variable to be included; default:0.05
Returns:
Prints PCA plot*
get_pca(test_df, motifs = True, groups = [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2])


get_pval_distribution

 get_pval_distribution (df_res, filepath='')

*p-value distribution plot of glycan differential expression result

Arguments:
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv]
filepath (string): absolute path including full filename allows for saving the plot
Returns:
prints p-value distribution plot*
get_pval_distribution(res)


get_ma

 get_ma (df_res, log2fc_thresh=1, sig_thresh=0.05, filepath='')

*MA plot of glycan differential expression result

Arguments:
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx]
log2fc_thresh (int): absolute Log2FC threshold for highlighting datapoints
sig_thresh (int): significance threshold for highlighting datapoints
filepath (string): absolute path including full filename allows for saving the plot
Returns:
prints MA plot*
get_ma(res)


get_glycanova

 get_glycanova (df, groups, impute=True, motifs=False,
                feature_set=['exhaustive', 'known'], min_samples=0.1,
                posthoc=True, custom_motifs=[], transform=None, gamma=0.1,
                custom_scale=0)

*Calculate an ANOVA for each glycan (or motif) in the DataFrame

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
groups (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3])
impute (bool): replaces zeroes with with a Random Forest based model; default:True
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10%
posthoc (bool): whether to do Tukey’s HSD test post-hoc to find out which differences were significant; default:True
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
custom_scale (dict): dictionary of type group_idx : mean(group)/min(mean(groups)) for an informed scale model
Returns:
(i) a pandas DataFrame with an F statistic, corrected p-value, indication of its significance, and effect size (Omega squared) for each glycan.
(ii) a dictionary of type glycan : pandas DataFrame, with post-hoc results for each glycan with a significant ANOVA.*
test_df2 = glycomics_data_loader.HIV_gagtransfection_O_PMID35112714

anv, ph = get_glycanova(test_df2, [1,1,1,1,2,2,2,2,3,3,3,3], motifs = False)
anv
You're working with an alpha of 0.06364810000741428 that has been adjusted for your sample size of 12.
Glycan F statistic p-val corr p-val significant Effect size
3 Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]Ga... 5.598977 0.026315 0.118419 False 0.304589
4 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 6.438216 0.018374 0.118419 False 0.341206
0 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 2.987765 0.101128 0.303384 False 0.159177
1 Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc 1.471480 0.279954 0.402510 False 0.042973
6 Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Neu5Ac(a2-3)... 1.324978 0.313063 0.402510 False 0.030021
5 Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]Ga... 1.500984 0.273814 0.402510 False 0.045540
8 Neu5Ac(a2-3)Gal(b1-4)GlcNAc6S(b1-6)[Neu5Ac(a2-... 1.923255 0.201631 0.402510 False 0.080822
2 Neu5Ac(a2-3)Gal(b1-3)GalNAc 1.060368 0.385914 0.434153 False 0.005716
7 Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-?)[GlcNAc(b1-?)... 0.514362 0.614449 0.614449 False -0.048494

get_meta_analysis

 get_meta_analysis (effect_sizes, variances, model='fixed', filepath='',
                    study_names=[])

*Fixed-effects model or random-effects model for meta-analysis of glycan effect sizes

Arguments:
effect_sizes (array-like): Effect sizes (e.g., Cohen’s d) from each study
variances (array-like): Corresponding effect size variances from each study
model (string): Whether to use ‘fixed’ or ‘random’ effects model
filepath (string): absolute path including full filename allows for saving the Forest plot
study_names (list): list of strings indicating the name of each study
Returns:
(1) The combined effect size
(2) The p-value for the combined effect size*
get_meta_analysis([-8.759, -6.363, -5.199, -3.952],
                 [7.061, 4.041, 2.919, 1.968])
(np.float64(-5.326913553837341), np.float64(3.005077298112724e-09))

get_time_series

 get_time_series (df, impute=True, motifs=False, feature_set=['known',
                  'exhaustive'], degree=1, min_samples=0.1,
                  custom_motifs=[], transform=None, gamma=0.1,
                  custom_scale=0)

*Analyzes time series data of glycans using an OLS model

Arguments:
df (dataframe): dataframe containing sample IDs of style sampleID_UnitTimepoint_replicate (e.g., T1_h5_r1) in first column and glycan relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
impute (bool): replaces zeroes with a Random Forest based model; default:True
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
degree (int): degree of the polynomial for regression, default:1 for linear regression
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10%
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
custom_scale (dict): dictionary of type timepoint : mean(timepoint)/min(mean(timepoints)) for an informed scale model
Returns:
Returns a dataframe with:
(i) Glycans/motifs potentially exhibiting significant changes over time
(ii) The slope of their expression curve over time
(iii) Uncorrected p-values (t-test) for testing whether slope is significantly different from zero
(iv) Corrected p-values (t-test with two-stage Benjamini-Hochberg correction) for testing whether slope is significantly different from zero
(v) Significance: True/False whether the corrected p-value lies below the sample size-appropriate significance threshold*
t_dic = {}
t_dic["ID"] = ["D1_h5_r1", "D1_h5_r2", "D1_h5_r3", "D1_h10_r1", "D1_h10_r2", "D1_h10_r3", "D1_h15_r1", "D1_h15_r2", "D1_h15_r3"]
t_dic["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc"] = [0.33, 0.31, 0.35, 1.51, 1.57, 1.66, 2.11, 2.04, 2.09]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [0.78, 1.01, 0.98, 0.88, 1.11, 0.72, 1.22, 1.00, 0.54]
t_dic["Neu5Ac(a2-6)GalNAc"] = [0.11, 0.09, 0.14, 0.02, 0.07, 0.10, 0.11, 0.09, 0.08]
get_time_series(pd.DataFrame(t_dic).set_index("ID").T)
You're working with an alpha of 0.0694557066556809 that has been adjusted for your sample size of 9.
Glycan Change p-val corr p-val significant
1 Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]Ga... 0.196162 0.004659 0.004659 True
0 Fuc(a1-2)Gal(b1-3)GalNAc -0.083848 0.011912 0.011912 True
2 Neu5Ac(a2-6)GalNAc -0.109381 0.090226 0.090226 False

get_jtk

 get_jtk (df_in, timepoints, periods, interval, motifs=False,
          feature_set=['known', 'exhaustive', 'terminal'],
          custom_motifs=[], transform=None, gamma=0.1,
          correction_method='two-stage')

*Detecting rhythmically expressed glycans via the Jonckheere–Terpstra–Kendall (JTK) algorithm

Arguments:
df_in (pd.DataFrame): A dataframe containing data for analysis. [alternative: filepath to .csv or .xlsx]
(column 0 = molecule IDs, then arranged in groups and by ascending timepoints)
timepoints (int): number of timepoints in the experiment (each timepoint must have the same number of replicates).
periods (list): number of timepoints (as int) per cycle.
interval (int): units of time (Arbitrary units) between experimental timepoints.
motifs (bool): a flag for running structural of motif-based analysis (True = run motif analysis); default:False.
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
correction_method (string): whether to use “two-stage” or “one-stage” Benjamini-Hochberg for correction; default:“two-stage”
Returns:
Returns a pandas dataframe containing the adjusted p-values, and most important waveform parameters for each
molecule in the analysis.*
t_dic = {}
t_dic["Neu5Ac(a2-3)Gal(b1-3)GalNAc"] = [0.433138901, 0.149729209, 0.358018822, 0.537641256, 1.526963756, 1.349986672, 0.75156406, 0.736710183]
t_dic["Gal(b1-3)GalNAc"] = [0.919762334, 0.760237184, 0.725566662, 0.459945797, 0.523801515, 0.695106926, 0.627632047, 1.183511209]
t_dic["Gal(b1-3)[Neu5Ac(a2-6)]GalNAc"] = [0.533138901, 0.119729209, 0.458018822, 0.637641256, 1.726963756, 1.249986672, 0.55156406, 0.436710183]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [3.862169504, 5.455032837, 3.858163289, 5.614650335, 3.124254095, 4.189550337, 4.641831312, 4.19538484]
tps = 8  # number of timepoints in experiment
periods = [8]  # number of timepoints per cycle
interval = 3  # units of time between experimental timepoints
t_df = pd.DataFrame(t_dic).T
t_df.columns = ["T3", "T6", "T9", "T12", "T15", "T18", "T21", "T24"]
get_jtk(t_df.reset_index(), tps, periods, interval)
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Significance inflation detected. The CLR/ALR transformation possibly cannot handle this dataset. Consider running again with a higher gamma value.             Proceed with caution; for now switching to Bonferroni correction to be conservative about this.
Molecule_Name BH_Q_Value Adjusted_P_value Period_Length Lag_Phase Amplitude significant
0 Neu5Ac(a2-3)Gal(b1-3)GalNAc 0.006944 0.001736 24.0 16.5 0.474136 True
1 Gal(b1-3)GalNAc 0.006944 0.001736 24.0 1.5 0.220136 True
2 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0.056548 0.014137 24.0 13.5 0.379760 False
3 Fuc(a1-2)Gal(b1-3)GalNAc 0.434722 0.108681 24.0 4.5 0.310215 False
get_jtk(t_df.reset_index(), tps, periods, interval, motifs = True, feature_set = ['terminal'])
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Molecule_Name BH_Q_Value Adjusted_P_value Period_Length Lag_Phase Amplitude significant
2 Terminal_Neu5Ac(a2-?) 0.000794 0.000397 0.0 0.0 0.000000e+00 True
0 Terminal_Neu5Ac(a2-3) 0.001736 0.001736 24.0 15.0 1.110223e-16 True
1 Terminal_Neu5Ac(a2-6) 0.014137 0.014137 24.0 13.5 2.283195e-01 True
4 Terminal_Fuc(a1-2) 0.061012 0.061012 24.0 4.5 2.825447e-01 True
3 Terminal_Gal(b1-3) 0.398760 0.398760 24.0 3.0 6.938894e-18 False

get_biodiversity

 get_biodiversity (df, group1, group2, metrics=['alpha', 'beta'],
                   motifs=False, feature_set=['exhaustive', 'known'],
                   custom_motifs=[], paired=False, permutations=999,
                   transform=None, gamma=0.1, custom_scale=0)

*Calculates diversity indices from glycomics data, similar to alpha/beta diversity etc in microbiome data

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): a list of column identifiers corresponding to samples in group 1
group2 (list): a list of column identifiers corresponding to samples in group 2 (note, if an empty list is provided, group 1 can be used a list of group identifiers for each column - e.g., [1,1,2,2,3,3…])
metrics (list): which diversity metrics to calculate (alpha, beta); default:[‘alpha’, ‘beta’]
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
permutations (int): number of permutations to perform in ANOSIM and PERMANOVA statistical test; default:999
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate)
Returns:
Returns a dataframe with:
(i) Diversity indices/metrics
(ii) Mean value of diversity metrics in group 1 (only alpha)
(iii) Mean value of diversity metrics in group 2 (only alpha)
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean
(v) Corrected p-values (Welch’s t-test with two-stage Benjamini-Hochberg correction) for difference in mean
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold
(vii) Effect size as Cohen’s d (ANOSIM R for beta; F statistics for PERMANOVA and Shannon/Simpson (ANOVA))*
res = get_biodiversity(test_df, group1 = [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
                                  group2 = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], motifs = True, paired = True)
res
You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.
Metric Group1 mean Group2 mean p-val Effect size corr p-val significant
0 simpson_diversity 0.876756 0.874348 0.000443 -0.948203 0.000443 True
1 shannon_diversity 2.244523 2.225758 0.001255 -0.846077 0.001255 True
2 Beta diversity (ANOSIM) NaN NaN 0.002002 0.145276 0.002002 True
3 Beta diversity (PERMANOVA) NaN NaN 0.003003 43.547762 0.003003 True
4 species_richness 15.000000 15.000000 1.000000 0.000000 1.000000 False

get_SparCC

 get_SparCC (df1, df2, motifs=False, feature_set=['known', 'exhaustive'],
             custom_motifs=[], transform=None, gamma=0.1,
             partial_correlations=False)

*Performs SparCC (Sparse Correlations for Compositional Data) on two (glycomics) datasets. Samples should be in the same order.

Arguments:
df1 (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
df2 (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
partial_correlations (bool): whether to use regularized partial correlations instead (enriches for direct effects); default:False
Returns:
Returns (i) a dataframe of pairwise correlations (Spearman’s rho)
and (ii) a dataframe with corrected p-values (two-stage Benjamini-Hochberg)*
df1 = glycomics_data_loader.time_series_N_PMID32149347
df2 = glycomics_data_loader.time_series_O_PMID32149347
df1 = pd.merge(df1, df2[['ID']], on = 'ID', how = 'inner')
df2 = pd.merge(df2, df1[['ID']], on = 'ID', how = 'inner')
df1 = df1.set_index(df1.columns.tolist()[0]).T.reset_index()
df2 = df2.set_index(df2.columns.tolist()[0]).T.reset_index()

corr, pval = get_SparCC(df1, df2, motifs = True, transform = "CLR")
sns.clustermap(corr)
You're working with an alpha of 0.04787928055709467 that has been adjusted for your sample size of 31.


get_roc

 get_roc (df, group1, group2, motifs=False, feature_set=['known',
          'exhaustive'], paired=False, impute=True, min_samples=0.1,
          custom_motifs=[], transform=None, gamma=0.1, custom_scale=0,
          filepath='', multi_score=False)

*Calculates ROC AUC for every feature and, optionally, plots the best

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples (note, if an empty list is provided, group 1 can be used a list of group identifiers for each column - e.g., [1,1,2,2,3,3…])
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
impute (bool): replaces zeroes with a Random Forest based model; default:True
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10%
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate)
filepath (string): absolute path including full filename allows for saving the plot, if plot=True
multi_score (bool): whether to find the best glycan risk score, containing multiple glycan features; default:False
Returns:
Returns a sorted list of tuples of type (glycan, AUC score) and, optionally, ROC curve for best feature*
get_roc(test_df, group1 = [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
                                  group2 = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], motifs = True, paired = True)
[('GlcNAc6S(b1-6)GalNAc', np.float64(0.765)),
 ('Neu5Ac(a2-3)Gal', np.float64(0.685)),
 ('Neu5Ac', np.float64(0.66)),
 ('Oglycan_core1', np.float64(0.6325)),
 ('Gal', np.float64(0.61)),
 ('Gal(b1-3)GalNAc', np.float64(0.6)),
 ('GalNAc', np.float64(0.595)),
 ('Mucin_elongated_core2', np.float64(0.515)),
 ('Disialyl_T_antigen', np.float64(0.48000000000000004)),
 ('Neu5Ac(a2-6)GalNAc', np.float64(0.46)),
 ('Neu5Ac(a2-8)Neu5Ac', np.float64(0.36250000000000004)),
 ('Terminal_LacNAc_type2', np.float64(0.28)),
 ('H_type2', np.float64(0.26250000000000007)),
 ('GalOS(b1-3)GalNAc', np.float64(0.2375))]


get_lectin_array

 get_lectin_array (df, group1, group2, paired=False, transform='')

*Function for analyzing lectin array data for two or more groups.

Arguments:
df (dataframe): dataframe containing samples as rows and lectins as columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of indices or names for the first group of samples, usually the control
group2 (list): list of indices or names for the second group of samples (note, if an empty list is provided, group 1 can be used a list of group identifiers for each column - e.g., [1,1,2,2,3,3…])
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
transform (string): optional data-processing, “log2” transforms df with np.log2; default:nothing
Returns:
Returns an output dataframe with:
(i) Deduced glycan motifs altered between groups
(ii) human names for features identified in the motifs from (i)
(iii) Lectins supporting the change in (i)
(iv) Direction of the change (e.g., “up” means higher in group2)
(v) Score/Magnitude of the change (remember, if you have more than two groups this reports on any pairwise combination, like an ANOVA)
(vi) Clustering of the scores into highly/moderate/low significance findings*
lectin_df = lectin_array_data_loader.A549_influenza_PMID33046650
get_lectin_array(lectin_df, [5,6,7], [8,9,10])
Lectin "Ab-LeB-1" is not found in our annotated lectin library and is excluded from analysis.
Lectin "APA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "APP" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Blood Group B [CLCP-19B]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Blood Group H2" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CA19-9 [121SLE]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CCA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CD15 [ICRF29-2]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CD15 [MY-1]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CD15 [SP-159]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Frossman" is not found in our annotated lectin library and is excluded from analysis.
Lectin "IAA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "IRA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Le X [P12]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Lewis A [7LE]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Lewis B [218]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Lewis Y [F3]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "LFA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "LPA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "MNA-M " is not found in our annotated lectin library and is excluded from analysis.
Lectin "MUC5Ac Ab" is not found in our annotated lectin library and is excluded from analysis.
Lectin "PMA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "PTA_1" is not found in our annotated lectin library and is excluded from analysis.
Lectin "PTA_2" is not found in our annotated lectin library and is excluded from analysis.
Lectin "SNA-S" is not found in our annotated lectin library and is excluded from analysis.
Lectin "SNA-V" is not found in our annotated lectin library and is excluded from analysis.
Lectin "VFA" is not found in our annotated lectin library and is excluded from analysis.
motif named_motifs lectin(s) change score significance
39 Neu5Ac(a2-6)Gal(b1-3)GlcNAc [Internal_LacNAc_type1] PSL, SNA, TJA-I, BDA, BPA, WGA_1, WGA_2 down 9.08 highly significant
38 Neu5Ac(a2-6)Gal(b1-4)GlcNAc [Internal_LacNAc_type2] PSL, SNA, TJA-I, BDA, BPA, ECA, RCA120, Ricin ... down 8.57 highly significant
7 Man(a1-2) [] ASA, Con A, CVN, HHL, SVN_1, GRFT, SVN_2, SNA-... up 4.87 moderately significant
14 Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc... [Chitobiose, Trimannosylcore, Terminal_LacNAc_... CA, CAA, DSA_1, DSA_2, DSA_3, AMA, BDA, BPA, C... up 3.57 moderately significant
4 Gal(b1-3)GalNAc [Oglycan_core1] ACA, AIA, MPA, PNA_1, PNA_2, BDA, BPA up 3.49 moderately significant
10 Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-4)][G... [Chitobiose, Trimannosylcore, Terminal_LacNAc_... Blackbean, Calsepa, PHA-E_1, PHA-E_2, AMA, BDA... up 2.77 moderately significant
16 Fuc(a1-2)Gal(b1-3)GalNAc(b1-4)[Neu5Ac(a2-3)]Ga... [Internal_LacNAc_type2, H_type3] Cholera Toxin, AAA, AAL, ACA, AIA, AOL, BDA, B... up 2.54 moderately significant
47 GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Ma... [Chitobiose, Trimannosylcore, core_fucose, Ngl... TL, AAL, AMA, AOL, Con A, GNA, GNL, HHL, LcH, ... up 2.50 moderately significant
15 Gal(b1-3)GalNAc(b1-4)[Neu5Ac(a2-3)]Gal(b1-4)Gl... [Internal_LacNAc_type2] Cholera Toxin, ACA, AIA, BDA, BPA, CSA, ECA, L... up 2.49 moderately significant
18 Man(a1-6) [] Con A, GNA, GNL, HHL, NPA, SNA-II, UDA up 2.35 moderately significant
17 Man(a1-3) [] Con A, GNA, GNL, HHL, NPA, SNA-II, UDA up 2.35 moderately significant
43 Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc [Internal_LacdiNAc_type2] SNA, CSA, SBA, VVA_1, VVA_2, WFA, BPA, ECA, ST... down 2.30 moderately significant
22 Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-4)]Ma... [Chitobiose, Trimannosylcore, Terminal_LacNAc_... DSA_1, DSA_2, DSA_3, AMA, BDA, Blackbean, BPA,... up 2.17 moderately significant
46 Fuc(a1-2)Gal(b1-3)GalNAc [H_type3, Oglycan_core1] TJA-II, AAA, AAL, ACA, AIA, AOL, BDA, BPA, MPA... up 1.99 moderately significant
3 Fuc(a1-6) [] AAL, AOL, LcH, PSA up 1.82 moderately significant
6 Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc [Chitobiose, Trimannosylcore] AMA, Con A, GNA, GNL, HHL, NPA, SNA-II, UDA, W... up 1.62 moderately significant
34 Neu5Ac(a2-3)Gal(b1-3)GalNAc [Oglycan_core1] MAL-II, ACA, AIA, BDA, BPA, MPA, PNA_1, PNA_2,... up 1.60 moderately significant
11 GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)[GlcNAc(b1-6... [Chitobiose, Trimannosylcore, Nglycan_complex] Blackbean, PHA-L, AMA, Con A, GNA, GNL, HHL, N... up 1.54 moderately significant
42 GlcNAc(b1-2)[GlcNAc(b1-6)]Man(a1-6)[GlcNAc(b1-... [Chitobiose, Trimannosylcore, bisectingGlcNAc,... RPA, AMA, Blackbean, Con A, GNA, GNL, HHL, NPA... up 1.47 moderately significant
41 GlcNAc(b1-2)[GlcNAc(b1-4)]Man(a1-3)[GlcNAc(b1-... [Chitobiose, Trimannosylcore, bisectingGlcNAc,... RPA, AMA, Con A, GNA, GNL, HHL, NPA, SNA-II, U... up 1.40 moderately significant
23 Gal(b1-4)GlcNAc [Terminal_LacNAc_type2] ECA, RCA120, Ricin B Chain, SJA, BDA, BPA up 1.06 low significance
5 GlcNAc(b1-3)GalNAc [Oglycan_core3] AIA, UEA-II, WGA_1, WGA_2 up 0.86 low significance
26 Gal(a1-3) [] GS-I_1, GS-I_2, GS-I_3, GS-I_4, MNA-G, PA-IL up 0.83 low significance
27 Gal(a1-4) [] GS-I_1, GS-I_2, GS-I_3, GS-I_4, MNA-G, PA-IL up 0.83 low significance
30 Gal(b1-4)GlcNAc(b1-3) [Terminal_LacNAc_type2] LEA_1, LEA_2, STA, BDA, BPA, ECA, RCA120, Rici... up 0.55 low significance
25 Gal(a1-3)Gal [] EEA, EEL, MOA, GS-I_1, GS-I_2, GS-I_3, GS-I_4,... up 0.53 low significance
33 Neu5Ac(a2-3)Gal(b1-4)GlcNAc [Internal_LacNAc_type2] MAA_1, MAA_2, MAL-I, BDA, BPA, ECA, RCA120, Ri... up 0.50 low significance
37 Gal(a1-3)GalNAc [] MOA, EEA, EEL, GS-I_1, GS-I_2, GS-I_3, GS-I_4,... up 0.47 low significance
20 GalNAc(a1-4) [] GHA, HAA, HPA, CSA, GS-I_1, GS-I_2, GS-I_3, GS... up 0.41 low significance
19 GalNAc(a1-3) [] GHA, HAA, HPA, CSA, GS-I_1, GS-I_2, GS-I_3, GS... up 0.41 low significance
21 GalNAc(a1-3)GalNAc(b1-3) [] DBA, SBA, CSA, GHA, HAA, HPA, VVA_1, VVA_2, WF... up 0.27 low significance
24 GalNAc(b1-4)GlcNAc [Terminal_LacdiNAc_type2] ECA, STA, CSA, SBA, VVA_1, VVA_2, WFA, BPA, WG... up 0.21 low significance
44 Fuc(a1-2)Gal(b1-4)GalNAc(b1-3) [] SNA-II, AAA, AAL, AOL, BDA, BPA, CSA, SBA, VVA... up 0.19 low significance
13 GalNAc(b1-4) [] CSA, SBA, VVA_1, VVA_2, WFA, BPA, WGA_1, WGA_2 up 0.16 low significance
12 GalNAc(b1-3) [] CSA, SBA, VVA_1, VVA_2, WFA, BPA, WGA_1, WGA_2 up 0.16 low significance
40 Fuc(a1-2)Gal(b1-4)GlcNAc [H_type2, Internal_LacNAc_type2] PTL-II, TJA-II, UEA-I, UEA-II, AAA, AAL, AOL, ... up 0.14 low significance
32 Gal3S(b1-4)GlcNAc [] MAA_1, MAA_2, MAL-I, MAL-II down 0.12 low significance
28 GlcNAc(a1-3) [] HAA, HPA, WGA_1, WGA_2 up 0.12 low significance
29 GlcNAc(a1-4) [] HAA, HPA, WGA_1, WGA_2 up 0.12 low significance
0 Fuc(a1-2) [] AAA, AAL, AOL up 0.08 low significance
36 Gal3S(b1-4) [] MAL-II down 0.08 low significance
35 Gal3S(b1-3) [] MAL-II down 0.08 low significance
49 Fuc(a1-2)Gal(b1-4)GalNAc [] UEA-II, AAA, AAL, AOL, BDA, BPA up 0.07 low significance
9 Gal(b1-4) [] BDA, BPA up 0.06 low significance
8 Gal(b1-3) [] BDA, BPA up 0.06 low significance
2 Fuc(a1-4) [] AAL, AOL down 0.04 low significance
1 Fuc(a1-3) [] AAL, AOL, Lotus down 0.04 low significance
31 GlcNAc(b1-4)GlcNAc(b1-4) [Chitobiose] LEA_1, LEA_2, WGA_1, WGA_2 down 0.01 low significance
50 GlcNAc(b1-3) [] WGA_1, WGA_2 down 0.01 low significance
51 GlcNAc(b1-4) [] WGA_1, WGA_2 down 0.01 low significance
45 GlcNAc(b1-4)GlcNAc(b1-4)GlcNAc(b1-4) [Chitobiose] STA, LEA_1, LEA_2, WGA_1, WGA_2 down 0.00 low significance
48 GlcNAc(b1-3)Gal [] UEA-II, WGA_1, WGA_2 up 0.00 low significance
52 Neu5Ac(a2-3) [] WGA_1, WGA_2 down 0.00 low significance
53 Neu5Ac(a2-6) [] WGA_1, WGA_2 down 0.00 low significance
54 Neu5Ac(a2-8) [] WGA_1, WGA_2 down 0.00 low significance

get_glycoshift_per_site

 get_glycoshift_per_site (df, group1, group2, paired=False, impute=True,
                          min_samples=0.2, gamma=0.1, custom_scale=0)

*Calculates differentially expressed glycans or motifs from glycoproteomics data

Arguments:
df (dataframe): glycoproteomics dataset, expects first column to be formatted as protein_site_composition and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
impute (bool): replaces zeroes with a Random Forest based model; default:True
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 20%
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate)
Returns:
Returns a dataframe with:
(for each condition/interaction feature)
(i) Regression coefficient from the GLM (indicating direction of change in the treatment condition)
(ii) Corrected p-values (two-tailed t-test with two-stage Benjamini-Hochberg correction) for testing the coefficient against zero
(iii) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold*
df_milk = glycoproteomics_data_loader.human_milk_N_PMID34087070

get_glycoshift_per_site(df_milk, ['Colostrum1', 'Colostrum2', 'Colostrum3'], ['Mature1', 'Mature2', 'Mature3'])
You're working with an alpha of 0.07862467893233027 that has been adjusted for your sample size of 6.
Condition_coefficient Condition_corr_pval Condition_significant Hex_Condition_coefficient Hex_Condition_corr_pval Hex_Condition_significant HexNAc_Condition_coefficient HexNAc_Condition_corr_pval HexNAc_Condition_significant complex_Condition_coefficient ... Neu5Ac_Condition_significant dHex_Condition_coefficient dHex_Condition_corr_pval dHex_Condition_significant high_Man_Condition_coefficient high_Man_Condition_corr_pval high_Man_Condition_significant hybrid_Condition_coefficient hybrid_Condition_corr_pval hybrid_Condition_significant
sp|P10909|CLUS_103 -0.154462 6.132844e-267 True -0.772309 6.132844e-267 True -0.617847 6.132844e-267 True 4.615035 ... True -0.154462 5.913814e-267 True 0.000000 1.000000e+00 False -4.769497 0.000000e+00 True
sp|P01024|CO3_85 -12.526980 2.581301e-204 True 11.700922 1.551607e-205 True -25.053959 2.581301e-204 True 0.000000 ... False 0.000000 1.000000e+00 False -12.526980 6.084495e-204 True -12.526980 2.396922e-204 True
sp|P47710|CASA1_69 0.290159 5.449434e-31 True -1.271521 2.240400e-31 True 1.160635 5.449434e-31 True 0.000000 ... True 3.012474 5.268244e-32 True 0.000000 1.000000e+00 False 0.290159 3.795142e-31 True
sp|Q08380|LG3BP_125 0.001841 2.313991e-04 True 0.009204 2.313991e-04 True 0.007364 2.313991e-04 True 0.000000 ... True 0.001841 1.487566e-04 True 0.000000 1.000000e+00 False 0.001841 1.432471e-04 True
sp|P00709|LALBA_90 -1.353837 4.248180e-04 True 4.058993 4.323787e-03 True -5.415348 4.248180e-04 True -0.734612 ... True 6.595320 2.009174e-20 True 0.000000 1.000000e+00 False -0.619225 5.502628e-01 False
sp|Q13410|BT1A1_55 -16.641783 7.593800e-04 True -0.699019 3.161822e-01 False 15.942763 4.284923e-04 True -4.052916 ... True -6.082883 6.878363e-03 True 0.000000 1.000000e+00 False -12.588866 1.116992e-02 True
sp|P19652|A1AG2_HUMAN/sp|P02763|A1AG1 -0.000937 3.594687e-01 False -0.004685 2.981904e-01 False -0.003748 2.614318e-01 False -0.000937 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P07602|SAP_101 0.002402 3.594687e-01 False 0.012011 2.981904e-01 False 0.009609 2.614318e-01 False 0.000000 ... False 0.002402 3.081160e-01 False 0.000000 1.000000e+00 False 0.002402 2.967043e-01 False
sp|P06858|LIPL_70 -0.001105 3.975872e-01 False -0.005524 2.981904e-01 False -0.004419 2.752527e-01 False -0.001105 ... False -0.001105 3.450489e-01 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P01833|PIGR_421 5.832778 4.195389e-01 False -5.697931 9.086043e-03 True 5.339127 4.356248e-02 True 0.000000 ... False -1.795784 3.877522e-01 False 0.000000 1.000000e+00 False 5.832778 3.246432e-01 False
sp|P07602|SAP_215 -0.008382 4.586658e-01 False -0.016764 3.363550e-01 False -0.016764 3.485725e-01 False 0.000000 ... False -0.008382 3.877522e-01 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P01833|PIGR_186 -0.008637 4.917509e-01 False 0.032629 5.341361e-01 False -0.034550 3.688132e-01 False -0.084454 ... False 0.075817 4.506151e-01 False 0.000000 1.000000e+00 False 0.075817 4.776920e-01 False
sp|P00738|HPT_241 0.001039 5.165593e-01 False 0.005195 4.017684e-01 False 0.004156 4.017684e-01 False 0.001039 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P10909|CLUS_86 0.001587 5.165593e-01 False 0.007935 4.017684e-01 False 0.006348 4.017684e-01 False 0.000000 ... False 0.001587 4.506151e-01 False 0.000000 1.000000e+00 False 0.001587 4.790272e-01 False
sp|P02788|TRFL_497 0.345354 5.925908e-01 False 1.804335 3.415734e-01 False 1.381416 4.581199e-01 False -11.595332 ... False -0.304229 7.941176e-01 False 0.000000 1.000000e+00 False -14.451221 1.982223e-06 True
sp|P02788|TRFL_156 -4.029533 5.925908e-01 False 3.516740 2.958771e-01 False -4.899001 5.275917e-02 True -0.165193 ... False 4.741650 2.666697e-08 True 0.000000 1.000000e+00 False -3.864340 3.246432e-01 False
sp|P0C0L5|CO4B_HUMAN/sp|P0C0L4|CO4A 0.000646 5.925908e-01 False 0.005811 5.302128e-01 False 0.001291 4.797163e-01 False 0.000000 ... False 0.000000 1.000000e+00 False 0.000646 9.705882e-01 False 0.000646 5.502628e-01 False
sp|P01833|PIGR_469 -4.319860 7.288590e-01 False -2.844144 2.981904e-01 False 2.497897 3.485725e-01 False 16.293140 ... False 0.273577 7.941176e-01 False 0.000000 1.000000e+00 False 10.667323 7.917684e-03 True
sp|P01876|IGHA1_340 5.404028 7.288590e-01 False -0.747848 7.306030e-01 False -0.523444 6.448328e-01 False -6.757407 ... True -1.901116 4.506151e-01 False 3.007233 9.705882e-01 False -1.767393 7.124186e-01 False
sp|P10909|CLUS_374 -0.000942 7.288590e-01 False -0.004711 6.337904e-01 False -0.003769 6.337904e-01 False 0.000000 ... False -0.001884 7.809203e-01 False 0.000000 1.000000e+00 False -0.000942 7.124186e-01 False
sp|P01591|IGJ_71 2.236056 7.376554e-01 False -1.364109 2.981904e-01 False 0.576360 4.024670e-01 False -1.475300 ... False 0.527272 5.322575e-01 False 0.000000 1.000000e+00 False 1.397952 5.246436e-01 False
sp|P08571|CD14_151 -0.000794 7.376554e-01 False -0.004761 6.717008e-01 False -0.001587 6.448328e-01 False 0.000000 ... False 0.000000 1.000000e+00 False -0.000794 9.705882e-01 False -0.000794 7.128253e-01 False
sp|P01871|IGHM_46 0.000313 7.896314e-01 False 0.001567 7.306030e-01 False 0.001253 7.306030e-01 False 0.000000 ... False 0.000313 7.941176e-01 False 0.000000 1.000000e+00 False 0.000313 7.647059e-01 False
sp|P02765|FETUA_156 0.000313 7.896314e-01 False 0.001563 7.431825e-01 False 0.001251 7.657032e-01 False 0.000313 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P02749|APOH_253 0.000138 7.896314e-01 False 0.000690 7.807170e-01 False 0.000552 7.807170e-01 False 0.000138 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P10909|CLUS_291 0.000169 8.188730e-01 False 0.000843 8.188730e-01 False 0.000506 8.188730e-01 False 0.000000 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000169 8.188730e-01 False
sp|P01833|PIGR_499 -2.354848 8.373113e-01 False -1.412526 6.076487e-01 False 3.270966 2.715501e-01 False -1.166694 ... False 3.649581 8.212144e-02 False 0.000000 1.000000e+00 False -2.378288 7.128253e-01 False
sp|P07602|SAP_426 0.000803 8.570498e-01 False 0.004013 8.570498e-01 False 0.001605 8.570498e-01 False 0.000000 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000803 8.570498e-01 False
sp|P02790|HEMO_453 0.000150 8.857313e-01 False 0.000750 8.857313e-01 False 0.000600 8.857313e-01 False 0.000150 ... False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False 0.000000 1.000000e+00 False
sp|P25311|ZA2G_109 0.002373 8.858231e-01 False 0.011864 8.858231e-01 False 0.009491 8.858231e-01 False -0.079942 ... False 0.002373 8.858231e-01 False 0.000000 1.000000e+00 False 0.082314 8.203274e-01 False
sp|P0DOX2|IGA2_HUMAN/sp|P01877|IGHA2 0.687611 8.875324e-01 False 0.862922 5.341361e-01 False -0.427391 5.598894e-01 False -5.206276 ... False -1.255648 3.877522e-01 False -2.187064 9.705882e-01 False -2.559791 3.246432e-01 False
sp|P01011|AACT_106 0.001222 9.024359e-01 False 0.006108 9.024359e-01 False 0.004886 9.024359e-01 False -2.853751 ... True 2.856194 5.926258e-33 True 0.000000 1.000000e+00 False 2.854973 3.096444e-33 True
sp|P01877|IGHA2_327 -0.197039 9.317804e-01 False -0.332926 9.003150e-01 False -0.394079 9.317804e-01 False 0.000000 ... False 0.000000 1.000000e+00 False 4.492679 9.705882e-01 False -0.197039 9.317804e-01 False
sp|Q08431|MFGM_238 0.000435 9.985248e-01 False 0.115394 3.363550e-01 False -0.300049 2.614318e-01 False 0.000000 ... False 0.000000 1.000000e+00 False 0.207069 9.705882e-01 False 0.000435 9.985248e-01 False

34 rows × 24 columns

annotate

extract curated motifs, graph features, and sequence features from glycan sequences


annotate_glycan

 annotate_glycan (glycan, motifs=None, termini_list=[], gmotifs=None)

*searches for known motifs in glycan sequence

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent
motifs (dataframe): dataframe of glycan motifs (name + sequence), can be used with a list of glycans too; default:motif_list
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
gmotifs (networkx): precalculated motif graphs for speed-up; default:None
Returns:
Returns dataframe with counts of motifs in glycan*
annotate_glycan("Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
motif_name Terminal_LewisX Internal_LewisX LewisY SialylLewisX SulfoSialylLewisX Terminal_LewisA Internal_LewisA LewisB SialylLewisA SulfoLewisA ... Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX Difucosylated_core GalFuc_core
Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc 0 1 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

1 rows × 156 columns


annotate_dataset

 annotate_dataset (glycans, motifs=None, feature_set=['known'],
                   termini_list=[], condense=False, custom_motifs=[])

*wrapper function to annotate motifs in list of glycans

Arguments:
glycans (list): list of IUPAC-condensed glycan sequences as strings
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
termini_list (list): list of monosaccharide/linkage positions for motifs (from ‘terminal’, ‘internal’, and ‘flexible’)
condense (bool): if True, throws away columns with only zeroes; default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns dataframe of glycans (rows) and presence/absence of known motifs (columns)*
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P']
print("Annotate Test")
out = annotate_dataset(glycans)
Annotate Test
motif_name Terminal_LewisX Internal_LewisX LewisY SialylLewisX SulfoSialylLewisX Terminal_LewisA Internal_LewisA LewisB SialylLewisA SulfoLewisA H_type2 H_type1 A_antigen B_antigen Galili_antigen GloboH Gb5 Gb4 Gb3 3SGb3 8DSGb3 3SGb4 8DSGb4 6DSGb4 3SGb5 8DSGb5 6DSGb5 6DSGb5_2 6SGb3 8DSGb3_2 6SGb4 8DSGb4_2 6SGb5 8DSGb5_2 66DSGb5 Forssman_antigen iGb3 I_antigen i_antigen PI_antigen Chitobiose Trimannosylcore Internal_LacNAc_type1 Terminal_LacNAc_type1 Internal_LacNAc_type2 Terminal_LacNAc_type2 Internal_LacdiNAc_type1 Terminal_LacdiNAc_type1 Internal_LacdiNAc_type2 Terminal_LacdiNAc_type2 bisectingGlcNAc VIM PolyLacNAc Ganglio_Series Lacto_Series(LewisC) NeoLacto_Series betaGlucan KeratanSulfate Hyluronan Mollu_series Arthro_series Cellulose_like Chondroitin_4S GPI_anchor Isoglobo_series LewisD Globo_series Sda SDA Muco_series Heparin Peptidoglycan Dermatansulfate CAD Lactosylceramide Lactotriaosylceramide LexLex GM3 H_type3 GM2 GM1 cisGM1 VIM2 GD3 GD1a GD2 GD1b SDLex Nglycolyl_GM2 Fuc_LN3 GT1b GD1 GD1a_2 LcGg4 GT3 Disialyl_T_antigen GT1a GT2 GT1c 2Fuc_GM1 GQ1c O_linked_mannose GT1aa GQ1b HNK1 GQ1ba O_mannose_Lex 2Fuc_GD1b Sialopentaosylceramide Sulfogangliotetraosylceramide B-GM1 GQ1aa bisSulfo-Lewis x para-Forssman core_fucose core_fucose(a1-3) GP1c B-GD1b GP1ca Isoglobotetraosylceramide polySia high_mannose Gala_series LPS_core Nglycan_complex Nglycan_complex2 Oglycan_core1 Oglycan_core2 Oglycan_core3 Oglycan_core4 Oglycan_core5 Oglycan_core6 Oglycan_core7 Xylogalacturonan Sialosylparagloboside LDNF OFuc Arabinogalactan_type2 EGF_repeat Nglycan_hybrid Arabinan Xyloglucan Acharan_Sulfate M3FX M3X 1-6betaGalactan Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX Difucosylated_core GalFuc_core
Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quantify_motifs

 quantify_motifs (df, glycans, feature_set, custom_motifs=[],
                  remove_redundant=True)

*Extracts and quantifies motifs for a dataset

Arguments:
df (dataframe): dataframe containing relative abundances (each sample one column) [alternative: filepath to .csv or .xlsx]
glycans(list): glycans as IUPAC-condensed strings
feature_set (list): which feature set to use for annotations, add more to list to expand; default is [‘exhaustive’,‘known’]; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
remove_redundant (bool): whether to remove redundant motifs via clean_up_heatmap; default:True
Returns:
Returns a pandas DataFrame with motifs as columns and samples as rows*
quantify_motifs(test_df.iloc[:, 1:], test_df.iloc[:, 0].values.tolist(), ['known', 'exhaustive'])
control_1 tumor_1 control_2 tumor_2 control_3 tumor_3 control_4 tumor_4 control_5 tumor_5 ... control_16 tumor_16 control_17 tumor_17 control_18 tumor_18 control_19 tumor_19 control_20 tumor_20
Neu5Ac(a2-8)Neu5Ac 0.084745 0.120050 0.388219 0.055402 0.279696 0.082135 0.369784 0.022555 0.080158 0.084913 ... 0.485839 0.629202 0.535171 0.637019 0.245015 0.127952 0.029853 0.022643 0.219166 0.331947
GalOS(b1-3)GalNAc 0.843710 1.185047 2.152084 0.687093 1.564450 0.381914 2.389590 0.533142 2.497482 0.338889 ... 2.066978 1.088630 1.462826 2.259636 1.687785 1.137672 0.024033 0.117449 1.972512 1.304717
H_type2 1.347737 0.892651 2.468405 1.810795 1.589162 0.449339 2.640132 0.572828 2.763890 0.737076 ... 1.070249 0.647786 1.440912 1.810304 1.722289 1.475260 4.847788 4.552496 0.480035 0.494123
GlcNAc6S(b1-6)GalNAc 2.707913 4.438043 6.198123 6.684838 1.478960 11.921934 0.892356 3.821469 4.605009 28.210391 ... 6.241593 11.157860 7.997660 4.916252 0.937290 15.269626 1.463159 0.565249 1.251077 2.680253
Terminal_LacNAc_type2 8.845085 10.063160 13.435501 28.834006 5.585973 11.359659 11.672584 21.193308 12.734919 28.597709 ... 10.883437 17.991155 21.166792 16.161351 11.909325 29.924308 12.820872 19.107379 8.802443 10.268911
Terminal_LacNAc_type2 52.982192 13.183951 24.413523 12.870782 9.555884 9.822266 12.628910 13.916662 26.569737 10.733867 ... 18.779972 12.157928 14.828507 20.879287 27.689619 10.734756 28.328965 37.870847 14.835019 8.910804
Disialyl_T_antigen 20.803836 36.895471 32.803297 20.401157 33.971366 30.150599 37.703636 24.728411 31.798990 15.989214 ... 46.337629 39.476930 39.087708 40.348217 35.791797 22.968160 11.026029 2.613718 44.676379 46.125360
Neu5Ac(a2-6)GalNAc 23.063482 39.304399 36.644881 22.263129 36.571122 31.229766 41.628644 26.256121 37.088978 17.054227 ... 50.675599 41.982557 42.829042 46.391984 38.682564 25.118814 11.540028 2.937334 47.171520 48.274238
Oglycan_core1 37.329013 75.567842 59.998893 57.608119 83.293693 78.436161 73.308916 64.356888 58.197862 60.329536 ... 68.269613 68.762287 62.541874 60.699726 58.713271 58.203265 58.826129 42.904325 74.390026 79.515568
Neu5Ac(a2-3)Gal 57.345927 94.670033 83.675402 103.574200 91.775344 106.231617 90.136699 98.461821 81.110136 117.087919 ... 97.928245 109.749014 101.760261 93.222423 86.403840 96.715461 80.029183 69.040921 95.565848 99.973512
Mucin_elongated_core2 61.827277 23.247111 37.849024 41.704788 15.141858 21.181925 24.301494 35.109970 39.304656 39.331576 ... 29.663409 30.149083 35.995300 37.040638 39.598944 40.659064 41.149838 56.978227 23.637462 19.179715
Neu5Ac 80.494155 134.094482 120.708503 125.892731 128.626161 137.543517 132.135127 124.740497 118.279272 134.227059 ... 149.089683 152.360772 145.124475 140.251427 125.331418 121.962226 91.599064 72.000898 142.956534 148.579697
Gal(b1-3)GalNAc 99.156290 98.814953 97.847916 99.312907 98.435550 99.618086 97.610410 99.466858 97.502518 99.661111 ... 97.933022 98.911370 98.537174 97.740364 98.312215 98.862328 99.975967 99.882551 98.027488 98.695283
GalNAc 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 ... 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000
Gal 163.691481 126.500106 141.895063 147.702533 115.056369 132.721945 122.804259 138.398297 141.412183 167.203077 ... 133.838024 140.218313 142.530133 139.697255 138.848449 154.791018 142.588964 157.426027 122.916027 120.555251

15 rows × 40 columns


get_k_saccharides

 get_k_saccharides (glycans, size=2, up_to=False, just_motifs=False,
                    terminal=False)

*function to retrieve k-saccharides (default:disaccharides) occurring in a list of glycans

Arguments:
glycans (list): list of glycans in IUPAC-condensed nomenclature
size (int): number of monosaccharides per -saccharide, default:2 (for disaccharides)
up_to (bool): in theory: include -saccharides up to size k; in practice: include monosaccharides; default:False
just_motifs (bool): if you only want the motifs as a nested list, no dataframe with counts; default:False
terminal (bool): whether to only count terminal subgraphs; default:False
Returns:
Returns dataframe with k-saccharide counts (columns) for each glycan (rows)*
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P']
out = get_k_saccharides(glycans, size = 3)
  GalNAc(a1-4)GlcNAcA(a1-4)Kdo GlcN(b1-7)Kdo(a2-5)Kdo GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAcA(a1-4)Kdo(a2-5)Kdo GlcNAcA(a1-4)[GlcN(b1-7)]Kdo Kdo(a2-4)Kdo(a2-6)GlcN4P Kdo(a2-5)Kdo(a2-6)GlcN4P Kdo(a2-5)[Kdo(a2-4)]Kdo Kdo(a2-6)GlcN4P(b1-6)GlcN4P Kdo(a2-?)Kdo(a2-?)GlcN4P Man(a1-2)Man(a1-2)Man Man(a1-2)Man(a1-3)Man Man(a1-3)Man(a1-6)Man Man(a1-3)Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man Man(a1-3)[Xyl(b1-2)]Man Man(a1-6)Man(b1-4)GlcNAc Man(a1-6)[Xyl(b1-2)]Man Man(a1-?)Man(a1-?)Man Man(a1-?)Man(b1-?)GlcNAc Man(a1-?)[Xyl(b1-?)]Man Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)Man(b1-4)GlcNAc
0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 2 2 1 1
1 0 0 0 0 0 0 0 0 0 0 1 1 2 1 0 0 1 0 4 2 0 1 0
2 1 1 0 1 1 1 1 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0

get_terminal_structures

 get_terminal_structures (glycan, size=1)

*returns terminal structures from all non-reducing ends (monosaccharide+linkage)

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed nomenclature or as networkx graph
size (int): how large the extracted motif should be in terms of monosaccharides (for now 1 or 2 are supported;
Returns:
Returns a list of terminal structures (strings)*
get_terminal_structures("Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc")
['Neu5Ac(a2-3)', 'Neu5Ac(a2-6)']

get_molecular_properties

 get_molecular_properties (glycan_list, verbose=False, placeholder=False)

*given a list of glycans, uses pubchempy to return various molecular parameters retrieved from PubChem

Arguments:
glycan_list (list): list of glycans in IUPAC-condensed
verbose (bool): set True to print SMILES not found on PubChem; default:False
placeholder (bool): whether failed requests should return dummy values or be dropped; default:False
Returns:
Returns a dataframe with all the molecular parameters retrieved from PubChem*
out = get_molecular_properties(["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"])
  h_bond_acceptor_count molecular_weight atom_stereo_count rotatable_bond_count undefined_bond_stereo_count complexity defined_atom_stereo_count exact_mass h_bond_donor_count xlogp tpsa undefined_atom_stereo_count monoisotopic_mass isotope_atom_count defined_bond_stereo_count charge covalent_unit_count heavy_atom_count bond_stereo_count
Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 62 2224.0 57 43 0 4410 56 2222.7830048 39 -23.600000 1070 1 2222.7830048 0 0 0 1 152 0

graph

convert glycan sequences to graphs and contains helper functions to search for motifs / check whether two sequences describe the same sequence, etc.


glycan_to_nxGraph

 glycan_to_nxGraph (glycan, libr=None, termini='ignore',
                    termini_list=None)

*wrapper for converting glycans into networkx graphs; also works with floating substituents

Arguments:
glycan (string): glycan in IUPAC-condensed format
libr (dict): dictionary of form glycoletter:index
termini (string): whether to encode terminal/internal position of monosaccharides, ‘ignore’ for skipping, ‘calc’ for automatic annotation, or ‘provided’ if this information is provided in termini_list; default:‘ignore’
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
Returns:
Returns networkx graph object of glycan*
print('Glycan to networkx Graph (only edges printed)')
print(glycan_to_nxGraph('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc').edges())
Glycan to networkx Graph (only edges printed)
[(0, 1), (1, 4), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 10), (8, 9), (9, 10)]

graph_to_string

 graph_to_string (graph)

*converts glycan graph back to IUPAC-condensed format

Assumptions: 1. The root node is the one with the highest index.

Arguments:
graph (networkx object): glycan graph
Returns:
Returns glycan in IUPAC-condensed format (string)*
graph_to_string(glycan_to_nxGraph('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'

compare_glycans

 compare_glycans (glycan_a, glycan_b)

*returns True if glycans are the same and False if not

Arguments:
glycan_a (string or networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object
glycan_b (string or networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object
Returns:
Returns True if two glycans are the same and False if not*
print("Graph Isomorphism Test")
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                      'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
Graph Isomorphism Test
True

handle_negation..wrapper

 handle_negation.<locals>.wrapper (glycan, motif, *args, **kwargs)
print("Subgraph Isomorphism Test")
print(subgraph_isomorphism('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                           'Fuc(a1-6)GlcNAc'))
Subgraph Isomorphism Test
True

generate_graph_features

 generate_graph_features (glycan, glycan_graph=True, label='network')

*compute graph features of glycan

Arguments:
glycan (string or networkx object): glycan in IUPAC-condensed format (or glycan network if glycan_graph=False)
glycan_graph (bool): True expects a glycan, False expects a network (from construct_network); default:True
label (string): Label to place in output dataframe if glycan_graph=False; default:‘network’
Returns:
Returns a pandas dataframe with different graph features as columns and glycan as row*
generate_graph_features("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
diameter branching nbrLeaves avgDeg varDeg maxDeg nbrDeg4 max_deg_leaves mean_deg_leaves deg_assort ... flow_edgeMax flow_edgeMin flow_edgeAvg flow_edgeVar secorderMax secorderMin secorderAvg secorderVar egap entropyStation
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc 8 1 3 1.818182 0.330579 3.0 0 3.0 3.0 -1.850372e-15 ... 0.333333 0.111111 0.217778 0.007289 45.607017 20.736441 31.679285 62.422895 0.060159 -2.374318

1 rows × 49 columns


largest_subgraph

 largest_subgraph (glycan_a, glycan_b)

*find the largest common subgraph of two glycans

Arguments:
glycan_a (string or networkx): glycan in IUPAC-condensed format or as networkx graph
glycan_b (string or networkx): glycan in IUPAC-condensed format or as networkx graph
Returns:
Returns the largest common subgraph as a string in IUPAC-condensed; returns empty string if there is no common subgraph*
glycan1 = 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
glycan2 = 'Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
largest_subgraph(glycan1, glycan2)
'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'

ensure_graph

 ensure_graph (glycan, **kwargs)

*ensures function compatibility with string glycans and graph glycans

Arguments:
glycan (string or networkx graph): glycan in IUPAC-condensed format or as a networkx graph
**kwargs: keyword arguments that are directly passed on to glycan_to_nxGraph
Returns:
Returns networkx graph object of glycan*
ensure_graph("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
<networkx.classes.graph.Graph>

get_possible_topologies

 get_possible_topologies (glycan, exhaustive=False,
                          allowed_disaccharides=None,
                          modification_map={'6S': {'GlcNAc', 'Gal'}, '3S':
                          {'Gal'}, '4S': {'GalNAc'}, 'OS': {'GalNAc',
                          'GlcNAc', 'Gal'}})

*creates possible glycans given a floating substituent; only works with max one floating substituent

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format or as networkx graph
exhaustive (bool): whether to also allow additions at internal positions; default:False
allowed_disaccharides (set): disaccharides that are permitted when creating possible glycans; default:not used
Returns:
Returns list of NetworkX-like glycan graphs of possible topologies*

possible_topology_check

 possible_topology_check (glycan, glycans, exhaustive=False, **kwargs)

*checks whether glycan with floating substituent could match glycans from a list; only works with max one floating substituent

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent
glycans (list): list of glycans in IUPAC-condensed format (or networkx graphs; should not contain floating substituents)
exhaustive (bool): whether to also allow additions at internal positions; default:False
**kwargs: keyword arguments that are directly passed on to compare_glycans
Returns:
Returns list of glycans that could match input glycan*
possible_topology_check("{Neu5Ac(a2-3)}Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc",
                       ["Fuc(a1-2)Gal(b1-3)GalNAc", "Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc",
                       "Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc"])
['Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc']

deduplicate_glycans

 deduplicate_glycans (glycans)

*removes duplicate glycans from a list/set, even if they have different strings

Arguments:
glycans (list or set): glycans in IUPAC-condensed format
Returns:
Returns deduplicated list of glycans*
deduplicate_glycans(["Fuc(a1-2)Gal(b1-3)GalNAc", "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Neu5Ac(a2-3)Gal(b1-3)]GalNAc",
                     "Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)]GalNAc", "Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc"])
['Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
 'Fuc(a1-2)Gal(b1-3)GalNAc',
 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Neu5Ac(a2-3)Gal(b1-3)]GalNAc']

processing

process IUPAC-condensed glycan sequences into glycoletters etc.


min_process_glycans

 min_process_glycans (glycan_list)

*converts list of glycans into a nested lists of glycoletters

Arguments:
glycan_list (list): list of glycans in IUPAC-condensed format as strings
Returns:
Returns list of glycoletter lists*
min_process_glycans(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
                     'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
[['Man', 'a1-3', 'Man', 'a1-6', 'Man', 'b1-4', 'GlcNAc', 'b1-4', 'GlcNAc'],
 ['Man',
  'a1-2',
  'Man',
  'a1-3',
  'Man',
  'a1-6',
  'Man',
  'b1-4',
  'GlcNAc',
  'b1-4',
  'GlcNAc']]

get_lib

 get_lib (glycan_list)

*returns dictionary of form glycoletter:index

Arguments:
glycan_list (list): list of IUPAC-condensed glycan sequences as strings
Returns:
Returns dictionary of form glycoletter:index*
get_lib(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
                     'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5}

expand_lib

 expand_lib (libr, glycan_list)

*updates libr with newly introduced glycoletters

Arguments:
libr (dict): dictionary of form glycoletter:index
glycan_list (list): list of IUPAC-condensed glycan sequences as strings
Returns:
Returns new lib*
lib1 = get_lib(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
                     'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
lib2 = expand_lib(lib1, ['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'])
lib2
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5, 'Fuc': 6}

presence_to_matrix

 presence_to_matrix (df, glycan_col_name='glycan',
                     label_col_name='Species')

*converts a dataframe such as df_species to absence/presence matrix

Arguments:
df (dataframe): dataframe with glycan occurrence, rows are glycan-label pairs
glycan_col_name (string): column name under which glycans are stored; default:glycan
label_col_name (string): column name under which labels are stored; default:Species
Returns:
Returns pandas dataframe with labels as rows and glycan occurrences as columns*
out = presence_to_matrix(df_species[df_species.Order == 'Fabales'].reset_index(drop = True),
                         label_col_name = 'Family')
glycan Apif(a1-2)Xyl(b1-2)[Glc6Ac(b1-4)]Glc Ara(a1-2)Ara(a1-6)GlcNAc Ara(a1-2)Glc(b1-2)Ara Ara(a1-2)GlcA Ara(a1-2)[Glc(b1-6)]Glc Ara(a1-6)Glc Araf(a1-3)Araf(a1-5)[Araf(a1-6)Gal(b1-6)Glc(b1-6)Man(a1-3)]Araf(a1-5)Araf(a1-3)Araf(a1-3)Araf Araf(a1-3)Gal(b1-6)Gal D-Apif(b1-2)Glc D-Apif(b1-2)GlcA D-Apif(b1-3)Xyl(b1-2)[Glc6Ac(b1-4)]Glc D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)Ara D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)D-Fuc D-Apif(b1-3)Xyl(b1-4)[Glc(b1-3)]Rha(a1-2)D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)[Rha(a1-3)]D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-3)D-Fuc D-Apif(b1-6)Glc D-ApifOMe(b1-3)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe D-ApifOMe(b1-3)XylOMe(b1-4)[GlcOMe(b1-3)]RhaOMe(a1-2)D-FucOMe Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc6Ac Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)Glc3Ac6Ac Fruf(b2-1)Glc4Ac6Ac Fruf(b2-1)Glc6Ac Fruf(b2-1)[Glc(b1-2)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)Glc(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc3Ac(b1-2)]Glc Fruf(b2-1)[Glc6Ac(b1-2)]Glc Fruf1Ac(b2-1)Glc2Ac4Ac6Ac Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc(b1-4)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)]Glc(b1-4)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)Glc Fuc(a1-2)Gal(b1-4)Xyl Fuc(a1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Fuc(a1-6)GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(?1-?)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Man(a1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(?1-?)[Gal(?1-?)]GlcNAc(?1-?)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(?1-?)Man(a1-3)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(a1-4)Gal Gal(a1-6)Gal Gal(a1-6)Gal(a1-6)Gal Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)[Fruf(b2-1)]Glc Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Gal(a1-6)Glc Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Man Gal(a1-6)Man(b1-4)Man Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)Man Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Gal(a1-6)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Gal(a1-6)Man(b1-4)[Gal(a1-6)]Man Gal(b1-2)GlcA Gal(b1-2)GlcA6Me Gal(b1-2)Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)[Xyl(b1-3)]GlcA Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Gal(b1-3)GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[GlcNAc(b1-4)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Gal(b1-3)GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-6)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-4)Gal(b1-4)Man Gal(b1-4)Gal(b1-4)ManOMe Gal(b1-4)GlcA Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Gal(b1-4)Man(b1-4)Man Gal(b1-4)Man(b1-4)Man(b1-4)Gal Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1FerOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-FucOMeOSin Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GalA(a1-2)[Araf(a1-5)Araf(a1-4)]Rha(b1-4)GalA GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)GalA GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA GalOMe(b1-2)[XylOMe(b1-3)]GlcAOMe GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe GalOMe(b1-4)XylOMe(b1-4)[D-ApifOMe(b1-3)]RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe Galf(b1-2)[Galf(b1-4)]Man Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Glc(a1-2)Rha(a1-6)Glc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-4)Glc(a1-2)Rha(a1-6)Glc Glc(a1-4)Glc(a1-4)Glc(a1-6)Glc Glc(a1-4)Glc(a1-4)GlcA Glc(a1-4)GlcA(b1-2)GlcA Glc(b1-2)Ara Glc(b1-2)Ara(a1-2)GlcA Glc(b1-2)Gal(b1-2)Gal(b1-2)GlcA Glc(b1-2)Gal(b1-2)GlcA Glc(b1-2)Gal(b1-2)GlcA(b1-3)[Glc(b1-3)]Ara Glc(b1-2)Glc Glc(b1-2)Glc(a1-2)FrufOBzOCin Glc(b1-2)Glc(b1-2)Glc Glc(b1-2)GlcA Glc(b1-2)[Ara(a1-3)]GlcA6Me Glc(b1-2)[Ara(a1-3)]GlcAOMe Glc(b1-2)[Ara(a1-6)]Glc Glc(b1-2)[Glc(b1-3)]Glc(a1-2)Fruf Glc(b1-2)[Glc(b1-3)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-2)[Glc6Ac(b1-3)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-2)[Rha(a1-3)]GlcA Glc(b1-2)[Xyl(b1-2)Ara(a1-6)]Glc Glc(b1-2)[Xyl(b1-2)D-Fuc(b1-6)]Glc Glc(b1-3)Ara Glc(b1-3)Glc Glc(b1-3)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz Glc(b1-3)[Araf(a1-4)]Rha(a1-2)Glc Glc(b1-3)[Xyl(b1-4)]Rha(a1-2)D-FucOMe Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-4)Glc Glc(b1-4)Glc(b1-4)Glc(b1-4)Man Glc(b1-4)Glc6Ac(b1-3)Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Man(b1-4)Glc Glc(b1-4)Rha Glc(b1-4)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Glc(b1-6)Glc(b1-3)Glc Glc1Cer Glc2Ac(b1-4)[D-Apif(b1-3)Xyl(b1-2)]Glc Glc2Ac3Ac4Ac6Ac(b1-3)Ara Glc6Ac(b1-2)Glc(a1-2)FrufOBzOCin Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)][RhaOAc(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum(a1-2)Fruf1CoumOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz GlcA(b1-2)Glc GlcA(b1-2)GlcA GlcA(b1-2)GlcA(b1-2)Rha GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl GlcA4Me(a1-2)[Xyl(b1-4)]Xyl GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Gal(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-?)Man(a1-3)[GlcNAc(b1-?)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcOMe(b1-3)[XylOMe(b1-4)]RhaOMe(a1-2)D-FucOMe Glcf(b1-2)Xyl(b1-4)Rha(b1-4)[Xyl(b1-3)]Xyl Hexf(?1-?)Xyl(b1-4)Rha(b1-4)[Xyl(a1-3)]Xyl L-Lyx(a1-2)Ara(a1-2)GlcA Lyx(a1-2)Ara(a1-2)GlcA Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)[Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)[Man(a1-3)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Man(a1-3)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-6)][Xylf(a1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc-ol Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc-ol Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]Hex Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)ManNAc Man(a1-3)[Xylf(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-?)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-?)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(b1-2)Man Man(b1-4)Gal(b1-4)Gal(b1-4)Man Man(b1-4)Gal(b1-4)Gal(b1-4)ManOMe Man(b1-4)Man Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-3)Gal(a1-3)Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man Man(b1-4)[Gal(a1-6)]Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man Man(b1-6)Glc Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Rha(a1-2)Ara Rha(a1-2)Ara(a1-2)GlcA Rha(a1-2)Ara(a1-2)GlcA6Me Rha(a1-2)Ara(a1-2)GlcAOMe Rha(a1-2)D-Ara(b1-2)GlcA Rha(a1-2)Gal(b1-2)Glc Rha(a1-2)Gal(b1-2)GlcA Rha(a1-2)Gal(b1-2)GlcA6Me Rha(a1-2)Gal(b1-2)GlcAOMe Rha(a1-2)Glc(b1-2)Glc Rha(a1-2)Glc(b1-2)GlcA Rha(a1-2)Glc(b1-2)GlcA6Me Rha(a1-2)Glc(b1-2)GlcAOMe Rha(a1-2)Glc(b1-6)Glc Rha(a1-2)GlcA(b1-2)GlcA Rha(a1-2)GlcAOMe(b1-2)GlcAOMe Rha(a1-2)Rha(a1-2)Gal(b1-4)[Glc(b1-2)]GlcA Rha(a1-2)Xyl Rha(a1-2)Xyl(b1-2)GlcA Rha(a1-2)Xyl(b1-2)GlcA6Me Rha(a1-2)Xyl(b1-2)GlcAOMe Rha(a1-2)Xyl3Ac Rha(a1-2)Xyl4Ac Rha(a1-2)[Glc(b1-3)]Glc Rha(a1-2)[Glc(b1-6)]Gal(b1-2)GlcA6Me Rha(a1-2)[Rha(a1-4)]Glc Rha(a1-2)[Rha(a1-6)]Gal Rha(a1-2)[Rha(a1-6)]Glc Rha(a1-2)[Xyl(b1-4)]Glc Rha(a1-2)[Xyl(b1-4)]Glc(b1-6)Glc Rha(a1-3)GlcA Rha(a1-4)Gal(b1-2)GlcA Rha(a1-4)Gal(b1-2)GlcAOMe Rha(a1-4)Gal(b1-2)GlcOMe Rha(a1-4)Gal(b1-4)Gal(b1-4)GalGro Rha(a1-4)Xyl(b1-2)Glc Rha(a1-4)Xyl(b1-2)GlcA Rha(a1-4)Xyl(b1-2)GlcAOMe Rha(a1-6)[Xyl(b1-3)Xyl(b1-2)]Glc(b1-2)Glc Rha(b1-2)Glc(b1-2)GlcA Rha1Fer(a1-4)Fruf(b2-1)GlcOBz RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol RhaOMe(a1-6)GlcOMe(b1-2)GlcOMe-ol Xyl(a1-6)Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(b1-2)Ara(a1-6)Glc Xyl(b1-2)Ara(a1-6)GlcNAc Xyl(b1-2)Ara(a1-6)[Glc(b1-2)]Glc Xyl(b1-2)Ara(a1-6)[Glc(b1-4)]GlcNAc Xyl(b1-2)D-Fuc(b1-6)Glc Xyl(b1-2)D-Fuc(b1-6)GlcNAc Xyl(b1-2)D-Fuc(b1-6)[Glc(b1-2)]Glc Xyl(b1-2)Fuc(a1-6)Glc Xyl(b1-2)Fuc(a1-6)GlcNAc Xyl(b1-2)Gal(b1-2)GlcA6Me Xyl(b1-2)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)Rha(a1-2)Ara Xyl(b1-2)[Glc(b1-3)]Ara Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc Xyl(b1-2)[Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Rha(a1-3)]GlcA Xyl(b1-3)Ara Xyl(b1-3)Xyl(b1-2)[Rha(a1-6)]Glc(b1-2)Glc Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc(b1-2)Glc Xyl(b1-4)Rha(a1-2)Ara Xyl(b1-4)Rha(a1-2)D-Fuc Xyl(b1-4)Rha(a1-2)D-FucOMe Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl3Ac(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl3Ac(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl Xyl(b1-4)[GlcAOMe(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl Xyl2Ac3Ac4Ac(b1-3)Ara XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol XylOMe(b1-3)XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol Xylf(b1-2)Xyl(b1-3)[Rha(b1-2)Rha(b1-4)]Xyl [Araf(a1-3)Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal [Araf(a1-3)Gal(b1-6)]Gal(b1-3)Gal [Gal(a1-4)Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Man(b1-4)Man(b1-4)Man(b1-4)Gal(a1-6)]Man(b1-2)[Gal(a1-6)]Man(b1-2)[Gal(a1-4)Gal(a1-6)]Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man [Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Gal(b1-3)Gal(b1-6)[Araf(a1-3)]Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)]Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc
Family                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
Fabaceae 1 4 1 3 1 1 0 1 3 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 1 1 2 1 1 1 1 4 2 1 2 2 7 4 4 4 2 8 4 2 5 4 1 1 1 1 1 0 1 1 3 1 1 1 1 1 1 2 6 1 1 1 1 2 2 1 1 2 1 1 1 1 3 2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 3 1 1 0 1 2 1 1 2 0 0 0 1 1 1 4 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 1 0 0 0 0 0 1 2 0 1 1 1 5 1 1 0 0 0 0 0 0 0 1 3 1 0 0 0 1 1 4 6 1 1 1 1 2 1 1 1 4 1 1 3 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 1 1 1 1 3 1 1 2 1 1 1 1 1 2 1 1 3 2 1 2 1 1 2 2 1 2 1 1 1 1 1 1 1 2 1 1 1 4 6 4 4 4 1 1 5 4 1 4 1 1 0 1 1 1 7 1 1 2 3 22 6 7 1 8 3 4 1 3 1 1 1 2 2 2 1 1 1 1 1 0 2 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 2 1 2 2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 7 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 3 2 1 1 3 2 1 0 0 2 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fagaceae 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Polygalaceae 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 2 2 1 1 1 1 2 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 2 2 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Quillajaceae 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

choose_correct_isoform

 choose_correct_isoform (glycans, reverse=False)

*given a list of glycan branch isomers, this function returns the correct isomer

Arguments:
glycans (list): glycans in IUPAC-condensed nomenclature
reverse (bool): whether to return the correct isomer (False) or everything except the correct isomer (True); default:False
Returns:
Returns the correct isomer as a string (if reverse=False; otherwise it returns a list of strings)*
choose_correct_isoform(["Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc",
                        "Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc"])
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'

enforce_class

 enforce_class (glycan, glycan_class, conf=None, extra_thresh=0.3)

*given a glycan and glycan class, determines whether glycan is from this class

Arguments:
glycan (string): glycan in IUPAC-condensed nomenclature
glycan_class (string): glycan class in form of “O”, “N”, “free”, or “lipid”
conf (float): prediction confidence; can be used to override class
extra_thresh (float): threshold to override class; default:0.3
Returns:
Returns True if glycan is in glycan class and False if not*
enforce_class("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", "O")
False

IUPAC_to_SMILES

 IUPAC_to_SMILES (glycan_list)

*given a list of IUPAC-condensed glycans, uses GlyLES to return a list of corresponding isomeric SMILES

Arguments:
glycan_list (list): list of IUPAC-condensed glycans
Returns:
Returns a list of corresponding isomeric SMILES*
IUPAC_to_SMILES(['Neu5Ac(a2-3)Gal(b1-4)Glc'])
['O1C(O)[C@H](O)[C@@H](O)[C@H](O[C@@H]2O[C@H](CO)[C@H](O)[C@H](O[C@]3(C(=O)O)C[C@H](O)[C@@H](NC(C)=O)[C@H]([C@H](O)[C@H](O)CO)O3)[C@H]2O)[C@H]1CO']

canonicalize_composition

 canonicalize_composition (comp)

*converts a composition from any common format into the dictionary that is optimized for glycowork

Arguments:
comp (string): composition formatted either in the style of Hex5HexNAc4Fuc1Neu5Ac2 or H5N4F1A2
Returns:
Returns composition as a dictionary of style monosaccharide : count*
print(canonicalize_composition("HexNAc2Hex1Fuc3Neu5Ac1"))
print(canonicalize_composition("N2H1F3A1"))
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}

canonicalize_iupac

 canonicalize_iupac (glycan)

*converts a glycan from IUPAC-extended, LinearCode, GlycoCT, and WURCS into the exact IUPAC-condensed version that is optimized for glycowork

Arguments:
glycan (string): glycan sequence; some rare post-biosynthetic modifications could still be an issue
Returns:
Returns glycan as a string in canonicalized IUPAC-condensed*
print(canonicalize_iupac("NeuAc?1-36SGalb1-4GlcNACb1-6(Fuc?1-2Galb1-4GlcNacb1-3Galb1-3)GalNAc-sp3"))
print(canonicalize_iupac("WURCS=2.0/5,11,10/[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5][a2112h-1b_1-5][a1221m-1a_1-5]/1-1-2-3-1-4-3-1-4-5-5/a4-b1_a6-k1_b4-c1_c3-d1_c6-g1_d2-e1_e4-f1_g2-h1_h4-i1_i2-j1"))
print(canonicalize_iupac("Ma3(Ma6)Mb4GNb4GN;N"))
print(canonicalize_iupac("α-D-Manp-(1→3)[α-D-Manp-(1→6)]-β-D-Manp-(1→4)-β-D-GlcpNAc-(1→4)-β-D-GlcpNAc-(1→"))
print(canonicalize_iupac("""RES
1b:b-dgal-HEX-1:5
2s:n-acetyl
3b:b-dgal-HEX-1:5
4b:b-dglc-HEX-1:5
5b:b-dgal-HEX-1:5
6b:a-dglc-HEX-1:5
7b:b-dgal-HEX-1:5
8b:a-lgal-HEX-1:5|6:d
9b:a-dgal-HEX-1:5
10s:n-acetyl
11s:n-acetyl
12b:b-dglc-HEX-1:5
13b:b-dgal-HEX-1:5
14b:a-lgal-HEX-1:5|6:d
15b:a-lgal-HEX-1:5|6:d
16s:n-acetyl
17s:n-acetyl
18b:b-dgal-HEX-1:5
LIN
1:1d(2+1)2n
2:1o(3+1)3d
3:3o(3+1)4d
4:4o(-1+1)5d
5:5o(-1+1)6d
6:6o(-1+1)7d
7:7o(2+1)8d
8:7o(3+1)9d
9:9d(2+1)10n
10:6d(2+1)11n
11:5o(-1+1)12d
12:12o(-1+1)13d
13:13o(2+1)14d
14:12o(-1+1)15d
15:12d(2+1)16n
16:4d(2+1)17n
17:1o(6+1)18d
"""))
Fuc(a1-2)Gal(b1-4)GlcNAc(b1-3)Gal(b1-3)[Neu5Ac(a2-3)Gal6S(b1-4)GlcNAc(b1-6)]GalNAc
Fuc(a1-2)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)[Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
Fuc(a1-2)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-?)[GalNAc(a1-3)[Fuc(a1-2)]Gal(b1-?)GlcNAc(a1-?)]Gal(b1-?)GlcNAc(b1-3)Gal(b1-3)[Gal(b1-6)]GalNAc

get_possible_linkages

 get_possible_linkages (wildcard, linkage_list={'b1-9', 'a2-?', 'a1-3',
                        'a1-6', '?1-4', '1-4', 'b1-2', '?1-3', 'b1-?',
                        '?1-2', 'b1-8', '1-6', 'a2-7', 'a2-1', 'a2-9',
                        'a2-11', 'b1-6', '?2-3', 'b2-4', '?2-?', '?2-6',
                        'b2-7', 'a2-5', 'b2-1', 'b2-6', 'b2-3', 'b1-4',
                        'b1-7', 'a2-8', 'a1-7', 'a2-3', 'a1-9', 'b2-8',
                        'a2-4', 'a1-?', 'a1-5', 'a2-2', 'b1-3', '?1-?',
                        '?2-8', 'b1-1', 'a2-6', 'a1-1', 'b1-5', 'b2-5',
                        'a1-2', 'a1-11', '?1-6', 'a1-4', 'a1-8', 'b2-2'})

*Retrieves all linkages that match a given wildcard pattern from a list of linkages

Arguments:
wildcard (string): The pattern to match, where ‘?’ can be used as a wildcard for any single character.
linkage_list (list): List of linkages as strings to search within; default:linkages
Returns:
Returns a list of linkages that match the wildcard pattern.*
get_possible_linkages("a1-?")
['a1-?',
 'a1-3',
 'a1-7',
 'a1-9',
 'a1-4',
 'a1-1',
 'a1-2',
 'a1-6',
 'a1-8',
 'a1-5']

get_possible_monosaccharides

 get_possible_monosaccharides (wildcard)

*Retrieves all matching common monosaccharides of a type, given the type

Arguments:
wildcard (string): Monosaccharide type, from “HexNAc”, “HexNAcOS”, “Hex”, “HexOS”, “dHex”, “Sia”, “HexA”, “Pen”
Returns:
Returns a list of specified monosaccharides of that type*
get_possible_monosaccharides("HexNAc")
{'GalNAc', 'GlcNAc', 'HexNAc', 'ManNAc'}

equal_repeats

 equal_repeats (r1, r2)

*checks whether two repeat units could stem from the same repeating structure, just shifted

Arguments:
r1 (string): glycan sequence in IUPAC-condensed nomenclature
r2 (string): glycan sequence in IUPAC-condensed nomenclature
Returns:
Returns True if repeat structures are shifted versions of each other, else False*
equal_repeats("Fuc2S3S(a1-3)Fuc2S(a1-4)Fuc2S3S", "Fuc2S(a1-4)Fuc2S3S(a1-3)Fuc2S")
True

get_class

 get_class (glycan)

*given a glycan, determines its class

Arguments:
glycan (string): glycan in IUPAC-condensed nomenclature
Returns:
Returns “O”, “N”, “free”, or “lipid” (or empty string if not either)*
get_class("Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
'N'

query

for interacting with the databases contained in glycowork, delivering insights for sequences of interest


get_insight

 get_insight (glycan, motifs=None)

*prints out meta-information about a glycan

Arguments:
glycan (string): glycan in IUPAC-condensed format
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list*
print("Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'")
get_insight('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')
Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
Let's get rolling! Give us a few moments to crunch some numbers.

This glycan occurs in the following species: ['Acanthocheilonema_viteae', 'Adeno-associated_dependoparvovirusA', 'Aedes_aegypti', 'Angiostrongylus_cantonensis', 'Anopheles_gambiae', 'Antheraea_pernyi', 'Apis_mellifera', 'Ascaris_suum', 'Autographa_californica_nucleopolyhedrovirus', 'AvianInfluenzaA_Virus', 'Bombus_ignitus', 'Bombyx_mori', 'Bos_taurus', 'Bos_taurus', 'Bos_taurus', 'Brugia_malayi', 'Caenorhabditis_elegans', 'Cardicola_forsteri', 'Cooperia_onchophora', 'Cornu_aspersum', 'Crassostrea_gigas', 'Crassostrea_virginica', 'Cricetulus_griseus', 'Danio_rerio', 'Dictyocaulus_viviparus', 'Dirofilaria_immitis', 'Drosophila_melanogaster', 'Fasciola_hepatica', 'Gallus_gallus', 'Glossina_morsitans', 'Haemonchus_contortus', 'Haliotis_tuberculata', 'Heligmosomoides_polygyrus', 'Helix_lucorum', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'HumanImmunoDeficiency_Virus', 'Hylesia_metabus', 'Lutzomyia_longipalpis', 'Lymantria_dispar', 'Macaca_mulatta', 'Mamestra_brassicae', 'Megathura_crenulata', 'Mus_musculus', 'Mus_musculus', 'Nilaparvata_lugens', 'Oesophagostomum_dentatum', 'Onchocerca_volvulus', 'Onchocerca_volvulus', 'Ophiactis_savignyi', 'Opisthorchis_viverrini', 'Ostrea_edulis', 'Ovis_aries', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pristionchus_pacificus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Schistosoma_mansoni', 'SemlikiForest_Virus', 'Spodoptera_frugiperda', 'Sus_scrofa', 'Tick_borne_encephalitis_virus', 'Tribolium_castaneum', 'Trichinella_spiralis', 'Trichoplusia_ni', 'Trichuris_suis', 'Tropidolaemus_subannulatus', 'Volvarina_rubella', 'undetermined', 'unidentified_influenza_virus']

Puh, that's quite a lot! Here are the phyla of those species: ['Arthropoda', 'Artverviricota', 'Chordata', 'Cossaviricota', 'Echinodermata', 'Kitrinoviricota', 'Mollusca', 'Negarnaviricota', 'Nematoda', 'Platyhelminthes', 'Virus']

This glycan contains the following motifs: ['Chitobiose', 'Trimannosylcore', 'core_fucose']

This is the GlyTouCan ID for this glycan: G63041RA

This glycan has been reported to be expressed in: ['2A3_cell_line', 'A549_cell_line', 'AML_193_cell_line', 'CHOK1_cell_line', 'CHOS_cell_line', 'CRL_1620_cell_line', 'Cal-27_cell_line', 'Cervicovaginal_Secretion', 'EOL_1_cell_line', 'FaDu_cell_line', 'HEK293_cell_line', 'HEL92_1_7_cell_line', 'HEL_cell_line', 'HL_60_cell_line', 'KG_1_cell_line', 'KG_1a_cell_line', 'Kasumi_1_cell_line', 'MDA_MB_231BR_cell_line', 'ME_1_cell_line', 'ML_1_cell_line', 'MOLM_13_cell_line', 'MOLM_14_cell_line', 'MV4_11_cell_line', 'M_07e_cell_line', 'NB_4_cell_line', 'NS0_cell_line', 'OCI_AML2_cell_line', 'OCI_AML3_cell_line', 'PLB_985_cell_line', 'SCC-9_cell_line', 'SCC_25_cell_line', 'TF_1_cell_line', 'THP_1_cell_line', 'U_937_cell_line', 'VU-147T_cell_line', 'alveolus_of_lung', 'brain', 'brain', 'brain', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellum', 'colon', 'cortex', 'digestive_tract', 'digestive_tract', 'forebrain', 'gills', 'gills', 'heart', 'heart', 'heart', 'hindbrain', 'hippocampal_formation', 'hippocampus', 'hippocampus', 'hippocampus', 'hippocampus', 'iPS1A_cell_line', 'iPS2A_cell_line', 'kidney', 'liver', 'lung', 'mantle', 'mantle', 'metastatic_pancreatic_ductal_adenocarcinoma', 'milk', 'milk', 'milk', 'mucus', 'muscle_of_leg', 'nerve_ending', 'ovary', 'pancreas', 'placenta', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'primary_pancreatic_ductal_adenocarcinoma', 'prostate_gland', 'seminal_fluid', 'striatum', 'striatum', 'striatum', 'striatum', 'testicle', 'testis', 'trachea', 'urine', 'urine', 'urine', 'urothelium']

This glycan has been reported to be dysregulated in (disease, direction, sample): [('REM_sleep_behavior_disorder', 'down', 'serum'), ('benign_breast_tumor_tissues_vs_para_carcinoma_tissues', 'up', 'breast'), ('cystic_fibrosis', 'up', 'sputum'), ('female_breast_cancer', 'up', 'breast'), ('female_breast_cancer', 'up', 'cell_line'), ('prostate_cancer', 'up', 'prostate_cancer_biopsy'), ('thyroid_gland_papillary_carcinoma', 'up', 'serum'), ('urinary_bladder_cancer', 'down', 'urine'), ('', '', ''), ('', '', ''), ('', '', ''), ('', '', '')]

That's all we can do for you at this point!

glytoucan_to_glycan

 glytoucan_to_glycan (ids, revert=False)

*interconverts GlyTouCan IDs and glycans in IUPAC-condensed

Arguments:
ids (list): list of GlyTouCan IDs as strings (if using glycans instead, change ‘revert’ to True
revert (bool): whether glycans should be mapped to GlyTouCan IDs or vice versa; default:False
Returns:
Returns list of either GlyTouCan IDs or glycans in IUPAC-condensed*
glytoucan_to_glycan(['G63041RA'])
['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc']

regex

for performing regular expression-like searches in glycans, very powerful to find complicated motifs


get_match

 get_match (pattern, glycan, return_matches=True)

*finds matches for a glyco-regular expression in a glycan

Arguments:
pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex
glycan (string or networkx): glycan sequence in IUPAC-condensed or as networkx graph
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True
Returns:
Returns either a boolean (return_matches = False) or a list of matches as strings (return_matches = True)*
# {} = between min and max occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# * = zero or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])*-HexNAc"
# + = one or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])+-HexNAc"
# ? = zero or one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc])?-HexNAc"
# {1,} = at minimum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,}-HexNAc"
# {,1} = at maximum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){,1}-HexNAc"
# {2} = exactly two occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){2}-HexNAc"
# ^ = start of sequence, e.g., "^Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# % = middle of sequence (i.e., neither start nor end)
# $ = end of sequence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc$"
# ?<= = lookbehind (i.e., provided pattern must be present before rest of pattern but is not included in match), e.g., "(?<=Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?<! = negative lookbehind (i.e., provided pattern is not present before rest of pattern and is also not included in match), e.g., "(?<!Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?= = lookahead (i.e., provided pattern must be present after rest of pattern but is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?=-HexNAc)"
# ?! = negative lookahead (i.e., provided pattern is not present after rest of pattern and is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?!-HexNAc)"

# Example: extracting the sequence from the a1-6 branch of N-glycans
pattern = "r[Sia]{,1}-Monosaccharide-([dHex]){,1}-Monosaccharide(?=-Mana6-Monosaccharide)"
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
['Gal(b1-4)GlcNAc']
['GalNAc(b1-4)GlcNAc']
['Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc']
['Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc']

For interested users, we here compile a selection of regular expression patterns that we find useful in our own work:

  • Lewis or sialyl-Lewis structures:
    pattern = “r[Sia]{,1}-[Gal|GalOS]{1}-([Fuc]){1}-[GlcNAc|GlcNAc6S]{1}”
  • Blood groups:
    pattern = “rFuc-([Gal|GalNAc])?-Gal-GlcNAc”
  • a1-6 branch in N-glycans:
    pattern = “r[Sia]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-[Man|GlcNAc]{1}-([.-.|.]){,1}-Mana6(?=-Manb4-GlcNAc)”
  • b1-6 branch in O-glycans (from core 2/4/6):
    pattern = “r[Sia|dHex]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-.b6(?=-GalNAc)”
  • b1-3 branch in O-glycans (from core 1/2):
    pattern = “r[Sia]{,1}-[.]{,1}-([dHex]){,1}-.b3(?=-GalNAc)”

get_match_batch

 get_match_batch (pattern, glycan_list, return_matches=True)

*finds matches for a glyco-regular expression in a list of glycans

Arguments:
pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex
glycan_list (list of strings or networkx): list of glycan sequence in IUPAC-condensed or as networkx graph
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True
Returns:
Returns either a list of booleans (return_matches = False) or a list of list of matches as strings (return_matches = True)*

motif_to_regex

 motif_to_regex (motif)

*tries to convert motif into a regular expression

Arguments:
motif (string): glycan in IUPAC-condensed nomenclature
Returns:
Returns regular expression if successful*
motif_to_regex("Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-?)")
'Fuca3-([Galb4]){1}-GlcNAcb?'

tokenization

helper functions to map m/z–>composition, composition–>structure, structure–>motif, and more


string_to_labels

 string_to_labels (character_string, libr=None)

*tokenizes word by indexing characters in passed library

Arguments:
character_string (string): string of characters to index
libr (dict): dict of library items
Returns:
Returns indexes of characters in library*
string_to_labels(['Man','a1-3','Man','a1-6','Man'])
[None, None, None, None, None]

pad_sequence

 pad_sequence (seq, max_length, pad_label=None, libr=None)

*brings all sequences to same length by adding padding token

Arguments:
seq (list): sequence to pad (from string_to_labels)
max_length (int): sequence length to pad to
pad_label (int): which padding label to use
libr (list): list of library items
Returns:
Returns padded sequence*
pad_sequence(string_to_labels(['Man','a1-3','Man','a1-6','Man']), 7)
[None, None, None, None, None, 25, 25]

stemify_glycan

 stemify_glycan (glycan, stem_lib=None, libr=None)

*removes modifications from all monosaccharides in a glycan

Arguments:
glycan (string): glycan in IUPAC-condensed format
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
libr (dict): dictionary of form glycoletter:index; default:lib
Returns:
Returns stemmed glycan as string*
stemify_glycan("Neu5Ac9Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
'Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc'

stemify_dataset

 stemify_dataset (df, stem_lib=None, libr=None, glycan_col_name='glycan',
                  rarity_filter=1)

*stemifies all glycans in a dataset by removing monosaccharide modifications

Arguments:
df (dataframe): dataframe with glycans in IUPAC-condensed format in column glycan_col_name
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
libr (dict): dictionary of form glycoletter:index; default:lib
glycan_col_name (string): column name under which glycans are stored; default:glycan
rarity_filter (int): how often monosaccharide modification has to occur to not get removed; default:1
Returns:
Returns df with glycans stemified*

mask_rare_glycoletters

 mask_rare_glycoletters (glycans, thresh_monosaccharides=None,
                         thresh_linkages=None)

*masks rare monosaccharides and linkages in a list of glycans

Arguments:
glycans (list): list of glycans in IUPAC-condensed form
thresh_monosaccharides (int): threshold-value for monosaccharides seen as “rare”; default:(0.001*len(glycans))
thresh_linkages (int): threshold-value for linkages seen as “rare”; default:(0.03*len(glycans))
Returns:
Returns list of glycans in IUPAC-condensed with masked rare monosaccharides and linkages*

mz_to_composition

 mz_to_composition (mz_value, mode='negative', mass_value='monoisotopic',
                    reduced=False, sample_prep='underivatized',
                    mass_tolerance=0.5, kingdom='Animalia',
                    glycan_class='all', df_use=None, filter_out=None,
                    extras=['doubly_charged'], adduct=None)

*Mapping a m/z value to a matching monosaccharide composition within SugarBase

Arguments:
mz_value (float): the actual m/z value from mass spectrometry
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
reduced (bool): whether glycans are reduced at reducing end; default:False
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, ‘lipid’ linked glycans, or ‘free’ glycans; default:‘all’
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None
extras (list): additional operations to perform if regular m/z matching does not yield a result; options include “adduct” and “doubly_charged”
adduct (string): chemical formula of adduct that contributes to m/z, e.g., “C2H4O2”; default:None
Returns:
Returns a list of matching compositions in dict form*
mz_to_composition(665.4, glycan_class='O', filter_out={'Kdn', 'P', 'HexA', 'Pen', 'HexN', 'Me', 'PCho', 'PEtN'},
                    reduced = True)
[{'dHex': 1, 'HexNAc': 2, 'Hex': 1, 'Neu5Ac': 1, 'Neu5Gc': 1}]

match_composition_relaxed

 match_composition_relaxed (composition, glycan_class='N',
                            kingdom='Animalia', df_use=None,
                            reducing_end=None)

*Given a coarse-grained monosaccharide composition (Hex, HexNAc, etc.), it returns all corresponding glycans

Arguments:
composition (dict): a dictionary indicating the composition to match (for example {“dHex”: 1, “Hex”: 1, “HexNAc”: 1})
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan
Returns:
Returns list of glycans matching composition in IUPAC-condensed*
match_composition_relaxed({"Hex":3, "HexNAc":2, "dHex":1}, glycan_class = 'O')
['Fuc(a1-2)[Gal(a1-3)]Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)[Gal(a1-3)]Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
 'Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-3)Gal(b1-4)GlcNAc(b1-3)Gal',
 'Gal(?1-?)Gal(b1-4)GlcNAc(b1-6)[Fuc(a1-2)Gal(b1-3)]GalNAc',
 'Gal(a1-3)GalNAc(a1-3)[Fuc(a1-2)]Gal(b1-3)Gal(b1-3)GalNAc',
 'Man(a1-6)Glc(a1-4)GlcNAc(b1-4)[Fuc(a1-2)]Gal(b1-3)GalNAc',
 'Man(a1-6)Glc(b1-4)GlcNAc(b1-4)[Fuc(a1-2)]Gal(b1-3)GalNAc',
 'Gal(?1-?)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Gal(b1-3)[Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)]GalNAc',
 'Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc',
 'Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(b1-3)GlcNAc(b1-3)Gal(b1-4)GlcNAc(b1-?)Man',
 'Fuc(a1-2)Gal(b1-4)GlcNAc(b1-6)[Gal(?1-?)Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(b1-?)GlcNAc(b1-3)Gal(b1-3)[Gal(b1-6)]GalNAc',
 'Fuc(a1-2)[Gal(a1-3)]Gal(b1-3)GlcNAc(b1-3)Gal(b1-3)GalNAc',
 'Fuc(a1-2)Gal(b1-3)Gal(b1-3)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(b1-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
 'Gal(b1-4)Gal(b1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-6)]GalNAc',
 'Gal(b1-2)Gal(a1-3)[Fuc(a1-2)]Gal(b1-3)[GlcNAc(b1-6)]GalNAc',
 'Fuc(a1-2)Gal(a1-3)Gal(a1-4)Gal(b1-3)[GlcNAc(b1-6)]GalNAc',
 'Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]Gal(b1-3)GalNAc',
 'Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-?)Gal(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(?1-?)Gal(b1-?)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)[Gal(a1-3)]Gal(b1-4)GlcNAc(?1-?)Gal(b1-3)GalNAc']

condense_composition_matching

 condense_composition_matching (matched_composition)

*Given a list of glycans matching a composition, find the minimum number of glycans characterizing this set

Arguments:
matched_composition (list): list of glycans matching to a composition
Returns:
Returns minimal list of glycans that match a composition*
match_comp = match_composition_relaxed({'Hex':1, 'HexNAc':1, 'Neu5Ac':1}, glycan_class = 'O')
print(match_comp)
condense_composition_matching(match_comp)
['Neu5Ac(a2-3)Gal(b1-3)GalNAc', 'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc', '{Neu5Ac(a2-?)}Gal(b1-3)GalNAc', 'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal', 'Neu5Ac(a2-3)Gal(b1-4)GalNAc', 'Neu5Ac(a2-6)Gal(b1-3)GalNAc', 'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-?)Hex(?1-?)GalNAc', 'Gal(?1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-3)Gal(?1-?)GalNAc', 'Neu5Ac(a2-6)Gal(a1-3)GalNAc', 'Neu5Ac(a2-?)Gal(?1-3)GalNAc', 'Neu5Ac(a2-?)GalNAc(a1-6)Gal', 'Neu5Ac(a2-?)Gal(b1-?)GalNAc', 'Gal(b1-4)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-3)GalNAc(b1-3)Gal']
['Neu5Ac(a2-3)Gal(b1-3)GalNAc',
 'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc',
 'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc',
 '{Neu5Ac(a2-?)}Gal(b1-3)GalNAc',
 'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal',
 'Neu5Ac(a2-3)Gal(b1-4)GalNAc',
 'Neu5Ac(a2-6)Gal(b1-3)GalNAc',
 'Neu5Ac(a2-?)Hex(?1-?)GalNAc',
 'Neu5Ac(a2-3)Gal(?1-?)GalNAc',
 'Neu5Ac(a2-6)Gal(a1-3)GalNAc',
 'Neu5Ac(a2-?)Gal(?1-3)GalNAc',
 'Neu5Ac(a2-?)GalNAc(a1-6)Gal',
 'Neu5Ac(a2-?)Gal(b1-?)GalNAc',
 'Gal(b1-4)[Neu5Ac(a2-6)]GalNAc',
 'Neu5Ac(a2-3)GalNAc(b1-3)Gal']

mz_to_structures

 mz_to_structures (mz_list, glycan_class, kingdom='Animalia',
                   abundances=None, mode='negative',
                   mass_value='monoisotopic', sample_prep='underivatized',
                   mass_tolerance=0.5, reduced=False, df_use=None,
                   filter_out=None, verbose=False)

*wrapper function to map precursor masses to structures, condense them, and match them with relative intensities

Arguments:
mz_list (list): list of precursor masses
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
abundances (dataframe): every row one composition (matching mz_list in order), every column one sample; default:pd.DataFrame([range(len(mz_list))]*2).T
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5
reduced (bool): whether glycans are reduced at reducing end; default:False
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None
verbose (bool): whether to print any non-matching compositions; default:False
Returns:
Returns dataframe of (matched structures) x (relative intensities)*
mz_to_structures([674.29], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
2 Gal(a1-3)[Neu5Ac(a2-6)]GalNAc 0
3 {Neu5Ac(a2-?)}Gal(b1-3)GalNAc 0
4 Neu5Ac(a2-3)[GalNAc(b1-4)]Gal 0
5 Neu5Ac(a2-3)Gal(b1-4)GalNAc 0
6 Neu5Ac(a2-6)Gal(b1-3)GalNAc 0
7 Neu5Ac(a2-?)Hex(?1-?)GalNAc 0
8 Neu5Ac(a2-3)Gal(?1-?)GalNAc 0
9 Neu5Ac(a2-6)Gal(a1-3)GalNAc 0
10 Neu5Ac(a2-?)Gal(?1-3)GalNAc 0
11 Neu5Ac(a2-?)GalNAc(a1-6)Gal 0
12 Neu5Ac(a2-?)Gal(b1-?)GalNAc 0
13 Gal(b1-4)[Neu5Ac(a2-6)]GalNAc 0
14 Neu5Ac(a2-3)GalNAc(b1-3)Gal 0

compositions_to_structures

 compositions_to_structures (composition_list, glycan_class='N',
                             kingdom='Animalia', abundances=None,
                             df_use=None, verbose=False)

*wrapper function to map compositions to structures, condense them, and match them with relative intensities

Arguments:
composition_list (list): list of composition dictionaries of the form {‘Hex’: 1, ‘HexNAc’: 1}
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
abundances (dataframe): every row one composition (matching composition_list in order), every column one sample;default:pd.DataFrame([range(len(composition_list))]*2).T
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan
verbose (bool): whether to print any non-matching compositions; default:False
Returns:
Returns dataframe of (matched structures) x (relative intensities)*
compositions_to_structures([{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1}], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc 0
2 Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc 0
3 Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc 0
4 Neu5Ac(a2-3)Gal(b1-4)[Neu5Ac(a2-6)]GalNAc 0
compositions_to_structures(["H1N1A2"], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc 0
2 Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc 0
3 Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc 0
4 Neu5Ac(a2-3)Gal(b1-4)[Neu5Ac(a2-6)]GalNAc 0

structure_to_basic

 structure_to_basic (glycan)

*converts a monosaccharide- and linkage-defined glycan structure to the base topology

Arguments:
glycan (string): glycan in IUPAC-condensed nomenclature
Returns:
Returns the glycan topology as a string*
structure_to_basic("Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
'Neu5Ac(?1-?)HexOS(?1-?)[Neu5Ac(?1-?)]HexNAc'

glycan_to_composition

 glycan_to_composition (glycan, stem_libr=None)

*maps glycan to its composition

Arguments:
glycan (string): glycan in IUPAC-condensed format
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
Returns:
Returns a dictionary of form “Monosaccharide” : count*
glycan_to_composition("Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1}

glycan_to_mass

 glycan_to_mass (glycan, mass_value='monoisotopic',
                 sample_prep='underivatized', stem_libr=None, adduct=None)

*given a glycan, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation

Arguments:
glycan (string): glycan in IUPAC-condensed format
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
adduct (string): chemical formula of adduct to be added, e.g., “C2H4O2”; default:None
Returns:
Returns the theoretical mass of input glycan*
glycan_to_mass("Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
1045.2903546

composition_to_mass

 composition_to_mass (dict_comp_in, mass_value='monoisotopic',
                      sample_prep='underivatized', adduct=None)

*given a composition, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation

Arguments:
dict_comp_in (dict): composition in form monosaccharide:count
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
adduct (string): chemical formula of adduct to be added, e.g., “C2H4O2”; default:None
Returns:
Returns the theoretical mass of input composition*
composition_to_mass({'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1})
1045.2903546

calculate_adduct_mass

 calculate_adduct_mass (adduct, mass_value='monoisotopic')

*Calculate the mass of the adduct based on its chemical formula

Arguments:
adduct (string): chemical formula of adduct, e.g., “C2H4O2”
mass_value (string): whether to use ‘monoisotopic’ or ‘average’ mass; default:‘monoisotopic’
Returns:
Returns the mass of the adduct*
calculate_adduct_mass("C2H4O2")
60.021

get_unique_topologies

 get_unique_topologies (composition, glycan_type, df_use=None,
                        universal_replacers=None, taxonomy_rank='Kingdom',
                        taxonomy_value='Animalia')

*given a composition, retrieves all observed and unique base topologies

Arguments:
composition (dict): composition in form monosaccharide:count
glycan_type (string): which glycan class to search, ‘N’, ‘O’, ‘lipid’, ‘free’, or ‘repeat’
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
universal_replacers (dictionary): dictionary of form base monosaccharide : specific monosaccharide
taxonomy_rank (string): at which taxonomic rank to filter; default: Kingdom
taxonomy_value (string): which value to filter at taxonomy_rank; default: Animalia
Returns:
Returns a list of observed base topologies for the given composition*
get_unique_topologies({'HexNAc':2, 'Hex':1}, 'O', universal_replacers = {'dHex':'Fuc'})
['HexNAc(?1-?)[HexNAc(?1-?)]Hex',
 'HexNAc(?1-?)[Hex(?1-?)]HexNAc',
 'HexNAc(?1-?)Hex(?1-?)HexNAc',
 'HexNAc(?1-?)HexNAc(?1-?)Hex',
 'Hex(?1-?)HexNAc(?1-?)HexNAc']