
motif contains many functions to process glycans in various ways and use this processing to analyze glycans via curated motifs, graph features, and sequence features. It contains the following modules:


drawing glycans in SNFG style


 GlycoDraw (draw_this, vertical=False, compact=False, show_linkage=True,
            dim=50, highlight_motif=None, highlight_termini_list=[],
            repeat=None, repeat_range=None, filepath=None, suppress=False)

Draws a glycan structure based on the provided input.

draw_this (string): The glycan structure or motif to be drawn.
vertical (bool, optional): Set to True to draw the structure vertically. Default: False.
compact (bool, optional): Set to True to draw the structure in a compact form. Default: False.
show_linkage (bool, optional): Set to False to hide the linkage information. Default: True.
dim (int, optional): The dimension (size) of the individual sugar units in the structure. Default: 50.
highlight_motif (string, optional): Glycan motif to highlight within the parent structure.
highlight_termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
repeat (bool
repeat_range (list of 2 int): List of index integers for the first and last main-chain monosaccharide in repeating unit. Monosaccharides are numbered starting from 0 (invisible placeholder = 0 in case of structure terminating in a linkage) at the reducing end.
filepath (string, optional): The path to the output file to save as SVG or PDF. Default: None.
suppress (bool, optional): Whether to suppress the visual display of drawings into the console; default:False
         highlight_motif = "GlcNAc(b1-?)Man")


 annotate_figure (svg_input, scale_range=(25, 80), compact=False,
                  glycan_size='medium', filepath='', scale_by_DE_res=None,
                  x_thresh=1, y_thresh=0.05, x_metric='Log2FC')

Modify matplotlib svg figure to replace text labels with glycan figures

svg_input (string): absolute path including full filename for input svg figure
scale_range (tuple): tuple of two integers defining min/max glycan dim; default:(25,80)
compact (bool): if True, draw compact glycan figures; default:False
glycan_size (string): modify glycan size; default:‘medium’; options are ‘small’, ‘medium’, ‘large’
filepath (string): absolute path including full filename allows for saving the plot
scale_by_DE_res (df): result table from motif_analysis.get_differential_expression. Include to scale glycan figure size by -10logp
x_thresh (float): absolute x metric threshold for datapoints included for scaling, set to match get_differential_expression; default:1.0
y_thresh (float): corr p threshhold for datapoints included for scaling, set to match get_differential_expression; default:0.05
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’
Modified figure svg code


 plot_glycans_excel (df, folder_filepath, glycan_col_num=0,
                     scaling_factor=0.2, compact=False)

plots SNFG images of glycans into new column in df and saves df as Excel file

df (dataframe): dataframe containing glycan sequences [alternative: filepath to .csv or .xlsx]
folder_filepath (string): full filepath to the folder you want to save the output to
glycan_col_num (int): index of the column containing glycan sequences; default:0 (first column)
scaling_factor (float): how large the glycans should be; default:0.2
compact (bool, optional): Set to True to draw the structures in a compact form. Default: False.
Saves the dataframe with glycan images as output.xlsx into folder_filepath


downstream analyses of important glycan motifs


 get_pvals_motifs (df, glycan_col_name='glycan', label_col_name='target',
                   zscores=True, thresh=1.645, sorting=True,
                   feature_set=['exhaustive'], multiple_samples=False,
                   motifs=None, custom_motifs=[])

returns enriched motifs based on label data or predicted data

df (dataframe): dataframe containing glycan sequences and labels [alternative: filepath to .csv or .xlsx]
glycan_col_name (string): column name for glycan sequences; arbitrary if multiple_samples = True; default:‘glycan’
label_col_name (string): column name for labels; arbitrary if multiple_samples = True; default:‘target’
zscores (bool): whether data are presented as z-scores or not, will be z-score transformed if False; default:True
thresh (float): threshold value to separate positive/negative; default is 1.645 for Z-scores
sorting (bool): whether p-value dataframe should be sorted ascendingly; default: True
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
multiple_samples (bool): set to True if you have multiple samples (rows) with glycan information (columns); default:False
motifs (dataframe): can be used to pass a modified motif_list to the function; default:None
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns dataframe with p-values, corrected p-values, and Cohen’s d as effect size for every glycan motif
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
label = [3.234, 2.423, 0.733, 3.102, 0.108]
test_df = pd.DataFrame({'glycan':glycans, 'binding':label})

print("Glyco-Motif enrichment p-value test")
out = get_pvals_motifs(test_df, 'glycan', 'binding').iloc[:10,:]
Glyco-Motif enrichment p-value test
  motif pval corr_pval effect_size
4 GlcNAc 0.038120 0.205849 1.530905
8 Man 0.054356 0.234990 1.390253
25 Man(a1-?)Man 0.060923 0.234990 1.308333
10 Man(a1-3)Man 0.034212 0.205849 1.196586
11 Man(a1-6)Man 0.019543 0.175885 1.168815
13 Man(b1-4)GlcNAc 0.019543 0.175885 1.168815
14 GlcNAc(b1-4)GlcNAc 0.019543 0.175885 1.168815
7 Kdo 0.328790 0.479672 -0.811679
2 Glc 0.644180 0.668956 -0.811679
16 Man(a1-2)Man 0.177461 0.479672 0.772320


 get_representative_substructures (enrichment_df)

builds minimal glycans that contain enriched motifs from get_pvals_motifs

enrichment_df (dataframe): output from get_pvals_motifs
Returns up to 10 minimal glycans in a list


 get_heatmap (df, motifs=False, feature_set=['known'],
              datatype='response', rarity_filter=0.05, filepath='',
              index_col='glycan', custom_motifs=[], **kwargs)

clusters samples based on glycan data (for instance glycan binding etc.)

df (dataframe): dataframe with glycan data, rows are samples and columns are glycans [alternative: filepath to .csv or .xlsx]
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
datatype (string): whether df comes from a dataset with quantitative variable (‘response’) or from presence_to_matrix (‘presence’)
rarity_filter (float): proportion of samples that need to have a non-zero value for a variable to be included; default:0.05
filepath (string): absolute path including full filename allows for saving the plot
index_col (string): default column to convert to dataframe index; default:‘glycan’
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
**kwargs: keyword arguments that are directly passed on to seaborn clustermap
Prints clustermap
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
label = [3.234, 2.423, 0.733, 3.102, 0.108]
label2 = [0.134, 0.345, 1.15, 0.233, 2.981]
label3 = [0.334, 0.245, 1.55, 0.133, 2.581]
test_df = pd.DataFrame([label, label2, label3], columns = glycans)

get_heatmap(test_df, motifs = True, feature_set = ['known', 'exhaustive'])


 plot_embeddings (glycans, emb=None, label_list=None, shape_feature=None,
                  filepath='', alpha=0.8, palette='colorblind', **kwargs)

plots glycan representations for a list of glycans

glycans (list): list of IUPAC-condensed glycan sequences as strings
emb (dictionary): stored glycan representations; default takes them from trained species-level SweetNet model
label_list (list): list of same length as glycans if coloring of the plot is desired
shape_feature (string): monosaccharide/bond used to display alternative shapes for dots on the plot
filepath (string): absolute path including full filename allows for saving the plot
alpha (float): transparency of points in plot; default:0.8
palette (string): color palette to color different classes; default:‘colorblind’
**kwargs: keyword arguments that are directly passed on to matplotlib
df_fabales = df_species[df_species.Order == 'Fabales'].reset_index(drop = True)
plot_embeddings(df_fabales.glycan.values.tolist(), label_list = df_fabales.Family.values.tolist())


 characterize_monosaccharide (sugar, df=None, mode='sugar',
                              glycan_col_name='glycan', rank=None,
                              focus=None, modifications=False,
                              filepath='', thresh=10)

for a given monosaccharide/linkage, return typical neighboring linkage/monosaccharide

sugar (string): monosaccharide or linkage
df (dataframe): dataframe to use for analysis; default:df_species
mode (string): either ‘sugar’ (connected monosaccharides), ‘bond’ (monosaccharides making a provided linkage), or ‘sugarbond’ (linkages that a provided monosaccharides makes); default:‘sugar’
glycan_col_name (string): column name under which glycans can be found; default:‘glycan’
rank (string): add column name as string if you want to filter for a group
focus (string): add row value as string if you want to filter for a group
modifications (bool): set to True if you want to consider modified versions of a monosaccharide; default:False
filepath (string): absolute path including full filename allows for saving the plot
thresh (int): threshold count of when to include motifs in plot; default:10 occurrences
Plots modification distribution and typical neighboring bond/monosaccharide
characterize_monosaccharide('D-Rha', rank = 'Kingdom', focus = 'Bacteria', modifications = True)


 get_differential_expression (df, group1, group2, motifs=False,
                              feature_set=['exhaustive', 'known'],
                              paired=False, impute=True, sets=False,
                              set_thresh=0.9, effect_size_variance=False,
                              min_samples=None, grouped_BH=False,

Calculates differentially expressed glycans or motifs from glycomics data

df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
impute (bool): replaces zeroes with a Random Forest based model; default:True
sets (bool): whether to identify clusters of highly correlated glycans/motifs to test for differential expression; default:False
set_thresh (float): correlation value used as a threshold for clusters; only used when sets=True; default:0.9
effect_size_variance (bool): whether effect size variance should also be calculated/estimated; default:False
min_samples (int): How many samples per group need to have non-zero values for glycan to be kept; default: at least half per group
grouped_BH (bool): whether to perform two-stage adaptive Benjamini-Hochberg as a grouped multiple testing correction; will SIGNIFICANTLY increase runtime; default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns a dataframe with:
(i) Differentially expressed glycans/motifs/sets
(ii) Their mean abundance across all samples in group1 + group2
(iii) Log2-transformed fold change of group2 vs group1 (i.e., negative = lower in group2)
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean
(v) Corrected p-values (Welch’s t-test with Benjamini-Hochberg correction) for difference in mean
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold
(vii) Corrected p-values (Levene’s test for equality of variances with Benjamini-Hochberg correction) for difference in variance
(viii) Effect size as Cohen’s d (sets=False) or Mahalanobis distance (sets=True)
(xi) [only if effect_size_variance=True] Effect size variance
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
label = [3.234, 2.423, 0.733, 3.102, 0.108]
label2 = [2.952, 2.011, 0.456, 4.006, 0.0]
label3 = [3.88, 1.771, 0.811, 3.562, 0.073]
label4 = [0.134, 0.345, 1.15, 0.233, 2.981]
label5 = [0.334, 0.245, 1.55, 0.133, 2.581]
label6 = [0.234, 0.423, 1.733, 0.102, 2.108]
test_df = pd.DataFrame([glycans, label, label2, label3, label4, label5, label6]).T

res = get_differential_expression(test_df, group1 = [4,5,6], group2 = [1,2,3], motifs = True, impute = True)
You're working with an alpha of 0.07862467893233027 that has been adjusted for your sample size of 6.
Glycan Mean abundance Log2FC p-val corr p-val significant corr Levene p-val Effect size
5 GlcNAc 9.587462 1.825183 2.469905e-07 0.000003 True 0.971435 78.585109
1 GlcNAc(b1-4)GlcNAc 4.793731 1.825183 1.385197e-05 0.000090 True 0.971435 27.336027
3 Man(a1-3)Man 6.144649 1.574705 2.879608e-04 0.001248 True 0.971435 20.479380
0 core_fucose(a1-3) 1.739635 2.038186 4.174583e-04 0.001357 True 0.971435 8.916848
9 Man 20.137125 1.653026 6.770485e-04 0.001760 True 0.971435 12.113443
8 betaGlucan 3.883061 -4.345913 1.386085e-03 0.003003 True 0.971435 -7.288949
7 Man(a1-?)Man 15.343394 1.601387 2.067233e-03 0.003833 True 0.971435 11.849348
11 Glc(b1-3)Glc 7.766123 -4.345913 2.358767e-03 0.003833 True 0.971435 -7.483606
10 Kdo 7.275312 -2.944967 3.000722e-03 0.004334 True 0.971435 -5.255214
6 Kdo(a2-?)Kdo 4.850208 -2.944967 4.905883e-03 0.006378 True 0.971435 -4.640157
12 Glc 11.649184 -4.345913 6.515318e-03 0.007700 True 0.971435 -6.980519
2 Man(a1-2)Man 4.405014 1.411105 7.118918e-03 0.007712 True 0.971435 8.139906
4 GalNAc(a1-4)GlcNAcA 2.425104 -2.944967 2.127494e-02 0.021275 True 0.971435 -3.175931


 get_volcano (df_res, y_thresh=0.05, x_thresh=1.0, label_changed=True,
              x_metric='Log2FC', annotate_volcano=False, filepath='')

Plots glycan differential expression results in a volcano plot

df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx]
y_thresh (float): corr p threshhold for labeling datapoints; default:0.05
x_thresh (float): absolute x metric threshold for labeling datapoints; default:1.0
label_changed (bool): if True, add text labels to significantly up- and downregulated datapoints; default:True
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’
annotate_volcano (bool): whether to annotate the dots in the plot with SNFG images; default: False
filepath (string): absolute path including full filename allows for saving the plot
Prints volcano plot


 get_coverage (df, filepath='')

Plot glycan coverage across samples, ordered by average intensity

df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
filepath (string): absolute path including full filename allows for saving the plot
Prints the heatmap
test_df = pd.concat([test_df.iloc[:, 0], test_df[test_df.columns[1:]].astype(float)], axis = 1)



 get_pca (df, groups=None, motifs=False, feature_set=['known',
          'exhaustive'], pc_x=1, pc_y=2, color=None, shape=None,
          filepath='', custom_motifs=[])

PCA plot from glycomics abundance dataframe

df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
groups (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3]); default:None
alternatively: design dataframe with ‘id’ column of samples names and additional columns with meta information
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
pc_x (int): principal component to plot on x axis; default:1
pc_y (int): principal component to plot on y axis; default:2
color (string): if design dataframe is provided: column name for color grouping; default:None
shape (string): if design dataframe is provided: column name for shape grouping; default:None
filepath (string): absolute path including full filename allows for saving the plot
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Prints PCA plot
get_pca(test_df, motifs = True, groups = [1,1,1,2,2,2])


 get_pval_distribution (df_res, filepath='')

p-value distribution plot of glycan differential expression result

df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv]
filepath (string): absolute path including full filename allows for saving the plot
prints p-value distribution plot


 get_ma (df_res, log2fc_thresh=1, sig_thresh=0.05, filepath='')

MA plot of glycan differential expression result

df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx]
log2fc_thresh (int): absolute Log2FC threshold for highlighting datapoints
sig_thresh (int): significance threshold for highlighting datapoints
filepath (string): absolute path including full filename allows for saving the plot
prints MA plot


 get_glycanova (df, groups, impute=True, motifs=False,
                feature_set=['exhaustive', 'known'], min_samples=None,
                posthoc=True, custom_motifs=[])

Calculate an ANOVA for each glycan (or motif) in the DataFrame

df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group_sizes (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3])
impute (bool): replaces zeroes with with a Random Forest based model; default:True
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
min_samples (int): How many samples per group need to have non-zero values for glycan to be kept; default: at least half per group
posthoc (bool): whether to do Tukey’s HSD test post-hoc to find out which differences were significant; default:True
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
(i) a pandas DataFrame with an F statistic, corrected p-value, and indication of its significance for each glycan.
(ii) a dictionary of type glycan : pandas DataFrame, with post-hoc results for each glycan with a significant ANOVA.
test_df['label_7'] = [0.234, 0.023, 5.733, 8.102, 0.308]
test_df['label_8'] = [0.177, 0.009, 6.105, 5.549, 0.278]
test_df['label_9'] = [0.511, 0.011, 4.998, 7.005, 0.414]

anv, ph = get_glycanova(test_df, [1,1,1,2,2,2,3,3,3], motifs = True)
You're working with an alpha of 0.0694557066556809 that has been adjusted for your sample size of 9.
Glycan F statistic corr p-val significant
10 GlcNAc 735.493169 8.715006e-07 True
8 GlcNAc(b1-4)GlcNAc 464.897344 1.713268e-06 True
9 Man(a1-3)Man 286.009486 4.846738e-06 True
11 Man(a1-?)Man 124.016453 4.073794e-05 True
12 Man 116.889001 4.073794e-05 True
0 betaGlucan 78.483797 8.679929e-05 True
6 core_fucose(a1-3) 77.931941 8.679929e-05 True
7 Man(a1-2)Man 76.658780 8.679929e-05 True
1 Glc(b1-3)Glc 67.371670 1.119105e-04 True
2 Glc 56.146940 1.696330e-04 True
5 Kdo 32.477874 7.145649e-04 True
4 Kdo(a2-?)Kdo 27.295755 1.051915e-03 True
3 GalNAc(a1-4)GlcNAcA 18.383777 2.761281e-03 True


 get_meta_analysis (effect_sizes, variances, model='fixed', filepath='',

Fixed-effects model or random-effects model for meta-analysis of glycan effect sizes

effect_sizes (array-like): Effect sizes (e.g., Cohen’s d) from each study
variances (array-like): Corresponding effect size variances from each study
model (string): Whether to use ‘fixed’ or ‘random’ effects model
filepath (string): absolute path including full filename allows for saving the Forest plot
study_names (list): list of strings indicating the name of each study
(1) The combined effect size
(2) The p-value for the combined effect size
get_meta_analysis([-8.759, -6.363, -5.199, -3.952],
                 [7.061, 4.041, 2.919, 1.968])
(-5.326913553837341, 3.005077298112724e-09)


 get_time_series (df, impute=True, motifs=False, feature_set=['known',
                  'exhaustive'], degree=1, min_samples=None,

Analyzes time series data of glycans using an OLS model

df (dataframe): dataframe containing sample IDs of style sampleID_UnitTimepoint_replicate (e.g., T1_h5_r1) in first column and glycan relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
impute (bool): replaces zeroes with a Random Forest based model; default:True
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
degree (int): degree of the polynomial for regression, default:1 for linear regression
min_samples (int): How many samples per group need to have non-zero values for glycan to be kept; default: at least half per group
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns a dataframe with:
(i) Glycans/motifs potentially exhibiting significant changes over time
(ii) The slope of their expression curve over time
(iii) Uncorrected p-values (t-test) for testing whether slope is significantly different from zero
(iv) Corrected p-values (t-test with Benjamini-Hochberg correction) for testing whether slope is significantly different from zero
(v) Significance: True/False whether the corrected p-value lies below the sample size-appropriate significance threshold
t_dic = {}
t_dic["ID"] = ["D1_h5_r1", "D1_h5_r2", "D1_h5_r3", "D1_h10_r1", "D1_h10_r2", "D1_h10_r3", "D1_h15_r1", "D1_h15_r2", "D1_h15_r3"]
t_dic["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc"] = [0.33, 0.31, 0.35, 1.51, 1.57, 1.66, 2.11, 2.04, 2.09]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [0.78, 1.01, 0.98, 0.88, 1.11, 0.72, 1.22, 1.00, 0.54]
You're working with an alpha of 0.0694557066556809 that has been adjusted for your sample size of 9.
Glycan Change p-val corr p-val significant
1 Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]Ga... 5.326852 0.002697 0.005394 True
0 Fuc(a1-2)Gal(b1-3)GalNAc -2.030518 0.328428 0.328428 False


 get_jtk (df_in, timepoints, periods, interval, motifs=False,
          feature_set=['known', 'exhaustive', 'terminal'],

Detecting rhythmically expressed glycans via the Jonckheere–Terpstra–Kendall (JTK) algorithm

df_in (pd.DataFrame): A dataframe containing data for analysis. [alternative: filepath to .csv or .xlsx]
(column 0 = molecule IDs, then arranged in groups and by ascending timepoints)
timepoints (int): number of timepoints in the experiment (each timepoint must have the same number of replicates).
periods (list): number of timepoints (as int) per cycle.
interval (int): units of time (Arbitrary units) between experimental timepoints.
motifs (bool): a flag for running structural of motif-based analysis (True = run motif analysis); default:False.
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns a pandas dataframe containing the adjusted p-values, and most important waveform parameters for each
molecule in the analysis.
t_dic = {}
t_dic["Neu5Ac(a2-3)Gal(b1-3)GalNAc"] = [0.433138901, 0.149729209, 0.358018822, 0.537641256, 1.526963756, 1.349986672, 0.75156406, 0.736710183]
t_dic["Gal(b1-3)GalNAc"] = [0.919762334, 0.760237184, 0.725566662, 0.459945797, 0.523801515, 0.695106926, 0.627632047, 1.183511209]
t_dic["Gal(b1-3)[Neu5Ac(a2-6)]GalNAc"] = [0.533138901, 0.119729209, 0.458018822, 0.637641256, 1.726963756, 1.249986672, 0.55156406, 0.436710183]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [3.862169504, 5.455032837, 3.858163289, 5.614650335, 3.124254095, 4.189550337, 4.641831312, 4.19538484]
tps = 8  # number of timepoints in experiment
periods = [8]  # number of timepoints per cycle
interval = 3  # units of time between experimental timepoints
t_df = pd.DataFrame(t_dic).T
t_df.columns = ["T3", "T6", "T9", "T12", "T15", "T18", "T21", "T24"]
get_jtk(t_df.reset_index(), tps, periods, interval)
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Molecule_Name BH_Q_Value Adjusted_P_value Period_Length Lag_Phase Amplitude significant
0 Neu5Ac(a2-3)Gal(b1-3)GalNAc 0.055556 0.013889 24.0 16.5 0.357084 True
2 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0.075397 0.044048 24.0 13.5 0.101473 True
1 Gal(b1-3)GalNAc 0.075397 0.056548 24.0 22.5 0.127140 True
3 Fuc(a1-2)Gal(b1-3)GalNAc 1.000000 1.000000 24.0 0.0 0.546986 False
get_jtk(t_df.reset_index(), tps, periods, interval, motifs = True, feature_set = ['terminal'])
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Molecule_Name BH_Q_Value Adjusted_P_value Period_Length Lag_Phase Amplitude significant
0 Neu5Ac(a2-3) 0.034722 0.013889 24.0 16.5 0.357084 True
2 Neu5Ac(a2-?) 0.034722 0.013889 0.0 0.0 0.000000 True
1 Neu5Ac(a2-6) 0.073413 0.044048 24.0 13.5 0.101473 True
3 Gal(b1-3) 0.543403 0.434722 24.0 16.5 0.208071 False
4 Fuc(a1-2) 1.000000 1.000000 24.0 0.0 0.546986 False


 get_biodiversity (df, group1, group2, motifs=False,
                   feature_set=['exhaustive', 'known'], paired=False,

Calculates diversity indices from glycomics data, similar to alpha diversity etc in microbiome data

df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns a dataframe with:
(i) Diversity indices/metrics
(ii) Mean value of diversity metrics in group 1
(iii) Mean value of diversity metrics in group 2
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean
(v) Corrected p-values (Welch’s t-test with Benjamini-Hochberg correction) for difference in mean
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold
(vii) Effect size as Cohen’s d
res = get_biodiversity(test_df, group1 = [4,5,6], group2 = [1,2,3], motifs = True)
You're working with an alpha of 0.07862467893233027 that has been adjusted for your sample size of 6.
Metric Group1 mean Group2 mean p-val corr p-val significant Effect size
1 shannon_diversity 2.278677 1.855369 0.000420 0.001261 True -8.941867
2 simpson_diversity 0.876248 0.804112 0.002471 0.003706 True -8.011404
0 richness 13.000000 12.000000 0.422650 0.422650 False -0.816497


extract curated motifs, graph features, and sequence features from glycan sequences


 annotate_glycan (glycan, motifs=None, termini_list=[], gmotifs=None)

searches for known motifs in glycan sequence

glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent
motifs (dataframe): dataframe of glycan motifs (name + sequence), can be used with a list of glycans too; default:motif_list
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
gmotifs (networkx): precalculated motif graphs for speed-up; default:None
Returns dataframe with counts of motifs in glycan
motif_name Terminal_LewisX Internal_LewisX LewisY SialylLewisX SulfoSialylLewisX Terminal_LewisA Internal_LewisA LewisB SialylLewisA SulfoLewisA ... Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX Difucosylated_core GalFuc_core
Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc 0 1 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

1 rows × 156 columns


 annotate_dataset (glycans, motifs=None, feature_set=['known'],
                   termini_list=[], condense=False, custom_motifs=[])

wrapper function to annotate motifs in list of glycans

glycans (list): list of IUPAC-condensed glycan sequences as strings
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
termini_list (list): list of monosaccharide/linkage positions (from ‘terminal’, ‘internal’, and ‘flexible’)
condense (bool): if True, throws away columns with only zeroes; default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns dataframe of glycans (rows) and presence/absence of known motifs (columns)
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
print("Annotate Test")
out = annotate_dataset(glycans)
Annotate Test
  Terminal_LewisX Internal_LewisX LewisY SialylLewisX SulfoSialylLewisX Terminal_LewisA Internal_LewisA LewisB SialylLewisA SulfoLewisA H_type2 H_type1 A_antigen B_antigen Galili_antigen GloboH Gb5 Gb4 Gb3 3SGb3 8DSGb3 3SGb4 8DSGb4 6DSGb4 3SGb5 8DSGb5 6DSGb5 6DSGb5_2 6SGb3 8DSGb3_2 6SGb4 8DSGb4_2 6SGb5 8DSGb5_2 66DSGb5 Forssman_antigen iGb3 I_antigen i_antigen PI_antigen Chitobiose Trimannosylcore Internal_LacNAc_type1 Terminal_LacNAc_type1 Internal_LacNAc_type2 Terminal_LacNAc_type2 Internal_LacdiNAc_type1 Terminal_LacdiNAc_type1 Internal_LacdiNAc_type2 Terminal_LacdiNAc_type2 bisectingGlcNAc VIM PolyLacNAc Ganglio_Series Lacto_Series(LewisC) NeoLacto_Series betaGlucan KeratanSulfate Hyluronan Mollu_series Arthro_series Cellulose_like Chondroitin_4S GPI_anchor Isoglobo_series LewisD Globo_series Sda SDA Muco_series Heparin Peptidoglycan Dermatansulfate CAD Lactosylceramide Lactotriaosylceramide LexLex GM3 H_type3 GM2 GM1 cisGM1 VIM2 GD3 GD1a GD2 GD1b SDLex Nglycolyl_GM2 Fuc_LN3 GT1b GD1 GD1a_2 LcGg4 GT3 Disialyl_T_antigen GT1a GT2 GT1c 2Fuc_GM1 GQ1c O_linked_mannose GT1aa GQ1b HNK1 GQ1ba O_mannose_Lex 2Fuc_GD1b Sialopentaosylceramide Sulfogangliotetraosylceramide B-GM1 GQ1aa bisSulfo-Lewis x para-Forssman core_fucose core_fucose(a1-3) GP1c B-GD1b GP1ca Isoglobotetraosylceramide polySia high_mannose Gala_series LPS_core Nglycan_complex Nglycan_complex2 Oglycan_core1 Oglycan_core2 Oglycan_core3 Oglycan_core4 Oglycan_core5 Oglycan_core6 Oglycan_core7 Xylogalacturonan Sialosylparagloboside LDNF OFuc Arabinogalactan_type2 EGF_repeat Nglycan_hybrid Arabinan Xyloglucan Acharan_Sulfate M3FX M3X 1-6betaGalactan Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX Difucosylated_core GalFuc_core
Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


 quantify_motifs (df, glycans, feature_set, custom_motifs=[])

Extracts and quantifies motifs for a dataset

df (dataframe): dataframe containing relative abundances (each sample one column) [alternative: filepath to .csv or .xlsx]
glycans(list): glycans as IUPAC-condensed strings
feature_set (list): which feature set to use for annotations, add more to list to expand; default is [‘exhaustive’,‘known’]; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns a pandas DataFrame with motifs as columns and samples as rows
quantify_motifs(test_df.iloc[:, 1:], test_df.iloc[:, 0].values.tolist(), ['known', 'exhaustive'])
Chitobiose Trimannosylcore betaGlucan core_fucose(a1-3) M3FX Fuc GalNAc Glc GlcN GlcN4P ... GalNAc(a1-4)GlcNAcA GlcNAcA(a1-4)Kdo GlcN(b1-7)Kdo Kdo(a2-5)Kdo Kdo(a2-4)Kdo Kdo(a2-6)GlcN4P GlcN4P(b1-6)GlcN4P Glc(b1-3)Glc Man(a1-?)Man Kdo(a2-?)Kdo
1 8.759 8.759 0.108 3.234 3.234 3.234 0.733 0.324 0.733 1.466 ... 0.733 0.733 0.733 0.733 0.733 0.733 0.733 0.216 27.889 1.466
2 8.969 8.969 0.000 2.952 2.952 2.952 0.456 0.000 0.456 0.912 ... 0.456 0.456 0.456 0.456 0.456 0.456 0.456 0.000 27.977 0.912
3 9.213 9.213 0.073 3.880 3.880 3.880 0.811 0.219 0.811 1.622 ... 0.811 0.811 0.811 0.811 0.811 0.811 0.811 0.146 27.301 1.622
4 0.712 0.712 2.981 0.134 0.134 0.134 1.150 8.943 1.150 2.300 ... 1.150 1.150 1.150 1.150 1.150 1.150 1.150 5.962 2.692 2.300
5 0.712 0.712 2.581 0.334 0.334 0.334 1.550 7.743 1.550 3.100 ... 1.550 1.550 1.550 1.550 1.550 1.550 1.550 5.162 2.292 3.100
6 0.759 0.759 2.108 0.234 0.234 0.234 1.733 6.324 1.733 3.466 ... 1.733 1.733 1.733 1.733 1.733 1.733 1.733 4.216 2.889 3.466
label_7 8.359 8.359 0.308 0.234 0.234 0.234 5.733 0.924 5.733 11.466 ... 5.733 5.733 5.733 5.733 5.733 5.733 5.733 0.616 24.889 11.466
label_8 5.735 5.735 0.278 0.177 0.177 0.177 6.105 0.834 6.105 12.210 ... 6.105 6.105 6.105 6.105 6.105 6.105 6.105 0.556 17.046 12.210
label_9 7.527 7.527 0.414 0.511 0.511 0.511 4.998 1.242 4.998 9.996 ... 4.998 4.998 4.998 4.998 4.998 4.998 4.998 0.828 22.092 9.996

9 rows × 32 columns


 get_k_saccharides (glycans, size=2, up_to=False, just_motifs=False,

function to retrieve k-saccharides (default:disaccharides) occurring in a list of glycans

glycans (list): list of glycans in IUPAC-condensed nomenclature
size (int): number of monosaccharides per -saccharide, default:2 (for disaccharides)
up_to (bool): in theory: include -saccharides up to size k; in practice: include monosaccharides; default:False
just_motifs (bool): if you only want the motifs as a nested list, no dataframe with counts; default:False
terminal (bool): whether to only count terminal subgraphs; default:False
Returns dataframe with k-saccharide counts (columns) for each glycan (rows)
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
out = get_k_saccharides(glycans, size = 3)
  Man(a1-3)[Man(a1-6)]Man Man(a1-3)[Xyl(b1-2)]Man Man(a1-3)Man(b1-4)GlcNAc Man(a1-6)[Xyl(b1-2)]Man Man(a1-6)Man(b1-4)GlcNAc Xyl(b1-2)Man(b1-4)GlcNAc Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-2)Man(a1-2)Man Man(a1-2)Man(a1-3)Man Man(a1-3)Man(a1-6)Man GalNAc(a1-4)GlcNAcA(a1-4)Kdo GlcNAcA(a1-4)[GlcN(b1-7)]Kdo GlcNAcA(a1-4)Kdo(a2-5)Kdo GlcN(b1-7)Kdo(a2-5)Kdo ]Kdo(a2-5)[Kdo(a2-4)]Kdo Kdo(a2-5)Kdo(a2-6)GlcN4P Kdo(a2-4)Kdo(a2-6)GlcN4P Kdo(a2-6)GlcN4P(b1-6)GlcN4P Man(a1-?)[Xyl(b1-?)]Man Man(a1-?)Man(b1-?)GlcNAc Man(a1-?)Man(a1-?)Man Kdo(a2-?)Kdo(a2-?)GlcN4P
0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0
1 0 0 1 0 1 0 1 0 1 1 2 0 0 0 0 0 0 0 0 0 2 4 0
2 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 2


 get_terminal_structures (glycan, size=1)

returns terminal structures from all non-reducing ends (monosaccharide+linkage)

glycan (string or networkx): glycan in IUPAC-condensed nomenclature or as networkx graph
size (int): how large the extracted motif should be in terms of monosaccharides (for now 1 or 2 are supported;
Returns a list of terminal structures (strings)
['Neu5Ac(a2-3)', 'Neu5Ac(a2-6)']


 get_molecular_properties (glycan_list, verbose=False, placeholder=False)

given a list of glycans, uses pubchempy to return various molecular parameters retrieved from PubChem

glycan_list (list): list of glycans in IUPAC-condensed
verbose (bool): set True to print SMILES not found on PubChem; default:False
placeholder (bool): whether failed requests should return dummy values or be dropped; default:False
Returns a dataframe with all the molecular parameters retrieved from PubChem
out = get_molecular_properties(["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"])
  complexity h_bond_acceptor_count defined_bond_stereo_count monoisotopic_mass atom_stereo_count isotope_atom_count undefined_bond_stereo_count covalent_unit_count undefined_atom_stereo_count rotatable_bond_count charge heavy_atom_count defined_atom_stereo_count xlogp h_bond_donor_count bond_stereo_count exact_mass tpsa molecular_weight
Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 4410 62 0 2222.7830048 57 0 0 1 1 43 0 152 56 -23.600000 39 0 2222.7830048 1070 2224.0


convert glycan sequences to graphs and contains helper functions to search for motifs / check whether two sequences describe the same sequence, etc.


 glycan_to_nxGraph (glycan, libr=None, termini='ignore',

wrapper for converting glycans into networkx graphs; also works with floating substituents

glycan (string): glycan in IUPAC-condensed format
libr (dict): dictionary of form glycoletter:index
termini (string): whether to encode terminal/internal position of monosaccharides, ‘ignore’ for skipping, ‘calc’ for automatic annotation, or ‘provided’ if this information is provided in termini_list; default:‘ignore’
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
Returns networkx graph object of glycan
print('Glycan to networkx Graph (only edges printed)')
Glycan to networkx Graph (only edges printed)
[(0, 1), (1, 4), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 10), (8, 9), (9, 10)]


 graph_to_string (graph)

converts glycan graph back to IUPAC-condensed format

graph (networkx object): glycan graph
Returns glycan in IUPAC-condensed format (string)


 compare_glycans (glycan_a, glycan_b, wildcards_ptm=False)

returns True if glycans are the same and False if not

glycan_a (string or networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object
glycan_b (stringor networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object
wildcards_ptm (bool): set to True to allow modification wildcards (e.g., ‘OS’ matching with ‘6S’):False
Returns True if two glycans are the same and False if not
print("Graph Isomorphism Test")
Graph Isomorphism Test


 subgraph_isomorphism (glycan, motif, termini_list=[], count=False,
                       wildcards_ptm=False, return_matches=False)

returns True if motif is in glycan and False if not

glycan (string or networkx): glycan in IUPAC-condensed format or as graph in NetworkX format
motif (string or networkx): glycan motif in IUPAC-condensed format or as graph in NetworkX format
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
count (bool): whether to return the number or absence/presence of motifs; default:False
wildcards_ptm (bool): set to True to allow modification wildcards (e.g., ‘OS’ matching with ‘6S’); default:False
return_matches (bool): whether the matched subgraphs in input glycan should be returned as node lists as an additional output; default:False
Returns True if motif is in glycan and False if not
print("Subgraph Isomorphism Test")
Subgraph Isomorphism Test


 generate_graph_features (glycan, glycan_graph=True, label='network')

compute graph features of glycan

glycan (string or networkx object): glycan in IUPAC-condensed format (or glycan network if glycan_graph=False)
glycan_graph (bool): True expects a glycan, False expects a network (from construct_network); default:True
label (string): Label to place in output dataframe if glycan_graph=False; default:‘network’
Returns a pandas dataframe with different graph features as columns and glycan as row
diameter branching nbrLeaves avgDeg varDeg maxDeg nbrDeg4 max_deg_leaves mean_deg_leaves deg_assort ... flow_edgeMax flow_edgeMin flow_edgeAvg flow_edgeVar secorderMax secorderMin secorderAvg secorderVar egap entropyStation
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc 8 1 3 1.818182 0.330579 3.0 0 3.0 3.0 -1.850372e-15 ... 0.333333 0.111111 0.217778 0.007289 45.607017 20.736441 31.679285 62.422895 0.026397 -2.35847

1 rows × 49 columns


 largest_subgraph (glycan_a, glycan_b)

find the largest common subgraph of two glycans

glycan_a (string or networkx): glycan in IUPAC-condensed format or as networkx graph
glycan_b (string or networkx): glycan in IUPAC-condensed format or as networkx graph
Returns the largest common subgraph as a string in IUPAC-condensed; returns empty string if there is no common subgraph
glycan1 = 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
glycan2 = 'Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
largest_subgraph(glycan1, glycan2)


 ensure_graph (glycan, **kwargs)

ensures function compatibility with string glycans and graph glycans

glycan (string or networkx graph): glycan in IUPAC-condensed format or as a networkx graph
**kwargs: keyword arguments that are directly passed on to glycan_to_nxGraph
Returns networkx graph object of glycan


 get_possible_topologies (glycan, exhaustive=False)

creates possible glycans given a floating substituent; only works with max one floating substituent

glycan (string or networkx): glycan in IUPAC-condensed format or as networkx graph
exhaustive (bool): whether to also allow additions at internal positions; default:False
Returns list of NetworkX-like glycan graphs of possible topologies


 possible_topology_check (glycan, glycans, exhaustive=False, **kwargs)

checks whether glycan with floating substituent could match glycans from a list; only works with max one floating substituent

glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent
glycans (list): list of glycans in IUPAC-condensed format (or networkx graphs; should not contain floating substituents)
exhaustive (bool): whether to also allow additions at internal positions; default:False
**kwargs: keyword arguments that are directly passed on to compare_glycans
Returns list of glycans that could match input glycan
                       ["Fuc(a1-2)Gal(b1-3)GalNAc", "Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc",


process IUPAC-condensed glycan sequences into glycoletters etc.


 min_process_glycans (glycan_list)

converts list of glycans into a nested lists of glycoletters

glycan_list (list): list of glycans in IUPAC-condensed format as strings
Returns list of glycoletter lists
[['Man', 'a1-3', 'Man', 'a1-6', 'Man', 'b1-4', 'GlcNAc', 'b1-4', 'GlcNAc'],


 get_lib (glycan_list)

returns dictionary of form glycoletter:index

glycan_list (list): list of IUPAC-condensed glycan sequences as strings
Returns dictionary of form glycoletter:index
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5}


 expand_lib (libr, glycan_list)

updates libr with newly introduced glycoletters

libr (dict): dictionary of form glycoletter:index
glycan_list (list): list of IUPAC-condensed glycan sequences as strings
Returns new lib
lib1 = get_lib(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
lib2 = expand_lib(lib1, ['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'])
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5, 'Fuc': 6}


 presence_to_matrix (df, glycan_col_name='glycan',

converts a dataframe such as df_species to absence/presence matrix

df (dataframe): dataframe with glycan occurrence, rows are glycan-label pairs
glycan_col_name (string): column name under which glycans are stored; default:glycan
label_col_name (string): column name under which labels are stored; default:Species
Returns pandas dataframe with labels as rows and glycan occurrences as columns
out = presence_to_matrix(df_species[df_species.Order == 'Fabales'].reset_index(drop = True),
                         label_col_name = 'Family')
glycan Apif(a1-2)Xyl(b1-2)[Glc6Ac(b1-4)]Glc Ara(a1-2)Ara(a1-6)GlcNAc Ara(a1-2)Glc(b1-2)Ara Ara(a1-2)GlcA Ara(a1-2)[Glc(b1-6)]Glc Ara(a1-6)Glc Araf(a1-3)Araf(a1-5)[Araf(a1-6)Gal(b1-6)Glc(b1-6)Man(a1-3)]Araf(a1-5)Araf(a1-3)Araf(a1-3)Araf Araf(a1-3)Gal(b1-6)Gal D-Apif(b1-2)Glc D-Apif(b1-2)GlcA D-Apif(b1-3)Xyl(b1-2)[Glc6Ac(b1-4)]Glc D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)Ara D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)D-Fuc D-Apif(b1-3)Xyl(b1-4)[Glc(b1-3)]Rha(a1-2)D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)[Rha(a1-3)]D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-3)D-Fuc D-Apif(b1-6)Glc D-ApifOMe(b1-3)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe D-ApifOMe(b1-3)XylOMe(b1-4)[GlcOMe(b1-3)]RhaOMe(a1-2)D-FucOMe Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc6Ac Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)Glc3Ac6Ac Fruf(b2-1)Glc4Ac6Ac Fruf(b2-1)Glc6Ac Fruf(b2-1)[Glc(b1-2)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)Glc(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc3Ac(b1-2)]Glc Fruf(b2-1)[Glc6Ac(b1-2)]Glc Fruf1Ac(b2-1)Glc2Ac4Ac6Ac Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc(b1-4)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)]Glc(b1-4)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)Glc Fuc(a1-2)Gal(b1-4)Xyl Fuc(a1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Fuc(a1-6)GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(?1-?)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Man(a1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(?1-?)[Gal(?1-?)]GlcNAc(?1-?)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(?1-?)Man(a1-3)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(a1-4)Gal Gal(a1-6)Gal Gal(a1-6)Gal(a1-6)Gal Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)[Fruf(b2-1)]Glc Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Gal(a1-6)Glc Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Man Gal(a1-6)Man(b1-4)Man Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)Man Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Gal(a1-6)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Gal(a1-6)Man(b1-4)[Gal(a1-6)]Man Gal(b1-2)GlcA Gal(b1-2)GlcA6Me Gal(b1-2)Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)[Xyl(b1-3)]GlcA Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Gal(b1-3)GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[GlcNAc(b1-4)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(a1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Gal(b1-3)GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-6)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-4)Gal(b1-4)Man Gal(b1-4)Gal(b1-4)ManOMe Gal(b1-4)GlcA Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Gal(b1-4)Man(b1-4)Man Gal(b1-4)Man(b1-4)Man(b1-4)Gal Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1FerOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-FucOMeOSin Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GalA(a1-2)[Araf(a1-5)Araf(a1-4)]Rha(b1-4)GalA GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)GalA GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA GalOMe(b1-2)[XylOMe(b1-3)]GlcAOMe GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe GalOMe(b1-4)XylOMe(b1-4)[D-ApifOMe(b1-3)]RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe Galf(b1-2)[Galf(b1-4)]Man Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Glc(a1-2)Rha(a1-6)Glc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-4)Glc(a1-2)Rha(a1-6)Glc Glc(a1-4)Glc(a1-4)Glc(a1-6)Glc Glc(a1-4)Glc(a1-4)GlcA Glc(a1-4)GlcA(b1-2)GlcA Glc(b1-2)Ara Glc(b1-2)Ara(a1-2)GlcA Glc(b1-2)Gal(b1-2)Gal(b1-2)GlcA Glc(b1-2)Gal(b1-2)GlcA Glc(b1-2)Gal(b1-2)GlcA(b1-3)[Glc(b1-3)]Ara Glc(b1-2)Glc Glc(b1-2)Glc(a1-2)FrufOBzOCin Glc(b1-2)Glc(b1-2)Glc Glc(b1-2)GlcA Glc(b1-2)[Ara(a1-3)]GlcA6Me Glc(b1-2)[Ara(a1-3)]GlcAOMe Glc(b1-2)[Ara(a1-6)]Glc Glc(b1-2)[Glc(b1-3)]Glc(a1-2)Fruf Glc(b1-2)[Glc(b1-3)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-2)[Glc6Ac(b1-3)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-2)[Rha(a1-3)]GlcA Glc(b1-2)[Xyl(b1-2)Ara(a1-6)]Glc Glc(b1-2)[Xyl(b1-2)D-Fuc(b1-6)]Glc Glc(b1-3)Ara Glc(b1-3)Glc Glc(b1-3)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz Glc(b1-3)[Araf(a1-4)]Rha(a1-2)Glc Glc(b1-3)[Xyl(b1-4)]Rha(a1-2)D-FucOMe Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-4)Glc Glc(b1-4)Glc(b1-4)Glc(b1-4)Man Glc(b1-4)Glc6Ac(b1-3)Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Man(b1-4)Glc Glc(b1-4)Rha Glc(b1-4)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Glc(b1-6)Glc(b1-3)Glc Glc1Cer Glc2Ac(b1-4)[D-Apif(b1-3)Xyl(b1-2)]Glc Glc2Ac3Ac4Ac6Ac(b1-3)Ara Glc6Ac(b1-2)Glc(a1-2)FrufOBzOCin Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)][RhaOAc(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum(a1-2)Fruf1CoumOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz GlcA(b1-2)Glc GlcA(b1-2)GlcA GlcA(b1-2)GlcA(b1-2)Rha GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl GlcA4Me(a1-2)[Xyl(b1-4)]Xyl GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Gal(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-?)Man(a1-3)[GlcNAc(b1-?)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcOMe(b1-3)[XylOMe(b1-4)]RhaOMe(a1-2)D-FucOMe Glcf(b1-2)Xyl(b1-4)Rha(b1-4)[Xyl(b1-3)]Xyl Hexf(?1-?)Xyl(b1-4)Rha(b1-4)[Xyl(a1-3)]Xyl L-Lyx(a1-2)Ara(a1-2)GlcA Lyx(a1-2)Ara(a1-2)GlcA Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)[Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)[Man(a1-3)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-6)][Xylf(a1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc-ol Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]Hex Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)ManNAc Man(a1-3)[Xylf(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-?)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-?)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(b1-2)Man Man(b1-4)Gal(b1-4)Gal(b1-4)Man Man(b1-4)Gal(b1-4)Gal(b1-4)ManOMe Man(b1-4)Man Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-3)Gal(a1-3)Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man Man(b1-4)[Gal(a1-6)]Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man Man(b1-6)Glc Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Rha(a1-2)Ara Rha(a1-2)Ara(a1-2)GlcA Rha(a1-2)Ara(a1-2)GlcA6Me Rha(a1-2)Ara(a1-2)GlcAOMe Rha(a1-2)D-Ara(b1-2)GlcA Rha(a1-2)Gal(b1-2)Glc Rha(a1-2)Gal(b1-2)GlcA Rha(a1-2)Gal(b1-2)GlcA6Me Rha(a1-2)Gal(b1-2)GlcAOMe Rha(a1-2)Glc(b1-2)Glc Rha(a1-2)Glc(b1-2)GlcA Rha(a1-2)Glc(b1-2)GlcA6Me Rha(a1-2)Glc(b1-2)GlcAOMe Rha(a1-2)Glc(b1-6)Glc Rha(a1-2)GlcA(b1-2)GlcA Rha(a1-2)GlcAOMe(b1-2)GlcAOMe Rha(a1-2)Rha(a1-2)Gal(b1-4)[Glc(b1-2)]GlcA Rha(a1-2)Xyl Rha(a1-2)Xyl(b1-2)GlcA Rha(a1-2)Xyl(b1-2)GlcA6Me Rha(a1-2)Xyl(b1-2)GlcAOMe Rha(a1-2)Xyl3Ac Rha(a1-2)Xyl4Ac Rha(a1-2)[Glc(b1-3)]Glc Rha(a1-2)[Glc(b1-6)]Gal(b1-2)GlcA6Me Rha(a1-2)[Rha(a1-4)]Glc Rha(a1-2)[Rha(a1-6)]Gal Rha(a1-2)[Rha(a1-6)]Glc Rha(a1-2)[Xyl(b1-4)]Glc Rha(a1-2)[Xyl(b1-4)]Glc(b1-6)Glc Rha(a1-3)GlcA Rha(a1-4)Gal(b1-2)GlcA Rha(a1-4)Gal(b1-2)GlcAOMe Rha(a1-4)Gal(b1-2)GlcOMe Rha(a1-4)Gal(b1-4)Gal(b1-4)GalGro Rha(a1-4)Xyl(b1-2)Glc Rha(a1-4)Xyl(b1-2)GlcA Rha(a1-4)Xyl(b1-2)GlcAOMe Rha(a1-6)[Xyl(b1-3)Xyl(b1-2)]Glc(b1-2)Glc Rha(b1-2)Glc(b1-2)GlcA Rha1Fer(a1-4)Fruf(b2-1)GlcOBz RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol RhaOMe(a1-6)GlcOMe(b1-2)GlcOMe-ol Xyl(a1-6)Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(b1-2)Ara(a1-6)Glc Xyl(b1-2)Ara(a1-6)GlcNAc Xyl(b1-2)Ara(a1-6)[Glc(b1-2)]Glc Xyl(b1-2)Ara(a1-6)[Glc(b1-4)]GlcNAc Xyl(b1-2)D-Fuc(b1-6)Glc Xyl(b1-2)D-Fuc(b1-6)GlcNAc Xyl(b1-2)D-Fuc(b1-6)[Glc(b1-2)]Glc Xyl(b1-2)Fuc(a1-6)Glc Xyl(b1-2)Fuc(a1-6)GlcNAc Xyl(b1-2)Gal(b1-2)GlcA6Me Xyl(b1-2)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)Rha(a1-2)Ara Xyl(b1-2)[Glc(b1-3)]Ara Xyl(b1-2)[Man(a1-3)][Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc Xyl(b1-2)[Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Rha(a1-3)]GlcA Xyl(b1-3)Ara Xyl(b1-3)Xyl(b1-2)[Rha(a1-6)]Glc(b1-2)Glc Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc(b1-2)Glc Xyl(b1-4)Rha(a1-2)Ara Xyl(b1-4)Rha(a1-2)D-Fuc Xyl(b1-4)Rha(a1-2)D-FucOMe Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl3Ac(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl3Ac(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl Xyl(b1-4)[GlcAOMe(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl Xyl2Ac3Ac4Ac(b1-3)Ara XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol XylOMe(b1-3)XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol Xylf(b1-2)Xyl(b1-3)[Rha(b1-2)Rha(b1-4)]Xyl [Araf(a1-3)Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal [Araf(a1-3)Gal(b1-6)]Gal(b1-3)Gal [Gal(a1-4)Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Man(b1-4)Man(b1-4)Man(b1-4)Gal(a1-6)]Man(b1-2)[Gal(a1-6)]Man(b1-2)[Gal(a1-4)Gal(a1-6)]Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man [Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Gal(b1-3)Gal(b1-6)[Araf(a1-3)]Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)]Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc
Fabaceae 1 4 1 3 1 1 0 1 3 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 1 1 2 1 1 1 1 4 2 1 2 2 7 4 4 4 2 8 4 2 5 4 1 1 1 1 1 0 1 1 3 1 1 1 1 1 1 2 1 5 1 1 1 1 2 2 1 1 2 1 1 1 1 3 2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 3 1 1 0 1 2 1 1 2 0 0 0 1 1 1 4 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 1 0 0 0 0 0 1 2 0 1 1 1 5 1 1 0 0 0 0 0 0 0 1 3 1 0 0 0 1 1 4 6 1 1 1 1 2 1 1 1 3 1 1 3 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 3 2 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 4 6 4 4 4 1 1 5 4 1 4 1 1 0 1 1 1 7 1 1 2 3 22 6 7 1 8 3 4 1 3 1 1 1 2 2 2 1 1 1 1 1 0 2 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 2 1 2 2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 7 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 5 2 1 1 1 3 2 1 1 3 2 1 0 0 2 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fagaceae 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Polygalaceae 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 2 2 1 1 1 1 2 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 2 2 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Quillajaceae 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


 choose_correct_isoform (glycans, reverse=False)

given a list of glycan branch isomers, this function returns the correct isomer

glycans (list): glycans in IUPAC-condensed nomenclature
reverse (bool): whether to return the correct isomer (False) or everything except the correct isomer (True); default:False
Returns the correct isomer as a string (if reverse=False; otherwise it returns a list of strings)


 enforce_class (glycan, glycan_class, conf=None, extra_thresh=0.3)

given a glycan and glycan class, determines whether glycan is from this class

glycan (string): glycan in IUPAC-condensed nomenclature
glycan_class (string): glycan class in form of “O”, “N”, “free”, or “lipid”
conf (float): prediction confidence; can be used to override class
extra_thresh (float): threshold to override class; default:0.3
Returns True if glycan is in glycan class and False if not
enforce_class("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", "O")


 IUPAC_to_SMILES (glycan_list)

given a list of IUPAC-condensed glycans, uses GlyLES to return a list of corresponding isomeric SMILES

glycan_list (list): list of IUPAC-condensed glycans
Returns a list of corresponding isomeric SMILES


 canonicalize_composition (comp)

converts a composition from any common format into the dictionary that is optimized for glycowork

comp (string): composition formatted either in the style of HexNAc2Hex1Fuc3Neu5Ac1 or N2H1F3A1
Returns composition as a dictionary of style monosaccharide : count
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}


 canonicalize_iupac (glycan)

converts a glycan from IUPAC-extended, LinearCode, GlycoCT, and WURCS into the exact IUPAC-condensed version that is optimized for glycowork

glycan (string): glycan sequence; some rare post-biosynthetic modifications could still be an issue
Returns glycan as a string in canonicalized IUPAC-condensed


 get_possible_linkages (wildcard, linkage_list={'a1-7', 'a1-8', 'a1-3',
                        'b1-5', 'b1-7', 'b1-4', 'a2-7', '?2-8', 'b1-3',
                        'a1-6', 'b1-1', 'b2-2', 'b1-8', 'b2-1', 'b2-3',
                        '?2-3', 'a1-11', 'b1-?', 'a1-?', 'a2-?', '?2-?',
                        '1-4', '?1-3', 'b2-6', '?1-2', 'a2-6', 'a2-11',
                        'b2-4', 'a2-4', 'a2-8', '?2-6', 'a2-3', 'b1-2',
                        '1-6', 'a2-2', 'a1-4', '?1-4', 'a2-9', 'a2-5',
                        'a2-1', 'a1-1', 'b2-5', 'b1-6', 'a1-9', 'a1-5',
                        '?1-?', 'b1-9', 'b2-8', '?1-6', 'a1-2', 'b2-7'})

Retrieves all linkages that match a given wildcard pattern from a list of linkages

wildcard (string): The pattern to match, where ‘?’ can be used as a wildcard for any single character.
linkage_list (list): List of linkages as strings to search within; default:linkages
Returns a list of linkages that match the wildcard pattern.


 get_possible_monosaccharides (wildcard)

Retrieves all matching common monosaccharides of a type, given the type

wildcard (string): Monosaccharide type, from “HexNAc”, “Hex”, “dHex”, “Sia”, “HexA”, “Pen”
Returns a list of specified monosaccharides of that type
{'GalNAc', 'GlcNAc', 'HexNAc', 'ManNAc'}


 equal_repeats (r1, r2)

checks whether two repeat units could stem from the same repeating structure, just shifted

r1 (string): glycan sequence in IUPAC-condensed nomenclature
r2 (string): glycan sequence in IUPAC-condensed nomenclature
Returns True if repeat structures are shifted versions of each other, else False
equal_repeats("Fuc2S3S(a1-3)Fuc2S(a1-4)Fuc2S3S", "Fuc2S(a1-4)Fuc2S3S(a1-3)Fuc2S")


for interacting with the databases contained in glycowork, delivering insights for sequences of interest


 get_insight (glycan, motifs=None)

prints out meta-information about a glycan

glycan (string): glycan in IUPAC-condensed format
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list
print("Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'")
Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
Let's get rolling! Give us a few moments to crunch some numbers.

This glycan occurs in the following species: ['Acanthocheilonema_viteae', 'Adeno-associated_dependoparvovirusA', 'Aedes_aegypti', 'Angiostrongylus_cantonensis', 'Anopheles_gambiae', 'Antheraea_pernyi', 'Apis_mellifera', 'Ascaris_suum', 'Autographa_californica_nucleopolyhedrovirus', 'AvianInfluenzaA_Virus', 'Bombus_ignitus', 'Bombyx_mori', 'Bos_taurus', 'Bos_taurus', 'Bos_taurus', 'Brugia_malayi', 'Caenorhabditis_elegans', 'Cardicola_forsteri', 'Cooperia_onchophora', 'Cornu_aspersum', 'Crassostrea_gigas', 'Crassostrea_virginica', 'Cricetulus_griseus', 'Danio_rerio', 'Dictyocaulus_viviparus', 'Dirofilaria_immitis', 'Drosophila_melanogaster', 'Fasciola_hepatica', 'Gallus_gallus', 'Glossina_morsitans', 'Haemonchus_contortus', 'Haliotis_tuberculata', 'Heligmosomoides_polygyrus', 'Helix_lucorum', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'HumanImmunoDeficiency_Virus', 'Hylesia_metabus', 'Lutzomyia_longipalpis', 'Lymantria_dispar', 'Macaca_mulatta', 'Macaca_mulatta', 'Macaca_mulatta', 'Macaca_mulatta', 'Mamestra_brassicae', 'Megathura_crenulata', 'Mus_musculus', 'Nilaparvata_lugens', 'Oesophagostomum_dentatum', 'Onchocerca_volvulus', 'Ophiactis_savignyi', 'Opisthorchis_viverrini', 'Ostrea_edulis', 'Ovis_aries', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pristionchus_pacificus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Schistosoma_mansoni', 'SemlikiForest_Virus', 'Spodoptera_frugiperda', 'Sus_scrofa', 'Tick_borne_encephalitis_virus', 'Tribolium_castaneum', 'Trichinella_spiralis', 'Trichoplusia_ni', 'Trichuris_suis', 'Tropidolaemus_subannulatus', 'Volvarina_rubella', 'undetermined', 'unidentified_influenza_virus']

Puh, that's quite a lot! Here are the phyla of those species: ['Arthropoda', 'Artverviricota', 'Chordata', 'Cossaviricota', 'Echinodermata', 'Kitrinoviricota', 'Mollusca', 'Negarnaviricota', 'Nematoda', 'Platyhelminthes', 'Virus']

This glycan contains the following motifs: ['Chitobiose', 'Trimannosylcore', 'core_fucose']

This is the GlyTouCan ID for this glycan: G63041RA

This glycan has been reported to be expressed in: ['2A3_cell_line', 'A549_cell_line', 'AML_193_cell_line', 'CHOK1_cell_line', 'CHOS_cell_line', 'CRL_1620_cell_line', 'Cal-27_cell_line', 'Cervicovaginal_Secretion', 'EOL_1_cell_line', 'FaDu_cell_line', 'HEK293_cell_line', 'HEL92_1_7_cell_line', 'HEL_cell_line', 'HL_60_cell_line', 'KG_1_cell_line', 'KG_1a_cell_line', 'Kasumi_1_cell_line', 'MDA_MB_231BR_cell_line', 'ME_1_cell_line', 'ML_1_cell_line', 'MOLM_13_cell_line', 'MOLM_14_cell_line', 'MV4_11_cell_line', 'M_07e_cell_line', 'NB_4_cell_line', 'NS0_cell_line', 'OCI_AML2_cell_line', 'OCI_AML3_cell_line', 'PLB_985_cell_line', 'SCC-9_cell_line', 'SCC_25_cell_line', 'TF_1_cell_line', 'THP_1_cell_line', 'U_937_cell_line', 'VU-147T_cell_line', 'alveolus_of_lung', 'brain', 'brain', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellum', 'colon', 'cortex', 'digestive_tract', 'digestive_tract', 'forebrain', 'gills', 'gills', 'heart', 'heart', 'heart', 'hindbrain', 'hippocampal_formation', 'hippocampus', 'hippocampus', 'hippocampus', 'hippocampus', 'iPS1A_cell_line', 'iPS2A_cell_line', 'kidney', 'liver', 'lung', 'mantle', 'mantle', 'milk', 'milk', 'milk', 'mucus', 'muscle_of_leg', 'nerve_ending', 'ovary', 'pancreas', 'placenta', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'prostate_gland', 'striatum', 'striatum', 'striatum', 'striatum', 'testicle', 'testis', 'trachea', 'urine', 'urothelium']

This glycan has been reported to be dysregulated in (disease, direction, sample): [('REM_sleep_behavior_disorder', 'down', 'serum'), ('benign_breast_tumor_tissues_vs_para_carcinoma_tissues', 'up', 'breast'), ('cystic_fibrosis', 'up', 'sputum'), ('female_breast_cancer', 'up', 'breast'), ('female_breast_cancer', 'up', 'cell_line'), ('prostate_cancer', 'up', 'prostate_cancer_biopsy'), ('thyroid_gland_papillary_carcinoma', 'up', 'serum'), ('urinary_bladder_cancer', 'down', 'urine')]

That's all we can do for you at this point!


 glytoucan_to_glycan (ids, revert=False)

interconverts GlyTouCan IDs and glycans in IUPAC-condensed

ids (list): list of GlyTouCan IDs as strings (if using glycans instead, change ‘revert’ to True
revert (bool): whether glycans should be mapped to GlyTouCan IDs or vice versa; default:False
Returns list of either GlyTouCan IDs or glycans in IUPAC-condensed


for performing regular expression-like searches in glycans, very powerful to find complicated motifs


 get_match (pattern, glycan, return_matches=True)

finds matches for a glyco-regular expression in a glycan

pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex
glycan (string or networkx): glycan sequence in IUPAC-condensed or as networkx graph
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True
Returns either a boolean (return_matches = False) or a list of matches as strings (return_matches = True)
# {} = between min and max occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# * = zero or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])*-HexNAc"
# + = one or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])+-HexNAc"
# ? = zero or one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc])?-HexNAc"
# {1,} = at minimum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,}-HexNAc"
# {,1} = at maximum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){,1}-HexNAc"
# {2} = exactly two occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){2}-HexNAc"
# ^ = start of sequence, e.g., "^Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# % = middle of sequence (i.e., neither start nor end)
# $ = end of sequence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc$"
# ?<= = lookbehind (i.e., provided pattern must be present before rest of pattern but is not included in match), e.g., "(?<=Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?<! = negative lookbehind (i.e., provided pattern is not present before rest of pattern and is also not included in match), e.g., "(?<!Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?= = lookahead (i.e., provided pattern must be present after rest of pattern but is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?=-HexNAc)"
# ?! = negative lookahead (i.e., provided pattern is not present after rest of pattern and is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?!-HexNAc)"

# Example: extracting the sequence from the a1-6 branch of N-glycans
pattern = "r[Sia]{,1}-Monosaccharide-([dHex]){,1}-Monosaccharide(?=-Mana6-Monosaccharide)"
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))

For interested users, we here compile a selection of regular expression patterns that we find useful in our own work:

  • Lewis or sialyl-Lewis structures:
    pattern = “r[Sia]{,1}-[Gal|GalOS]{1}-([Fuc]){1}-[GlcNAc|GlcNAc6S]{1}”
  • Blood groups:
    pattern = “rFuc-([Gal|GalNAc])?-Gal-GlcNAc”
  • a1-6 branch in N-glycans:
    pattern = “r[Sia]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-[Man|GlcNAc]{1}-([.-.|.]){,1}-Mana6(?=-Manb4-GlcNAc)”
  • b1-6 branch in O-glycans (from core 2/4/6):
    pattern = “r[Sia|dHex]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-.b6(?=-GalNAc)”
  • b1-3 branch in O-glycans (from core 1/2):
    pattern = “r[Sia]{,1}-[.]{,1}-([dHex]){,1}-.b3(?=-GalNAc)”


 get_match_batch (pattern, glycan_list, return_matches=True)

finds matches for a glyco-regular expression in a list of glycans

pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex
glycan_list (list of strings or networkx): list of glycan sequence in IUPAC-condensed or as networkx graph
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True
Returns either a list of booleans (return_matches = False) or a list of list of matches as strings (return_matches = True)


 motif_to_regex (motif)

tries to convert motif into a regular expression

motif (string): glycan in IUPAC-condensed nomenclature
Returns regular expression if successful


helper functions to map m/z–>composition, composition–>structure, structure–>motif, and more


 string_to_labels (character_string, libr=None)

tokenizes word by indexing characters in passed library

character_string (string): string of characters to index
libr (dict): dict of library items
Returns indexes of characters in library
[None, None, None, None, None]


 pad_sequence (seq, max_length, pad_label=None, libr=None)

brings all sequences to same length by adding padding token

seq (list): sequence to pad (from string_to_labels)
max_length (int): sequence length to pad to
pad_label (int): which padding label to use
libr (list): list of library items
Returns padded sequence
pad_sequence(string_to_labels(['Man','a1-3','Man','a1-6','Man']), 7)
[None, None, None, None, None, 25, 25]


 stemify_glycan (glycan, stem_lib=None, libr=None)

removes modifications from all monosaccharides in a glycan

glycan (string): glycan in IUPAC-condensed format
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
libr (dict): dictionary of form glycoletter:index; default:lib
Returns stemmed glycan as string


 stemify_dataset (df, stem_lib=None, libr=None, glycan_col_name='glycan',

stemifies all glycans in a dataset by removing monosaccharide modifications

df (dataframe): dataframe with glycans in IUPAC-condensed format in column glycan_col_name
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
libr (dict): dictionary of form glycoletter:index; default:lib
glycan_col_name (string): column name under which glycans are stored; default:glycan
rarity_filter (int): how often monosaccharide modification has to occur to not get removed; default:1
Returns df with glycans stemified


 mask_rare_glycoletters (glycans, thresh_monosaccharides=None,

masks rare monosaccharides and linkages in a list of glycans

glycans (list): list of glycans in IUPAC-condensed form
thresh_monosaccharides (int): threshold-value for monosaccharides seen as “rare”; default:(0.001*len(glycans))
thresh_linkages (int): threshold-value for linkages seen as “rare”; default:(0.03*len(glycans))
Returns list of glycans in IUPAC-condensed with masked rare monosaccharides and linkages


 mz_to_composition (mz_value, mode='negative', mass_value='monoisotopic',
                    reduced=False, sample_prep='underivatized',
                    mass_tolerance=0.5, kingdom='Animalia',
                    glycan_class='N', df_use=None, filter_out=set())

Mapping a m/z value to a matching monosaccharide composition within SugarBase

mz_value (float): the actual m/z value from mass spectrometry
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
reduced (bool): whether glycans are reduced at reducing end; default:False
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:‘N’
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None
Returns a list of matching compositions in dict form
mz_to_composition(665.4, glycan_class='O', filter_out={'Kdn', 'P', 'HexA', 'Pen', 'HexN', 'Me', 'PCho', 'PEtN'},
                    reduced = True)
[{'HexNAc': 2, 'Hex': 2, 'Neu5Ac': 2}]


 match_composition_relaxed (composition, glycan_class='N',
                            kingdom='Animalia', df_use=None,

Given a coarse-grained monosaccharide composition (Hex, HexNAc, etc.), it returns all corresponding glycans

composition (dict): a dictionary indicating the composition to match (for example {“dHex”: 1, “Hex”: 1, “HexNAc”: 1})
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan
Returns list of glycans matching composition in IUPAC-condensed
match_composition_relaxed({"Hex":3, "HexNAc":2, "dHex":1}, glycan_class = 'O')


 condense_composition_matching (matched_composition)

Given a list of glycans matching a composition, find the minimum number of glycans characterizing this set

matched_composition (list): list of glycans matching to a composition
Returns minimal list of glycans that match a composition
match_comp = match_composition_relaxed({'Hex':1, 'HexNAc':1, 'Neu5Ac':1}, glycan_class = 'O')
['Neu5Ac(a2-3)Gal(b1-3)GalNAc', 'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-?)Gal(b1-3)GalNAc', '{Neu5Ac(a2-?)}Gal(b1-3)GalNAc', 'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal', 'Neu5Ac(a2-3)Gal(b1-4)GalNAc', 'Neu5Ac(a2-6)Gal(b1-3)GalNAc', 'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-?)Hex(?1-?)GalNAc', 'Gal(?1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-3)Gal(?1-?)GalNAc', 'Neu5Ac(a2-6)Gal(a1-3)GalNAc', 'Neu5Ac(a2-?)Gal(?1-3)GalNAc']


 mz_to_structures (mz_list, glycan_class, kingdom='Animalia',
                   abundances=None, mode='negative',
                   mass_value='monoisotopic', sample_prep='underivatized',
                   mass_tolerance=0.5, reduced=False, df_use=None,
                   filter_out=set(), verbose=False)

wrapper function to map precursor masses to structures, condense them, and match them with relative intensities

mz_list (list): list of precursor masses
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
abundances (dataframe): every row one composition (matching mz_list in order), every column one sample; default:pd.DataFrame([range(len(mz_list))]*2).T
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5
reduced (bool): whether glycans are reduced at reducing end; default:False
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None
verbose (bool): whether to print any non-matching compositions; default:False
Returns dataframe of (matched structures) x (relative intensities)
mz_to_structures([674.29], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
2 Gal(a1-3)[Neu5Ac(a2-6)]GalNAc 0
3 {Neu5Ac(a2-?)}Gal(b1-3)GalNAc 0
4 Neu5Ac(a2-3)[GalNAc(b1-4)]Gal 0
5 Neu5Ac(a2-3)Gal(b1-4)GalNAc 0
6 Neu5Ac(a2-6)Gal(b1-3)GalNAc 0
7 Neu5Ac(a2-?)Hex(?1-?)GalNAc 0
8 Neu5Ac(a2-3)Gal(?1-?)GalNAc 0
9 Neu5Ac(a2-6)Gal(a1-3)GalNAc 0


 compositions_to_structures (composition_list, glycan_class='N',
                             kingdom='Animalia', abundances=None,
                             df_use=None, verbose=False)

wrapper function to map compositions to structures, condense them, and match them with relative intensities

composition_list (list): list of composition dictionaries of the form {‘Hex’: 1, ‘HexNAc’: 1}
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
abundances (dataframe): every row one composition (matching composition_list in order), every column one sample;default:pd.DataFrame([range(len(composition_list))]*2).T
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan
verbose (bool): whether to print any non-matching compositions; default:False
Returns dataframe of (matched structures) x (relative intensities)
compositions_to_structures([{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1}], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc 0
2 Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc 0
3 Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc 0
compositions_to_structures(["H1N1A2"], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc 0
2 Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc 0
3 Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc 0


 structure_to_basic (glycan)

converts a monosaccharide- and linkage-defined glycan structure to the base topology

glycan (string): glycan in IUPAC-condensed nomenclature
Returns the glycan topology as a string


 glycan_to_composition (glycan, stem_libr=None)

maps glycan to its composition

glycan (string): glycan in IUPAC-condensed format
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
Returns a dictionary of form “Monosaccharide” : count
{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1}


 glycan_to_mass (glycan, mass_value='monoisotopic',
                 sample_prep='underivatized', stem_libr=None)

given a glycan, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation

glycan (string): glycan in IUPAC-condensed format
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
Returns the theoretical mass of input glycan


 composition_to_mass (dict_comp, mass_value='monoisotopic',

given a composition, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation

dict_comp (dict): composition in form monosaccharide:count
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
Returns the theoretical mass of input composition
composition_to_mass({'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1})


 get_unique_topologies (composition, glycan_type, df_use=None,
                        universal_replacers={}, taxonomy_rank='Kingdom',

given a composition, retrieves all observed and unique base topologies

composition (dict): composition in form monosaccharide:count
glycan_type (string): which glycan class to search, ‘N’, ‘O’, ‘lipid’, ‘free’, or ‘repeat’
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
universal_replacers (dictionary): dictionary of form base monosaccharide : specific monosaccharide
taxonomy_rank (string): at which taxonomic rank to filter; default: Kingdom
taxonomy_value (string): which value to filter at taxonomy_rank; default: Animalia
Returns a list of observed base topologies for the given composition
get_unique_topologies({'HexNAc':2, 'Hex':1}, 'O', universal_replacers = {'dHex':'Fuc'})