motif

motif contains many functions to process glycans in various ways and use this processing to analyze glycans via curated motifs, graph features, and sequence features. It contains the following modules:

draw

drawing glycans in SNFG style


GlycoDraw

 GlycoDraw (draw_this, vertical=False, compact=False, show_linkage=True,
            dim=50, highlight_motif=None, highlight_termini_list=[],
            repeat=None, repeat_range=None, filepath=None, suppress=False)

Draws a glycan structure based on the provided input.

Arguments:
draw_this (string): The glycan structure or motif to be drawn.
vertical (bool, optional): Set to True to draw the structure vertically. Default: False.
compact (bool, optional): Set to True to draw the structure in a compact form. Default: False.
show_linkage (bool, optional): Set to False to hide the linkage information. Default: True.
dim (int, optional): The dimension (size) of the individual sugar units in the structure. Default: 50.
highlight_motif (string, optional): Glycan motif to highlight within the parent structure.
highlight_termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
repeat (bool
repeat_range (list of 2 int): List of index integers for the first and last main-chain monosaccharide in repeating unit. Monosaccharides are numbered starting from 0 (invisible placeholder = 0 in case of structure terminating in a linkage) at the reducing end.
filepath (string, optional): The path to the output file to save as SVG or PDF. Default: None.
suppress (bool, optional): Whether to suppress the visual display of drawings into the console; default:False
GlycoDraw("Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc",
         highlight_motif = "GlcNAc(b1-?)Man")


annotate_figure

 annotate_figure (svg_input, scale_range=(25, 80), compact=False,
                  glycan_size='medium', filepath='', scale_by_DE_res=None,
                  x_thresh=1, y_thresh=0.05, x_metric='Log2FC')

Modify matplotlib svg figure to replace text labels with glycan figures

Arguments:
svg_input (string): absolute path including full filename for input svg figure
scale_range (tuple): tuple of two integers defining min/max glycan dim; default:(25,80)
compact (bool): if True, draw compact glycan figures; default:False
glycan_size (string): modify glycan size; default:‘medium’; options are ‘small’, ‘medium’, ‘large’
filepath (string): absolute path including full filename allows for saving the plot
scale_by_DE_res (df): result table from motif_analysis.get_differential_expression. Include to scale glycan figure size by -10logp
x_thresh (float): absolute x metric threshold for datapoints included for scaling, set to match get_differential_expression; default:1.0
y_thresh (float): corr p threshhold for datapoints included for scaling, set to match get_differential_expression; default:0.05
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’
Returns:
Modified figure svg code

plot_glycans_excel

 plot_glycans_excel (df, folder_filepath, glycan_col_num=0,
                     scaling_factor=0.2, compact=False)

plots SNFG images of glycans into new column in df and saves df as Excel file

Arguments:
df (dataframe): dataframe containing glycan sequences [alternative: filepath to .csv or .xlsx]
folder_filepath (string): full filepath to the folder you want to save the output to
glycan_col_num (int): index of the column containing glycan sequences; default:0 (first column)
scaling_factor (float): how large the glycans should be; default:0.2
compact (bool, optional): Set to True to draw the structures in a compact form. Default: False.
Returns:
Saves the dataframe with glycan images as output.xlsx into folder_filepath

analysis

downstream analyses of important glycan motifs


get_pvals_motifs

 get_pvals_motifs (df, glycan_col_name='glycan', label_col_name='target',
                   zscores=True, thresh=1.645, sorting=True,
                   feature_set=['exhaustive'], multiple_samples=False,
                   motifs=None, custom_motifs=[])

returns enriched motifs based on label data or predicted data

Arguments:
df (dataframe): dataframe containing glycan sequences and labels [alternative: filepath to .csv or .xlsx]
glycan_col_name (string): column name for glycan sequences; arbitrary if multiple_samples = True; default:‘glycan’
label_col_name (string): column name for labels; arbitrary if multiple_samples = True; default:‘target’
zscores (bool): whether data are presented as z-scores or not, will be z-score transformed if False; default:True
thresh (float): threshold value to separate positive/negative; default is 1.645 for Z-scores
sorting (bool): whether p-value dataframe should be sorted ascendingly; default: True
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
multiple_samples (bool): set to True if you have multiple samples (rows) with glycan information (columns); default:False
motifs (dataframe): can be used to pass a modified motif_list to the function; default:None
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns dataframe with p-values, corrected p-values, and Cohen’s d as effect size for every glycan motif
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN',
          'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
test_df = pd.DataFrame({'glycan':glycans, 'binding':label})

print("Glyco-Motif enrichment p-value test")
out = get_pvals_motifs(test_df, 'glycan', 'binding').iloc[:10,:]
Glyco-Motif enrichment p-value test
  motif pval corr_pval effect_size
4 GlcNAc 0.038120 0.205849 1.530905
8 Man 0.054356 0.234990 1.390253
25 Man(a1-?)Man 0.060923 0.234990 1.308333
10 Man(a1-3)Man 0.034212 0.205849 1.196586
11 Man(a1-6)Man 0.019543 0.175885 1.168815
13 Man(b1-4)GlcNAc 0.019543 0.175885 1.168815
14 GlcNAc(b1-4)GlcNAc 0.019543 0.175885 1.168815
7 Kdo 0.328790 0.479672 -0.811679
2 Glc 0.644180 0.668956 -0.811679
16 Man(a1-2)Man 0.177461 0.479672 0.772320

get_representative_substructures

 get_representative_substructures (enrichment_df)

builds minimal glycans that contain enriched motifs from get_pvals_motifs

Arguments:
enrichment_df (dataframe): output from get_pvals_motifs
Returns:
Returns up to 10 minimal glycans in a list

get_heatmap

 get_heatmap (df, motifs=False, feature_set=['known'],
              datatype='response', rarity_filter=0.05, filepath='',
              index_col='glycan', custom_motifs=[], **kwargs)

clusters samples based on glycan data (for instance glycan binding etc.)

Arguments:
df (dataframe): dataframe with glycan data, rows are samples and columns are glycans [alternative: filepath to .csv or .xlsx]
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
datatype (string): whether df comes from a dataset with quantitative variable (‘response’) or from presence_to_matrix (‘presence’)
rarity_filter (float): proportion of samples that need to have a non-zero value for a variable to be included; default:0.05
filepath (string): absolute path including full filename allows for saving the plot
index_col (string): default column to convert to dataframe index; default:‘glycan’
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
**kwargs: keyword arguments that are directly passed on to seaborn clustermap
Returns:
Prints clustermap
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P',
           'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
label2 = [0.134, 0.345, 1.15, 0.233, 2.981]
label3 = [0.334, 0.245, 1.55, 0.133, 2.581]
test_df = pd.DataFrame([label, label2, label3], columns = glycans)

get_heatmap(test_df, motifs = True, feature_set = ['known', 'exhaustive'])


plot_embeddings

 plot_embeddings (glycans, emb=None, label_list=None, shape_feature=None,
                  filepath='', alpha=0.8, palette='colorblind', **kwargs)

plots glycan representations for a list of glycans

Arguments:
glycans (list): list of IUPAC-condensed glycan sequences as strings
emb (dictionary): stored glycan representations; default takes them from trained species-level SweetNet model
label_list (list): list of same length as glycans if coloring of the plot is desired
shape_feature (string): monosaccharide/bond used to display alternative shapes for dots on the plot
filepath (string): absolute path including full filename allows for saving the plot
alpha (float): transparency of points in plot; default:0.8
palette (string): color palette to color different classes; default:‘colorblind’
**kwargs: keyword arguments that are directly passed on to matplotlib
df_fabales = df_species[df_species.Order == 'Fabales'].reset_index(drop = True)
plot_embeddings(df_fabales.glycan.values.tolist(), label_list = df_fabales.Family.values.tolist())


characterize_monosaccharide

 characterize_monosaccharide (sugar, df=None, mode='sugar',
                              glycan_col_name='glycan', rank=None,
                              focus=None, modifications=False,
                              filepath='', thresh=10)

for a given monosaccharide/linkage, return typical neighboring linkage/monosaccharide

Arguments:
sugar (string): monosaccharide or linkage
df (dataframe): dataframe to use for analysis; default:df_species
mode (string): either ‘sugar’ (connected monosaccharides), ‘bond’ (monosaccharides making a provided linkage), or ‘sugarbond’ (linkages that a provided monosaccharides makes); default:‘sugar’
glycan_col_name (string): column name under which glycans can be found; default:‘glycan’
rank (string): add column name as string if you want to filter for a group
focus (string): add row value as string if you want to filter for a group
modifications (bool): set to True if you want to consider modified versions of a monosaccharide; default:False
filepath (string): absolute path including full filename allows for saving the plot
thresh (int): threshold count of when to include motifs in plot; default:10 occurrences
Returns:
Plots modification distribution and typical neighboring bond/monosaccharide
characterize_monosaccharide('D-Rha', rank = 'Kingdom', focus = 'Bacteria', modifications = True)


get_differential_expression

 get_differential_expression (df, group1, group2, motifs=False,
                              feature_set=['exhaustive', 'known'],
                              paired=False, impute=True, sets=False,
                              set_thresh=0.9, effect_size_variance=False,
                              min_samples=None, grouped_BH=False,
                              custom_motifs=[])

Calculates differentially expressed glycans or motifs from glycomics data

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
impute (bool): replaces zeroes with a Random Forest based model; default:True
sets (bool): whether to identify clusters of highly correlated glycans/motifs to test for differential expression; default:False
set_thresh (float): correlation value used as a threshold for clusters; only used when sets=True; default:0.9
effect_size_variance (bool): whether effect size variance should also be calculated/estimated; default:False
min_samples (int): How many samples per group need to have non-zero values for glycan to be kept; default: at least half per group
grouped_BH (bool): whether to perform two-stage adaptive Benjamini-Hochberg as a grouped multiple testing correction; will SIGNIFICANTLY increase runtime; default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns a dataframe with:
(i) Differentially expressed glycans/motifs/sets
(ii) Their mean abundance across all samples in group1 + group2
(iii) Log2-transformed fold change of group2 vs group1 (i.e., negative = lower in group2)
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean
(v) Corrected p-values (Welch’s t-test with Benjamini-Hochberg correction) for difference in mean
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold
(vii) Corrected p-values (Levene’s test for equality of variances with Benjamini-Hochberg correction) for difference in variance
(viii) Effect size as Cohen’s d (sets=False) or Mahalanobis distance (sets=True)
(xi) [only if effect_size_variance=True] Effect size variance
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P',
           'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
label2 = [2.952, 2.011, 0.456, 4.006, 0.0]
label3 = [3.88, 1.771, 0.811, 3.562, 0.073]
label4 = [0.134, 0.345, 1.15, 0.233, 2.981]
label5 = [0.334, 0.245, 1.55, 0.133, 2.581]
label6 = [0.234, 0.423, 1.733, 0.102, 2.108]
test_df = pd.DataFrame([glycans, label, label2, label3, label4, label5, label6]).T

res = get_differential_expression(test_df, group1 = [4,5,6], group2 = [1,2,3], motifs = True, impute = True)
res
You're working with an alpha of 0.07862467893233027 that has been adjusted for your sample size of 6.
Glycan Mean abundance Log2FC p-val corr p-val significant corr Levene p-val Effect size
5 GlcNAc 9.587462 1.825183 2.469905e-07 0.000003 True 0.971435 78.585109
1 GlcNAc(b1-4)GlcNAc 4.793731 1.825183 1.385197e-05 0.000090 True 0.971435 27.336027
3 Man(a1-3)Man 6.144649 1.574705 2.879608e-04 0.001248 True 0.971435 20.479380
0 core_fucose(a1-3) 1.739635 2.038186 4.174583e-04 0.001357 True 0.971435 8.916848
9 Man 20.137125 1.653026 6.770485e-04 0.001760 True 0.971435 12.113443
8 betaGlucan 3.883061 -4.345913 1.386085e-03 0.003003 True 0.971435 -7.288949
7 Man(a1-?)Man 15.343394 1.601387 2.067233e-03 0.003833 True 0.971435 11.849348
11 Glc(b1-3)Glc 7.766123 -4.345913 2.358767e-03 0.003833 True 0.971435 -7.483606
10 Kdo 7.275312 -2.944967 3.000722e-03 0.004334 True 0.971435 -5.255214
6 Kdo(a2-?)Kdo 4.850208 -2.944967 4.905883e-03 0.006378 True 0.971435 -4.640157
12 Glc 11.649184 -4.345913 6.515318e-03 0.007700 True 0.971435 -6.980519
2 Man(a1-2)Man 4.405014 1.411105 7.118918e-03 0.007712 True 0.971435 8.139906
4 GalNAc(a1-4)GlcNAcA 2.425104 -2.944967 2.127494e-02 0.021275 True 0.971435 -3.175931

get_volcano

 get_volcano (df_res, y_thresh=0.05, x_thresh=1.0, label_changed=True,
              x_metric='Log2FC', annotate_volcano=False, filepath='')

Plots glycan differential expression results in a volcano plot

Arguments:
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx]
y_thresh (float): corr p threshhold for labeling datapoints; default:0.05
x_thresh (float): absolute x metric threshold for labeling datapoints; default:1.0
label_changed (bool): if True, add text labels to significantly up- and downregulated datapoints; default:True
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’
annotate_volcano (bool): whether to annotate the dots in the plot with SNFG images; default: False
filepath (string): absolute path including full filename allows for saving the plot
Returns:
Prints volcano plot
get_volcano(res)


get_coverage

 get_coverage (df, filepath='')

Plot glycan coverage across samples, ordered by average intensity

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
filepath (string): absolute path including full filename allows for saving the plot
Returns:
Prints the heatmap
test_df = pd.concat([test_df.iloc[:, 0], test_df[test_df.columns[1:]].astype(float)], axis = 1)

get_coverage(test_df)


get_pca

 get_pca (df, groups=None, motifs=False, feature_set=['known',
          'exhaustive'], pc_x=1, pc_y=2, color=None, shape=None,
          filepath='', custom_motifs=[])

PCA plot from glycomics abundance dataframe

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
groups (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3]); default:None
alternatively: design dataframe with ‘id’ column of samples names and additional columns with meta information
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
pc_x (int): principal component to plot on x axis; default:1
pc_y (int): principal component to plot on y axis; default:2
color (string): if design dataframe is provided: column name for color grouping; default:None
shape (string): if design dataframe is provided: column name for shape grouping; default:None
filepath (string): absolute path including full filename allows for saving the plot
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Prints PCA plot
get_pca(test_df, motifs = True, groups = [1,1,1,2,2,2])


get_pval_distribution

 get_pval_distribution (df_res, filepath='')

p-value distribution plot of glycan differential expression result

Arguments:
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv]
filepath (string): absolute path including full filename allows for saving the plot
Returns:
prints p-value distribution plot
get_pval_distribution(res)


get_ma

 get_ma (df_res, log2fc_thresh=1, sig_thresh=0.05, filepath='')

MA plot of glycan differential expression result

Arguments:
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx]
log2fc_thresh (int): absolute Log2FC threshold for highlighting datapoints
sig_thresh (int): significance threshold for highlighting datapoints
filepath (string): absolute path including full filename allows for saving the plot
Returns:
prints MA plot
get_ma(res)


get_glycanova

 get_glycanova (df, groups, impute=True, motifs=False,
                feature_set=['exhaustive', 'known'], min_samples=None,
                posthoc=True, custom_motifs=[])

Calculate an ANOVA for each glycan (or motif) in the DataFrame

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group_sizes (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3])
impute (bool): replaces zeroes with with a Random Forest based model; default:True
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
min_samples (int): How many samples per group need to have non-zero values for glycan to be kept; default: at least half per group
posthoc (bool): whether to do Tukey’s HSD test post-hoc to find out which differences were significant; default:True
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
(i) a pandas DataFrame with an F statistic, corrected p-value, and indication of its significance for each glycan.
(ii) a dictionary of type glycan : pandas DataFrame, with post-hoc results for each glycan with a significant ANOVA.
test_df['label_7'] = [0.234, 0.023, 5.733, 8.102, 0.308]
test_df['label_8'] = [0.177, 0.009, 6.105, 5.549, 0.278]
test_df['label_9'] = [0.511, 0.011, 4.998, 7.005, 0.414]

anv, ph = get_glycanova(test_df, [1,1,1,2,2,2,3,3,3], motifs = True)
anv
You're working with an alpha of 0.0694557066556809 that has been adjusted for your sample size of 9.
Glycan F statistic corr p-val significant
10 GlcNAc 735.493169 8.715006e-07 True
8 GlcNAc(b1-4)GlcNAc 464.897344 1.713268e-06 True
9 Man(a1-3)Man 286.009486 4.846738e-06 True
11 Man(a1-?)Man 124.016453 4.073794e-05 True
12 Man 116.889001 4.073794e-05 True
0 betaGlucan 78.483797 8.679929e-05 True
6 core_fucose(a1-3) 77.931941 8.679929e-05 True
7 Man(a1-2)Man 76.658780 8.679929e-05 True
1 Glc(b1-3)Glc 67.371670 1.119105e-04 True
2 Glc 56.146940 1.696330e-04 True
5 Kdo 32.477874 7.145649e-04 True
4 Kdo(a2-?)Kdo 27.295755 1.051915e-03 True
3 GalNAc(a1-4)GlcNAcA 18.383777 2.761281e-03 True

get_meta_analysis

 get_meta_analysis (effect_sizes, variances, model='fixed', filepath='',
                    study_names=[])

Fixed-effects model or random-effects model for meta-analysis of glycan effect sizes

Arguments:
effect_sizes (array-like): Effect sizes (e.g., Cohen’s d) from each study
variances (array-like): Corresponding effect size variances from each study
model (string): Whether to use ‘fixed’ or ‘random’ effects model
filepath (string): absolute path including full filename allows for saving the Forest plot
study_names (list): list of strings indicating the name of each study
Returns:
(1) The combined effect size
(2) The p-value for the combined effect size
get_meta_analysis([-8.759, -6.363, -5.199, -3.952],
                 [7.061, 4.041, 2.919, 1.968])
(-5.326913553837341, 3.005077298112724e-09)

get_time_series

 get_time_series (df, impute=True, motifs=False, feature_set=['known',
                  'exhaustive'], degree=1, min_samples=None,
                  custom_motifs=[])

Analyzes time series data of glycans using an OLS model

Arguments:
df (dataframe): dataframe containing sample IDs of style sampleID_UnitTimepoint_replicate (e.g., T1_h5_r1) in first column and glycan relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
impute (bool): replaces zeroes with a Random Forest based model; default:True
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
degree (int): degree of the polynomial for regression, default:1 for linear regression
min_samples (int): How many samples per group need to have non-zero values for glycan to be kept; default: at least half per group
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns a dataframe with:
(i) Glycans/motifs potentially exhibiting significant changes over time
(ii) The slope of their expression curve over time
(iii) Uncorrected p-values (t-test) for testing whether slope is significantly different from zero
(iv) Corrected p-values (t-test with Benjamini-Hochberg correction) for testing whether slope is significantly different from zero
(v) Significance: True/False whether the corrected p-value lies below the sample size-appropriate significance threshold
t_dic = {}
t_dic["ID"] = ["D1_h5_r1", "D1_h5_r2", "D1_h5_r3", "D1_h10_r1", "D1_h10_r2", "D1_h10_r3", "D1_h15_r1", "D1_h15_r2", "D1_h15_r3"]
t_dic["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc"] = [0.33, 0.31, 0.35, 1.51, 1.57, 1.66, 2.11, 2.04, 2.09]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [0.78, 1.01, 0.98, 0.88, 1.11, 0.72, 1.22, 1.00, 0.54]
get_time_series(pd.DataFrame(t_dic))
You're working with an alpha of 0.0694557066556809 that has been adjusted for your sample size of 9.
Glycan Change p-val corr p-val significant
1 Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]Ga... 5.326852 0.002697 0.005394 True
0 Fuc(a1-2)Gal(b1-3)GalNAc -2.030518 0.328428 0.328428 False

get_jtk

 get_jtk (df_in, timepoints, periods, interval, motifs=False,
          feature_set=['known', 'exhaustive', 'terminal'],
          custom_motifs=[])

Detecting rhythmically expressed glycans via the Jonckheere–Terpstra–Kendall (JTK) algorithm

Arguments:
df_in (pd.DataFrame): A dataframe containing data for analysis. [alternative: filepath to .csv or .xlsx]
(column 0 = molecule IDs, then arranged in groups and by ascending timepoints)
timepoints (int): number of timepoints in the experiment (each timepoint must have the same number of replicates).
periods (list): number of timepoints (as int) per cycle.
interval (int): units of time (Arbitrary units) between experimental timepoints.
motifs (bool): a flag for running structural of motif-based analysis (True = run motif analysis); default:False.
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns a pandas dataframe containing the adjusted p-values, and most important waveform parameters for each
molecule in the analysis.
t_dic = {}
t_dic["Neu5Ac(a2-3)Gal(b1-3)GalNAc"] = [0.433138901, 0.149729209, 0.358018822, 0.537641256, 1.526963756, 1.349986672, 0.75156406, 0.736710183]
t_dic["Gal(b1-3)GalNAc"] = [0.919762334, 0.760237184, 0.725566662, 0.459945797, 0.523801515, 0.695106926, 0.627632047, 1.183511209]
t_dic["Gal(b1-3)[Neu5Ac(a2-6)]GalNAc"] = [0.533138901, 0.119729209, 0.458018822, 0.637641256, 1.726963756, 1.249986672, 0.55156406, 0.436710183]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [3.862169504, 5.455032837, 3.858163289, 5.614650335, 3.124254095, 4.189550337, 4.641831312, 4.19538484]
tps = 8  # number of timepoints in experiment
periods = [8]  # number of timepoints per cycle
interval = 3  # units of time between experimental timepoints
t_df = pd.DataFrame(t_dic).T
t_df.columns = ["T3", "T6", "T9", "T12", "T15", "T18", "T21", "T24"]
get_jtk(t_df.reset_index(), tps, periods, interval)
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Molecule_Name BH_Q_Value Adjusted_P_value Period_Length Lag_Phase Amplitude significant
0 Neu5Ac(a2-3)Gal(b1-3)GalNAc 0.055556 0.013889 24.0 16.5 0.357084 True
2 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0.075397 0.044048 24.0 13.5 0.101473 True
1 Gal(b1-3)GalNAc 0.075397 0.056548 24.0 22.5 0.127140 True
3 Fuc(a1-2)Gal(b1-3)GalNAc 1.000000 1.000000 24.0 0.0 0.546986 False
get_jtk(t_df.reset_index(), tps, periods, interval, motifs = True, feature_set = ['terminal'])
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Molecule_Name BH_Q_Value Adjusted_P_value Period_Length Lag_Phase Amplitude significant
0 Neu5Ac(a2-3) 0.034722 0.013889 24.0 16.5 0.357084 True
2 Neu5Ac(a2-?) 0.034722 0.013889 0.0 0.0 0.000000 True
1 Neu5Ac(a2-6) 0.073413 0.044048 24.0 13.5 0.101473 True
3 Gal(b1-3) 0.543403 0.434722 24.0 16.5 0.208071 False
4 Fuc(a1-2) 1.000000 1.000000 24.0 0.0 0.546986 False

get_biodiversity

 get_biodiversity (df, group1, group2, motifs=False,
                   feature_set=['exhaustive', 'known'], paired=False,
                   custom_motifs=[])

Calculates diversity indices from glycomics data, similar to alpha diversity etc in microbiome data

Arguments:
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx]
group1 (list): list of column indices or names for the first group of samples, usually the control
group2 (list): list of column indices or names for the second group of samples
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns a dataframe with:
(i) Diversity indices/metrics
(ii) Mean value of diversity metrics in group 1
(iii) Mean value of diversity metrics in group 2
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean
(v) Corrected p-values (Welch’s t-test with Benjamini-Hochberg correction) for difference in mean
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold
(vii) Effect size as Cohen’s d
res = get_biodiversity(test_df, group1 = [4,5,6], group2 = [1,2,3], motifs = True)
res
You're working with an alpha of 0.07862467893233027 that has been adjusted for your sample size of 6.
Metric Group1 mean Group2 mean p-val corr p-val significant Effect size
1 shannon_diversity 2.278677 1.855369 0.000420 0.001261 True -8.941867
2 simpson_diversity 0.876248 0.804112 0.002471 0.003706 True -8.011404
0 richness 13.000000 12.000000 0.422650 0.422650 False -0.816497

annotate

extract curated motifs, graph features, and sequence features from glycan sequences


annotate_glycan

 annotate_glycan (glycan, motifs=None, termini_list=[], gmotifs=None)

searches for known motifs in glycan sequence

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent
motifs (dataframe): dataframe of glycan motifs (name + sequence), can be used with a list of glycans too; default:motif_list
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
gmotifs (networkx): precalculated motif graphs for speed-up; default:None
Returns:
Returns dataframe with counts of motifs in glycan
annotate_glycan("Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
motif_name Terminal_LewisX Internal_LewisX LewisY SialylLewisX SulfoSialylLewisX Terminal_LewisA Internal_LewisA LewisB SialylLewisA SulfoLewisA ... Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX Difucosylated_core GalFuc_core
Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc 0 1 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

1 rows × 156 columns


annotate_dataset

 annotate_dataset (glycans, motifs=None, feature_set=['known'],
                   termini_list=[], condense=False, custom_motifs=[])

wrapper function to annotate motifs in list of glycans

Arguments:
glycans (list): list of IUPAC-condensed glycan sequences as strings
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features),
termini_list (list): list of monosaccharide/linkage positions (from ‘terminal’, ‘internal’, and ‘flexible’)
condense (bool): if True, throws away columns with only zeroes; default:False
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns dataframe of glycans (rows) and presence/absence of known motifs (columns)
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P']
print("Annotate Test")
out = annotate_dataset(glycans)
Annotate Test
  Terminal_LewisX Internal_LewisX LewisY SialylLewisX SulfoSialylLewisX Terminal_LewisA Internal_LewisA LewisB SialylLewisA SulfoLewisA H_type2 H_type1 A_antigen B_antigen Galili_antigen GloboH Gb5 Gb4 Gb3 3SGb3 8DSGb3 3SGb4 8DSGb4 6DSGb4 3SGb5 8DSGb5 6DSGb5 6DSGb5_2 6SGb3 8DSGb3_2 6SGb4 8DSGb4_2 6SGb5 8DSGb5_2 66DSGb5 Forssman_antigen iGb3 I_antigen i_antigen PI_antigen Chitobiose Trimannosylcore Internal_LacNAc_type1 Terminal_LacNAc_type1 Internal_LacNAc_type2 Terminal_LacNAc_type2 Internal_LacdiNAc_type1 Terminal_LacdiNAc_type1 Internal_LacdiNAc_type2 Terminal_LacdiNAc_type2 bisectingGlcNAc VIM PolyLacNAc Ganglio_Series Lacto_Series(LewisC) NeoLacto_Series betaGlucan KeratanSulfate Hyluronan Mollu_series Arthro_series Cellulose_like Chondroitin_4S GPI_anchor Isoglobo_series LewisD Globo_series Sda SDA Muco_series Heparin Peptidoglycan Dermatansulfate CAD Lactosylceramide Lactotriaosylceramide LexLex GM3 H_type3 GM2 GM1 cisGM1 VIM2 GD3 GD1a GD2 GD1b SDLex Nglycolyl_GM2 Fuc_LN3 GT1b GD1 GD1a_2 LcGg4 GT3 Disialyl_T_antigen GT1a GT2 GT1c 2Fuc_GM1 GQ1c O_linked_mannose GT1aa GQ1b HNK1 GQ1ba O_mannose_Lex 2Fuc_GD1b Sialopentaosylceramide Sulfogangliotetraosylceramide B-GM1 GQ1aa bisSulfo-Lewis x para-Forssman core_fucose core_fucose(a1-3) GP1c B-GD1b GP1ca Isoglobotetraosylceramide polySia high_mannose Gala_series LPS_core Nglycan_complex Nglycan_complex2 Oglycan_core1 Oglycan_core2 Oglycan_core3 Oglycan_core4 Oglycan_core5 Oglycan_core6 Oglycan_core7 Xylogalacturonan Sialosylparagloboside LDNF OFuc Arabinogalactan_type2 EGF_repeat Nglycan_hybrid Arabinan Xyloglucan Acharan_Sulfate M3FX M3X 1-6betaGalactan Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX Difucosylated_core GalFuc_core
Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quantify_motifs

 quantify_motifs (df, glycans, feature_set, custom_motifs=[])

Extracts and quantifies motifs for a dataset

Arguments:
df (dataframe): dataframe containing relative abundances (each sample one column) [alternative: filepath to .csv or .xlsx]
glycans(list): glycans as IUPAC-condensed strings
feature_set (list): which feature set to use for annotations, add more to list to expand; default is [‘exhaustive’,‘known’]; options are: ‘known’ (hand-crafted glycan features),
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty
Returns:
Returns a pandas DataFrame with motifs as columns and samples as rows
quantify_motifs(test_df.iloc[:, 1:], test_df.iloc[:, 0].values.tolist(), ['known', 'exhaustive'])
Chitobiose Trimannosylcore betaGlucan core_fucose(a1-3) M3FX Fuc GalNAc Glc GlcN GlcN4P ... GalNAc(a1-4)GlcNAcA GlcNAcA(a1-4)Kdo GlcN(b1-7)Kdo Kdo(a2-5)Kdo Kdo(a2-4)Kdo Kdo(a2-6)GlcN4P GlcN4P(b1-6)GlcN4P Glc(b1-3)Glc Man(a1-?)Man Kdo(a2-?)Kdo
1 8.759 8.759 0.108 3.234 3.234 3.234 0.733 0.324 0.733 1.466 ... 0.733 0.733 0.733 0.733 0.733 0.733 0.733 0.216 27.889 1.466
2 8.969 8.969 0.000 2.952 2.952 2.952 0.456 0.000 0.456 0.912 ... 0.456 0.456 0.456 0.456 0.456 0.456 0.456 0.000 27.977 0.912
3 9.213 9.213 0.073 3.880 3.880 3.880 0.811 0.219 0.811 1.622 ... 0.811 0.811 0.811 0.811 0.811 0.811 0.811 0.146 27.301 1.622
4 0.712 0.712 2.981 0.134 0.134 0.134 1.150 8.943 1.150 2.300 ... 1.150 1.150 1.150 1.150 1.150 1.150 1.150 5.962 2.692 2.300
5 0.712 0.712 2.581 0.334 0.334 0.334 1.550 7.743 1.550 3.100 ... 1.550 1.550 1.550 1.550 1.550 1.550 1.550 5.162 2.292 3.100
6 0.759 0.759 2.108 0.234 0.234 0.234 1.733 6.324 1.733 3.466 ... 1.733 1.733 1.733 1.733 1.733 1.733 1.733 4.216 2.889 3.466
label_7 8.359 8.359 0.308 0.234 0.234 0.234 5.733 0.924 5.733 11.466 ... 5.733 5.733 5.733 5.733 5.733 5.733 5.733 0.616 24.889 11.466
label_8 5.735 5.735 0.278 0.177 0.177 0.177 6.105 0.834 6.105 12.210 ... 6.105 6.105 6.105 6.105 6.105 6.105 6.105 0.556 17.046 12.210
label_9 7.527 7.527 0.414 0.511 0.511 0.511 4.998 1.242 4.998 9.996 ... 4.998 4.998 4.998 4.998 4.998 4.998 4.998 0.828 22.092 9.996

9 rows × 32 columns


get_k_saccharides

 get_k_saccharides (glycans, size=2, up_to=False, just_motifs=False,
                    terminal=False)

function to retrieve k-saccharides (default:disaccharides) occurring in a list of glycans

Arguments:
glycans (list): list of glycans in IUPAC-condensed nomenclature
size (int): number of monosaccharides per -saccharide, default:2 (for disaccharides)
up_to (bool): in theory: include -saccharides up to size k; in practice: include monosaccharides; default:False
just_motifs (bool): if you only want the motifs as a nested list, no dataframe with counts; default:False
terminal (bool): whether to only count terminal subgraphs; default:False
Returns:
Returns dataframe with k-saccharide counts (columns) for each glycan (rows)
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P']
out = get_k_saccharides(glycans, size = 3)
  Man(a1-3)[Man(a1-6)]Man Man(a1-3)[Xyl(b1-2)]Man Man(a1-3)Man(b1-4)GlcNAc Man(a1-6)[Xyl(b1-2)]Man Man(a1-6)Man(b1-4)GlcNAc Xyl(b1-2)Man(b1-4)GlcNAc Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-2)Man(a1-2)Man Man(a1-2)Man(a1-3)Man Man(a1-3)Man(a1-6)Man GalNAc(a1-4)GlcNAcA(a1-4)Kdo GlcNAcA(a1-4)[GlcN(b1-7)]Kdo GlcNAcA(a1-4)Kdo(a2-5)Kdo GlcN(b1-7)Kdo(a2-5)Kdo ]Kdo(a2-5)[Kdo(a2-4)]Kdo Kdo(a2-5)Kdo(a2-6)GlcN4P Kdo(a2-4)Kdo(a2-6)GlcN4P Kdo(a2-6)GlcN4P(b1-6)GlcN4P Man(a1-?)[Xyl(b1-?)]Man Man(a1-?)Man(b1-?)GlcNAc Man(a1-?)Man(a1-?)Man Kdo(a2-?)Kdo(a2-?)GlcN4P
0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0
1 0 0 1 0 1 0 1 0 1 1 2 0 0 0 0 0 0 0 0 0 2 4 0
2 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 2

get_terminal_structures

 get_terminal_structures (glycan, size=1)

returns terminal structures from all non-reducing ends (monosaccharide+linkage)

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed nomenclature or as networkx graph
size (int): how large the extracted motif should be in terms of monosaccharides (for now 1 or 2 are supported;
Returns:
Returns a list of terminal structures (strings)
get_terminal_structures("Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc")
['Neu5Ac(a2-3)', 'Neu5Ac(a2-6)']

get_molecular_properties

 get_molecular_properties (glycan_list, verbose=False, placeholder=False)

given a list of glycans, uses pubchempy to return various molecular parameters retrieved from PubChem

Arguments:
glycan_list (list): list of glycans in IUPAC-condensed
verbose (bool): set True to print SMILES not found on PubChem; default:False
placeholder (bool): whether failed requests should return dummy values or be dropped; default:False
Returns:
Returns a dataframe with all the molecular parameters retrieved from PubChem
out = get_molecular_properties(["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"])
  complexity h_bond_acceptor_count defined_bond_stereo_count monoisotopic_mass atom_stereo_count isotope_atom_count undefined_bond_stereo_count covalent_unit_count undefined_atom_stereo_count rotatable_bond_count charge heavy_atom_count defined_atom_stereo_count xlogp h_bond_donor_count bond_stereo_count exact_mass tpsa molecular_weight
Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 4410 62 0 2222.7830048 57 0 0 1 1 43 0 152 56 -23.600000 39 0 2222.7830048 1070 2224.0

graph

convert glycan sequences to graphs and contains helper functions to search for motifs / check whether two sequences describe the same sequence, etc.


glycan_to_nxGraph

 glycan_to_nxGraph (glycan, libr=None, termini='ignore',
                    termini_list=None)

wrapper for converting glycans into networkx graphs; also works with floating substituents

Arguments:
glycan (string): glycan in IUPAC-condensed format
libr (dict): dictionary of form glycoletter:index
termini (string): whether to encode terminal/internal position of monosaccharides, ‘ignore’ for skipping, ‘calc’ for automatic annotation, or ‘provided’ if this information is provided in termini_list; default:‘ignore’
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
Returns:
Returns networkx graph object of glycan
print('Glycan to networkx Graph (only edges printed)')
print(glycan_to_nxGraph('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc').edges())
Glycan to networkx Graph (only edges printed)
[(0, 1), (1, 4), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 10), (8, 9), (9, 10)]

graph_to_string

 graph_to_string (graph)

converts glycan graph back to IUPAC-condensed format

Arguments:
graph (networkx object): glycan graph
Returns:
Returns glycan in IUPAC-condensed format (string)
graph_to_string(glycan_to_nxGraph('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'

compare_glycans

 compare_glycans (glycan_a, glycan_b, wildcards_ptm=False)

returns True if glycans are the same and False if not

Arguments:
glycan_a (string or networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object
glycan_b (stringor networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object
wildcards_ptm (bool): set to True to allow modification wildcards (e.g., ‘OS’ matching with ‘6S’):False
Returns:
Returns True if two glycans are the same and False if not
print("Graph Isomorphism Test")
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                      'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
Graph Isomorphism Test
True

subgraph_isomorphism

 subgraph_isomorphism (glycan, motif, termini_list=[], count=False,
                       wildcards_ptm=False, return_matches=False)

returns True if motif is in glycan and False if not

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format or as graph in NetworkX format
motif (string or networkx): glycan motif in IUPAC-condensed format or as graph in NetworkX format
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’)
count (bool): whether to return the number or absence/presence of motifs; default:False
wildcards_ptm (bool): set to True to allow modification wildcards (e.g., ‘OS’ matching with ‘6S’); default:False
return_matches (bool): whether the matched subgraphs in input glycan should be returned as node lists as an additional output; default:False
Returns:
Returns True if motif is in glycan and False if not
print("Subgraph Isomorphism Test")
print(subgraph_isomorphism('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                           'Fuc(a1-6)GlcNAc'))
Subgraph Isomorphism Test
True

generate_graph_features

 generate_graph_features (glycan, glycan_graph=True, label='network')

compute graph features of glycan

Arguments:
glycan (string or networkx object): glycan in IUPAC-condensed format (or glycan network if glycan_graph=False)
glycan_graph (bool): True expects a glycan, False expects a network (from construct_network); default:True
label (string): Label to place in output dataframe if glycan_graph=False; default:‘network’
Returns:
Returns a pandas dataframe with different graph features as columns and glycan as row
generate_graph_features("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
diameter branching nbrLeaves avgDeg varDeg maxDeg nbrDeg4 max_deg_leaves mean_deg_leaves deg_assort ... flow_edgeMax flow_edgeMin flow_edgeAvg flow_edgeVar secorderMax secorderMin secorderAvg secorderVar egap entropyStation
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc 8 1 3 1.818182 0.330579 3.0 0 3.0 3.0 -1.850372e-15 ... 0.333333 0.111111 0.217778 0.007289 45.607017 20.736441 31.679285 62.422895 0.026397 -2.35847

1 rows × 49 columns


largest_subgraph

 largest_subgraph (glycan_a, glycan_b)

find the largest common subgraph of two glycans

Arguments:
glycan_a (string or networkx): glycan in IUPAC-condensed format or as networkx graph
glycan_b (string or networkx): glycan in IUPAC-condensed format or as networkx graph
Returns:
Returns the largest common subgraph as a string in IUPAC-condensed; returns empty string if there is no common subgraph
glycan1 = 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
glycan2 = 'Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
largest_subgraph(glycan1, glycan2)
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'

ensure_graph

 ensure_graph (glycan, **kwargs)

ensures function compatibility with string glycans and graph glycans

Arguments:
glycan (string or networkx graph): glycan in IUPAC-condensed format or as a networkx graph
**kwargs: keyword arguments that are directly passed on to glycan_to_nxGraph
Returns:
Returns networkx graph object of glycan
ensure_graph("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc")
<networkx.classes.graph.Graph>

get_possible_topologies

 get_possible_topologies (glycan, exhaustive=False)

creates possible glycans given a floating substituent; only works with max one floating substituent

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format or as networkx graph
exhaustive (bool): whether to also allow additions at internal positions; default:False
Returns:
Returns list of NetworkX-like glycan graphs of possible topologies

possible_topology_check

 possible_topology_check (glycan, glycans, exhaustive=False, **kwargs)

checks whether glycan with floating substituent could match glycans from a list; only works with max one floating substituent

Arguments:
glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent
glycans (list): list of glycans in IUPAC-condensed format (or networkx graphs; should not contain floating substituents)
exhaustive (bool): whether to also allow additions at internal positions; default:False
**kwargs: keyword arguments that are directly passed on to compare_glycans
Returns:
Returns list of glycans that could match input glycan
possible_topology_check("{Neu5Ac(a2-3)}Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc",
                       ["Fuc(a1-2)Gal(b1-3)GalNAc", "Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc",
                       "Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc"])
['Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc']

processing

process IUPAC-condensed glycan sequences into glycoletters etc.


min_process_glycans

 min_process_glycans (glycan_list)

converts list of glycans into a nested lists of glycoletters

Arguments:
glycan_list (list): list of glycans in IUPAC-condensed format as strings
Returns:
Returns list of glycoletter lists
min_process_glycans(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
                     'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
[['Man', 'a1-3', 'Man', 'a1-6', 'Man', 'b1-4', 'GlcNAc', 'b1-4', 'GlcNAc'],
 ['Man',
  'a1-2',
  'Man',
  'a1-3',
  'Man',
  'a1-6',
  'Man',
  'b1-4',
  'GlcNAc',
  'b1-4',
  'GlcNAc']]

get_lib

 get_lib (glycan_list)

returns dictionary of form glycoletter:index

Arguments:
glycan_list (list): list of IUPAC-condensed glycan sequences as strings
Returns:
Returns dictionary of form glycoletter:index
get_lib(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
                     'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5}

expand_lib

 expand_lib (libr, glycan_list)

updates libr with newly introduced glycoletters

Arguments:
libr (dict): dictionary of form glycoletter:index
glycan_list (list): list of IUPAC-condensed glycan sequences as strings
Returns:
Returns new lib
lib1 = get_lib(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
                     'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
lib2 = expand_lib(lib1, ['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'])
lib2
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5, 'Fuc': 6}

presence_to_matrix

 presence_to_matrix (df, glycan_col_name='glycan',
                     label_col_name='Species')

converts a dataframe such as df_species to absence/presence matrix

Arguments:
df (dataframe): dataframe with glycan occurrence, rows are glycan-label pairs
glycan_col_name (string): column name under which glycans are stored; default:glycan
label_col_name (string): column name under which labels are stored; default:Species
Returns:
Returns pandas dataframe with labels as rows and glycan occurrences as columns
out = presence_to_matrix(df_species[df_species.Order == 'Fabales'].reset_index(drop = True),
                         label_col_name = 'Family')
glycan Apif(a1-2)Xyl(b1-2)[Glc6Ac(b1-4)]Glc Ara(a1-2)Ara(a1-6)GlcNAc Ara(a1-2)Glc(b1-2)Ara Ara(a1-2)GlcA Ara(a1-2)[Glc(b1-6)]Glc Ara(a1-6)Glc Araf(a1-3)Araf(a1-5)[Araf(a1-6)Gal(b1-6)Glc(b1-6)Man(a1-3)]Araf(a1-5)Araf(a1-3)Araf(a1-3)Araf Araf(a1-3)Gal(b1-6)Gal D-Apif(b1-2)Glc D-Apif(b1-2)GlcA D-Apif(b1-3)Xyl(b1-2)[Glc6Ac(b1-4)]Glc D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)Ara D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)D-Fuc D-Apif(b1-3)Xyl(b1-4)[Glc(b1-3)]Rha(a1-2)D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)[Rha(a1-3)]D-Fuc D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-3)D-Fuc D-Apif(b1-6)Glc D-ApifOMe(b1-3)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe D-ApifOMe(b1-3)XylOMe(b1-4)[GlcOMe(b1-3)]RhaOMe(a1-2)D-FucOMe Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc6Ac Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)Glc3Ac6Ac Fruf(b2-1)Glc4Ac6Ac Fruf(b2-1)Glc6Ac Fruf(b2-1)[Glc(b1-2)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)Glc(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac Fruf(b2-1)[Glc3Ac(b1-2)]Glc Fruf(b2-1)[Glc6Ac(b1-2)]Glc Fruf1Ac(b2-1)Glc2Ac4Ac6Ac Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc(b1-4)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)]Glc(b1-4)Glc Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)Glc Fuc(a1-2)Gal(b1-4)Xyl Fuc(a1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Fuc(a1-6)GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(?1-?)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Man(a1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(?1-?)[Gal(?1-?)]GlcNAc(?1-?)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(?1-?)Man(a1-3)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(a1-4)Gal Gal(a1-6)Gal Gal(a1-6)Gal(a1-6)Gal Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)[Fruf(b2-1)]Glc Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Gal(a1-6)Glc Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Glc(a1-2)Fruf Gal(a1-6)Man Gal(a1-6)Man(b1-4)Man Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)Man Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Gal(a1-6)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Gal(a1-6)Man(b1-4)[Gal(a1-6)]Man Gal(b1-2)GlcA Gal(b1-2)GlcA6Me Gal(b1-2)Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Gal(b1-2)[Xyl(b1-3)]GlcA Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Gal(b1-3)GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[GlcNAc(b1-4)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(a1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Gal(b1-3)GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-3)[Fuc(a1-6)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Gal(b1-4)Gal(b1-4)Man Gal(b1-4)Gal(b1-4)ManOMe Gal(b1-4)GlcA Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Gal(b1-4)Man(b1-4)Man Gal(b1-4)Man(b1-4)Man(b1-4)Gal Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1FerOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-FucOMeOSin Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc1CoumOMe Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GalA(a1-2)[Araf(a1-5)Araf(a1-4)]Rha(b1-4)GalA GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)GalA GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA GalOMe(b1-2)[XylOMe(b1-3)]GlcAOMe GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe GalOMe(b1-4)XylOMe(b1-4)[D-ApifOMe(b1-3)]RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe Galf(b1-2)[Galf(b1-4)]Man Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Glc(a1-2)Rha(a1-6)Glc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Glc(a1-4)Glc(a1-2)Rha(a1-6)Glc Glc(a1-4)Glc(a1-4)Glc(a1-6)Glc Glc(a1-4)Glc(a1-4)GlcA Glc(a1-4)GlcA(b1-2)GlcA Glc(b1-2)Ara Glc(b1-2)Ara(a1-2)GlcA Glc(b1-2)Gal(b1-2)Gal(b1-2)GlcA Glc(b1-2)Gal(b1-2)GlcA Glc(b1-2)Gal(b1-2)GlcA(b1-3)[Glc(b1-3)]Ara Glc(b1-2)Glc Glc(b1-2)Glc(a1-2)FrufOBzOCin Glc(b1-2)Glc(b1-2)Glc Glc(b1-2)GlcA Glc(b1-2)[Ara(a1-3)]GlcA6Me Glc(b1-2)[Ara(a1-3)]GlcAOMe Glc(b1-2)[Ara(a1-6)]Glc Glc(b1-2)[Glc(b1-3)]Glc(a1-2)Fruf Glc(b1-2)[Glc(b1-3)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-2)[Glc6Ac(b1-3)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-2)[Rha(a1-3)]GlcA Glc(b1-2)[Xyl(b1-2)Ara(a1-6)]Glc Glc(b1-2)[Xyl(b1-2)D-Fuc(b1-6)]Glc Glc(b1-3)Ara Glc(b1-3)Glc Glc(b1-3)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-3)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz Glc(b1-3)[Araf(a1-4)]Rha(a1-2)Glc Glc(b1-3)[Xyl(b1-4)]Rha(a1-2)D-FucOMe Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc(b1-4)Glc Glc(b1-4)Glc(b1-4)Glc(b1-4)Man Glc(b1-4)Glc6Ac(b1-3)Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz Glc(b1-4)Man(b1-4)Glc Glc(b1-4)Rha Glc(b1-4)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Glc(b1-6)Glc(b1-3)Glc Glc1Cer Glc2Ac(b1-4)[D-Apif(b1-3)Xyl(b1-2)]Glc Glc2Ac3Ac4Ac6Ac(b1-3)Ara Glc6Ac(b1-2)Glc(a1-2)FrufOBzOCin Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)][RhaOAc(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum(a1-2)Fruf1CoumOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz GlcA(b1-2)Glc GlcA(b1-2)GlcA GlcA(b1-2)GlcA(b1-2)Rha GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl GlcA4Me(a1-2)[Xyl(b1-4)]Xyl GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Gal(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcNAc(b1-?)Man(a1-3)[GlcNAc(b1-?)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc GlcOMe(b1-3)[XylOMe(b1-4)]RhaOMe(a1-2)D-FucOMe Glcf(b1-2)Xyl(b1-4)Rha(b1-4)[Xyl(b1-3)]Xyl Hexf(?1-?)Xyl(b1-4)Rha(b1-4)[Xyl(a1-3)]Xyl L-Lyx(a1-2)Ara(a1-2)GlcA Lyx(a1-2)Ara(a1-2)GlcA Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-2)[Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-2)[Man(a1-3)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Man(a1-6)][Xylf(a1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc-ol Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]Hex Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)ManNAc Man(a1-3)[Xylf(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-?)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc Man(a1-?)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Man(b1-2)Man Man(b1-4)Gal(b1-4)Gal(b1-4)Man Man(b1-4)Gal(b1-4)Gal(b1-4)ManOMe Man(b1-4)Man Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-3)Gal(a1-3)Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man Man(b1-4)[Gal(a1-6)]Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man Man(b1-6)Glc Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Rha(a1-2)Ara Rha(a1-2)Ara(a1-2)GlcA Rha(a1-2)Ara(a1-2)GlcA6Me Rha(a1-2)Ara(a1-2)GlcAOMe Rha(a1-2)D-Ara(b1-2)GlcA Rha(a1-2)Gal(b1-2)Glc Rha(a1-2)Gal(b1-2)GlcA Rha(a1-2)Gal(b1-2)GlcA6Me Rha(a1-2)Gal(b1-2)GlcAOMe Rha(a1-2)Glc(b1-2)Glc Rha(a1-2)Glc(b1-2)GlcA Rha(a1-2)Glc(b1-2)GlcA6Me Rha(a1-2)Glc(b1-2)GlcAOMe Rha(a1-2)Glc(b1-6)Glc Rha(a1-2)GlcA(b1-2)GlcA Rha(a1-2)GlcAOMe(b1-2)GlcAOMe Rha(a1-2)Rha(a1-2)Gal(b1-4)[Glc(b1-2)]GlcA Rha(a1-2)Xyl Rha(a1-2)Xyl(b1-2)GlcA Rha(a1-2)Xyl(b1-2)GlcA6Me Rha(a1-2)Xyl(b1-2)GlcAOMe Rha(a1-2)Xyl3Ac Rha(a1-2)Xyl4Ac Rha(a1-2)[Glc(b1-3)]Glc Rha(a1-2)[Glc(b1-6)]Gal(b1-2)GlcA6Me Rha(a1-2)[Rha(a1-4)]Glc Rha(a1-2)[Rha(a1-6)]Gal Rha(a1-2)[Rha(a1-6)]Glc Rha(a1-2)[Xyl(b1-4)]Glc Rha(a1-2)[Xyl(b1-4)]Glc(b1-6)Glc Rha(a1-3)GlcA Rha(a1-4)Gal(b1-2)GlcA Rha(a1-4)Gal(b1-2)GlcAOMe Rha(a1-4)Gal(b1-2)GlcOMe Rha(a1-4)Gal(b1-4)Gal(b1-4)GalGro Rha(a1-4)Xyl(b1-2)Glc Rha(a1-4)Xyl(b1-2)GlcA Rha(a1-4)Xyl(b1-2)GlcAOMe Rha(a1-6)[Xyl(b1-3)Xyl(b1-2)]Glc(b1-2)Glc Rha(b1-2)Glc(b1-2)GlcA Rha1Fer(a1-4)Fruf(b2-1)GlcOBz RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol RhaOMe(a1-6)GlcOMe(b1-2)GlcOMe-ol Xyl(a1-6)Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol Xyl(b1-2)Ara(a1-6)Glc Xyl(b1-2)Ara(a1-6)GlcNAc Xyl(b1-2)Ara(a1-6)[Glc(b1-2)]Glc Xyl(b1-2)Ara(a1-6)[Glc(b1-4)]GlcNAc Xyl(b1-2)D-Fuc(b1-6)Glc Xyl(b1-2)D-Fuc(b1-6)GlcNAc Xyl(b1-2)D-Fuc(b1-6)[Glc(b1-2)]Glc Xyl(b1-2)Fuc(a1-6)Glc Xyl(b1-2)Fuc(a1-6)GlcNAc Xyl(b1-2)Gal(b1-2)GlcA6Me Xyl(b1-2)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)Rha(a1-2)Ara Xyl(b1-2)[Glc(b1-3)]Ara Xyl(b1-2)[Man(a1-3)][Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc Xyl(b1-2)[Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc Xyl(b1-2)[Rha(a1-3)]GlcA Xyl(b1-3)Ara Xyl(b1-3)Xyl(b1-2)[Rha(a1-6)]Glc(b1-2)Glc Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc(b1-2)Glc Xyl(b1-4)Rha(a1-2)Ara Xyl(b1-4)Rha(a1-2)D-Fuc Xyl(b1-4)Rha(a1-2)D-FucOMe Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl3Ac(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl3Ac(b1-4)Xyl Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl Xyl(b1-4)[GlcAOMe(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl Xyl2Ac3Ac4Ac(b1-3)Ara XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol XylOMe(b1-3)XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol Xylf(b1-2)Xyl(b1-3)[Rha(b1-2)Rha(b1-4)]Xyl [Araf(a1-3)Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal [Araf(a1-3)Gal(b1-6)]Gal(b1-3)Gal [Gal(a1-4)Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Man(b1-4)Man(b1-4)Man(b1-4)Gal(a1-6)]Man(b1-2)[Gal(a1-6)]Man(b1-2)[Gal(a1-4)Gal(a1-6)]Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man [Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Gal(b1-3)Gal(b1-6)[Araf(a1-3)]Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal [Gal(b1-6)]Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc
Family                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
Fabaceae 1 4 1 3 1 1 0 1 3 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 1 1 2 1 1 1 1 4 2 1 2 2 7 4 4 4 2 8 4 2 5 4 1 1 1 1 1 0 1 1 3 1 1 1 1 1 1 2 1 5 1 1 1 1 2 2 1 1 2 1 1 1 1 3 2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 3 1 1 0 1 2 1 1 2 0 0 0 1 1 1 4 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 1 0 0 0 0 0 1 2 0 1 1 1 5 1 1 0 0 0 0 0 0 0 1 3 1 0 0 0 1 1 4 6 1 1 1 1 2 1 1 1 3 1 1 3 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 3 2 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 4 6 4 4 4 1 1 5 4 1 4 1 1 0 1 1 1 7 1 1 2 3 22 6 7 1 8 3 4 1 3 1 1 1 2 2 2 1 1 1 1 1 0 2 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 2 1 2 2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 7 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 5 2 1 1 1 3 2 1 1 3 2 1 0 0 2 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Fagaceae 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Polygalaceae 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 2 2 1 1 1 1 2 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 2 2 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Quillajaceae 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

choose_correct_isoform

 choose_correct_isoform (glycans, reverse=False)

given a list of glycan branch isomers, this function returns the correct isomer

Arguments:
glycans (list): glycans in IUPAC-condensed nomenclature
reverse (bool): whether to return the correct isomer (False) or everything except the correct isomer (True); default:False
Returns:
Returns the correct isomer as a string (if reverse=False; otherwise it returns a list of strings)
choose_correct_isoform(["Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc",
                        "Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc"])
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'

enforce_class

 enforce_class (glycan, glycan_class, conf=None, extra_thresh=0.3)

given a glycan and glycan class, determines whether glycan is from this class

Arguments:
glycan (string): glycan in IUPAC-condensed nomenclature
glycan_class (string): glycan class in form of “O”, “N”, “free”, or “lipid”
conf (float): prediction confidence; can be used to override class
extra_thresh (float): threshold to override class; default:0.3
Returns:
Returns True if glycan is in glycan class and False if not
enforce_class("Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", "O")
False

IUPAC_to_SMILES

 IUPAC_to_SMILES (glycan_list)

given a list of IUPAC-condensed glycans, uses GlyLES to return a list of corresponding isomeric SMILES

Arguments:
glycan_list (list): list of IUPAC-condensed glycans
Returns:
Returns a list of corresponding isomeric SMILES
IUPAC_to_SMILES(['Neu5Ac(a2-3)Gal(b1-4)Glc'])
['O1C(O)[C@H](O)[C@@H](O)[C@H](O[C@@H]2O[C@H](CO)[C@H](O)[C@H](O[C@]3(C(=O)O)C[C@H](O)[C@@H](NC(C)=O)[C@H]([C@H](O)[C@H](O)CO)O3)[C@H]2O)[C@H]1CO']

canonicalize_composition

 canonicalize_composition (comp)

converts a composition from any common format into the dictionary that is optimized for glycowork

Arguments:
comp (string): composition formatted either in the style of HexNAc2Hex1Fuc3Neu5Ac1 or N2H1F3A1
Returns:
Returns composition as a dictionary of style monosaccharide : count
print(canonicalize_composition("HexNAc2Hex1Fuc3Neu5Ac1"))
print(canonicalize_composition("N2H1F3A1"))
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}

canonicalize_iupac

 canonicalize_iupac (glycan)

converts a glycan from IUPAC-extended, LinearCode, GlycoCT, and WURCS into the exact IUPAC-condensed version that is optimized for glycowork

Arguments:
glycan (string): glycan sequence; some rare post-biosynthetic modifications could still be an issue
Returns:
Returns glycan as a string in canonicalized IUPAC-condensed
print(canonicalize_iupac("NeuAc?1-36SGalb1-4GlcNACb1-6(Fuc?1-2Galb1-4GlcNacb1-3Galb1-3)GalNAc-sp3"))
print(canonicalize_iupac("WURCS=2.0/5,11,10/[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5][a2112h-1b_1-5][a1221m-1a_1-5]/1-1-2-3-1-4-3-1-4-5-5/a4-b1_a6-k1_b4-c1_c3-d1_c6-g1_d2-e1_e4-f1_g2-h1_h4-i1_i2-j1"))
print(canonicalize_iupac("Ma3(Ma6)Mb4GNb4GN;N"))
print(canonicalize_iupac("α-D-Manp-(1→3)[α-D-Manp-(1→6)]-β-D-Manp-(1→4)-β-D-GlcpNAc-(1→4)-β-D-GlcpNAc-(1→"))
print(canonicalize_iupac("""RES
1b:b-dgal-HEX-1:5
2s:n-acetyl
3b:b-dgal-HEX-1:5
4b:b-dglc-HEX-1:5
5b:b-dgal-HEX-1:5
6b:a-dglc-HEX-1:5
7b:b-dgal-HEX-1:5
8b:a-lgal-HEX-1:5|6:d
9b:a-dgal-HEX-1:5
10s:n-acetyl
11s:n-acetyl
12b:b-dglc-HEX-1:5
13b:b-dgal-HEX-1:5
14b:a-lgal-HEX-1:5|6:d
15b:a-lgal-HEX-1:5|6:d
16s:n-acetyl
17s:n-acetyl
18b:b-dgal-HEX-1:5
LIN
1:1d(2+1)2n
2:1o(3+1)3d
3:3o(3+1)4d
4:4o(-1+1)5d
5:5o(-1+1)6d
6:6o(-1+1)7d
7:7o(2+1)8d
8:7o(3+1)9d
9:9d(2+1)10n
10:6d(2+1)11n
11:5o(-1+1)12d
12:12o(-1+1)13d
13:13o(2+1)14d
14:12o(-1+1)15d
15:12d(2+1)16n
16:4d(2+1)17n
17:1o(6+1)18d
"""))
Fuc(a1-2)Gal(b1-4)GlcNAc(b1-3)Gal(b1-3)[Neu5Ac(a2-3)Gal6S(b1-4)GlcNAc(b1-6)]GalNAc
Fuc(a1-2)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)[Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
Fuc(a1-2)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-?)[GalNAc(a1-3)[Fuc(a1-2)]Gal(b1-?)GlcNAc(a1-?)]Gal(b1-?)GlcNAc(b1-3)Gal(b1-3)[Gal(b1-6)]GalNAc

get_possible_linkages

 get_possible_linkages (wildcard, linkage_list={'a1-7', 'a1-8', 'a1-3',
                        'b1-5', 'b1-7', 'b1-4', 'a2-7', '?2-8', 'b1-3',
                        'a1-6', 'b1-1', 'b2-2', 'b1-8', 'b2-1', 'b2-3',
                        '?2-3', 'a1-11', 'b1-?', 'a1-?', 'a2-?', '?2-?',
                        '1-4', '?1-3', 'b2-6', '?1-2', 'a2-6', 'a2-11',
                        'b2-4', 'a2-4', 'a2-8', '?2-6', 'a2-3', 'b1-2',
                        '1-6', 'a2-2', 'a1-4', '?1-4', 'a2-9', 'a2-5',
                        'a2-1', 'a1-1', 'b2-5', 'b1-6', 'a1-9', 'a1-5',
                        '?1-?', 'b1-9', 'b2-8', '?1-6', 'a1-2', 'b2-7'})

Retrieves all linkages that match a given wildcard pattern from a list of linkages

Arguments:
wildcard (string): The pattern to match, where ‘?’ can be used as a wildcard for any single character.
linkage_list (list): List of linkages as strings to search within; default:linkages
Returns:
Returns a list of linkages that match the wildcard pattern.
get_possible_linkages("a1-?")
['a1-7',
 'a1-1',
 'a1-9',
 'a1-5',
 'a1-4',
 'a1-?',
 'a1-6',
 'a1-3',
 'a1-8',
 'a1-2']

get_possible_monosaccharides

 get_possible_monosaccharides (wildcard)

Retrieves all matching common monosaccharides of a type, given the type

Arguments:
wildcard (string): Monosaccharide type, from “HexNAc”, “Hex”, “dHex”, “Sia”, “HexA”, “Pen”
Returns:
Returns a list of specified monosaccharides of that type
get_possible_monosaccharides("HexNAc")
{'GalNAc', 'GlcNAc', 'HexNAc', 'ManNAc'}

equal_repeats

 equal_repeats (r1, r2)

checks whether two repeat units could stem from the same repeating structure, just shifted

Arguments:
r1 (string): glycan sequence in IUPAC-condensed nomenclature
r2 (string): glycan sequence in IUPAC-condensed nomenclature
Returns:
Returns True if repeat structures are shifted versions of each other, else False
equal_repeats("Fuc2S3S(a1-3)Fuc2S(a1-4)Fuc2S3S", "Fuc2S(a1-4)Fuc2S3S(a1-3)Fuc2S")
True

query

for interacting with the databases contained in glycowork, delivering insights for sequences of interest


get_insight

 get_insight (glycan, motifs=None)

prints out meta-information about a glycan

Arguments:
glycan (string): glycan in IUPAC-condensed format
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list
print("Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'")
get_insight('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')
Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
Let's get rolling! Give us a few moments to crunch some numbers.

This glycan occurs in the following species: ['Acanthocheilonema_viteae', 'Adeno-associated_dependoparvovirusA', 'Aedes_aegypti', 'Angiostrongylus_cantonensis', 'Anopheles_gambiae', 'Antheraea_pernyi', 'Apis_mellifera', 'Ascaris_suum', 'Autographa_californica_nucleopolyhedrovirus', 'AvianInfluenzaA_Virus', 'Bombus_ignitus', 'Bombyx_mori', 'Bos_taurus', 'Bos_taurus', 'Bos_taurus', 'Brugia_malayi', 'Caenorhabditis_elegans', 'Cardicola_forsteri', 'Cooperia_onchophora', 'Cornu_aspersum', 'Crassostrea_gigas', 'Crassostrea_virginica', 'Cricetulus_griseus', 'Danio_rerio', 'Dictyocaulus_viviparus', 'Dirofilaria_immitis', 'Drosophila_melanogaster', 'Fasciola_hepatica', 'Gallus_gallus', 'Glossina_morsitans', 'Haemonchus_contortus', 'Haliotis_tuberculata', 'Heligmosomoides_polygyrus', 'Helix_lucorum', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'HumanImmunoDeficiency_Virus', 'Hylesia_metabus', 'Lutzomyia_longipalpis', 'Lymantria_dispar', 'Macaca_mulatta', 'Macaca_mulatta', 'Macaca_mulatta', 'Macaca_mulatta', 'Mamestra_brassicae', 'Megathura_crenulata', 'Mus_musculus', 'Nilaparvata_lugens', 'Oesophagostomum_dentatum', 'Onchocerca_volvulus', 'Ophiactis_savignyi', 'Opisthorchis_viverrini', 'Ostrea_edulis', 'Ovis_aries', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pristionchus_pacificus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Schistosoma_mansoni', 'SemlikiForest_Virus', 'Spodoptera_frugiperda', 'Sus_scrofa', 'Tick_borne_encephalitis_virus', 'Tribolium_castaneum', 'Trichinella_spiralis', 'Trichoplusia_ni', 'Trichuris_suis', 'Tropidolaemus_subannulatus', 'Volvarina_rubella', 'undetermined', 'unidentified_influenza_virus']

Puh, that's quite a lot! Here are the phyla of those species: ['Arthropoda', 'Artverviricota', 'Chordata', 'Cossaviricota', 'Echinodermata', 'Kitrinoviricota', 'Mollusca', 'Negarnaviricota', 'Nematoda', 'Platyhelminthes', 'Virus']

This glycan contains the following motifs: ['Chitobiose', 'Trimannosylcore', 'core_fucose']

This is the GlyTouCan ID for this glycan: G63041RA

This glycan has been reported to be expressed in: ['2A3_cell_line', 'A549_cell_line', 'AML_193_cell_line', 'CHOK1_cell_line', 'CHOS_cell_line', 'CRL_1620_cell_line', 'Cal-27_cell_line', 'Cervicovaginal_Secretion', 'EOL_1_cell_line', 'FaDu_cell_line', 'HEK293_cell_line', 'HEL92_1_7_cell_line', 'HEL_cell_line', 'HL_60_cell_line', 'KG_1_cell_line', 'KG_1a_cell_line', 'Kasumi_1_cell_line', 'MDA_MB_231BR_cell_line', 'ME_1_cell_line', 'ML_1_cell_line', 'MOLM_13_cell_line', 'MOLM_14_cell_line', 'MV4_11_cell_line', 'M_07e_cell_line', 'NB_4_cell_line', 'NS0_cell_line', 'OCI_AML2_cell_line', 'OCI_AML3_cell_line', 'PLB_985_cell_line', 'SCC-9_cell_line', 'SCC_25_cell_line', 'TF_1_cell_line', 'THP_1_cell_line', 'U_937_cell_line', 'VU-147T_cell_line', 'alveolus_of_lung', 'brain', 'brain', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellum', 'colon', 'cortex', 'digestive_tract', 'digestive_tract', 'forebrain', 'gills', 'gills', 'heart', 'heart', 'heart', 'hindbrain', 'hippocampal_formation', 'hippocampus', 'hippocampus', 'hippocampus', 'hippocampus', 'iPS1A_cell_line', 'iPS2A_cell_line', 'kidney', 'liver', 'lung', 'mantle', 'mantle', 'milk', 'milk', 'milk', 'mucus', 'muscle_of_leg', 'nerve_ending', 'ovary', 'pancreas', 'placenta', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'prostate_gland', 'striatum', 'striatum', 'striatum', 'striatum', 'testicle', 'testis', 'trachea', 'urine', 'urothelium']

This glycan has been reported to be dysregulated in (disease, direction, sample): [('REM_sleep_behavior_disorder', 'down', 'serum'), ('benign_breast_tumor_tissues_vs_para_carcinoma_tissues', 'up', 'breast'), ('cystic_fibrosis', 'up', 'sputum'), ('female_breast_cancer', 'up', 'breast'), ('female_breast_cancer', 'up', 'cell_line'), ('prostate_cancer', 'up', 'prostate_cancer_biopsy'), ('thyroid_gland_papillary_carcinoma', 'up', 'serum'), ('urinary_bladder_cancer', 'down', 'urine')]

That's all we can do for you at this point!

glytoucan_to_glycan

 glytoucan_to_glycan (ids, revert=False)

interconverts GlyTouCan IDs and glycans in IUPAC-condensed

Arguments:
ids (list): list of GlyTouCan IDs as strings (if using glycans instead, change ‘revert’ to True
revert (bool): whether glycans should be mapped to GlyTouCan IDs or vice versa; default:False
Returns:
Returns list of either GlyTouCan IDs or glycans in IUPAC-condensed
glytoucan_to_glycan(['G63041RA'])
['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc']

regex

for performing regular expression-like searches in glycans, very powerful to find complicated motifs


get_match

 get_match (pattern, glycan, return_matches=True)

finds matches for a glyco-regular expression in a glycan

Arguments:
pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex
glycan (string or networkx): glycan sequence in IUPAC-condensed or as networkx graph
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True
Returns:
Returns either a boolean (return_matches = False) or a list of matches as strings (return_matches = True)
# {} = between min and max occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# * = zero or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])*-HexNAc"
# + = one or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])+-HexNAc"
# ? = zero or one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc])?-HexNAc"
# {1,} = at minimum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,}-HexNAc"
# {,1} = at maximum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){,1}-HexNAc"
# {2} = exactly two occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){2}-HexNAc"
# ^ = start of sequence, e.g., "^Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# % = middle of sequence (i.e., neither start nor end)
# $ = end of sequence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc$"
# ?<= = lookbehind (i.e., provided pattern must be present before rest of pattern but is not included in match), e.g., "(?<=Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?<! = negative lookbehind (i.e., provided pattern is not present before rest of pattern and is also not included in match), e.g., "(?<!Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?= = lookahead (i.e., provided pattern must be present after rest of pattern but is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?=-HexNAc)"
# ?! = negative lookahead (i.e., provided pattern is not present after rest of pattern and is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?!-HexNAc)"

# Example: extracting the sequence from the a1-6 branch of N-glycans
pattern = "r[Sia]{,1}-Monosaccharide-([dHex]){,1}-Monosaccharide(?=-Mana6-Monosaccharide)"
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
['Gal(b1-4)GlcNAc']
['GalNAc(b1-4)GlcNAc']
['Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc']
['Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc']

For interested users, we here compile a selection of regular expression patterns that we find useful in our own work:

  • Lewis or sialyl-Lewis structures:
    pattern = “r[Sia]{,1}-[Gal|GalOS]{1}-([Fuc]){1}-[GlcNAc|GlcNAc6S]{1}”
  • Blood groups:
    pattern = “rFuc-([Gal|GalNAc])?-Gal-GlcNAc”
  • a1-6 branch in N-glycans:
    pattern = “r[Sia]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-[Man|GlcNAc]{1}-([.-.|.]){,1}-Mana6(?=-Manb4-GlcNAc)”
  • b1-6 branch in O-glycans (from core 2/4/6):
    pattern = “r[Sia|dHex]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-.b6(?=-GalNAc)”
  • b1-3 branch in O-glycans (from core 1/2):
    pattern = “r[Sia]{,1}-[.]{,1}-([dHex]){,1}-.b3(?=-GalNAc)”

get_match_batch

 get_match_batch (pattern, glycan_list, return_matches=True)

finds matches for a glyco-regular expression in a list of glycans

Arguments:
pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex
glycan_list (list of strings or networkx): list of glycan sequence in IUPAC-condensed or as networkx graph
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True
Returns:
Returns either a list of booleans (return_matches = False) or a list of list of matches as strings (return_matches = True)

motif_to_regex

 motif_to_regex (motif)

tries to convert motif into a regular expression

Arguments:
motif (string): glycan in IUPAC-condensed nomenclature
Returns:
Returns regular expression if successful
motif_to_regex("Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-?)")
'Fuca3-([Galb4]){1}-GlcNAcb?'

tokenization

helper functions to map m/z–>composition, composition–>structure, structure–>motif, and more


string_to_labels

 string_to_labels (character_string, libr=None)

tokenizes word by indexing characters in passed library

Arguments:
character_string (string): string of characters to index
libr (dict): dict of library items
Returns:
Returns indexes of characters in library
string_to_labels(['Man','a1-3','Man','a1-6','Man'])
[None, None, None, None, None]

pad_sequence

 pad_sequence (seq, max_length, pad_label=None, libr=None)

brings all sequences to same length by adding padding token

Arguments:
seq (list): sequence to pad (from string_to_labels)
max_length (int): sequence length to pad to
pad_label (int): which padding label to use
libr (list): list of library items
Returns:
Returns padded sequence
pad_sequence(string_to_labels(['Man','a1-3','Man','a1-6','Man']), 7)
[None, None, None, None, None, 25, 25]

stemify_glycan

 stemify_glycan (glycan, stem_lib=None, libr=None)

removes modifications from all monosaccharides in a glycan

Arguments:
glycan (string): glycan in IUPAC-condensed format
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
libr (dict): dictionary of form glycoletter:index; default:lib
Returns:
Returns stemmed glycan as string
stemify_glycan("Neu5Ac9Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
'Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc'

stemify_dataset

 stemify_dataset (df, stem_lib=None, libr=None, glycan_col_name='glycan',
                  rarity_filter=1)

stemifies all glycans in a dataset by removing monosaccharide modifications

Arguments:
df (dataframe): dataframe with glycans in IUPAC-condensed format in column glycan_col_name
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
libr (dict): dictionary of form glycoletter:index; default:lib
glycan_col_name (string): column name under which glycans are stored; default:glycan
rarity_filter (int): how often monosaccharide modification has to occur to not get removed; default:1
Returns:
Returns df with glycans stemified

mask_rare_glycoletters

 mask_rare_glycoletters (glycans, thresh_monosaccharides=None,
                         thresh_linkages=None)

masks rare monosaccharides and linkages in a list of glycans

Arguments:
glycans (list): list of glycans in IUPAC-condensed form
thresh_monosaccharides (int): threshold-value for monosaccharides seen as “rare”; default:(0.001*len(glycans))
thresh_linkages (int): threshold-value for linkages seen as “rare”; default:(0.03*len(glycans))
Returns:
Returns list of glycans in IUPAC-condensed with masked rare monosaccharides and linkages

mz_to_composition

 mz_to_composition (mz_value, mode='negative', mass_value='monoisotopic',
                    reduced=False, sample_prep='underivatized',
                    mass_tolerance=0.5, kingdom='Animalia',
                    glycan_class='N', df_use=None, filter_out=set())

Mapping a m/z value to a matching monosaccharide composition within SugarBase

Arguments:
mz_value (float): the actual m/z value from mass spectrometry
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
reduced (bool): whether glycans are reduced at reducing end; default:False
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:‘N’
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None
Returns:
Returns a list of matching compositions in dict form
mz_to_composition(665.4, glycan_class='O', filter_out={'Kdn', 'P', 'HexA', 'Pen', 'HexN', 'Me', 'PCho', 'PEtN'},
                    reduced = True)
[{'HexNAc': 2, 'Hex': 2, 'Neu5Ac': 2}]

match_composition_relaxed

 match_composition_relaxed (composition, glycan_class='N',
                            kingdom='Animalia', df_use=None,
                            reducing_end=None)

Given a coarse-grained monosaccharide composition (Hex, HexNAc, etc.), it returns all corresponding glycans

Arguments:
composition (dict): a dictionary indicating the composition to match (for example {“dHex”: 1, “Hex”: 1, “HexNAc”: 1})
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan
Returns:
Returns list of glycans matching composition in IUPAC-condensed
match_composition_relaxed({"Hex":3, "HexNAc":2, "dHex":1}, glycan_class = 'O')
['Fuc(a1-2)[Gal(a1-3)]Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Gal(a1-3)GalNAc(a1-3)[Fuc(a1-2)]Gal(b1-3)Gal(b1-3)GalNAc',
 'Man(a1-6)Glc(a1-4)GlcNAc(b1-4)[Fuc(a1-2)]Gal(b1-3)GalNAc',
 'Gal(?1-?)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Gal(b1-3)[Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)]GalNAc',
 'Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc',
 'Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)[Gal(a1-3)]Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
 'Fuc(a1-2)Gal(b1-?)GlcNAc(b1-3)Gal(b1-3)[Gal(b1-6)]GalNAc',
 'Fuc(a1-2)[Gal(a1-3)]Gal(b1-3)GlcNAc(b1-3)Gal(b1-3)GalNAc',
 'Fuc(a1-2)Gal(b1-3)Gal(b1-3)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(b1-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
 'Gal(b1-4)Gal(b1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-6)]GalNAc',
 'Gal(b1-2)Gal(a1-3)[Fuc(a1-2)]Gal(b1-3)[GlcNAc(b1-6)]GalNAc',
 'Fuc(a1-2)Gal(a1-3)Gal(a1-4)Gal(b1-3)[GlcNAc(b1-6)]GalNAc',
 'Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]Gal(b1-3)GalNAc',
 'Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-?)Gal(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(?1-?)Gal(b1-?)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
 'Fuc(a1-2)Gal(b1-4)GlcNAc(b1-6)[Gal(?1-?)Gal(b1-3)]GalNAc']

condense_composition_matching

 condense_composition_matching (matched_composition)

Given a list of glycans matching a composition, find the minimum number of glycans characterizing this set

Arguments:
matched_composition (list): list of glycans matching to a composition
Returns:
Returns minimal list of glycans that match a composition
match_comp = match_composition_relaxed({'Hex':1, 'HexNAc':1, 'Neu5Ac':1}, glycan_class = 'O')
print(match_comp)
condense_composition_matching(match_comp)
['Neu5Ac(a2-3)Gal(b1-3)GalNAc', 'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-?)Gal(b1-3)GalNAc', '{Neu5Ac(a2-?)}Gal(b1-3)GalNAc', 'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal', 'Neu5Ac(a2-3)Gal(b1-4)GalNAc', 'Neu5Ac(a2-6)Gal(b1-3)GalNAc', 'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-?)Hex(?1-?)GalNAc', 'Gal(?1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-3)Gal(?1-?)GalNAc', 'Neu5Ac(a2-6)Gal(a1-3)GalNAc', 'Neu5Ac(a2-?)Gal(?1-3)GalNAc']
['Neu5Ac(a2-3)Gal(b1-3)GalNAc',
 'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc',
 'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc',
 '{Neu5Ac(a2-?)}Gal(b1-3)GalNAc',
 'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal',
 'Neu5Ac(a2-3)Gal(b1-4)GalNAc',
 'Neu5Ac(a2-6)Gal(b1-3)GalNAc',
 'Neu5Ac(a2-?)Hex(?1-?)GalNAc',
 'Neu5Ac(a2-3)Gal(?1-?)GalNAc',
 'Neu5Ac(a2-6)Gal(a1-3)GalNAc']

mz_to_structures

 mz_to_structures (mz_list, glycan_class, kingdom='Animalia',
                   abundances=None, mode='negative',
                   mass_value='monoisotopic', sample_prep='underivatized',
                   mass_tolerance=0.5, reduced=False, df_use=None,
                   filter_out=set(), verbose=False)

wrapper function to map precursor masses to structures, condense them, and match them with relative intensities

Arguments:
mz_list (list): list of precursor masses
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
abundances (dataframe): every row one composition (matching mz_list in order), every column one sample; default:pd.DataFrame([range(len(mz_list))]*2).T
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5
reduced (bool): whether glycans are reduced at reducing end; default:False
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None
verbose (bool): whether to print any non-matching compositions; default:False
Returns:
Returns dataframe of (matched structures) x (relative intensities)
mz_to_structures([674.29], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
2 Gal(a1-3)[Neu5Ac(a2-6)]GalNAc 0
3 {Neu5Ac(a2-?)}Gal(b1-3)GalNAc 0
4 Neu5Ac(a2-3)[GalNAc(b1-4)]Gal 0
5 Neu5Ac(a2-3)Gal(b1-4)GalNAc 0
6 Neu5Ac(a2-6)Gal(b1-3)GalNAc 0
7 Neu5Ac(a2-?)Hex(?1-?)GalNAc 0
8 Neu5Ac(a2-3)Gal(?1-?)GalNAc 0
9 Neu5Ac(a2-6)Gal(a1-3)GalNAc 0

compositions_to_structures

 compositions_to_structures (composition_list, glycan_class='N',
                             kingdom='Animalia', abundances=None,
                             df_use=None, verbose=False)

wrapper function to map compositions to structures, condense them, and match them with relative intensities

Arguments:
composition_list (list): list of composition dictionaries of the form {‘Hex’: 1, ‘HexNAc’: 1}
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’
abundances (dataframe): every row one composition (matching composition_list in order), every column one sample;default:pd.DataFrame([range(len(composition_list))]*2).T
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan
verbose (bool): whether to print any non-matching compositions; default:False
Returns:
Returns dataframe of (matched structures) x (relative intensities)
compositions_to_structures([{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1}], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc 0
2 Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc 0
3 Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc 0
compositions_to_structures(["H1N1A2"], glycan_class = 'O')
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc 0
1 Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc 0
2 Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc 0
3 Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc 0

structure_to_basic

 structure_to_basic (glycan)

converts a monosaccharide- and linkage-defined glycan structure to the base topology

Arguments:
glycan (string): glycan in IUPAC-condensed nomenclature
Returns:
Returns the glycan topology as a string
structure_to_basic("Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
'Neu5Ac(a2-?)HexOS(?1-?)[Neu5Ac(a2-?)]HexNAc'

glycan_to_composition

 glycan_to_composition (glycan, stem_libr=None)

maps glycan to its composition

Arguments:
glycan (string): glycan in IUPAC-condensed format
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
Returns:
Returns a dictionary of form “Monosaccharide” : count
glycan_to_composition("Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1}

glycan_to_mass

 glycan_to_mass (glycan, mass_value='monoisotopic',
                 sample_prep='underivatized', stem_libr=None)

given a glycan, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation

Arguments:
glycan (string): glycan in IUPAC-condensed format
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib
Returns:
Returns the theoretical mass of input glycan
glycan_to_mass("Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc")
1045.2903546

composition_to_mass

 composition_to_mass (dict_comp, mass_value='monoisotopic',
                      sample_prep='underivatized')

given a composition, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation

Arguments:
dict_comp (dict): composition in form monosaccharide:count
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’
Returns:
Returns the theoretical mass of input composition
composition_to_mass({'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1})
1045.2903546

get_unique_topologies

 get_unique_topologies (composition, glycan_type, df_use=None,
                        universal_replacers={}, taxonomy_rank='Kingdom',
                        taxonomy_value='Animalia')

given a composition, retrieves all observed and unique base topologies

Arguments:
composition (dict): composition in form monosaccharide:count
glycan_type (string): which glycan class to search, ‘N’, ‘O’, ‘lipid’, ‘free’, or ‘repeat’
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan
universal_replacers (dictionary): dictionary of form base monosaccharide : specific monosaccharide
taxonomy_rank (string): at which taxonomic rank to filter; default: Kingdom
taxonomy_value (string): which value to filter at taxonomy_rank; default: Animalia
Returns:
Returns a list of observed base topologies for the given composition
get_unique_topologies({'HexNAc':2, 'Hex':1}, 'O', universal_replacers = {'dHex':'Fuc'})
['Hex(?1-?)HexNAc(?1-?)HexNAc',
 'Hex(?1-?)[HexNAc(?1-?)]HexNAc',
 'HexNAc(?1-?)Hex(?1-?)HexNAc']