"Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc",
GlycoDraw(= "GlcNAc(b1-?)Man") highlight_motif
motif
motif
contains many functions to process glycans in various ways and use this processing to analyze glycans via curated motifs, graph features, and sequence features. It contains the following modules:
draw
contains the GlycoDraw function to draw glycans in SNFG styleanalysis
contains functions for downstream analyses of important glycan motifs etc.annotate
contains functions to extract curated motifs, graph features, and sequence features from glycan sequencesgraph
is used to convert glycan sequences to graphs and contains helper functions to search for motifs / check whether two sequences describe the same sequence, etc.processing
contains functions to process IUPAC-condensed glycan sequences, as well as conversion functions to convert other nomenclatures into IUPAC-condensed.regex
contains functionality for performing powerful regular expression-like searches on glycans;get_match
is the user-facing function.query
is used to interact with the databases contained in glycowork, delivering insights for sequences of interesttokenization
has helper functions to map m/z–>composition, composition–>structure, structure–>motif, and more
draw
drawing glycans in SNFG style
GlycoDraw
GlycoDraw (draw_this, vertical=False, compact=False, show_linkage=True, dim=50, highlight_motif=None, highlight_termini_list=[], repeat=None, repeat_range=None, draw_method=None, filepath=None, suppress=False, per_residue=[])
*Draws a glycan structure based on the provided input.
Arguments: |
---|
draw_this (string): The glycan structure or motif to be drawn. |
vertical (bool, optional): Set to True to draw the structure vertically. Default: False. |
compact (bool, optional): Set to True to draw the structure in a compact form. Default: False. |
show_linkage (bool, optional): Set to False to hide the linkage information. Default: True. |
dim (int, optional): The dimension (size) of the individual sugar units in the structure. Default: 50. |
highlight_motif (string, optional): Glycan motif to highlight within the parent structure. |
highlight_termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’) |
repeat (bool |
repeat_range (list of 2 int): List of index integers for the first and last main-chain monosaccharide in repeating unit. Monosaccharides are numbered starting from 0 (invisible placeholder = 0 in case of structure terminating in a linkage) at the reducing end. |
draw_method (string, optional): Specify ‘chem2d’ or ‘chem3d’ to draw chemical structures; default:None (SNFG figure) |
filepath (string, optional): The path to the output file to save as SVG or PDF when drawing SNFG/chem2d figures or PDB when generating 3D conformers. Default: None. |
suppress (bool, optional): Whether to suppress the visual display of drawings into the console; default:False |
per_residue (list, optional): list of floats (order should be the same as the monosaccharides in glycan string) to quantitatively highlight monosaccharides.* |
annotate_figure
annotate_figure (svg_input, scale_range=(25, 80), compact=False, glycan_size='medium', filepath='', scale_by_DE_res=None, x_thresh=1, y_thresh=0.05, x_metric='Log2FC')
*Modify matplotlib svg figure to replace text labels with glycan figures
Arguments: |
---|
svg_input (string): absolute path including full filename for input svg figure |
scale_range (tuple): tuple of two integers defining min/max glycan dim; default:(25,80) |
compact (bool): if True, draw compact glycan figures; default:False |
glycan_size (string): modify glycan size; default:‘medium’; options are ‘small’, ‘medium’, ‘large’ |
filepath (string): absolute path including full filename allows for saving the plot |
scale_by_DE_res (df): result table from motif_analysis.get_differential_expression. Include to scale glycan figure size by -10logp |
x_thresh (float): absolute x metric threshold for datapoints included for scaling, set to match get_differential_expression; default:1.0 |
y_thresh (float): corr p threshhold for datapoints included for scaling, set to match get_differential_expression; default:0.05 |
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’ |
Returns: |
---|
Modified figure svg code* |
plot_glycans_excel
plot_glycans_excel (df, folder_filepath, glycan_col_num=0, scaling_factor=0.2, compact=False)
*plots SNFG images of glycans into new column in df and saves df as Excel file
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences [alternative: filepath to .csv or .xlsx] |
folder_filepath (string): full filepath to the folder you want to save the output to |
glycan_col_num (int): index of the column containing glycan sequences; default:0 (first column) |
scaling_factor (float): how large the glycans should be; default:0.2 |
compact (bool, optional): Set to True to draw the structures in a compact form. Default: False. |
Returns: |
---|
Saves the dataframe with glycan images as output.xlsx into folder_filepath* |
analysis
downstream analyses of important glycan motifs
get_pvals_motifs
get_pvals_motifs (df, glycan_col_name='glycan', label_col_name='target', zscores=True, thresh=1.645, sorting=True, feature_set=['exhaustive'], multiple_samples=False, motifs=None, custom_motifs=[])
*returns enriched motifs based on label data or predicted data
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences and labels [alternative: filepath to .csv or .xlsx] |
glycan_col_name (string): column name for glycan sequences; arbitrary if multiple_samples = True; default:‘glycan’ |
label_col_name (string): column name for labels; arbitrary if multiple_samples = True; default:‘target’ |
zscores (bool): whether data are presented as z-scores or not, will be z-score transformed if False; default:True |
thresh (float): threshold value to separate positive/negative; default is 1.645 for Z-scores |
sorting (bool): whether p-value dataframe should be sorted ascendingly; default: True |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
multiple_samples (bool): set to True if you have multiple samples (rows) with glycan information (columns); default:False |
motifs (dataframe): can be used to pass a modified motif_list to the function; default:None |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
Returns: |
---|
Returns dataframe with p-values, corrected p-values, and Cohen’s d as effect size for every glycan motif* |
= ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
glycans 'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN',
'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
'Glc(b1-3)Glc(b1-3)Glc']
= [3.234, 2.423, 0.733, 3.102, 0.108]
label = pd.DataFrame({'glycan':glycans, 'binding':label})
test_df
print("Glyco-Motif enrichment p-value test")
= get_pvals_motifs(test_df, 'glycan', 'binding').iloc[:10,:] out
Glyco-Motif enrichment p-value test
motif | pval | corr_pval | effect_size | |
---|---|---|---|---|
4 | GlcNAc | 0.038120 | 0.205849 | 1.530905 |
8 | Man | 0.054356 | 0.234990 | 1.390253 |
24 | Man(a1-?)Man | 0.060923 | 0.234990 | 1.308333 |
22 | Man(a1-3)Man | 0.034212 | 0.205849 | 1.196586 |
14 | GlcNAc(b1-4)GlcNAc | 0.019543 | 0.175885 | 1.168815 |
23 | Man(a1-6)Man | 0.019543 | 0.175885 | 1.168815 |
25 | Man(b1-4)GlcNAc | 0.019543 | 0.175885 | 1.168815 |
7 | Kdo | 0.328790 | 0.479672 | -0.811679 |
2 | Glc | 0.644180 | 0.668956 | -0.811679 |
21 | Man(a1-2)Man | 0.177461 | 0.479672 | 0.772320 |
get_representative_substructures
get_representative_substructures (enrichment_df)
*builds minimal glycans that contain enriched motifs from get_pvals_motifs
Arguments: |
---|
enrichment_df (dataframe): output from get_pvals_motifs |
Returns: |
---|
Returns up to 10 minimal glycans in a list* |
get_heatmap
get_heatmap (df, motifs=False, feature_set=['known'], transform='', datatype='response', rarity_filter=0.05, filepath='', index_col='glycan', custom_motifs=[], return_plot=False, show_all=False, **kwargs)
*clusters samples based on glycan data (for instance glycan binding etc.)
Arguments: |
---|
df (dataframe): dataframe with glycan data, rows are samples and columns are glycans [alternative: filepath to .csv or .xlsx] |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
transform (string): whether to transform the data before plotting, currently the only option is “CLR”, recommended for glycomics data; default: no transformation |
datatype (string): whether df comes from a dataset with quantitative variable (‘response’) or from presence_to_matrix (‘presence’) |
rarity_filter (float): proportion of samples that need to have a non-zero value for a variable to be included; default:0.05 |
filepath (string): absolute path including full filename allows for saving the plot |
index_col (string): default column to convert to dataframe index; default:‘glycan’ |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
return_plot (bool): whether to return the plot object for external saving; default:False |
show_all (bool): whether to plot all ticklabels, no matter how many there are (this might cause visual overlaps); default:False |
**kwargs: keyword arguments that are directly passed on to seaborn clustermap |
Returns: |
---|
Prints clustermap* |
= ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
glycans 'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P',
'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
'Glc(b1-3)Glc(b1-3)Glc']
= [3.234, 2.423, 0.733, 3.102, 0.108]
label = [0.134, 0.345, 1.15, 0.233, 2.981]
label2 = [0.334, 0.245, 1.55, 0.133, 2.581]
label3 = pd.DataFrame([label, label2, label3], columns = glycans)
test_df
= True, feature_set = ['known', 'exhaustive']) get_heatmap(test_df, motifs
plot_embeddings
plot_embeddings (glycans, emb=None, label_list=None, shape_feature=None, filepath='', alpha=0.8, palette='colorblind', **kwargs)
*plots glycan representations for a list of glycans
Arguments: |
---|
glycans (list): list of IUPAC-condensed glycan sequences as strings |
emb (dictionary): stored glycan representations; default takes them from trained species-level SweetNet model |
label_list (list): list of same length as glycans if coloring of the plot is desired |
shape_feature (string): monosaccharide/bond used to display alternative shapes for dots on the plot |
filepath (string): absolute path including full filename allows for saving the plot |
alpha (float): transparency of points in plot; default:0.8 |
palette (string): color palette to color different classes; default:‘colorblind’ |
**kwargs: keyword arguments that are directly passed on to matplotlib* |
= df_species[df_species.Order == 'Fabales'].reset_index(drop = True)
df_fabales = df_fabales.Family.values.tolist()) plot_embeddings(df_fabales.glycan.values.tolist(), label_list
characterize_monosaccharide
characterize_monosaccharide (sugar, df=None, mode='sugar', glycan_col_name='glycan', rank=None, focus=None, modifications=False, filepath='', thresh=10)
*for a given monosaccharide/linkage, return typical neighboring linkage/monosaccharide
Arguments: |
---|
sugar (string): monosaccharide or linkage |
df (dataframe): dataframe to use for analysis; default:df_species |
mode (string): either ‘sugar’ (connected monosaccharides), ‘bond’ (monosaccharides making a provided linkage), or ‘sugarbond’ (linkages that a provided monosaccharides makes); default:‘sugar’ |
glycan_col_name (string): column name under which glycans can be found; default:‘glycan’ |
rank (string): add column name as string if you want to filter for a group |
focus (string): add row value as string if you want to filter for a group |
modifications (bool): set to True if you want to consider modified versions of a monosaccharide; default:False |
filepath (string): absolute path including full filename allows for saving the plot |
thresh (int): threshold count of when to include motifs in plot; default:10 occurrences |
Returns: |
---|
Plots modification distribution and typical neighboring bond/monosaccharide* |
'D-Rha', rank = 'Kingdom', focus = 'Bacteria', modifications = True) characterize_monosaccharide(
get_differential_expression
get_differential_expression (df, group1, group2, motifs=False, feature_set=['exhaustive', 'known'], paired=False, impute=True, sets=False, set_thresh=0.9, effect_size_variance=False, min_samples=0.1, grouped_BH=False, custom_motifs=[], transform=None, gamma=0.1, custom_scale=0, glycoproteomics=False, level='peptide', monte_carlo=False)
*Calculates differentially expressed glycans or motifs from glycomics data
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
group1 (list): list of column indices or names for the first group of samples, usually the control |
group2 (list): list of column indices or names for the second group of samples |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False |
impute (bool): replaces zeroes with a Random Forest based model; default:True |
sets (bool): whether to identify clusters of highly correlated glycans/motifs to test for differential expression; default:False |
set_thresh (float): correlation value used as a threshold for clusters; only used when sets=True; default:0.9 |
effect_size_variance (bool): whether effect size variance should also be calculated/estimated; default:False |
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10% |
grouped_BH (bool): whether to perform two-stage adaptive Benjamini-Hochberg as a grouped multiple testing correction; will SIGNIFICANTLY increase runtime; default:False |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate) |
glycoproteomics (bool): whether the analyzed data in df comes from a glycoproteomics experiment; default:False |
level (string; only relevant if glycoproteomics=True): whether to analyze glycoform differential expression at the level of ‘peptide’ or ‘protein’; default:‘peptide’ |
monte_carlo (bool): whether to account for technical variation via Monte Carlo simulations; will be slower and much more conservative; default:False |
Returns: |
---|
Returns a dataframe with: |
(i) Differentially expressed glycans/motifs/sets |
(ii) Their mean abundance across all samples in group1 + group2 |
(iii) Log2-transformed fold change of group2 vs group1 (i.e., negative = lower in group2) |
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean |
(v) Corrected p-values (Welch’s t-test with two-stage Benjamini-Hochberg correction) for difference in mean |
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold |
(vii) Corrected p-values (Levene’s test for equality of variances with Benjamini-Hochberg correction) for difference in variance |
(viii) Effect size as Cohen’s d (sets=False) or Mahalanobis distance (sets=True) |
(xi) Corrected p-values of equivalence test to test whether means are significantly equivalent; only done for p-values > 0.05 from (iv) |
(x) [only if effect_size_variance=True] Effect size variance* |
= glycomics_data_loader.human_skin_O_PMC5871710_BCC
test_df
= get_differential_expression(test_df, group1 = [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
res = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], motifs = True, paired = True)
group2 res
You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.
Glycan | Mean abundance | Log2FC | p-val | corr p-val | significant | corr Levene p-val | Effect size | Equivalence p-val | |
---|---|---|---|---|---|---|---|---|---|
1 | GalOS(b1-3)GalNAc | 0.159900 | -0.949766 | 0.000471 | 0.003768 | True | 0.935976 | -0.942071 | 1.000000 |
4 | Terminal_LacNAc_type2 | 2.328696 | 0.500929 | 0.001587 | 0.003801 | True | 0.690077 | 0.823108 | 1.000000 |
10 | Neu5Ac(a2-3)Gal | 7.956860 | 0.276936 | 0.001957 | 0.003801 | True | 0.935976 | 0.802525 | 1.000000 |
3 | GlcNAc6S(b1-6)GalNAc | 1.046247 | 0.922369 | 0.002160 | 0.003801 | True | 0.935976 | 0.792804 | 1.000000 |
2 | H_type2 | 0.247156 | -0.701400 | 0.002376 | 0.003801 | True | 0.935976 | -0.783442 | 1.000000 |
5 | Terminal_LacNAc_type2 | 2.440640 | -0.475144 | 0.009863 | 0.013151 | True | 0.935976 | -0.641129 | 1.000000 |
13 | Neu5Ac | 12.726196 | 0.194904 | 0.022240 | 0.025417 | True | 0.935976 | 0.556605 | 1.000000 |
0 | Neu5Ac(a2-8)Neu5Ac | 0.038663 | -0.588491 | 0.033743 | 0.033743 | True | 0.690077 | -0.511708 | 1.000000 |
8 | Oglycan_core1 | 3.790085 | 0.186020 | 0.042067 | 0.042067 | True | 0.935976 | 0.487392 | 1.000000 |
14 | Gal | 12.886096 | 0.132187 | 0.049873 | 0.049873 | False | 0.935976 | 0.468301 | 1.000000 |
11 | Gal(b1-3)GalNAc | 4.769337 | 0.116511 | 0.082862 | 0.082862 | False | 0.935976 | 0.409377 | 0.681761 |
12 | GalNAc | 12.345115 | 0.106238 | 0.106760 | 0.106760 | False | 0.935976 | 0.378600 | 0.681761 |
7 | Neu5Ac(a2-6)GalNAc | 2.440640 | -0.078716 | 0.502123 | 0.502123 | False | 0.935976 | -0.152987 | 0.605365 |
6 | Disialyl_T_antigen | 2.328696 | -0.027796 | 0.818170 | 0.818170 | False | 0.935976 | -0.052125 | 0.570158 |
9 | Mucin_elongated_core2 | 4.169726 | 0.010202 | 0.932748 | 0.932748 | False | 0.935976 | 0.019121 | 0.570158 |
get_volcano
get_volcano (df_res, y_thresh=0.05, x_thresh=0, n=None, label_changed=True, x_metric='Log2FC', annotate_volcano=False, filepath='', **kwargs)
*Plots glycan differential expression results in a volcano plot
Arguments: |
---|
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx] |
y_thresh (float): corr p threshhold for labeling datapoints; default:0.05 |
x_thresh (float): absolute x metric threshold for labeling datapoints; default:0 |
n (float): sample size for Bayesian-Adaptive Alpha Adjustment; default = None |
label_changed (bool): if True, add text labels to significantly up- and downregulated datapoints; default:True |
x_metric (string): x-axis metric; default:‘Log2FC’; options are ‘Log2FC’, ‘Effect size’ |
annotate_volcano (bool): whether to annotate the dots in the plot with SNFG images; default: False |
filepath (string): absolute path including full filename allows for saving the plot |
**kwargs: keyword arguments that are directly passed on to seaborn scatterplot |
Returns: |
---|
Prints volcano plot* |
get_volcano(res)
You're working with a default alpha of 0.05. Set sample size (n = ...) for Bayesian-Adaptive Alpha Adjustment
get_coverage
get_coverage (df, filepath='')
*Plot glycan coverage across samples, ordered by average intensity
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
filepath (string): absolute path including full filename allows for saving the plot |
Returns: |
---|
Prints the heatmap* |
= pd.concat([test_df.iloc[:, 0], test_df[test_df.columns[1:]].astype(float)], axis = 1)
test_df
get_coverage(test_df)
get_pca
get_pca (df, groups=None, motifs=False, feature_set=['known', 'exhaustive'], pc_x=1, pc_y=2, color=None, shape=None, filepath='', custom_motifs=[], transform=None, rarity_filter=0.05)
*PCA plot from glycomics abundance dataframe
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
groups (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3]); default:None |
alternatively: design dataframe with ‘id’ column of samples names and additional columns with meta information |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
pc_x (int): principal component to plot on x axis; default:1 |
pc_y (int): principal component to plot on y axis; default:2 |
color (string): if design dataframe is provided: column name for color grouping; default:None |
shape (string): if design dataframe is provided: column name for shape grouping; default:None |
filepath (string): absolute path including full filename allows for saving the plot |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (string): whether to transform the data before plotting, options are “CLR” and “ALR”, recommended for glycomics data; default: no transformation |
rarity_filter (float): proportion of samples that need to have a non-zero value for a variable to be included; default:0.05 |
Returns: |
---|
Prints PCA plot* |
= True, groups = [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2]) get_pca(test_df, motifs
get_pval_distribution
get_pval_distribution (df_res, filepath='')
*p-value distribution plot of glycan differential expression result
Arguments: |
---|
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv] |
filepath (string): absolute path including full filename allows for saving the plot |
Returns: |
---|
prints p-value distribution plot* |
get_pval_distribution(res)
get_ma
get_ma (df_res, log2fc_thresh=1, sig_thresh=0.05, filepath='')
*MA plot of glycan differential expression result
Arguments: |
---|
df_res (dataframe): output from get_differential_expression [alternative: filepath to .csv or .xlsx] |
log2fc_thresh (int): absolute Log2FC threshold for highlighting datapoints |
sig_thresh (int): significance threshold for highlighting datapoints |
filepath (string): absolute path including full filename allows for saving the plot |
Returns: |
---|
prints MA plot* |
get_ma(res)
get_glycanova
get_glycanova (df, groups, impute=True, motifs=False, feature_set=['exhaustive', 'known'], min_samples=0.1, posthoc=True, custom_motifs=[], transform=None, gamma=0.1, custom_scale=0)
*Calculate an ANOVA for each glycan (or motif) in the DataFrame
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
groups (list): a list of group identifiers for each sample (e.g., [1,1,1,2,2,2,3,3,3]) |
impute (bool): replaces zeroes with with a Random Forest based model; default:True |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10% |
posthoc (bool): whether to do Tukey’s HSD test post-hoc to find out which differences were significant; default:True |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
custom_scale (dict): dictionary of type group_idx : mean(group)/min(mean(groups)) for an informed scale model |
Returns: |
---|
(i) a pandas DataFrame with an F statistic, corrected p-value, indication of its significance, and effect size (Omega squared) for each glycan. |
(ii) a dictionary of type glycan : pandas DataFrame, with post-hoc results for each glycan with a significant ANOVA.* |
= glycomics_data_loader.HIV_gagtransfection_O_PMID35112714
test_df2
= get_glycanova(test_df2, [1,1,1,1,2,2,2,2,3,3,3,3], motifs = False)
anv, ph anv
You're working with an alpha of 0.06364810000741428 that has been adjusted for your sample size of 12.
Glycan | F statistic | p-val | corr p-val | significant | Effect size | |
---|---|---|---|---|---|---|
3 | Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]Ga... | 5.598977 | 0.026315 | 0.118419 | False | 0.304589 |
4 | Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc | 6.438216 | 0.018374 | 0.118419 | False | 0.341206 |
0 | Gal(b1-3)[Neu5Ac(a2-6)]GalNAc | 2.987765 | 0.101128 | 0.303384 | False | 0.159177 |
1 | Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc | 1.471480 | 0.279954 | 0.402510 | False | 0.042973 |
6 | Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Neu5Ac(a2-3)... | 1.324978 | 0.313063 | 0.402510 | False | 0.030021 |
5 | Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]Ga... | 1.500984 | 0.273814 | 0.402510 | False | 0.045540 |
8 | Neu5Ac(a2-3)Gal(b1-4)GlcNAc6S(b1-6)[Neu5Ac(a2-... | 1.923255 | 0.201631 | 0.402510 | False | 0.080822 |
2 | Neu5Ac(a2-3)Gal(b1-3)GalNAc | 1.060368 | 0.385914 | 0.434153 | False | 0.005716 |
7 | Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-?)[GlcNAc(b1-?)... | 0.514362 | 0.614449 | 0.614449 | False | -0.048494 |
get_meta_analysis
get_meta_analysis (effect_sizes, variances, model='fixed', filepath='', study_names=[])
*Fixed-effects model or random-effects model for meta-analysis of glycan effect sizes
Arguments: |
---|
effect_sizes (array-like): Effect sizes (e.g., Cohen’s d) from each study |
variances (array-like): Corresponding effect size variances from each study |
model (string): Whether to use ‘fixed’ or ‘random’ effects model |
filepath (string): absolute path including full filename allows for saving the Forest plot |
study_names (list): list of strings indicating the name of each study |
Returns: |
---|
(1) The combined effect size |
(2) The p-value for the combined effect size* |
-8.759, -6.363, -5.199, -3.952],
get_meta_analysis([7.061, 4.041, 2.919, 1.968]) [
(np.float64(-5.326913553837341), np.float64(3.005077298112724e-09))
get_time_series
get_time_series (df, impute=True, motifs=False, feature_set=['known', 'exhaustive'], degree=1, min_samples=0.1, custom_motifs=[], transform=None, gamma=0.1, custom_scale=0)
*Analyzes time series data of glycans using an OLS model
Arguments: |
---|
df (dataframe): dataframe containing sample IDs of style sampleID_UnitTimepoint_replicate (e.g., T1_h5_r1) in first column and glycan relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
impute (bool): replaces zeroes with a Random Forest based model; default:True |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
degree (int): degree of the polynomial for regression, default:1 for linear regression |
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10% |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
custom_scale (dict): dictionary of type timepoint : mean(timepoint)/min(mean(timepoints)) for an informed scale model |
Returns: |
---|
Returns a dataframe with: |
(i) Glycans/motifs potentially exhibiting significant changes over time |
(ii) The slope of their expression curve over time |
(iii) Uncorrected p-values (t-test) for testing whether slope is significantly different from zero |
(iv) Corrected p-values (t-test with two-stage Benjamini-Hochberg correction) for testing whether slope is significantly different from zero |
(v) Significance: True/False whether the corrected p-value lies below the sample size-appropriate significance threshold* |
= {}
t_dic "ID"] = ["D1_h5_r1", "D1_h5_r2", "D1_h5_r3", "D1_h10_r1", "D1_h10_r2", "D1_h10_r3", "D1_h15_r1", "D1_h15_r2", "D1_h15_r3"]
t_dic["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc"] = [0.33, 0.31, 0.35, 1.51, 1.57, 1.66, 2.11, 2.04, 2.09]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [0.78, 1.01, 0.98, 0.88, 1.11, 0.72, 1.22, 1.00, 0.54]
t_dic["Neu5Ac(a2-6)GalNAc"] = [0.11, 0.09, 0.14, 0.02, 0.07, 0.10, 0.11, 0.09, 0.08]
t_dic["ID").T) get_time_series(pd.DataFrame(t_dic).set_index(
You're working with an alpha of 0.0694557066556809 that has been adjusted for your sample size of 9.
Glycan | Change | p-val | corr p-val | significant | |
---|---|---|---|---|---|
1 | Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]Ga... | 0.196162 | 0.004659 | 0.004659 | True |
0 | Fuc(a1-2)Gal(b1-3)GalNAc | -0.083848 | 0.011912 | 0.011912 | True |
2 | Neu5Ac(a2-6)GalNAc | -0.109381 | 0.090226 | 0.090226 | False |
get_jtk
get_jtk (df_in, timepoints, periods, interval, motifs=False, feature_set=['known', 'exhaustive', 'terminal'], custom_motifs=[], transform=None, gamma=0.1, correction_method='two-stage')
*Detecting rhythmically expressed glycans via the Jonckheere–Terpstra–Kendall (JTK) algorithm
Arguments: |
---|
df_in (pd.DataFrame): A dataframe containing data for analysis. [alternative: filepath to .csv or .xlsx] |
(column 0 = molecule IDs, then arranged in groups and by ascending timepoints) |
timepoints (int): number of timepoints in the experiment (each timepoint must have the same number of replicates). |
periods (list): number of timepoints (as int) per cycle. |
interval (int): units of time (Arbitrary units) between experimental timepoints. |
motifs (bool): a flag for running structural of motif-based analysis (True = run motif analysis); default:False. |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
correction_method (string): whether to use “two-stage” or “one-stage” Benjamini-Hochberg for correction; default:“two-stage” |
Returns: |
---|
Returns a pandas dataframe containing the adjusted p-values, and most important waveform parameters for each |
molecule in the analysis.* |
= {}
t_dic "Neu5Ac(a2-3)Gal(b1-3)GalNAc"] = [0.433138901, 0.149729209, 0.358018822, 0.537641256, 1.526963756, 1.349986672, 0.75156406, 0.736710183]
t_dic["Gal(b1-3)GalNAc"] = [0.919762334, 0.760237184, 0.725566662, 0.459945797, 0.523801515, 0.695106926, 0.627632047, 1.183511209]
t_dic["Gal(b1-3)[Neu5Ac(a2-6)]GalNAc"] = [0.533138901, 0.119729209, 0.458018822, 0.637641256, 1.726963756, 1.249986672, 0.55156406, 0.436710183]
t_dic["Fuc(a1-2)Gal(b1-3)GalNAc"] = [3.862169504, 5.455032837, 3.858163289, 5.614650335, 3.124254095, 4.189550337, 4.641831312, 4.19538484]
t_dic[= 8 # number of timepoints in experiment
tps = [8] # number of timepoints per cycle
periods = 3 # units of time between experimental timepoints
interval = pd.DataFrame(t_dic).T
t_df = ["T3", "T6", "T9", "T12", "T15", "T18", "T21", "T24"]
t_df.columns get_jtk(t_df.reset_index(), tps, periods, interval)
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Significance inflation detected. The CLR/ALR transformation possibly cannot handle this dataset. Consider running again with a higher gamma value. Proceed with caution; for now switching to Bonferroni correction to be conservative about this.
Molecule_Name | BH_Q_Value | Adjusted_P_value | Period_Length | Lag_Phase | Amplitude | significant | |
---|---|---|---|---|---|---|---|
0 | Neu5Ac(a2-3)Gal(b1-3)GalNAc | 0.006944 | 0.001736 | 24.0 | 16.5 | 0.474136 | True |
1 | Gal(b1-3)GalNAc | 0.006944 | 0.001736 | 24.0 | 1.5 | 0.220136 | True |
2 | Gal(b1-3)[Neu5Ac(a2-6)]GalNAc | 0.056548 | 0.014137 | 24.0 | 13.5 | 0.379760 | False |
3 | Fuc(a1-2)Gal(b1-3)GalNAc | 0.434722 | 0.108681 | 24.0 | 4.5 | 0.310215 | False |
= True, feature_set = ['terminal']) get_jtk(t_df.reset_index(), tps, periods, interval, motifs
You're working with an alpha of 0.22004505213567527 that has been adjusted for your sample size of 1.
Molecule_Name | BH_Q_Value | Adjusted_P_value | Period_Length | Lag_Phase | Amplitude | significant | |
---|---|---|---|---|---|---|---|
2 | Terminal_Neu5Ac(a2-?) | 0.000794 | 0.000397 | 0.0 | 0.0 | 0.000000e+00 | True |
0 | Terminal_Neu5Ac(a2-3) | 0.001736 | 0.001736 | 24.0 | 15.0 | 1.110223e-16 | True |
1 | Terminal_Neu5Ac(a2-6) | 0.014137 | 0.014137 | 24.0 | 13.5 | 2.283195e-01 | True |
4 | Terminal_Fuc(a1-2) | 0.061012 | 0.061012 | 24.0 | 4.5 | 2.825447e-01 | True |
3 | Terminal_Gal(b1-3) | 0.398760 | 0.398760 | 24.0 | 3.0 | 6.938894e-18 | False |
get_biodiversity
get_biodiversity (df, group1, group2, metrics=['alpha', 'beta'], motifs=False, feature_set=['exhaustive', 'known'], custom_motifs=[], paired=False, permutations=999, transform=None, gamma=0.1, custom_scale=0)
*Calculates diversity indices from glycomics data, similar to alpha/beta diversity etc in microbiome data
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
group1 (list): a list of column identifiers corresponding to samples in group 1 |
group2 (list): a list of column identifiers corresponding to samples in group 2 (note, if an empty list is provided, group 1 can be used a list of group identifiers for each column - e.g., [1,1,2,2,3,3…]) |
metrics (list): which diversity metrics to calculate (alpha, beta); default:[‘alpha’, ‘beta’] |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False |
permutations (int): number of permutations to perform in ANOSIM and PERMANOVA statistical test; default:999 |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate) |
Returns: |
---|
Returns a dataframe with: |
(i) Diversity indices/metrics |
(ii) Mean value of diversity metrics in group 1 (only alpha) |
(iii) Mean value of diversity metrics in group 2 (only alpha) |
(iv) Uncorrected p-values (Welch’s t-test) for difference in mean |
(v) Corrected p-values (Welch’s t-test with two-stage Benjamini-Hochberg correction) for difference in mean |
(vi) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold |
(vii) Effect size as Cohen’s d (ANOSIM R for beta; F statistics for PERMANOVA and Shannon/Simpson (ANOVA))* |
= get_biodiversity(test_df, group1 = [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
res = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], motifs = True, paired = True)
group2 res
You're working with an alpha of 0.044390023979542614 that has been adjusted for your sample size of 40.
Metric | Group1 mean | Group2 mean | p-val | Effect size | corr p-val | significant | |
---|---|---|---|---|---|---|---|
0 | simpson_diversity | 0.876756 | 0.874348 | 0.000443 | -0.948203 | 0.000443 | True |
1 | shannon_diversity | 2.244523 | 2.225758 | 0.001255 | -0.846077 | 0.001255 | True |
2 | Beta diversity (ANOSIM) | NaN | NaN | 0.002002 | 0.145276 | 0.002002 | True |
3 | Beta diversity (PERMANOVA) | NaN | NaN | 0.003003 | 43.547762 | 0.003003 | True |
4 | species_richness | 15.000000 | 15.000000 | 1.000000 | 0.000000 | 1.000000 | False |
get_SparCC
get_SparCC (df1, df2, motifs=False, feature_set=['known', 'exhaustive'], custom_motifs=[], transform=None, gamma=0.1, partial_correlations=False)
*Performs SparCC (Sparse Correlations for Compositional Data) on two (glycomics) datasets. Samples should be in the same order.
Arguments: |
---|
df1 (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
df2 (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
partial_correlations (bool): whether to use regularized partial correlations instead (enriches for direct effects); default:False |
Returns: |
---|
Returns (i) a dataframe of pairwise correlations (Spearman’s rho) |
and (ii) a dataframe with corrected p-values (two-stage Benjamini-Hochberg)* |
= glycomics_data_loader.time_series_N_PMID32149347
df1 = glycomics_data_loader.time_series_O_PMID32149347
df2 = pd.merge(df1, df2[['ID']], on = 'ID', how = 'inner')
df1 = pd.merge(df2, df1[['ID']], on = 'ID', how = 'inner')
df2 = df1.set_index(df1.columns.tolist()[0]).T.reset_index()
df1 = df2.set_index(df2.columns.tolist()[0]).T.reset_index()
df2
= get_SparCC(df1, df2, motifs = True, transform = "CLR")
corr, pval sns.clustermap(corr)
You're working with an alpha of 0.04787928055709467 that has been adjusted for your sample size of 31.
get_roc
get_roc (df, group1, group2, motifs=False, feature_set=['known', 'exhaustive'], paired=False, impute=True, min_samples=0.1, custom_motifs=[], transform=None, gamma=0.1, custom_scale=0, filepath='', multi_score=False)
*Calculates ROC AUC for every feature and, optionally, plots the best
Arguments: |
---|
df (dataframe): dataframe containing glycan sequences in first column and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
group1 (list): list of column indices or names for the first group of samples, usually the control |
group2 (list): list of column indices or names for the second group of samples (note, if an empty list is provided, group 1 can be used a list of group identifiers for each column - e.g., [1,1,2,2,3,3…]) |
motifs (bool): whether to analyze full sequences (False) or motifs (True); default:False |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False |
impute (bool): replaces zeroes with a Random Forest based model; default:True |
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 10% |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
transform (str): transformation to escape Aitchison space; options are CLR and ALR (use ALR if you have many glycans (>100) with low values); default:will be inferred |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate) |
filepath (string): absolute path including full filename allows for saving the plot, if plot=True |
multi_score (bool): whether to find the best glycan risk score, containing multiple glycan features; default:False |
Returns: |
---|
Returns a sorted list of tuples of type (glycan, AUC score) and, optionally, ROC curve for best feature* |
= [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39],
get_roc(test_df, group1 = [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40], motifs = True, paired = True) group2
[('GlcNAc6S(b1-6)GalNAc', np.float64(0.765)),
('Neu5Ac(a2-3)Gal', np.float64(0.685)),
('Neu5Ac', np.float64(0.66)),
('Oglycan_core1', np.float64(0.6325)),
('Gal', np.float64(0.61)),
('Gal(b1-3)GalNAc', np.float64(0.6)),
('GalNAc', np.float64(0.595)),
('Mucin_elongated_core2', np.float64(0.515)),
('Disialyl_T_antigen', np.float64(0.48000000000000004)),
('Neu5Ac(a2-6)GalNAc', np.float64(0.46)),
('Neu5Ac(a2-8)Neu5Ac', np.float64(0.36250000000000004)),
('Terminal_LacNAc_type2', np.float64(0.28)),
('H_type2', np.float64(0.26250000000000007)),
('GalOS(b1-3)GalNAc', np.float64(0.2375))]
get_lectin_array
get_lectin_array (df, group1, group2, paired=False, transform='')
*Function for analyzing lectin array data for two or more groups.
Arguments: |
---|
df (dataframe): dataframe containing samples as rows and lectins as columns [alternative: filepath to .csv or .xlsx] |
group1 (list): list of indices or names for the first group of samples, usually the control |
group2 (list): list of indices or names for the second group of samples (note, if an empty list is provided, group 1 can be used a list of group identifiers for each column - e.g., [1,1,2,2,3,3…]) |
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False |
transform (string): optional data-processing, “log2” transforms df with np.log2; default:nothing |
Returns: |
---|
Returns an output dataframe with: |
(i) Deduced glycan motifs altered between groups |
(ii) human names for features identified in the motifs from (i) |
(iii) Lectins supporting the change in (i) |
(iv) Direction of the change (e.g., “up” means higher in group2) |
(v) Score/Magnitude of the change (remember, if you have more than two groups this reports on any pairwise combination, like an ANOVA) |
(vi) Clustering of the scores into highly/moderate/low significance findings* |
= lectin_array_data_loader.A549_influenza_PMID33046650
lectin_df 5,6,7], [8,9,10]) get_lectin_array(lectin_df, [
Lectin "Ab-LeB-1" is not found in our annotated lectin library and is excluded from analysis.
Lectin "APA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "APP" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Blood Group B [CLCP-19B]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Blood Group H2" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CA19-9 [121SLE]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CCA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CD15 [ICRF29-2]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CD15 [MY-1]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "CD15 [SP-159]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Frossman" is not found in our annotated lectin library and is excluded from analysis.
Lectin "IAA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "IRA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Le X [P12]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Lewis A [7LE]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Lewis B [218]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "Lewis Y [F3]" is not found in our annotated lectin library and is excluded from analysis.
Lectin "LFA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "LPA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "MNA-M " is not found in our annotated lectin library and is excluded from analysis.
Lectin "MUC5Ac Ab" is not found in our annotated lectin library and is excluded from analysis.
Lectin "PMA" is not found in our annotated lectin library and is excluded from analysis.
Lectin "PTA_1" is not found in our annotated lectin library and is excluded from analysis.
Lectin "PTA_2" is not found in our annotated lectin library and is excluded from analysis.
Lectin "SNA-S" is not found in our annotated lectin library and is excluded from analysis.
Lectin "SNA-V" is not found in our annotated lectin library and is excluded from analysis.
Lectin "VFA" is not found in our annotated lectin library and is excluded from analysis.
motif | named_motifs | lectin(s) | change | score | significance | |
---|---|---|---|---|---|---|
39 | Neu5Ac(a2-6)Gal(b1-3)GlcNAc | [Internal_LacNAc_type1] | PSL, SNA, TJA-I, BDA, BPA, WGA_1, WGA_2 | down | 9.08 | highly significant |
38 | Neu5Ac(a2-6)Gal(b1-4)GlcNAc | [Internal_LacNAc_type2] | PSL, SNA, TJA-I, BDA, BPA, ECA, RCA120, Ricin ... | down | 8.57 | highly significant |
7 | Man(a1-2) | [] | ASA, Con A, CVN, HHL, SVN_1, GRFT, SVN_2, SNA-... | up | 4.87 | moderately significant |
14 | Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc... | [Chitobiose, Trimannosylcore, Terminal_LacNAc_... | CA, CAA, DSA_1, DSA_2, DSA_3, AMA, BDA, BPA, C... | up | 3.57 | moderately significant |
4 | Gal(b1-3)GalNAc | [Oglycan_core1] | ACA, AIA, MPA, PNA_1, PNA_2, BDA, BPA | up | 3.49 | moderately significant |
10 | Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-4)][G... | [Chitobiose, Trimannosylcore, Terminal_LacNAc_... | Blackbean, Calsepa, PHA-E_1, PHA-E_2, AMA, BDA... | up | 2.77 | moderately significant |
16 | Fuc(a1-2)Gal(b1-3)GalNAc(b1-4)[Neu5Ac(a2-3)]Ga... | [Internal_LacNAc_type2, H_type3] | Cholera Toxin, AAA, AAL, ACA, AIA, AOL, BDA, B... | up | 2.54 | moderately significant |
47 | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Ma... | [Chitobiose, Trimannosylcore, core_fucose, Ngl... | TL, AAL, AMA, AOL, Con A, GNA, GNL, HHL, LcH, ... | up | 2.50 | moderately significant |
15 | Gal(b1-3)GalNAc(b1-4)[Neu5Ac(a2-3)]Gal(b1-4)Gl... | [Internal_LacNAc_type2] | Cholera Toxin, ACA, AIA, BDA, BPA, CSA, ECA, L... | up | 2.49 | moderately significant |
18 | Man(a1-6) | [] | Con A, GNA, GNL, HHL, NPA, SNA-II, UDA | up | 2.35 | moderately significant |
17 | Man(a1-3) | [] | Con A, GNA, GNL, HHL, NPA, SNA-II, UDA | up | 2.35 | moderately significant |
43 | Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc | [Internal_LacdiNAc_type2] | SNA, CSA, SBA, VVA_1, VVA_2, WFA, BPA, ECA, ST... | down | 2.30 | moderately significant |
22 | Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-4)]Ma... | [Chitobiose, Trimannosylcore, Terminal_LacNAc_... | DSA_1, DSA_2, DSA_3, AMA, BDA, Blackbean, BPA,... | up | 2.17 | moderately significant |
46 | Fuc(a1-2)Gal(b1-3)GalNAc | [H_type3, Oglycan_core1] | TJA-II, AAA, AAL, ACA, AIA, AOL, BDA, BPA, MPA... | up | 1.99 | moderately significant |
3 | Fuc(a1-6) | [] | AAL, AOL, LcH, PSA | up | 1.82 | moderately significant |
6 | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | [Chitobiose, Trimannosylcore] | AMA, Con A, GNA, GNL, HHL, NPA, SNA-II, UDA, W... | up | 1.62 | moderately significant |
34 | Neu5Ac(a2-3)Gal(b1-3)GalNAc | [Oglycan_core1] | MAL-II, ACA, AIA, BDA, BPA, MPA, PNA_1, PNA_2,... | up | 1.60 | moderately significant |
11 | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)[GlcNAc(b1-6... | [Chitobiose, Trimannosylcore, Nglycan_complex] | Blackbean, PHA-L, AMA, Con A, GNA, GNL, HHL, N... | up | 1.54 | moderately significant |
42 | GlcNAc(b1-2)[GlcNAc(b1-6)]Man(a1-6)[GlcNAc(b1-... | [Chitobiose, Trimannosylcore, bisectingGlcNAc,... | RPA, AMA, Blackbean, Con A, GNA, GNL, HHL, NPA... | up | 1.47 | moderately significant |
41 | GlcNAc(b1-2)[GlcNAc(b1-4)]Man(a1-3)[GlcNAc(b1-... | [Chitobiose, Trimannosylcore, bisectingGlcNAc,... | RPA, AMA, Con A, GNA, GNL, HHL, NPA, SNA-II, U... | up | 1.40 | moderately significant |
23 | Gal(b1-4)GlcNAc | [Terminal_LacNAc_type2] | ECA, RCA120, Ricin B Chain, SJA, BDA, BPA | up | 1.06 | low significance |
5 | GlcNAc(b1-3)GalNAc | [Oglycan_core3] | AIA, UEA-II, WGA_1, WGA_2 | up | 0.86 | low significance |
26 | Gal(a1-3) | [] | GS-I_1, GS-I_2, GS-I_3, GS-I_4, MNA-G, PA-IL | up | 0.83 | low significance |
27 | Gal(a1-4) | [] | GS-I_1, GS-I_2, GS-I_3, GS-I_4, MNA-G, PA-IL | up | 0.83 | low significance |
30 | Gal(b1-4)GlcNAc(b1-3) | [Terminal_LacNAc_type2] | LEA_1, LEA_2, STA, BDA, BPA, ECA, RCA120, Rici... | up | 0.55 | low significance |
25 | Gal(a1-3)Gal | [] | EEA, EEL, MOA, GS-I_1, GS-I_2, GS-I_3, GS-I_4,... | up | 0.53 | low significance |
33 | Neu5Ac(a2-3)Gal(b1-4)GlcNAc | [Internal_LacNAc_type2] | MAA_1, MAA_2, MAL-I, BDA, BPA, ECA, RCA120, Ri... | up | 0.50 | low significance |
37 | Gal(a1-3)GalNAc | [] | MOA, EEA, EEL, GS-I_1, GS-I_2, GS-I_3, GS-I_4,... | up | 0.47 | low significance |
20 | GalNAc(a1-4) | [] | GHA, HAA, HPA, CSA, GS-I_1, GS-I_2, GS-I_3, GS... | up | 0.41 | low significance |
19 | GalNAc(a1-3) | [] | GHA, HAA, HPA, CSA, GS-I_1, GS-I_2, GS-I_3, GS... | up | 0.41 | low significance |
21 | GalNAc(a1-3)GalNAc(b1-3) | [] | DBA, SBA, CSA, GHA, HAA, HPA, VVA_1, VVA_2, WF... | up | 0.27 | low significance |
24 | GalNAc(b1-4)GlcNAc | [Terminal_LacdiNAc_type2] | ECA, STA, CSA, SBA, VVA_1, VVA_2, WFA, BPA, WG... | up | 0.21 | low significance |
44 | Fuc(a1-2)Gal(b1-4)GalNAc(b1-3) | [] | SNA-II, AAA, AAL, AOL, BDA, BPA, CSA, SBA, VVA... | up | 0.19 | low significance |
13 | GalNAc(b1-4) | [] | CSA, SBA, VVA_1, VVA_2, WFA, BPA, WGA_1, WGA_2 | up | 0.16 | low significance |
12 | GalNAc(b1-3) | [] | CSA, SBA, VVA_1, VVA_2, WFA, BPA, WGA_1, WGA_2 | up | 0.16 | low significance |
40 | Fuc(a1-2)Gal(b1-4)GlcNAc | [H_type2, Internal_LacNAc_type2] | PTL-II, TJA-II, UEA-I, UEA-II, AAA, AAL, AOL, ... | up | 0.14 | low significance |
32 | Gal3S(b1-4)GlcNAc | [] | MAA_1, MAA_2, MAL-I, MAL-II | down | 0.12 | low significance |
28 | GlcNAc(a1-3) | [] | HAA, HPA, WGA_1, WGA_2 | up | 0.12 | low significance |
29 | GlcNAc(a1-4) | [] | HAA, HPA, WGA_1, WGA_2 | up | 0.12 | low significance |
0 | Fuc(a1-2) | [] | AAA, AAL, AOL | up | 0.08 | low significance |
36 | Gal3S(b1-4) | [] | MAL-II | down | 0.08 | low significance |
35 | Gal3S(b1-3) | [] | MAL-II | down | 0.08 | low significance |
49 | Fuc(a1-2)Gal(b1-4)GalNAc | [] | UEA-II, AAA, AAL, AOL, BDA, BPA | up | 0.07 | low significance |
9 | Gal(b1-4) | [] | BDA, BPA | up | 0.06 | low significance |
8 | Gal(b1-3) | [] | BDA, BPA | up | 0.06 | low significance |
2 | Fuc(a1-4) | [] | AAL, AOL | down | 0.04 | low significance |
1 | Fuc(a1-3) | [] | AAL, AOL, Lotus | down | 0.04 | low significance |
31 | GlcNAc(b1-4)GlcNAc(b1-4) | [Chitobiose] | LEA_1, LEA_2, WGA_1, WGA_2 | down | 0.01 | low significance |
50 | GlcNAc(b1-3) | [] | WGA_1, WGA_2 | down | 0.01 | low significance |
51 | GlcNAc(b1-4) | [] | WGA_1, WGA_2 | down | 0.01 | low significance |
45 | GlcNAc(b1-4)GlcNAc(b1-4)GlcNAc(b1-4) | [Chitobiose] | STA, LEA_1, LEA_2, WGA_1, WGA_2 | down | 0.00 | low significance |
48 | GlcNAc(b1-3)Gal | [] | UEA-II, WGA_1, WGA_2 | up | 0.00 | low significance |
52 | Neu5Ac(a2-3) | [] | WGA_1, WGA_2 | down | 0.00 | low significance |
53 | Neu5Ac(a2-6) | [] | WGA_1, WGA_2 | down | 0.00 | low significance |
54 | Neu5Ac(a2-8) | [] | WGA_1, WGA_2 | down | 0.00 | low significance |
get_glycoshift_per_site
get_glycoshift_per_site (df, group1, group2, paired=False, impute=True, min_samples=0.2, gamma=0.1, custom_scale=0)
*Calculates differentially expressed glycans or motifs from glycoproteomics data
Arguments: |
---|
df (dataframe): glycoproteomics dataset, expects first column to be formatted as protein_site_composition and relative abundances in subsequent columns [alternative: filepath to .csv or .xlsx] |
group1 (list): list of column indices or names for the first group of samples, usually the control |
group2 (list): list of column indices or names for the second group of samples |
paired (bool): whether samples are paired or not (e.g., tumor & tumor-adjacent tissue from same patient); default:False |
impute (bool): replaces zeroes with a Random Forest based model; default:True |
min_samples (float): Percent of the samples that need to have non-zero values for glycan to be kept; default: 20% |
gamma (float): uncertainty parameter to estimate scale uncertainty for CLR transformation; default: 0.1 |
custom_scale (float or dict): Ratio of total signal in group2/group1 for an informed scale model (or group_idx: mean(group)/min(mean(groups)) signal dict for multivariate) |
Returns: |
---|
Returns a dataframe with: |
(for each condition/interaction feature) |
(i) Regression coefficient from the GLM (indicating direction of change in the treatment condition) |
(ii) Corrected p-values (two-tailed t-test with two-stage Benjamini-Hochberg correction) for testing the coefficient against zero |
(iii) Significance: True/False of whether the corrected p-value lies below the sample size-appropriate significance threshold* |
= glycoproteomics_data_loader.human_milk_N_PMID34087070
df_milk
'Colostrum1', 'Colostrum2', 'Colostrum3'], ['Mature1', 'Mature2', 'Mature3']) get_glycoshift_per_site(df_milk, [
You're working with an alpha of 0.07862467893233027 that has been adjusted for your sample size of 6.
Condition_coefficient | Condition_corr_pval | Condition_significant | Hex_Condition_coefficient | Hex_Condition_corr_pval | Hex_Condition_significant | HexNAc_Condition_coefficient | HexNAc_Condition_corr_pval | HexNAc_Condition_significant | complex_Condition_coefficient | ... | Neu5Ac_Condition_significant | dHex_Condition_coefficient | dHex_Condition_corr_pval | dHex_Condition_significant | high_Man_Condition_coefficient | high_Man_Condition_corr_pval | high_Man_Condition_significant | hybrid_Condition_coefficient | hybrid_Condition_corr_pval | hybrid_Condition_significant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sp|P10909|CLUS_103 | -0.154462 | 6.132844e-267 | True | -0.772309 | 6.132844e-267 | True | -0.617847 | 6.132844e-267 | True | 4.615035 | ... | True | -0.154462 | 5.913814e-267 | True | 0.000000 | 1.000000e+00 | False | -4.769497 | 0.000000e+00 | True |
sp|P01024|CO3_85 | -12.526980 | 2.581301e-204 | True | 11.700922 | 1.551607e-205 | True | -25.053959 | 2.581301e-204 | True | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | -12.526980 | 6.084495e-204 | True | -12.526980 | 2.396922e-204 | True |
sp|P47710|CASA1_69 | 0.290159 | 5.449434e-31 | True | -1.271521 | 2.240400e-31 | True | 1.160635 | 5.449434e-31 | True | 0.000000 | ... | True | 3.012474 | 5.268244e-32 | True | 0.000000 | 1.000000e+00 | False | 0.290159 | 3.795142e-31 | True |
sp|Q08380|LG3BP_125 | 0.001841 | 2.313991e-04 | True | 0.009204 | 2.313991e-04 | True | 0.007364 | 2.313991e-04 | True | 0.000000 | ... | True | 0.001841 | 1.487566e-04 | True | 0.000000 | 1.000000e+00 | False | 0.001841 | 1.432471e-04 | True |
sp|P00709|LALBA_90 | -1.353837 | 4.248180e-04 | True | 4.058993 | 4.323787e-03 | True | -5.415348 | 4.248180e-04 | True | -0.734612 | ... | True | 6.595320 | 2.009174e-20 | True | 0.000000 | 1.000000e+00 | False | -0.619225 | 5.502628e-01 | False |
sp|Q13410|BT1A1_55 | -16.641783 | 7.593800e-04 | True | -0.699019 | 3.161822e-01 | False | 15.942763 | 4.284923e-04 | True | -4.052916 | ... | True | -6.082883 | 6.878363e-03 | True | 0.000000 | 1.000000e+00 | False | -12.588866 | 1.116992e-02 | True |
sp|P19652|A1AG2_HUMAN/sp|P02763|A1AG1 | -0.000937 | 3.594687e-01 | False | -0.004685 | 2.981904e-01 | False | -0.003748 | 2.614318e-01 | False | -0.000937 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P07602|SAP_101 | 0.002402 | 3.594687e-01 | False | 0.012011 | 2.981904e-01 | False | 0.009609 | 2.614318e-01 | False | 0.000000 | ... | False | 0.002402 | 3.081160e-01 | False | 0.000000 | 1.000000e+00 | False | 0.002402 | 2.967043e-01 | False |
sp|P06858|LIPL_70 | -0.001105 | 3.975872e-01 | False | -0.005524 | 2.981904e-01 | False | -0.004419 | 2.752527e-01 | False | -0.001105 | ... | False | -0.001105 | 3.450489e-01 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P01833|PIGR_421 | 5.832778 | 4.195389e-01 | False | -5.697931 | 9.086043e-03 | True | 5.339127 | 4.356248e-02 | True | 0.000000 | ... | False | -1.795784 | 3.877522e-01 | False | 0.000000 | 1.000000e+00 | False | 5.832778 | 3.246432e-01 | False |
sp|P07602|SAP_215 | -0.008382 | 4.586658e-01 | False | -0.016764 | 3.363550e-01 | False | -0.016764 | 3.485725e-01 | False | 0.000000 | ... | False | -0.008382 | 3.877522e-01 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P01833|PIGR_186 | -0.008637 | 4.917509e-01 | False | 0.032629 | 5.341361e-01 | False | -0.034550 | 3.688132e-01 | False | -0.084454 | ... | False | 0.075817 | 4.506151e-01 | False | 0.000000 | 1.000000e+00 | False | 0.075817 | 4.776920e-01 | False |
sp|P00738|HPT_241 | 0.001039 | 5.165593e-01 | False | 0.005195 | 4.017684e-01 | False | 0.004156 | 4.017684e-01 | False | 0.001039 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P10909|CLUS_86 | 0.001587 | 5.165593e-01 | False | 0.007935 | 4.017684e-01 | False | 0.006348 | 4.017684e-01 | False | 0.000000 | ... | False | 0.001587 | 4.506151e-01 | False | 0.000000 | 1.000000e+00 | False | 0.001587 | 4.790272e-01 | False |
sp|P02788|TRFL_497 | 0.345354 | 5.925908e-01 | False | 1.804335 | 3.415734e-01 | False | 1.381416 | 4.581199e-01 | False | -11.595332 | ... | False | -0.304229 | 7.941176e-01 | False | 0.000000 | 1.000000e+00 | False | -14.451221 | 1.982223e-06 | True |
sp|P02788|TRFL_156 | -4.029533 | 5.925908e-01 | False | 3.516740 | 2.958771e-01 | False | -4.899001 | 5.275917e-02 | True | -0.165193 | ... | False | 4.741650 | 2.666697e-08 | True | 0.000000 | 1.000000e+00 | False | -3.864340 | 3.246432e-01 | False |
sp|P0C0L5|CO4B_HUMAN/sp|P0C0L4|CO4A | 0.000646 | 5.925908e-01 | False | 0.005811 | 5.302128e-01 | False | 0.001291 | 4.797163e-01 | False | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000646 | 9.705882e-01 | False | 0.000646 | 5.502628e-01 | False |
sp|P01833|PIGR_469 | -4.319860 | 7.288590e-01 | False | -2.844144 | 2.981904e-01 | False | 2.497897 | 3.485725e-01 | False | 16.293140 | ... | False | 0.273577 | 7.941176e-01 | False | 0.000000 | 1.000000e+00 | False | 10.667323 | 7.917684e-03 | True |
sp|P01876|IGHA1_340 | 5.404028 | 7.288590e-01 | False | -0.747848 | 7.306030e-01 | False | -0.523444 | 6.448328e-01 | False | -6.757407 | ... | True | -1.901116 | 4.506151e-01 | False | 3.007233 | 9.705882e-01 | False | -1.767393 | 7.124186e-01 | False |
sp|P10909|CLUS_374 | -0.000942 | 7.288590e-01 | False | -0.004711 | 6.337904e-01 | False | -0.003769 | 6.337904e-01 | False | 0.000000 | ... | False | -0.001884 | 7.809203e-01 | False | 0.000000 | 1.000000e+00 | False | -0.000942 | 7.124186e-01 | False |
sp|P01591|IGJ_71 | 2.236056 | 7.376554e-01 | False | -1.364109 | 2.981904e-01 | False | 0.576360 | 4.024670e-01 | False | -1.475300 | ... | False | 0.527272 | 5.322575e-01 | False | 0.000000 | 1.000000e+00 | False | 1.397952 | 5.246436e-01 | False |
sp|P08571|CD14_151 | -0.000794 | 7.376554e-01 | False | -0.004761 | 6.717008e-01 | False | -0.001587 | 6.448328e-01 | False | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | -0.000794 | 9.705882e-01 | False | -0.000794 | 7.128253e-01 | False |
sp|P01871|IGHM_46 | 0.000313 | 7.896314e-01 | False | 0.001567 | 7.306030e-01 | False | 0.001253 | 7.306030e-01 | False | 0.000000 | ... | False | 0.000313 | 7.941176e-01 | False | 0.000000 | 1.000000e+00 | False | 0.000313 | 7.647059e-01 | False |
sp|P02765|FETUA_156 | 0.000313 | 7.896314e-01 | False | 0.001563 | 7.431825e-01 | False | 0.001251 | 7.657032e-01 | False | 0.000313 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P02749|APOH_253 | 0.000138 | 7.896314e-01 | False | 0.000690 | 7.807170e-01 | False | 0.000552 | 7.807170e-01 | False | 0.000138 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P10909|CLUS_291 | 0.000169 | 8.188730e-01 | False | 0.000843 | 8.188730e-01 | False | 0.000506 | 8.188730e-01 | False | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000169 | 8.188730e-01 | False |
sp|P01833|PIGR_499 | -2.354848 | 8.373113e-01 | False | -1.412526 | 6.076487e-01 | False | 3.270966 | 2.715501e-01 | False | -1.166694 | ... | False | 3.649581 | 8.212144e-02 | False | 0.000000 | 1.000000e+00 | False | -2.378288 | 7.128253e-01 | False |
sp|P07602|SAP_426 | 0.000803 | 8.570498e-01 | False | 0.004013 | 8.570498e-01 | False | 0.001605 | 8.570498e-01 | False | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000803 | 8.570498e-01 | False |
sp|P02790|HEMO_453 | 0.000150 | 8.857313e-01 | False | 0.000750 | 8.857313e-01 | False | 0.000600 | 8.857313e-01 | False | 0.000150 | ... | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False | 0.000000 | 1.000000e+00 | False |
sp|P25311|ZA2G_109 | 0.002373 | 8.858231e-01 | False | 0.011864 | 8.858231e-01 | False | 0.009491 | 8.858231e-01 | False | -0.079942 | ... | False | 0.002373 | 8.858231e-01 | False | 0.000000 | 1.000000e+00 | False | 0.082314 | 8.203274e-01 | False |
sp|P0DOX2|IGA2_HUMAN/sp|P01877|IGHA2 | 0.687611 | 8.875324e-01 | False | 0.862922 | 5.341361e-01 | False | -0.427391 | 5.598894e-01 | False | -5.206276 | ... | False | -1.255648 | 3.877522e-01 | False | -2.187064 | 9.705882e-01 | False | -2.559791 | 3.246432e-01 | False |
sp|P01011|AACT_106 | 0.001222 | 9.024359e-01 | False | 0.006108 | 9.024359e-01 | False | 0.004886 | 9.024359e-01 | False | -2.853751 | ... | True | 2.856194 | 5.926258e-33 | True | 0.000000 | 1.000000e+00 | False | 2.854973 | 3.096444e-33 | True |
sp|P01877|IGHA2_327 | -0.197039 | 9.317804e-01 | False | -0.332926 | 9.003150e-01 | False | -0.394079 | 9.317804e-01 | False | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | 4.492679 | 9.705882e-01 | False | -0.197039 | 9.317804e-01 | False |
sp|Q08431|MFGM_238 | 0.000435 | 9.985248e-01 | False | 0.115394 | 3.363550e-01 | False | -0.300049 | 2.614318e-01 | False | 0.000000 | ... | False | 0.000000 | 1.000000e+00 | False | 0.207069 | 9.705882e-01 | False | 0.000435 | 9.985248e-01 | False |
34 rows × 24 columns
annotate
extract curated motifs, graph features, and sequence features from glycan sequences
annotate_glycan
annotate_glycan (glycan, motifs=None, termini_list=[], gmotifs=None)
*searches for known motifs in glycan sequence
Arguments: |
---|
glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent |
motifs (dataframe): dataframe of glycan motifs (name + sequence), can be used with a list of glycans too; default:motif_list |
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’) |
gmotifs (networkx): precalculated motif graphs for speed-up; default:None |
Returns: |
---|
Returns dataframe with counts of motifs in glycan* |
"Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc") annotate_glycan(
motif_name | Terminal_LewisX | Internal_LewisX | LewisY | SialylLewisX | SulfoSialylLewisX | Terminal_LewisA | Internal_LewisA | LewisB | SialylLewisA | SulfoLewisA | ... | Arabinogalactan_type1 | Galactomannan | Tetraantennary_Nglycan | Mucin_elongated_core2 | Fucoidan | Alginate | FG | XX | Difucosylated_core | GalFuc_core |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 rows × 156 columns
annotate_dataset
annotate_dataset (glycans, motifs=None, feature_set=['known'], termini_list=[], condense=False, custom_motifs=[])
*wrapper function to annotate motifs in list of glycans
Arguments: |
---|
glycans (list): list of IUPAC-condensed glycan sequences as strings |
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is ‘known’; options are: ‘known’ (hand-crafted glycan features), |
termini_list (list): list of monosaccharide/linkage positions for motifs (from ‘terminal’, ‘internal’, and ‘flexible’) |
condense (bool): if True, throws away columns with only zeroes; default:False |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
Returns: |
---|
Returns dataframe of glycans (rows) and presence/absence of known motifs (columns)* |
= ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
glycans 'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P']
print("Annotate Test")
= annotate_dataset(glycans) out
Annotate Test
motif_name | Terminal_LewisX | Internal_LewisX | LewisY | SialylLewisX | SulfoSialylLewisX | Terminal_LewisA | Internal_LewisA | LewisB | SialylLewisA | SulfoLewisA | H_type2 | H_type1 | A_antigen | B_antigen | Galili_antigen | GloboH | Gb5 | Gb4 | Gb3 | 3SGb3 | 8DSGb3 | 3SGb4 | 8DSGb4 | 6DSGb4 | 3SGb5 | 8DSGb5 | 6DSGb5 | 6DSGb5_2 | 6SGb3 | 8DSGb3_2 | 6SGb4 | 8DSGb4_2 | 6SGb5 | 8DSGb5_2 | 66DSGb5 | Forssman_antigen | iGb3 | I_antigen | i_antigen | PI_antigen | Chitobiose | Trimannosylcore | Internal_LacNAc_type1 | Terminal_LacNAc_type1 | Internal_LacNAc_type2 | Terminal_LacNAc_type2 | Internal_LacdiNAc_type1 | Terminal_LacdiNAc_type1 | Internal_LacdiNAc_type2 | Terminal_LacdiNAc_type2 | bisectingGlcNAc | VIM | PolyLacNAc | Ganglio_Series | Lacto_Series(LewisC) | NeoLacto_Series | betaGlucan | KeratanSulfate | Hyluronan | Mollu_series | Arthro_series | Cellulose_like | Chondroitin_4S | GPI_anchor | Isoglobo_series | LewisD | Globo_series | Sda | SDA | Muco_series | Heparin | Peptidoglycan | Dermatansulfate | CAD | Lactosylceramide | Lactotriaosylceramide | LexLex | GM3 | H_type3 | GM2 | GM1 | cisGM1 | VIM2 | GD3 | GD1a | GD2 | GD1b | SDLex | Nglycolyl_GM2 | Fuc_LN3 | GT1b | GD1 | GD1a_2 | LcGg4 | GT3 | Disialyl_T_antigen | GT1a | GT2 | GT1c | 2Fuc_GM1 | GQ1c | O_linked_mannose | GT1aa | GQ1b | HNK1 | GQ1ba | O_mannose_Lex | 2Fuc_GD1b | Sialopentaosylceramide | Sulfogangliotetraosylceramide | B-GM1 | GQ1aa | bisSulfo-Lewis x | para-Forssman | core_fucose | core_fucose(a1-3) | GP1c | B-GD1b | GP1ca | Isoglobotetraosylceramide | polySia | high_mannose | Gala_series | LPS_core | Nglycan_complex | Nglycan_complex2 | Oglycan_core1 | Oglycan_core2 | Oglycan_core3 | Oglycan_core4 | Oglycan_core5 | Oglycan_core6 | Oglycan_core7 | Xylogalacturonan | Sialosylparagloboside | LDNF | OFuc | Arabinogalactan_type2 | EGF_repeat | Nglycan_hybrid | Arabinan | Xyloglucan | Acharan_Sulfate | M3FX | M3X | 1-6betaGalactan | Arabinogalactan_type1 | Galactomannan | Tetraantennary_Nglycan | Mucin_elongated_core2 | Fucoidan | Alginate | FG | XX | Difucosylated_core | GalFuc_core |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
quantify_motifs
quantify_motifs (df, glycans, feature_set, custom_motifs=[], remove_redundant=True)
*Extracts and quantifies motifs for a dataset
Arguments: |
---|
df (dataframe): dataframe containing relative abundances (each sample one column) [alternative: filepath to .csv or .xlsx] |
glycans(list): glycans as IUPAC-condensed strings |
feature_set (list): which feature set to use for annotations, add more to list to expand; default is [‘exhaustive’,‘known’]; options are: ‘known’ (hand-crafted glycan features), |
custom_motifs (list): list of glycan motifs, used if feature_set includes ‘custom’; default:empty |
remove_redundant (bool): whether to remove redundant motifs via clean_up_heatmap; default:True |
Returns: |
---|
Returns a pandas DataFrame with motifs as columns and samples as rows* |
1:], test_df.iloc[:, 0].values.tolist(), ['known', 'exhaustive']) quantify_motifs(test_df.iloc[:,
control_1 | tumor_1 | control_2 | tumor_2 | control_3 | tumor_3 | control_4 | tumor_4 | control_5 | tumor_5 | ... | control_16 | tumor_16 | control_17 | tumor_17 | control_18 | tumor_18 | control_19 | tumor_19 | control_20 | tumor_20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Neu5Ac(a2-8)Neu5Ac | 0.084745 | 0.120050 | 0.388219 | 0.055402 | 0.279696 | 0.082135 | 0.369784 | 0.022555 | 0.080158 | 0.084913 | ... | 0.485839 | 0.629202 | 0.535171 | 0.637019 | 0.245015 | 0.127952 | 0.029853 | 0.022643 | 0.219166 | 0.331947 |
GalOS(b1-3)GalNAc | 0.843710 | 1.185047 | 2.152084 | 0.687093 | 1.564450 | 0.381914 | 2.389590 | 0.533142 | 2.497482 | 0.338889 | ... | 2.066978 | 1.088630 | 1.462826 | 2.259636 | 1.687785 | 1.137672 | 0.024033 | 0.117449 | 1.972512 | 1.304717 |
H_type2 | 1.347737 | 0.892651 | 2.468405 | 1.810795 | 1.589162 | 0.449339 | 2.640132 | 0.572828 | 2.763890 | 0.737076 | ... | 1.070249 | 0.647786 | 1.440912 | 1.810304 | 1.722289 | 1.475260 | 4.847788 | 4.552496 | 0.480035 | 0.494123 |
GlcNAc6S(b1-6)GalNAc | 2.707913 | 4.438043 | 6.198123 | 6.684838 | 1.478960 | 11.921934 | 0.892356 | 3.821469 | 4.605009 | 28.210391 | ... | 6.241593 | 11.157860 | 7.997660 | 4.916252 | 0.937290 | 15.269626 | 1.463159 | 0.565249 | 1.251077 | 2.680253 |
Terminal_LacNAc_type2 | 8.845085 | 10.063160 | 13.435501 | 28.834006 | 5.585973 | 11.359659 | 11.672584 | 21.193308 | 12.734919 | 28.597709 | ... | 10.883437 | 17.991155 | 21.166792 | 16.161351 | 11.909325 | 29.924308 | 12.820872 | 19.107379 | 8.802443 | 10.268911 |
Terminal_LacNAc_type2 | 52.982192 | 13.183951 | 24.413523 | 12.870782 | 9.555884 | 9.822266 | 12.628910 | 13.916662 | 26.569737 | 10.733867 | ... | 18.779972 | 12.157928 | 14.828507 | 20.879287 | 27.689619 | 10.734756 | 28.328965 | 37.870847 | 14.835019 | 8.910804 |
Disialyl_T_antigen | 20.803836 | 36.895471 | 32.803297 | 20.401157 | 33.971366 | 30.150599 | 37.703636 | 24.728411 | 31.798990 | 15.989214 | ... | 46.337629 | 39.476930 | 39.087708 | 40.348217 | 35.791797 | 22.968160 | 11.026029 | 2.613718 | 44.676379 | 46.125360 |
Neu5Ac(a2-6)GalNAc | 23.063482 | 39.304399 | 36.644881 | 22.263129 | 36.571122 | 31.229766 | 41.628644 | 26.256121 | 37.088978 | 17.054227 | ... | 50.675599 | 41.982557 | 42.829042 | 46.391984 | 38.682564 | 25.118814 | 11.540028 | 2.937334 | 47.171520 | 48.274238 |
Oglycan_core1 | 37.329013 | 75.567842 | 59.998893 | 57.608119 | 83.293693 | 78.436161 | 73.308916 | 64.356888 | 58.197862 | 60.329536 | ... | 68.269613 | 68.762287 | 62.541874 | 60.699726 | 58.713271 | 58.203265 | 58.826129 | 42.904325 | 74.390026 | 79.515568 |
Neu5Ac(a2-3)Gal | 57.345927 | 94.670033 | 83.675402 | 103.574200 | 91.775344 | 106.231617 | 90.136699 | 98.461821 | 81.110136 | 117.087919 | ... | 97.928245 | 109.749014 | 101.760261 | 93.222423 | 86.403840 | 96.715461 | 80.029183 | 69.040921 | 95.565848 | 99.973512 |
Mucin_elongated_core2 | 61.827277 | 23.247111 | 37.849024 | 41.704788 | 15.141858 | 21.181925 | 24.301494 | 35.109970 | 39.304656 | 39.331576 | ... | 29.663409 | 30.149083 | 35.995300 | 37.040638 | 39.598944 | 40.659064 | 41.149838 | 56.978227 | 23.637462 | 19.179715 |
Neu5Ac | 80.494155 | 134.094482 | 120.708503 | 125.892731 | 128.626161 | 137.543517 | 132.135127 | 124.740497 | 118.279272 | 134.227059 | ... | 149.089683 | 152.360772 | 145.124475 | 140.251427 | 125.331418 | 121.962226 | 91.599064 | 72.000898 | 142.956534 | 148.579697 |
Gal(b1-3)GalNAc | 99.156290 | 98.814953 | 97.847916 | 99.312907 | 98.435550 | 99.618086 | 97.610410 | 99.466858 | 97.502518 | 99.661111 | ... | 97.933022 | 98.911370 | 98.537174 | 97.740364 | 98.312215 | 98.862328 | 99.975967 | 99.882551 | 98.027488 | 98.695283 |
GalNAc | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | ... | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 |
Gal | 163.691481 | 126.500106 | 141.895063 | 147.702533 | 115.056369 | 132.721945 | 122.804259 | 138.398297 | 141.412183 | 167.203077 | ... | 133.838024 | 140.218313 | 142.530133 | 139.697255 | 138.848449 | 154.791018 | 142.588964 | 157.426027 | 122.916027 | 120.555251 |
15 rows × 40 columns
get_k_saccharides
get_k_saccharides (glycans, size=2, up_to=False, just_motifs=False, terminal=False)
*function to retrieve k-saccharides (default:disaccharides) occurring in a list of glycans
Arguments: |
---|
glycans (list): list of glycans in IUPAC-condensed nomenclature |
size (int): number of monosaccharides per -saccharide, default:2 (for disaccharides) |
up_to (bool): in theory: include -saccharides up to size k; in practice: include monosaccharides; default:False |
just_motifs (bool): if you only want the motifs as a nested list, no dataframe with counts; default:False |
terminal (bool): whether to only count terminal subgraphs; default:False |
Returns: |
---|
Returns dataframe with k-saccharide counts (columns) for each glycan (rows)* |
= ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
glycans 'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcN4P(b1-6)GlcN4P']
= get_k_saccharides(glycans, size = 3) out
GalNAc(a1-4)GlcNAcA(a1-4)Kdo | GlcN(b1-7)Kdo(a2-5)Kdo | GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAcA(a1-4)Kdo(a2-5)Kdo | GlcNAcA(a1-4)[GlcN(b1-7)]Kdo | Kdo(a2-4)Kdo(a2-6)GlcN4P | Kdo(a2-5)Kdo(a2-6)GlcN4P | Kdo(a2-5)[Kdo(a2-4)]Kdo | Kdo(a2-6)GlcN4P(b1-6)GlcN4P | Kdo(a2-?)Kdo(a2-?)GlcN4P | Man(a1-2)Man(a1-2)Man | Man(a1-2)Man(a1-3)Man | Man(a1-3)Man(a1-6)Man | Man(a1-3)Man(b1-4)GlcNAc | Man(a1-3)[Man(a1-6)]Man | Man(a1-3)[Xyl(b1-2)]Man | Man(a1-6)Man(b1-4)GlcNAc | Man(a1-6)[Xyl(b1-2)]Man | Man(a1-?)Man(a1-?)Man | Man(a1-?)Man(b1-?)GlcNAc | Man(a1-?)[Xyl(b1-?)]Man | Man(b1-4)GlcNAc(b1-4)GlcNAc | Xyl(b1-2)Man(b1-4)GlcNAc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 2 | 2 | 1 | 1 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 0 | 0 | 1 | 0 | 4 | 2 | 0 | 1 | 0 |
2 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
get_terminal_structures
get_terminal_structures (glycan, size=1)
*returns terminal structures from all non-reducing ends (monosaccharide+linkage)
Arguments: |
---|
glycan (string or networkx): glycan in IUPAC-condensed nomenclature or as networkx graph |
size (int): how large the extracted motif should be in terms of monosaccharides (for now 1 or 2 are supported; |
Returns: |
---|
Returns a list of terminal structures (strings)* |
"Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc") get_terminal_structures(
['Neu5Ac(a2-3)', 'Neu5Ac(a2-6)']
get_molecular_properties
get_molecular_properties (glycan_list, verbose=False, placeholder=False)
*given a list of glycans, uses pubchempy to return various molecular parameters retrieved from PubChem
Arguments: |
---|
glycan_list (list): list of glycans in IUPAC-condensed |
verbose (bool): set True to print SMILES not found on PubChem; default:False |
placeholder (bool): whether failed requests should return dummy values or be dropped; default:False |
Returns: |
---|
Returns a dataframe with all the molecular parameters retrieved from PubChem* |
= get_molecular_properties(["Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"]) out
h_bond_acceptor_count | molecular_weight | atom_stereo_count | rotatable_bond_count | undefined_bond_stereo_count | complexity | defined_atom_stereo_count | exact_mass | h_bond_donor_count | xlogp | tpsa | undefined_atom_stereo_count | monoisotopic_mass | isotope_atom_count | defined_bond_stereo_count | charge | covalent_unit_count | heavy_atom_count | bond_stereo_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 62 | 2224.0 | 57 | 43 | 0 | 4410 | 56 | 2222.7830048 | 39 | -23.600000 | 1070 | 1 | 2222.7830048 | 0 | 0 | 0 | 1 | 152 | 0 |
graph
convert glycan sequences to graphs and contains helper functions to search for motifs / check whether two sequences describe the same sequence, etc.
glycan_to_nxGraph
glycan_to_nxGraph (glycan, libr=None, termini='ignore', termini_list=None)
*wrapper for converting glycans into networkx graphs; also works with floating substituents
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed format |
libr (dict): dictionary of form glycoletter:index |
termini (string): whether to encode terminal/internal position of monosaccharides, ‘ignore’ for skipping, ‘calc’ for automatic annotation, or ‘provided’ if this information is provided in termini_list; default:‘ignore’ |
termini_list (list): list of monosaccharide positions (from ‘terminal’, ‘internal’, and ‘flexible’) |
Returns: |
---|
Returns networkx graph object of glycan* |
print('Glycan to networkx Graph (only edges printed)')
print(glycan_to_nxGraph('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc').edges())
Glycan to networkx Graph (only edges printed)
[(0, 1), (1, 4), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 10), (8, 9), (9, 10)]
graph_to_string
graph_to_string (graph)
*converts glycan graph back to IUPAC-condensed format
Assumptions: 1. The root node is the one with the highest index.
Arguments: |
---|
graph (networkx object): glycan graph |
Returns: |
---|
Returns glycan in IUPAC-condensed format (string)* |
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')) graph_to_string(glycan_to_nxGraph(
'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
compare_glycans
compare_glycans (glycan_a, glycan_b)
*returns True if glycans are the same and False if not
Arguments: |
---|
glycan_a (string or networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object |
glycan_b (string or networkx object): glycan in IUPAC-condensed format or as a precomputed networkx object |
Returns: |
---|
Returns True if two glycans are the same and False if not* |
print("Graph Isomorphism Test")
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
Graph Isomorphism Test
True
handle_negation..wrapper
handle_negation.<locals>.wrapper (glycan, motif, *args, **kwargs)
print("Subgraph Isomorphism Test")
print(subgraph_isomorphism('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
'Fuc(a1-6)GlcNAc'))
Subgraph Isomorphism Test
True
generate_graph_features
generate_graph_features (glycan, glycan_graph=True, label='network')
*compute graph features of glycan
Arguments: |
---|
glycan (string or networkx object): glycan in IUPAC-condensed format (or glycan network if glycan_graph=False) |
glycan_graph (bool): True expects a glycan, False expects a network (from construct_network); default:True |
label (string): Label to place in output dataframe if glycan_graph=False; default:‘network’ |
Returns: |
---|
Returns a pandas dataframe with different graph features as columns and glycan as row* |
"Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc") generate_graph_features(
diameter | branching | nbrLeaves | avgDeg | varDeg | maxDeg | nbrDeg4 | max_deg_leaves | mean_deg_leaves | deg_assort | ... | flow_edgeMax | flow_edgeMin | flow_edgeAvg | flow_edgeVar | secorderMax | secorderMin | secorderAvg | secorderVar | egap | entropyStation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | 8 | 1 | 3 | 1.818182 | 0.330579 | 3.0 | 0 | 3.0 | 3.0 | -1.850372e-15 | ... | 0.333333 | 0.111111 | 0.217778 | 0.007289 | 45.607017 | 20.736441 | 31.679285 | 62.422895 | 0.060159 | -2.374318 |
1 rows × 49 columns
largest_subgraph
largest_subgraph (glycan_a, glycan_b)
*find the largest common subgraph of two glycans
Arguments: |
---|
glycan_a (string or networkx): glycan in IUPAC-condensed format or as networkx graph |
glycan_b (string or networkx): glycan in IUPAC-condensed format or as networkx graph |
Returns: |
---|
Returns the largest common subgraph as a string in IUPAC-condensed; returns empty string if there is no common subgraph* |
= 'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
glycan1 = 'Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
glycan2 largest_subgraph(glycan1, glycan2)
'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
ensure_graph
ensure_graph (glycan, **kwargs)
*ensures function compatibility with string glycans and graph glycans
Arguments: |
---|
glycan (string or networkx graph): glycan in IUPAC-condensed format or as a networkx graph |
**kwargs: keyword arguments that are directly passed on to glycan_to_nxGraph |
Returns: |
---|
Returns networkx graph object of glycan* |
"Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc") ensure_graph(
<networkx.classes.graph.Graph>
get_possible_topologies
get_possible_topologies (glycan, exhaustive=False, allowed_disaccharides=None, modification_map={'6S': {'GlcNAc', 'Gal'}, '3S': {'Gal'}, '4S': {'GalNAc'}, 'OS': {'GalNAc', 'GlcNAc', 'Gal'}})
*creates possible glycans given a floating substituent; only works with max one floating substituent
Arguments: |
---|
glycan (string or networkx): glycan in IUPAC-condensed format or as networkx graph |
exhaustive (bool): whether to also allow additions at internal positions; default:False |
allowed_disaccharides (set): disaccharides that are permitted when creating possible glycans; default:not used |
Returns: |
---|
Returns list of NetworkX-like glycan graphs of possible topologies* |
possible_topology_check
possible_topology_check (glycan, glycans, exhaustive=False, **kwargs)
*checks whether glycan with floating substituent could match glycans from a list; only works with max one floating substituent
Arguments: |
---|
glycan (string or networkx): glycan in IUPAC-condensed format (or as networkx graph) that has to contain a floating substituent |
glycans (list): list of glycans in IUPAC-condensed format (or networkx graphs; should not contain floating substituents) |
exhaustive (bool): whether to also allow additions at internal positions; default:False |
**kwargs: keyword arguments that are directly passed on to compare_glycans |
Returns: |
---|
Returns list of glycans that could match input glycan* |
"{Neu5Ac(a2-3)}Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc",
possible_topology_check("Fuc(a1-2)Gal(b1-3)GalNAc", "Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc",
["Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc"])
['Neu5Ac(a2-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc']
deduplicate_glycans
deduplicate_glycans (glycans)
*removes duplicate glycans from a list/set, even if they have different strings
Arguments: |
---|
glycans (list or set): glycans in IUPAC-condensed format |
Returns: |
---|
Returns deduplicated list of glycans* |
"Fuc(a1-2)Gal(b1-3)GalNAc", "Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Neu5Ac(a2-3)Gal(b1-3)]GalNAc",
deduplicate_glycans(["Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)]GalNAc", "Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc"])
['Neu5Ac(a2-6)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
'Fuc(a1-2)Gal(b1-3)GalNAc',
'Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-6)[Neu5Ac(a2-3)Gal(b1-3)]GalNAc']
processing
process IUPAC-condensed glycan sequences into glycoletters etc.
min_process_glycans
min_process_glycans (glycan_list)
*converts list of glycans into a nested lists of glycoletters
Arguments: |
---|
glycan_list (list): list of glycans in IUPAC-condensed format as strings |
Returns: |
---|
Returns list of glycoletter lists* |
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
min_process_glycans(['Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
[['Man', 'a1-3', 'Man', 'a1-6', 'Man', 'b1-4', 'GlcNAc', 'b1-4', 'GlcNAc'],
['Man',
'a1-2',
'Man',
'a1-3',
'Man',
'a1-6',
'Man',
'b1-4',
'GlcNAc',
'b1-4',
'GlcNAc']]
get_lib
get_lib (glycan_list)
*returns dictionary of form glycoletter:index
Arguments: |
---|
glycan_list (list): list of IUPAC-condensed glycan sequences as strings |
Returns: |
---|
Returns dictionary of form glycoletter:index* |
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
get_lib(['Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5}
expand_lib
expand_lib (libr, glycan_list)
*updates libr with newly introduced glycoletters
Arguments: |
---|
libr (dict): dictionary of form glycoletter:index |
glycan_list (list): list of IUPAC-condensed glycan sequences as strings |
Returns: |
---|
Returns new lib* |
= get_lib(['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
lib1 'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc'])
= expand_lib(lib1, ['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'])
lib2 lib2
{'GlcNAc': 0, 'Man': 1, 'a1-2': 2, 'a1-3': 3, 'a1-6': 4, 'b1-4': 5, 'Fuc': 6}
presence_to_matrix
presence_to_matrix (df, glycan_col_name='glycan', label_col_name='Species')
*converts a dataframe such as df_species to absence/presence matrix
Arguments: |
---|
df (dataframe): dataframe with glycan occurrence, rows are glycan-label pairs |
glycan_col_name (string): column name under which glycans are stored; default:glycan |
label_col_name (string): column name under which labels are stored; default:Species |
Returns: |
---|
Returns pandas dataframe with labels as rows and glycan occurrences as columns* |
= presence_to_matrix(df_species[df_species.Order == 'Fabales'].reset_index(drop = True),
out = 'Family') label_col_name
glycan | Apif(a1-2)Xyl(b1-2)[Glc6Ac(b1-4)]Glc | Ara(a1-2)Ara(a1-6)GlcNAc | Ara(a1-2)Glc(b1-2)Ara | Ara(a1-2)GlcA | Ara(a1-2)[Glc(b1-6)]Glc | Ara(a1-6)Glc | Araf(a1-3)Araf(a1-5)[Araf(a1-6)Gal(b1-6)Glc(b1-6)Man(a1-3)]Araf(a1-5)Araf(a1-3)Araf(a1-3)Araf | Araf(a1-3)Gal(b1-6)Gal | D-Apif(b1-2)Glc | D-Apif(b1-2)GlcA | D-Apif(b1-3)Xyl(b1-2)[Glc6Ac(b1-4)]Glc | D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)Ara | D-Apif(b1-3)Xyl(b1-4)Rha(a1-2)D-Fuc | D-Apif(b1-3)Xyl(b1-4)[Glc(b1-3)]Rha(a1-2)D-Fuc | D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)D-Fuc | D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-2)[Rha(a1-3)]D-Fuc | D-Apif(b1-3)[Gal(b1-4)Xyl(b1-4)]Rha(a1-3)D-Fuc | D-Apif(b1-6)Glc | D-ApifOMe(b1-3)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe | D-ApifOMe(b1-3)XylOMe(b1-4)[GlcOMe(b1-3)]RhaOMe(a1-2)D-FucOMe | Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc | Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc4Ac6Ac(b1-3)]Glc6Ac | Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc | Fruf(a2-1)[Glc(b1-2)][Glc(b1-3)Glc6Ac(b1-3)]Glc6Ac | Fruf(b2-1)Glc3Ac6Ac | Fruf(b2-1)Glc4Ac6Ac | Fruf(b2-1)Glc6Ac | Fruf(b2-1)[Glc(b1-2)]Glc | Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)Glc(b1-3)]Glc | Fruf(b2-1)[Glc(b1-2)][Glc(b1-3)]Glc6Ac | Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc | Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc(b1-3)]Glc6Ac | Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc | Fruf(b2-1)[Glc(b1-2)][Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac | Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc | Fruf(b2-1)[Glc(b1-2)][Glc6Ac(b1-3)]Glc6Ac | Fruf(b2-1)[Glc(b1-4)Glc6Ac(b1-3)]Glc6Ac | Fruf(b2-1)[Glc3Ac(b1-2)]Glc | Fruf(b2-1)[Glc6Ac(b1-2)]Glc | Fruf1Ac(b2-1)Glc2Ac4Ac6Ac | Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc | Fuc(a1-2)Gal(b1-2)Xyl(a1-6)Glc(b1-4)Glc | Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)]Glc(b1-4)Glc | Fuc(a1-2)Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)Glc | Fuc(a1-2)Gal(b1-4)Xyl | Fuc(a1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Fuc(a1-6)GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(?1-?)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Man(a1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(?1-?)[Gal(?1-?)]GlcNAc(?1-?)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(?1-?)Man(a1-3)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(a1-4)Gal | Gal(a1-6)Gal | Gal(a1-6)Gal(a1-6)Gal | Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf | Gal(a1-6)Gal(a1-6)Gal(a1-6)Gal(a1-6)[Fruf(b2-1)]Glc | Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc | Gal(a1-6)Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf | Gal(a1-6)Gal(a1-6)Glc | Gal(a1-6)Gal(a1-6)Glc(a1-2)Fruf | Gal(a1-6)Glc(a1-2)Fruf | Gal(a1-6)Man | Gal(a1-6)Man(b1-4)Man | Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)Man | Gal(a1-6)Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man | Gal(a1-6)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man | Gal(a1-6)Man(b1-4)[Gal(a1-6)]Man | Gal(b1-2)GlcA | Gal(b1-2)GlcA6Me | Gal(b1-2)Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Gal(b1-2)Xyl(a1-6)[Glc(b1-4)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Gal(b1-2)[Xyl(b1-3)]GlcA | Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Gal(b1-3)GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[GlcNAc(b1-4)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Gal(b1-3)GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[GlcNAc(b1-2)Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-3)[Fuc(a1-6)]GlcNAc(b1-2)[Man(a1-6)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Gal(b1-4)Gal(b1-4)Man | Gal(b1-4)Gal(b1-4)ManOMe | Gal(b1-4)GlcA | Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)[Gal(b1-4)GlcNAc(b1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Gal(b1-4)Man(b1-4)Man | Gal(b1-4)Man(b1-4)Man(b1-4)Gal | Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc | Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1CoumOMe | Gal(b1-4)Xyl(b1-4)Rha(a1-2)D-Fuc1FerOMe | Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc | Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe | Gal(b1-4)Xyl(b1-4)Rha(a1-2)[Rha(a1-3)]D-FucOMeOSin | Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc | Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)D-Fuc1CoumOMe | Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc | Gal(b1-4)Xyl(b1-4)[D-Apif(b1-3)]Rha(a1-2)[Rha(a1-3)]D-Fuc1CoumOMe | Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)[GlcNAc(b1-2)Man(a1-3)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GalA(a1-2)[Araf(a1-5)Araf(a1-4)]Rha(b1-4)GalA | GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)Rha(a1-4)GalA(a1-2)GalA | GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA(a1-4)GalA | GalOMe(b1-2)[XylOMe(b1-3)]GlcAOMe | GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe | GalOMe(b1-4)XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe | GalOMe(b1-4)XylOMe(b1-4)[D-ApifOMe(b1-3)]RhaOMe(a1-2)[RhaOMe(a1-3)]D-FucOMe | Galf(b1-2)[Galf(b1-4)]Man | Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Glc(a1-2)Glc(a1-3)Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Glc(a1-2)Rha(a1-6)Glc | Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Glc(a1-3)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Glc(a1-4)Glc(a1-2)Rha(a1-6)Glc | Glc(a1-4)Glc(a1-4)Glc(a1-6)Glc | Glc(a1-4)Glc(a1-4)GlcA | Glc(a1-4)GlcA(b1-2)GlcA | Glc(b1-2)Ara | Glc(b1-2)Ara(a1-2)GlcA | Glc(b1-2)Gal(b1-2)Gal(b1-2)GlcA | Glc(b1-2)Gal(b1-2)GlcA | Glc(b1-2)Gal(b1-2)GlcA(b1-3)[Glc(b1-3)]Ara | Glc(b1-2)Glc | Glc(b1-2)Glc(a1-2)FrufOBzOCin | Glc(b1-2)Glc(b1-2)Glc | Glc(b1-2)GlcA | Glc(b1-2)[Ara(a1-3)]GlcA6Me | Glc(b1-2)[Ara(a1-3)]GlcAOMe | Glc(b1-2)[Ara(a1-6)]Glc | Glc(b1-2)[Glc(b1-3)]Glc(a1-2)Fruf | Glc(b1-2)[Glc(b1-3)]Glc1Fer6Ac(a1-2)Fruf1FerOBz | Glc(b1-2)[Glc6Ac(b1-3)]Glc1Fer(a1-2)Fruf1FerOBz | Glc(b1-2)[Rha(a1-3)]GlcA | Glc(b1-2)[Xyl(b1-2)Ara(a1-6)]Glc | Glc(b1-2)[Xyl(b1-2)D-Fuc(b1-6)]Glc | Glc(b1-3)Ara | Glc(b1-3)Glc | Glc(b1-3)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Coum6Ac(a1-2)Fruf1CoumOBz | Glc(b1-3)Glc6Ac(b1-3)[Glc(b1-2)][Rha(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz | Glc(b1-3)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz | Glc(b1-3)[Araf(a1-4)]Rha(a1-2)Glc | Glc(b1-3)[Xyl(b1-4)]Rha(a1-2)D-FucOMe | Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc(a1-2)Fruf | Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz | Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz | Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz | Glc(b1-4)Glc(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz | Glc(b1-4)Glc(b1-4)Glc | Glc(b1-4)Glc(b1-4)Glc(b1-4)Man | Glc(b1-4)Glc6Ac(b1-3)Glc1Fer6Ac(a1-2)Fruf1FerOBz | Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum6Ac(a1-2)Fruf1FerOBz | Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz | Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOBz | Glc(b1-4)Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz | Glc(b1-4)Man(b1-4)Glc | Glc(b1-4)Rha | Glc(b1-4)Rha1Fer(a1-4)Fruf(b2-1)GlcOBz | Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Glc(b1-6)Glc(b1-3)Glc | Glc1Cer | Glc2Ac(b1-4)[D-Apif(b1-3)Xyl(b1-2)]Glc | Glc2Ac3Ac4Ac6Ac(b1-3)Ara | Glc6Ac(b1-2)Glc(a1-2)FrufOBzOCin | Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz | Glc6Ac(b1-3)Glc6Ac(b1-3)[Glc6Ac(b1-2)][RhaOAc(a1-4)]Glc1Fer6Ac(a1-2)Fruf1CoumOAcOBz | Glc6Ac(b1-3)[Glc(b1-2)]Glc1Coum(a1-2)Fruf1CoumOBz | Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1CoumOBz | Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer(a1-2)Fruf1FerOBz | Glc6Ac(b1-3)[Glc(b1-2)]Glc1Fer6Ac(a1-2)Fruf1FerOBz | GlcA(b1-2)Glc | GlcA(b1-2)GlcA | GlcA(b1-2)GlcA(b1-2)Rha | GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl | GlcA4Me(a1-2)[Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)]Xyl | GlcA4Me(a1-2)[Xyl(b1-4)]Xyl | GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Gal(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-?)[Man(a1-?)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-2)Man(a1-?)[Xyl(b1-2)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-4)Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | GlcNAc(b1-4)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcNAc(b1-?)Man(a1-3)[GlcNAc(b1-?)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | GlcOMe(b1-3)[XylOMe(b1-4)]RhaOMe(a1-2)D-FucOMe | Glcf(b1-2)Xyl(b1-4)Rha(b1-4)[Xyl(b1-3)]Xyl | Hexf(?1-?)Xyl(b1-4)Rha(b1-4)[Xyl(a1-3)]Xyl | L-Lyx(a1-2)Ara(a1-2)GlcA | Lyx(a1-2)Ara(a1-2)GlcA | Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-2)[Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)]Man(b1-4)GlcNAc | Man(a1-2)Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-2)Man(a1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc | Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-6)[Man(a1-2)Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-3)[Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN | Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-2)[Man(a1-3)]Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)[Gal(b1-3)GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-3)[Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)[Man(a1-2)Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-3)[Man(a1-3)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)[Man(a1-3)[Man(a1-6)]Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc | Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc | Man(a1-3)[Man(a1-6)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)[Man(a1-6)][Xylf(a1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc-ol | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAcN | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc-ol | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]Hex | Man(a1-3)[Xyl(b1-2)][Man(a1-6)]Man(b1-4)ManNAc | Man(a1-3)[Xylf(b1-2)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-6)[Man(a1-3)]Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(a1-6)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc | Man(a1-?)Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-?)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Man(a1-?)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Man(b1-2)Man | Man(b1-4)Gal(b1-4)Gal(b1-4)Man | Man(b1-4)Gal(b1-4)Gal(b1-4)ManOMe | Man(b1-4)Man | Man(b1-4)Man(b1-4)Man | Man(b1-4)Man(b1-4)Man(b1-4)Man | Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man | Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man | Man(b1-4)Man(b1-4)[Gal(a1-6)]Man | Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man | Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-3)Gal(a1-3)Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man(b1-4)[Man(b1-6)]Man(b1-4)[Man(b1-6)]Man(b1-4)Man(b1-4)Man | Man(b1-4)[Gal(a1-6)]Man | Man(b1-4)[Gal(a1-6)]Man(b1-4)Man | Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man | Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man | Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man | Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man | Man(b1-6)Glc | Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-4)]Man(a1-3)[Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Rha(a1-2)Ara | Rha(a1-2)Ara(a1-2)GlcA | Rha(a1-2)Ara(a1-2)GlcA6Me | Rha(a1-2)Ara(a1-2)GlcAOMe | Rha(a1-2)D-Ara(b1-2)GlcA | Rha(a1-2)Gal(b1-2)Glc | Rha(a1-2)Gal(b1-2)GlcA | Rha(a1-2)Gal(b1-2)GlcA6Me | Rha(a1-2)Gal(b1-2)GlcAOMe | Rha(a1-2)Glc(b1-2)Glc | Rha(a1-2)Glc(b1-2)GlcA | Rha(a1-2)Glc(b1-2)GlcA6Me | Rha(a1-2)Glc(b1-2)GlcAOMe | Rha(a1-2)Glc(b1-6)Glc | Rha(a1-2)GlcA(b1-2)GlcA | Rha(a1-2)GlcAOMe(b1-2)GlcAOMe | Rha(a1-2)Rha(a1-2)Gal(b1-4)[Glc(b1-2)]GlcA | Rha(a1-2)Xyl | Rha(a1-2)Xyl(b1-2)GlcA | Rha(a1-2)Xyl(b1-2)GlcA6Me | Rha(a1-2)Xyl(b1-2)GlcAOMe | Rha(a1-2)Xyl3Ac | Rha(a1-2)Xyl4Ac | Rha(a1-2)[Glc(b1-3)]Glc | Rha(a1-2)[Glc(b1-6)]Gal(b1-2)GlcA6Me | Rha(a1-2)[Rha(a1-4)]Glc | Rha(a1-2)[Rha(a1-6)]Gal | Rha(a1-2)[Rha(a1-6)]Glc | Rha(a1-2)[Xyl(b1-4)]Glc | Rha(a1-2)[Xyl(b1-4)]Glc(b1-6)Glc | Rha(a1-3)GlcA | Rha(a1-4)Gal(b1-2)GlcA | Rha(a1-4)Gal(b1-2)GlcAOMe | Rha(a1-4)Gal(b1-2)GlcOMe | Rha(a1-4)Gal(b1-4)Gal(b1-4)GalGro | Rha(a1-4)Xyl(b1-2)Glc | Rha(a1-4)Xyl(b1-2)GlcA | Rha(a1-4)Xyl(b1-2)GlcAOMe | Rha(a1-6)[Xyl(b1-3)Xyl(b1-2)]Glc(b1-2)Glc | Rha(b1-2)Glc(b1-2)GlcA | Rha1Fer(a1-4)Fruf(b2-1)GlcOBz | RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol | RhaOMe(a1-6)GlcOMe(b1-2)GlcOMe-ol | Xyl(a1-6)Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)Glc-ol | Xyl(a1-6)Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc-ol | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc | Xyl(a1-6)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc-ol | Xyl(b1-2)Ara(a1-6)Glc | Xyl(b1-2)Ara(a1-6)GlcNAc | Xyl(b1-2)Ara(a1-6)[Glc(b1-2)]Glc | Xyl(b1-2)Ara(a1-6)[Glc(b1-4)]GlcNAc | Xyl(b1-2)D-Fuc(b1-6)Glc | Xyl(b1-2)D-Fuc(b1-6)GlcNAc | Xyl(b1-2)D-Fuc(b1-6)[Glc(b1-2)]Glc | Xyl(b1-2)Fuc(a1-6)Glc | Xyl(b1-2)Fuc(a1-6)GlcNAc | Xyl(b1-2)Gal(b1-2)GlcA6Me | Xyl(b1-2)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Xyl(b1-2)Rha(a1-2)Ara | Xyl(b1-2)[Glc(b1-3)]Ara | Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)GlcNAc | Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAcN | Xyl(b1-2)[Man(a1-3)][Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | Xyl(b1-2)[Man(a1-6)]Man(a1-3)Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | Xyl(b1-2)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | Xyl(b1-2)[Rha(a1-3)]GlcA | Xyl(b1-3)Ara | Xyl(b1-3)Xyl(b1-2)[Rha(a1-6)]Glc(b1-2)Glc | Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc | Xyl(b1-3)Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc(b1-2)Glc | Xyl(b1-4)Rha(a1-2)Ara | Xyl(b1-4)Rha(a1-2)D-Fuc | Xyl(b1-4)Rha(a1-2)D-FucOMe | Xyl(b1-4)Rha(a1-2)[Rha(a1-6)]Glc | Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl | Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl3Ac(b1-4)Xyl | Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl(b1-4)Xyl | Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl3Ac(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl3Ac(b1-4)Xyl | Xyl(b1-4)Xyl(b1-4)[GlcA(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl | Xyl(b1-4)[GlcAOMe(a1-2)]Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)Xyl | Xyl2Ac3Ac4Ac(b1-3)Ara | XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol | XylOMe(b1-3)XylOMe(b1-2)[RhaOMe(a1-6)]GlcOMe(b1-2)GlcOMe-ol | XylOMe(b1-4)RhaOMe(a1-2)D-FucOMe | XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe | XylOMe(b1-4)RhaOMe(a1-2)[RhaOMe(a1-6)]GlcOMe-ol | Xylf(b1-2)Xyl(b1-3)[Rha(b1-2)Rha(b1-4)]Xyl | [Araf(a1-3)Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal | [Araf(a1-3)Gal(b1-6)]Gal(b1-3)Gal | [Gal(a1-4)Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Man(b1-4)Man(b1-4)Man(b1-4)Gal(a1-6)]Man(b1-2)[Gal(a1-6)]Man(b1-2)[Gal(a1-4)Gal(a1-6)]Man(b1-4)Man | [Gal(a1-6)]Man(b1-4)Man | [Gal(a1-6)]Man(b1-4)Man(b1-4)Man | [Gal(a1-6)]Man(b1-4)Man(b1-4)Man(b1-4)Man(b1-4)Man | [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man(b1-4)Man | [Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)[Gal(a1-6)]Man(b1-4)Man | [Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Gal(b1-3)Gal(b1-6)[Araf(a1-3)]Gal(b1-6)]Gal(b1-3)Gal | [Gal(b1-3)Gal(b1-6)]Gal(b1-3)Gal | [Gal(b1-6)Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal | [Gal(b1-6)Gal(b1-6)]Gal(b1-3)Gal | [Gal(b1-6)]Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal(b1-3)Gal | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Araf(a1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Fuc(a1-2)Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Gal(b1-2)Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc | [Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)[Gal(b1-5)Araf(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)[Xyl(a1-6)]Glc(b1-4)Glc(b1-4)Glc |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Family | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fabaceae | 1 | 4 | 1 | 3 | 1 | 1 | 0 | 1 | 3 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 4 | 2 | 1 | 2 | 2 | 7 | 4 | 4 | 4 | 2 | 8 | 4 | 2 | 5 | 4 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 6 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 3 | 1 | 1 | 0 | 1 | 2 | 1 | 1 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 1 | 1 | 1 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 1 | 0 | 0 | 0 | 1 | 1 | 4 | 6 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 4 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 3 | 2 | 1 | 2 | 1 | 1 | 2 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 4 | 6 | 4 | 4 | 4 | 1 | 1 | 5 | 4 | 1 | 4 | 1 | 1 | 0 | 1 | 1 | 1 | 7 | 1 | 1 | 2 | 3 | 22 | 6 | 7 | 1 | 8 | 3 | 4 | 1 | 3 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 7 | 1 | 1 | 1 | 2 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | 3 | 2 | 1 | 0 | 0 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 4 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Fagaceae | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Polygalaceae | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Quillajaceae | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
choose_correct_isoform
choose_correct_isoform (glycans, reverse=False)
*given a list of glycan branch isomers, this function returns the correct isomer
Arguments: |
---|
glycans (list): glycans in IUPAC-condensed nomenclature |
reverse (bool): whether to return the correct isomer (False) or everything except the correct isomer (True); default:False |
Returns: |
---|
Returns the correct isomer as a string (if reverse=False; otherwise it returns a list of strings)* |
"Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc",
choose_correct_isoform(["Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc"])
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
enforce_class
enforce_class (glycan, glycan_class, conf=None, extra_thresh=0.3)
*given a glycan and glycan class, determines whether glycan is from this class
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed nomenclature |
glycan_class (string): glycan class in form of “O”, “N”, “free”, or “lipid” |
conf (float): prediction confidence; can be used to override class |
extra_thresh (float): threshold to override class; default:0.3 |
Returns: |
---|
Returns True if glycan is in glycan class and False if not* |
"Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", "O") enforce_class(
False
IUPAC_to_SMILES
IUPAC_to_SMILES (glycan_list)
*given a list of IUPAC-condensed glycans, uses GlyLES to return a list of corresponding isomeric SMILES
Arguments: |
---|
glycan_list (list): list of IUPAC-condensed glycans |
Returns: |
---|
Returns a list of corresponding isomeric SMILES* |
'Neu5Ac(a2-3)Gal(b1-4)Glc']) IUPAC_to_SMILES([
['O1C(O)[C@H](O)[C@@H](O)[C@H](O[C@@H]2O[C@H](CO)[C@H](O)[C@H](O[C@]3(C(=O)O)C[C@H](O)[C@@H](NC(C)=O)[C@H]([C@H](O)[C@H](O)CO)O3)[C@H]2O)[C@H]1CO']
canonicalize_composition
canonicalize_composition (comp)
*converts a composition from any common format into the dictionary that is optimized for glycowork
Arguments: |
---|
comp (string): composition formatted either in the style of Hex5HexNAc4Fuc1Neu5Ac2 or H5N4F1A2 |
Returns: |
---|
Returns composition as a dictionary of style monosaccharide : count* |
print(canonicalize_composition("HexNAc2Hex1Fuc3Neu5Ac1"))
print(canonicalize_composition("N2H1F3A1"))
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}
{'HexNAc': 2, 'Hex': 1, 'dHex': 3, 'Neu5Ac': 1}
canonicalize_iupac
canonicalize_iupac (glycan)
*converts a glycan from IUPAC-extended, LinearCode, GlycoCT, and WURCS into the exact IUPAC-condensed version that is optimized for glycowork
Arguments: |
---|
glycan (string): glycan sequence; some rare post-biosynthetic modifications could still be an issue |
Returns: |
---|
Returns glycan as a string in canonicalized IUPAC-condensed* |
print(canonicalize_iupac("NeuAc?1-36SGalb1-4GlcNACb1-6(Fuc?1-2Galb1-4GlcNacb1-3Galb1-3)GalNAc-sp3"))
print(canonicalize_iupac("WURCS=2.0/5,11,10/[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5][a2112h-1b_1-5][a1221m-1a_1-5]/1-1-2-3-1-4-3-1-4-5-5/a4-b1_a6-k1_b4-c1_c3-d1_c6-g1_d2-e1_e4-f1_g2-h1_h4-i1_i2-j1"))
print(canonicalize_iupac("Ma3(Ma6)Mb4GNb4GN;N"))
print(canonicalize_iupac("α-D-Manp-(1→3)[α-D-Manp-(1→6)]-β-D-Manp-(1→4)-β-D-GlcpNAc-(1→4)-β-D-GlcpNAc-(1→"))
print(canonicalize_iupac("""RES
1b:b-dgal-HEX-1:5
2s:n-acetyl
3b:b-dgal-HEX-1:5
4b:b-dglc-HEX-1:5
5b:b-dgal-HEX-1:5
6b:a-dglc-HEX-1:5
7b:b-dgal-HEX-1:5
8b:a-lgal-HEX-1:5|6:d
9b:a-dgal-HEX-1:5
10s:n-acetyl
11s:n-acetyl
12b:b-dglc-HEX-1:5
13b:b-dgal-HEX-1:5
14b:a-lgal-HEX-1:5|6:d
15b:a-lgal-HEX-1:5|6:d
16s:n-acetyl
17s:n-acetyl
18b:b-dgal-HEX-1:5
LIN
1:1d(2+1)2n
2:1o(3+1)3d
3:3o(3+1)4d
4:4o(-1+1)5d
5:5o(-1+1)6d
6:6o(-1+1)7d
7:7o(2+1)8d
8:7o(3+1)9d
9:9d(2+1)10n
10:6d(2+1)11n
11:5o(-1+1)12d
12:12o(-1+1)13d
13:13o(2+1)14d
14:12o(-1+1)15d
15:12d(2+1)16n
16:4d(2+1)17n
17:1o(6+1)18d
"""))
Fuc(a1-2)Gal(b1-4)GlcNAc(b1-3)Gal(b1-3)[Neu5Ac(a2-3)Gal6S(b1-4)GlcNAc(b1-6)]GalNAc
Fuc(a1-2)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)[Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
Fuc(a1-2)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-?)[GalNAc(a1-3)[Fuc(a1-2)]Gal(b1-?)GlcNAc(a1-?)]Gal(b1-?)GlcNAc(b1-3)Gal(b1-3)[Gal(b1-6)]GalNAc
get_possible_linkages
get_possible_linkages (wildcard, linkage_list={'b1-9', 'a2-?', 'a1-3', 'a1-6', '?1-4', '1-4', 'b1-2', '?1-3', 'b1-?', '?1-2', 'b1-8', '1-6', 'a2-7', 'a2-1', 'a2-9', 'a2-11', 'b1-6', '?2-3', 'b2-4', '?2-?', '?2-6', 'b2-7', 'a2-5', 'b2-1', 'b2-6', 'b2-3', 'b1-4', 'b1-7', 'a2-8', 'a1-7', 'a2-3', 'a1-9', 'b2-8', 'a2-4', 'a1-?', 'a1-5', 'a2-2', 'b1-3', '?1-?', '?2-8', 'b1-1', 'a2-6', 'a1-1', 'b1-5', 'b2-5', 'a1-2', 'a1-11', '?1-6', 'a1-4', 'a1-8', 'b2-2'})
*Retrieves all linkages that match a given wildcard pattern from a list of linkages
Arguments: |
---|
wildcard (string): The pattern to match, where ‘?’ can be used as a wildcard for any single character. |
linkage_list (list): List of linkages as strings to search within; default:linkages |
Returns: |
---|
Returns a list of linkages that match the wildcard pattern.* |
"a1-?") get_possible_linkages(
['a1-?',
'a1-3',
'a1-7',
'a1-9',
'a1-4',
'a1-1',
'a1-2',
'a1-6',
'a1-8',
'a1-5']
get_possible_monosaccharides
get_possible_monosaccharides (wildcard)
*Retrieves all matching common monosaccharides of a type, given the type
Arguments: |
---|
wildcard (string): Monosaccharide type, from “HexNAc”, “HexNAcOS”, “Hex”, “HexOS”, “dHex”, “Sia”, “HexA”, “Pen” |
Returns: |
---|
Returns a list of specified monosaccharides of that type* |
"HexNAc") get_possible_monosaccharides(
{'GalNAc', 'GlcNAc', 'HexNAc', 'ManNAc'}
equal_repeats
equal_repeats (r1, r2)
*checks whether two repeat units could stem from the same repeating structure, just shifted
Arguments: |
---|
r1 (string): glycan sequence in IUPAC-condensed nomenclature |
r2 (string): glycan sequence in IUPAC-condensed nomenclature |
Returns: |
---|
Returns True if repeat structures are shifted versions of each other, else False* |
"Fuc2S3S(a1-3)Fuc2S(a1-4)Fuc2S3S", "Fuc2S(a1-4)Fuc2S3S(a1-3)Fuc2S") equal_repeats(
True
get_class
get_class (glycan)
*given a glycan, determines its class
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed nomenclature |
Returns: |
---|
Returns “O”, “N”, “free”, or “lipid” (or empty string if not either)* |
"Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc") get_class(
'N'
query
for interacting with the databases contained in glycowork, delivering insights for sequences of interest
get_insight
get_insight (glycan, motifs=None)
*prints out meta-information about a glycan
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed format |
motifs (dataframe): dataframe of glycan motifs (name + sequence); default:motif_list* |
print("Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'")
'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc') get_insight(
Test get_insight with 'Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'
Let's get rolling! Give us a few moments to crunch some numbers.
This glycan occurs in the following species: ['Acanthocheilonema_viteae', 'Adeno-associated_dependoparvovirusA', 'Aedes_aegypti', 'Angiostrongylus_cantonensis', 'Anopheles_gambiae', 'Antheraea_pernyi', 'Apis_mellifera', 'Ascaris_suum', 'Autographa_californica_nucleopolyhedrovirus', 'AvianInfluenzaA_Virus', 'Bombus_ignitus', 'Bombyx_mori', 'Bos_taurus', 'Bos_taurus', 'Bos_taurus', 'Brugia_malayi', 'Caenorhabditis_elegans', 'Cardicola_forsteri', 'Cooperia_onchophora', 'Cornu_aspersum', 'Crassostrea_gigas', 'Crassostrea_virginica', 'Cricetulus_griseus', 'Danio_rerio', 'Dictyocaulus_viviparus', 'Dirofilaria_immitis', 'Drosophila_melanogaster', 'Fasciola_hepatica', 'Gallus_gallus', 'Glossina_morsitans', 'Haemonchus_contortus', 'Haliotis_tuberculata', 'Heligmosomoides_polygyrus', 'Helix_lucorum', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'Homo_sapiens', 'HumanImmunoDeficiency_Virus', 'Hylesia_metabus', 'Lutzomyia_longipalpis', 'Lymantria_dispar', 'Macaca_mulatta', 'Mamestra_brassicae', 'Megathura_crenulata', 'Mus_musculus', 'Mus_musculus', 'Nilaparvata_lugens', 'Oesophagostomum_dentatum', 'Onchocerca_volvulus', 'Onchocerca_volvulus', 'Ophiactis_savignyi', 'Opisthorchis_viverrini', 'Ostrea_edulis', 'Ovis_aries', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pan_troglodytes', 'Pristionchus_pacificus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Rattus_norvegicus', 'Schistosoma_mansoni', 'SemlikiForest_Virus', 'Spodoptera_frugiperda', 'Sus_scrofa', 'Tick_borne_encephalitis_virus', 'Tribolium_castaneum', 'Trichinella_spiralis', 'Trichoplusia_ni', 'Trichuris_suis', 'Tropidolaemus_subannulatus', 'Volvarina_rubella', 'undetermined', 'unidentified_influenza_virus']
Puh, that's quite a lot! Here are the phyla of those species: ['Arthropoda', 'Artverviricota', 'Chordata', 'Cossaviricota', 'Echinodermata', 'Kitrinoviricota', 'Mollusca', 'Negarnaviricota', 'Nematoda', 'Platyhelminthes', 'Virus']
This glycan contains the following motifs: ['Chitobiose', 'Trimannosylcore', 'core_fucose']
This is the GlyTouCan ID for this glycan: G63041RA
This glycan has been reported to be expressed in: ['2A3_cell_line', 'A549_cell_line', 'AML_193_cell_line', 'CHOK1_cell_line', 'CHOS_cell_line', 'CRL_1620_cell_line', 'Cal-27_cell_line', 'Cervicovaginal_Secretion', 'EOL_1_cell_line', 'FaDu_cell_line', 'HEK293_cell_line', 'HEL92_1_7_cell_line', 'HEL_cell_line', 'HL_60_cell_line', 'KG_1_cell_line', 'KG_1a_cell_line', 'Kasumi_1_cell_line', 'MDA_MB_231BR_cell_line', 'ME_1_cell_line', 'ML_1_cell_line', 'MOLM_13_cell_line', 'MOLM_14_cell_line', 'MV4_11_cell_line', 'M_07e_cell_line', 'NB_4_cell_line', 'NS0_cell_line', 'OCI_AML2_cell_line', 'OCI_AML3_cell_line', 'PLB_985_cell_line', 'SCC-9_cell_line', 'SCC_25_cell_line', 'TF_1_cell_line', 'THP_1_cell_line', 'U_937_cell_line', 'VU-147T_cell_line', 'alveolus_of_lung', 'brain', 'brain', 'brain', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellar_cortex', 'cerebellum', 'colon', 'cortex', 'digestive_tract', 'digestive_tract', 'forebrain', 'gills', 'gills', 'heart', 'heart', 'heart', 'hindbrain', 'hippocampal_formation', 'hippocampus', 'hippocampus', 'hippocampus', 'hippocampus', 'iPS1A_cell_line', 'iPS2A_cell_line', 'kidney', 'liver', 'lung', 'mantle', 'mantle', 'metastatic_pancreatic_ductal_adenocarcinoma', 'milk', 'milk', 'milk', 'mucus', 'muscle_of_leg', 'nerve_ending', 'ovary', 'pancreas', 'placenta', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'prefrontal_cortex', 'primary_pancreatic_ductal_adenocarcinoma', 'prostate_gland', 'seminal_fluid', 'striatum', 'striatum', 'striatum', 'striatum', 'testicle', 'testis', 'trachea', 'urine', 'urine', 'urine', 'urothelium']
This glycan has been reported to be dysregulated in (disease, direction, sample): [('REM_sleep_behavior_disorder', 'down', 'serum'), ('benign_breast_tumor_tissues_vs_para_carcinoma_tissues', 'up', 'breast'), ('cystic_fibrosis', 'up', 'sputum'), ('female_breast_cancer', 'up', 'breast'), ('female_breast_cancer', 'up', 'cell_line'), ('prostate_cancer', 'up', 'prostate_cancer_biopsy'), ('thyroid_gland_papillary_carcinoma', 'up', 'serum'), ('urinary_bladder_cancer', 'down', 'urine'), ('', '', ''), ('', '', ''), ('', '', ''), ('', '', '')]
That's all we can do for you at this point!
glytoucan_to_glycan
glytoucan_to_glycan (ids, revert=False)
*interconverts GlyTouCan IDs and glycans in IUPAC-condensed
Arguments: |
---|
ids (list): list of GlyTouCan IDs as strings (if using glycans instead, change ‘revert’ to True |
revert (bool): whether glycans should be mapped to GlyTouCan IDs or vice versa; default:False |
Returns: |
---|
Returns list of either GlyTouCan IDs or glycans in IUPAC-condensed* |
'G63041RA']) glytoucan_to_glycan([
['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc']
regex
for performing regular expression-like searches in glycans, very powerful to find complicated motifs
get_match
get_match (pattern, glycan, return_matches=True)
*finds matches for a glyco-regular expression in a glycan
Arguments: |
---|
pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex |
glycan (string or networkx): glycan sequence in IUPAC-condensed or as networkx graph |
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True |
Returns: |
---|
Returns either a boolean (return_matches = False) or a list of matches as strings (return_matches = True)* |
# {} = between min and max occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# * = zero or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])*-HexNAc"
# + = one or more occurrences, e.g., "Hex-HexNAc-([Hex|Fuc])+-HexNAc"
# ? = zero or one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc])?-HexNAc"
# {1,} = at minimum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,}-HexNAc"
# {,1} = at maximum one occurrence, e.g., "Hex-HexNAc-([Hex|Fuc]){,1}-HexNAc"
# {2} = exactly two occurrences, e.g., "Hex-HexNAc-([Hex|Fuc]){2}-HexNAc"
# ^ = start of sequence, e.g., "^Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# % = middle of sequence (i.e., neither start nor end)
# $ = end of sequence, e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc$"
# ?<= = lookbehind (i.e., provided pattern must be present before rest of pattern but is not included in match), e.g., "(?<=Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?<! = negative lookbehind (i.e., provided pattern is not present before rest of pattern and is also not included in match), e.g., "(?<!Xyl-)Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc"
# ?= = lookahead (i.e., provided pattern must be present after rest of pattern but is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?=-HexNAc)"
# ?! = negative lookahead (i.e., provided pattern is not present after rest of pattern and is not included in match), e.g., "Hex-HexNAc-([Hex|Fuc]){1,2}-HexNAc(?!-HexNAc)"
# Example: extracting the sequence from the a1-6 branch of N-glycans
= "r[Sia]{,1}-Monosaccharide-([dHex]){,1}-Monosaccharide(?=-Mana6-Monosaccharide)"
pattern print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
print(get_match(pattern, "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"))
['Gal(b1-4)GlcNAc']
['GalNAc(b1-4)GlcNAc']
['Neu5Ac(a2-6)GalNAc(b1-4)GlcNAc']
['Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc']
For interested users, we here compile a selection of regular expression patterns that we find useful in our own work:
- Lewis or sialyl-Lewis structures:
pattern = “r[Sia]{,1}-[Gal|GalOS]{1}-([Fuc]){1}-[GlcNAc|GlcNAc6S]{1}” - Blood groups:
pattern = “rFuc-([Gal|GalNAc])?-Gal-GlcNAc” - a1-6 branch in N-glycans:
pattern = “r[Sia]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-[Man|GlcNAc]{1}-([.-.|.]){,1}-Mana6(?=-Manb4-GlcNAc)” - b1-6 branch in O-glycans (from core 2/4/6):
pattern = “r[Sia|dHex]{,1}-[Hex|HexNAc]{,1}-([dHex]){,1}-.b6(?=-GalNAc)” - b1-3 branch in O-glycans (from core 1/2):
pattern = “r[Sia]{,1}-[.]{,1}-([dHex]){,1}-.b3(?=-GalNAc)”
get_match_batch
get_match_batch (pattern, glycan_list, return_matches=True)
*finds matches for a glyco-regular expression in a list of glycans
Arguments: |
---|
pattern (string): glyco-regular expression in the form of “Hex-HexNAc-([Hex |
glycan_list (list of strings or networkx): list of glycan sequence in IUPAC-condensed or as networkx graph |
return_matches (bool): whether to return True/False or return the matches as a list of strings; default:True |
Returns: |
---|
Returns either a list of booleans (return_matches = False) or a list of list of matches as strings (return_matches = True)* |
motif_to_regex
motif_to_regex (motif)
*tries to convert motif into a regular expression
Arguments: |
---|
motif (string): glycan in IUPAC-condensed nomenclature |
Returns: |
---|
Returns regular expression if successful* |
"Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-?)") motif_to_regex(
'Fuca3-([Galb4]){1}-GlcNAcb?'
tokenization
helper functions to map m/z–>composition, composition–>structure, structure–>motif, and more
string_to_labels
string_to_labels (character_string, libr=None)
*tokenizes word by indexing characters in passed library
Arguments: |
---|
character_string (string): string of characters to index |
libr (dict): dict of library items |
Returns: |
---|
Returns indexes of characters in library* |
'Man','a1-3','Man','a1-6','Man']) string_to_labels([
[None, None, None, None, None]
pad_sequence
pad_sequence (seq, max_length, pad_label=None, libr=None)
*brings all sequences to same length by adding padding token
Arguments: |
---|
seq (list): sequence to pad (from string_to_labels) |
max_length (int): sequence length to pad to |
pad_label (int): which padding label to use |
libr (list): list of library items |
Returns: |
---|
Returns padded sequence* |
'Man','a1-3','Man','a1-6','Man']), 7) pad_sequence(string_to_labels([
[None, None, None, None, None, 25, 25]
stemify_glycan
stemify_glycan (glycan, stem_lib=None, libr=None)
*removes modifications from all monosaccharides in a glycan
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed format |
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib |
libr (dict): dictionary of form glycoletter:index; default:lib |
Returns: |
---|
Returns stemmed glycan as string* |
"Neu5Ac9Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc") stemify_glycan(
'Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc'
stemify_dataset
stemify_dataset (df, stem_lib=None, libr=None, glycan_col_name='glycan', rarity_filter=1)
*stemifies all glycans in a dataset by removing monosaccharide modifications
Arguments: |
---|
df (dataframe): dataframe with glycans in IUPAC-condensed format in column glycan_col_name |
stem_lib (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib |
libr (dict): dictionary of form glycoletter:index; default:lib |
glycan_col_name (string): column name under which glycans are stored; default:glycan |
rarity_filter (int): how often monosaccharide modification has to occur to not get removed; default:1 |
Returns: |
---|
Returns df with glycans stemified* |
mask_rare_glycoletters
mask_rare_glycoletters (glycans, thresh_monosaccharides=None, thresh_linkages=None)
*masks rare monosaccharides and linkages in a list of glycans
Arguments: |
---|
glycans (list): list of glycans in IUPAC-condensed form |
thresh_monosaccharides (int): threshold-value for monosaccharides seen as “rare”; default:(0.001*len(glycans)) |
thresh_linkages (int): threshold-value for linkages seen as “rare”; default:(0.03*len(glycans)) |
Returns: |
---|
Returns list of glycans in IUPAC-condensed with masked rare monosaccharides and linkages* |
mz_to_composition
mz_to_composition (mz_value, mode='negative', mass_value='monoisotopic', reduced=False, sample_prep='underivatized', mass_tolerance=0.5, kingdom='Animalia', glycan_class='all', df_use=None, filter_out=None, extras=['doubly_charged'], adduct=None)
*Mapping a m/z value to a matching monosaccharide composition within SugarBase
Arguments: |
---|
mz_value (float): the actual m/z value from mass spectrometry |
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’ |
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’ |
reduced (bool): whether glycans are reduced at reducing end; default:False |
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’ |
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5 |
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’ |
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, ‘lipid’ linked glycans, or ‘free’ glycans; default:‘all’ |
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan |
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None |
extras (list): additional operations to perform if regular m/z matching does not yield a result; options include “adduct” and “doubly_charged” |
adduct (string): chemical formula of adduct that contributes to m/z, e.g., “C2H4O2”; default:None |
Returns: |
---|
Returns a list of matching compositions in dict form* |
665.4, glycan_class='O', filter_out={'Kdn', 'P', 'HexA', 'Pen', 'HexN', 'Me', 'PCho', 'PEtN'},
mz_to_composition(= True) reduced
[{'dHex': 1, 'HexNAc': 2, 'Hex': 1, 'Neu5Ac': 1, 'Neu5Gc': 1}]
match_composition_relaxed
match_composition_relaxed (composition, glycan_class='N', kingdom='Animalia', df_use=None, reducing_end=None)
*Given a coarse-grained monosaccharide composition (Hex, HexNAc, etc.), it returns all corresponding glycans
Arguments: |
---|
composition (dict): a dictionary indicating the composition to match (for example {“dHex”: 1, “Hex”: 1, “HexNAc”: 1}) |
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N |
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’ |
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan |
Returns: |
---|
Returns list of glycans matching composition in IUPAC-condensed* |
"Hex":3, "HexNAc":2, "dHex":1}, glycan_class = 'O') match_composition_relaxed({
['Fuc(a1-2)[Gal(a1-3)]Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
'Fuc(a1-2)[Gal(a1-3)]Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
'Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-3)Gal(b1-4)GlcNAc(b1-3)Gal',
'Gal(?1-?)Gal(b1-4)GlcNAc(b1-6)[Fuc(a1-2)Gal(b1-3)]GalNAc',
'Gal(a1-3)GalNAc(a1-3)[Fuc(a1-2)]Gal(b1-3)Gal(b1-3)GalNAc',
'Man(a1-6)Glc(a1-4)GlcNAc(b1-4)[Fuc(a1-2)]Gal(b1-3)GalNAc',
'Man(a1-6)Glc(b1-4)GlcNAc(b1-4)[Fuc(a1-2)]Gal(b1-3)GalNAc',
'Gal(?1-?)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
'Gal(b1-3)[Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)]GalNAc',
'Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-3)Gal(b1-3)GalNAc',
'Gal(b1-4)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
'Fuc(a1-2)Gal(b1-3)GlcNAc(b1-3)Gal(b1-4)GlcNAc(b1-?)Man',
'Fuc(a1-2)Gal(b1-4)GlcNAc(b1-6)[Gal(?1-?)Gal(b1-3)]GalNAc',
'Fuc(a1-2)Gal(b1-?)GlcNAc(b1-3)Gal(b1-3)[Gal(b1-6)]GalNAc',
'Fuc(a1-2)[Gal(a1-3)]Gal(b1-3)GlcNAc(b1-3)Gal(b1-3)GalNAc',
'Fuc(a1-2)Gal(b1-3)Gal(b1-3)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
'Fuc(a1-2)Gal(b1-3)Gal(b1-3)[Gal(b1-4)GlcNAc(b1-6)]GalNAc',
'Gal(b1-4)Gal(b1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-6)]GalNAc',
'Gal(b1-2)Gal(a1-3)[Fuc(a1-2)]Gal(b1-3)[GlcNAc(b1-6)]GalNAc',
'Fuc(a1-2)Gal(a1-3)Gal(a1-4)Gal(b1-3)[GlcNAc(b1-6)]GalNAc',
'Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]Gal(b1-3)GalNAc',
'Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-?)Gal(b1-6)[Gal(b1-3)]GalNAc',
'Fuc(a1-2)Gal(?1-?)Gal(b1-?)GlcNAc(b1-6)[Gal(b1-3)]GalNAc',
'Fuc(a1-2)[Gal(a1-3)]Gal(b1-4)GlcNAc(?1-?)Gal(b1-3)GalNAc']
condense_composition_matching
condense_composition_matching (matched_composition)
*Given a list of glycans matching a composition, find the minimum number of glycans characterizing this set
Arguments: |
---|
matched_composition (list): list of glycans matching to a composition |
Returns: |
---|
Returns minimal list of glycans that match a composition* |
= match_composition_relaxed({'Hex':1, 'HexNAc':1, 'Neu5Ac':1}, glycan_class = 'O')
match_comp print(match_comp)
condense_composition_matching(match_comp)
['Neu5Ac(a2-3)Gal(b1-3)GalNAc', 'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc', '{Neu5Ac(a2-?)}Gal(b1-3)GalNAc', 'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal', 'Neu5Ac(a2-3)Gal(b1-4)GalNAc', 'Neu5Ac(a2-6)Gal(b1-3)GalNAc', 'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-?)Hex(?1-?)GalNAc', 'Gal(?1-3)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-3)Gal(?1-?)GalNAc', 'Neu5Ac(a2-6)Gal(a1-3)GalNAc', 'Neu5Ac(a2-?)Gal(?1-3)GalNAc', 'Neu5Ac(a2-?)GalNAc(a1-6)Gal', 'Neu5Ac(a2-?)Gal(b1-?)GalNAc', 'Gal(b1-4)[Neu5Ac(a2-6)]GalNAc', 'Neu5Ac(a2-3)GalNAc(b1-3)Gal']
['Neu5Ac(a2-3)Gal(b1-3)GalNAc',
'Gal(b1-3)[Neu5Ac(a2-6)]GalNAc',
'Gal(a1-3)[Neu5Ac(a2-6)]GalNAc',
'{Neu5Ac(a2-?)}Gal(b1-3)GalNAc',
'Neu5Ac(a2-3)[GalNAc(b1-4)]Gal',
'Neu5Ac(a2-3)Gal(b1-4)GalNAc',
'Neu5Ac(a2-6)Gal(b1-3)GalNAc',
'Neu5Ac(a2-?)Hex(?1-?)GalNAc',
'Neu5Ac(a2-3)Gal(?1-?)GalNAc',
'Neu5Ac(a2-6)Gal(a1-3)GalNAc',
'Neu5Ac(a2-?)Gal(?1-3)GalNAc',
'Neu5Ac(a2-?)GalNAc(a1-6)Gal',
'Neu5Ac(a2-?)Gal(b1-?)GalNAc',
'Gal(b1-4)[Neu5Ac(a2-6)]GalNAc',
'Neu5Ac(a2-3)GalNAc(b1-3)Gal']
mz_to_structures
mz_to_structures (mz_list, glycan_class, kingdom='Animalia', abundances=None, mode='negative', mass_value='monoisotopic', sample_prep='underivatized', mass_tolerance=0.5, reduced=False, df_use=None, filter_out=None, verbose=False)
*wrapper function to map precursor masses to structures, condense them, and match them with relative intensities
Arguments: |
---|
mz_list (list): list of precursor masses |
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans |
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’ |
abundances (dataframe): every row one composition (matching mz_list in order), every column one sample; default:pd.DataFrame([range(len(mz_list))]*2).T |
mode (string): whether mz_value comes from MS in ‘positive’ or ‘negative’ mode; default:‘negative’ |
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’ |
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’ |
mass_tolerance (float): how much deviation to tolerate for a match; default:0.5 |
reduced (bool): whether glycans are reduced at reducing end; default:False |
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan |
filter_out (set): set of monosaccharide types to ignore during composition finding; default:None |
verbose (bool): whether to print any non-matching compositions; default:False |
Returns: |
---|
Returns dataframe of (matched structures) x (relative intensities)* |
674.29], glycan_class = 'O') mz_to_structures([
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan | abundance | |
---|---|---|
0 | Neu5Ac(a2-3)Gal(b1-3)GalNAc | 0 |
1 | Gal(b1-3)[Neu5Ac(a2-6)]GalNAc | 0 |
2 | Gal(a1-3)[Neu5Ac(a2-6)]GalNAc | 0 |
3 | {Neu5Ac(a2-?)}Gal(b1-3)GalNAc | 0 |
4 | Neu5Ac(a2-3)[GalNAc(b1-4)]Gal | 0 |
5 | Neu5Ac(a2-3)Gal(b1-4)GalNAc | 0 |
6 | Neu5Ac(a2-6)Gal(b1-3)GalNAc | 0 |
7 | Neu5Ac(a2-?)Hex(?1-?)GalNAc | 0 |
8 | Neu5Ac(a2-3)Gal(?1-?)GalNAc | 0 |
9 | Neu5Ac(a2-6)Gal(a1-3)GalNAc | 0 |
10 | Neu5Ac(a2-?)Gal(?1-3)GalNAc | 0 |
11 | Neu5Ac(a2-?)GalNAc(a1-6)Gal | 0 |
12 | Neu5Ac(a2-?)Gal(b1-?)GalNAc | 0 |
13 | Gal(b1-4)[Neu5Ac(a2-6)]GalNAc | 0 |
14 | Neu5Ac(a2-3)GalNAc(b1-3)Gal | 0 |
compositions_to_structures
compositions_to_structures (composition_list, glycan_class='N', kingdom='Animalia', abundances=None, df_use=None, verbose=False)
*wrapper function to map compositions to structures, condense them, and match them with relative intensities
Arguments: |
---|
composition_list (list): list of composition dictionaries of the form {‘Hex’: 1, ‘HexNAc’: 1} |
glycan_class (string): which glycan class does the m/z value stem from, ‘N’, ‘O’, or ‘lipid’ linked glycans or ‘free’ glycans; default:N |
kingdom (string): taxonomic kingdom for choosing a subset of glycans to consider; default:‘Animalia’ |
abundances (dataframe): every row one composition (matching composition_list in order), every column one sample;default:pd.DataFrame([range(len(composition_list))]*2).T |
df_use (dataframe): glycan dataframe for searching glycan structures; default:df_glycan |
verbose (bool): whether to print any non-matching compositions; default:False |
Returns: |
---|
Returns dataframe of (matched structures) x (relative intensities)* |
'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1}], glycan_class = 'O') compositions_to_structures([{
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan | abundance | |
---|---|---|
0 | Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc | 0 |
1 | Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc | 0 |
2 | Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc | 0 |
3 | Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc | 0 |
4 | Neu5Ac(a2-3)Gal(b1-4)[Neu5Ac(a2-6)]GalNAc | 0 |
"H1N1A2"], glycan_class = 'O') compositions_to_structures([
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan | abundance | |
---|---|---|
0 | Neu5Ac(a2-3)Gal(b1-3)[Neu5Ac(a2-6)]GalNAc | 0 |
1 | Gal(b1-3)[Neu5Ac(a2-8)Neu5Ac(a2-6)]GalNAc | 0 |
2 | Neu5Ac(a2-8)Neu5Ac(a2-6)[Gal(b1-3)]GalNAc | 0 |
3 | Neu5Ac(a2-3)[Neu5Ac(a2-6)]Gal(b1-3)GalNAc | 0 |
4 | Neu5Ac(a2-3)Gal(b1-4)[Neu5Ac(a2-6)]GalNAc | 0 |
structure_to_basic
structure_to_basic (glycan)
*converts a monosaccharide- and linkage-defined glycan structure to the base topology
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed nomenclature |
Returns: |
---|
Returns the glycan topology as a string* |
"Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc") structure_to_basic(
'Neu5Ac(?1-?)HexOS(?1-?)[Neu5Ac(?1-?)]HexNAc'
glycan_to_composition
glycan_to_composition (glycan, stem_libr=None)
*maps glycan to its composition
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed format |
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib |
Returns: |
---|
Returns a dictionary of form “Monosaccharide” : count* |
"Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc") glycan_to_composition(
{'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1}
glycan_to_mass
glycan_to_mass (glycan, mass_value='monoisotopic', sample_prep='underivatized', stem_libr=None, adduct=None)
*given a glycan, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation
Arguments: |
---|
glycan (string): glycan in IUPAC-condensed format |
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’ |
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’ |
stem_libr (dictionary): dictionary of form modified_monosaccharide:core_monosaccharide; default:created from lib |
adduct (string): chemical formula of adduct to be added, e.g., “C2H4O2”; default:None |
Returns: |
---|
Returns the theoretical mass of input glycan* |
"Neu5Ac(a2-3)Gal6S(b1-3)[Neu5Ac(a2-6)]GalNAc") glycan_to_mass(
1045.2903546
composition_to_mass
composition_to_mass (dict_comp_in, mass_value='monoisotopic', sample_prep='underivatized', adduct=None)
*given a composition, calculates its theoretical mass; only allowed extra-modifications are methylation, sulfation, phosphorylation
Arguments: |
---|
dict_comp_in (dict): composition in form monosaccharide:count |
mass_value (string): whether the expected mass is ‘monoisotopic’ or ‘average’; default:‘monoisotopic’ |
sample_prep (string): whether the glycans has been ‘underivatized’, ‘permethylated’, or ‘peracetylated’; default:‘underivatized’ |
adduct (string): chemical formula of adduct to be added, e.g., “C2H4O2”; default:None |
Returns: |
---|
Returns the theoretical mass of input composition* |
'Neu5Ac': 2, 'Hex': 1, 'HexNAc': 1, 'S': 1}) composition_to_mass({
1045.2903546
calculate_adduct_mass
calculate_adduct_mass (adduct, mass_value='monoisotopic')
*Calculate the mass of the adduct based on its chemical formula
Arguments: |
---|
adduct (string): chemical formula of adduct, e.g., “C2H4O2” |
mass_value (string): whether to use ‘monoisotopic’ or ‘average’ mass; default:‘monoisotopic’ |
Returns: |
---|
Returns the mass of the adduct* |
"C2H4O2") calculate_adduct_mass(
60.021
get_unique_topologies
get_unique_topologies (composition, glycan_type, df_use=None, universal_replacers=None, taxonomy_rank='Kingdom', taxonomy_value='Animalia')
*given a composition, retrieves all observed and unique base topologies
Arguments: |
---|
composition (dict): composition in form monosaccharide:count |
glycan_type (string): which glycan class to search, ‘N’, ‘O’, ‘lipid’, ‘free’, or ‘repeat’ |
df_use (dataframe): species-specific glycan dataframe to use for mapping; default: df_glycan |
universal_replacers (dictionary): dictionary of form base monosaccharide : specific monosaccharide |
taxonomy_rank (string): at which taxonomic rank to filter; default: Kingdom |
taxonomy_value (string): which value to filter at taxonomy_rank; default: Animalia |
Returns: |
---|
Returns a list of observed base topologies for the given composition* |
'HexNAc':2, 'Hex':1}, 'O', universal_replacers = {'dHex':'Fuc'}) get_unique_topologies({
['HexNAc(?1-?)[HexNAc(?1-?)]Hex',
'HexNAc(?1-?)[Hex(?1-?)]HexNAc',
'HexNAc(?1-?)Hex(?1-?)HexNAc',
'HexNAc(?1-?)HexNAc(?1-?)Hex',
'Hex(?1-?)HexNAc(?1-?)HexNAc']