CI

Glycans are a fundamental biological sequence, similar to DNA, RNA, or proteins. Glycans are complex carbohydrates that can form branched structures comprising monosaccharides and linkages as constituents. Despite being conspicuously absent from most research, glycans are ubiquitous in biology. They decorate most proteins and lipids and direct the stability and functions of biomolecules, cells, and organisms. This also makes glycans relevant to every human disease.

The analysis of glycans is made difficult by their nonlinearity and their astounding diversity, given the large number of monosaccharides and types of linkages. Glycowork is a Python package designed to process and analyze glycan sequences, with a special emphasis on glycan-focused data science and machine learning. Next to various functions to work with glycans, Glycowork also contains glycan data that can be used for glycan alignments, model pre-training, motif comparisons, etc.

The inspiration for glycowork can be found in Bojar et al., 2020 and Burkholz et al., 2021. There, you can also find examples of possible use cases for the functions in glycowork.

The full documentation for glycowork can be found here: https://bojarlab.github.io/glycowork/

Install

via pip:
pip install glycowork
import glycowork

alternative:
pip install git+https://github.com/BojarLab/glycowork.git
import glycowork

How to use

Glycowork currently contains four main modules:

  • alignment
    • can be used to find similar glycan sequences by alignment according to a glycan-specific substitution matrix
  • glycan_data
    • stores several glycan datasets and contains helper functions
  • ml
    • here are all the functions for training and using machine learning models, including train-test-split, getting glycan representations, etc.
  • motif
    • contains functions for processing glycan sequences, identifying motifs and features, and analyzing them

Below are some examples of what you can do with glycowork, be sure to check out the other examples in the full documentation for everything that's there.

from glycowork.motif.graph import compare_glycans
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                     'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc'))
print(compare_glycans('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc',
                     'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc'))
True
False
from glycowork.motif.query import get_insight
get_insight('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')
Let's get rolling! Give us a few moments to crunch some numbers.

This glycan occurs in the following species: ['Antheraea_pernyi', 'Apis_mellifera', 'Autographa_californica_nucleopolyhedrovirus', 'AvianInfluenzaA_Virus', 'Bombyx_mori', 'Bos_taurus', 'Caenorhabditis_elegans', 'Drosophila_melanogaster', 'Homo_sapiens', 'HumanImmunoDeficiency_Virus', 'Mamestra_brassicae', 'Megathura_crenulata', 'Mus_musculus', 'Rattus_norvegicus', 'Spodoptera_frugiperda', 'Sus_scrofa', 'Trichinella_spiralis']

Puh, that's quite a lot! Here are the phyla of those species: ['Arthropoda', 'Chordata', 'Mollusca', 'Negarnaviricota', 'Nematoda', 'Virus']

This glycan contains the following motifs: ['Chitobiose', 'Trimannosylcore', 'core_fucose']

That's all we can do for you at this point!
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN']
from glycowork.motif.annotate import annotate_dataset
out = annotate_dataset(glycans, feature_set = ['known', 'graph', 'exhaustive']).head()
LewisX LewisY SialylLewisX SulfoSialylLewisX LewisA LewisB SialylLewisA SulfoLewisA H_type2 H_type1 A_antigen B_antigen Galili_antigen GloboH Gb5 Gb4 Gb3 3SGb3 8DSGb3 3SGb4 8DSGb4 6DSGb4 3SGb5 8DSGb5 6DSGb5 6DSGb5_2 6SGb3 8DSGb3_2 6SGb4 8DSGb4_2 6SGb5 8DSGb5_2 66DSGb5 Forssman_antigen iGb3 I_antigen i_antigen PI_antigen Chitobiose Trimannosylcore Internal_LacNAc_type1 Terminal_LacNAc_type1 Internal_LacNAc_type2 Terminal_LacNAc_type2 Internal_LacdiNAc_type1 Terminal_LacdiNAc_type1 Internal_LacdiNAc_type2 Terminal_LacdiNAc_type2 bisectingGlcNAc VIM PolyLacNAc Ganglio_Series Lacto_Series(LewisC) NeoLacto_Series betaGlucan KeratanSulfate Hyluronan Mollu_series Arthro_series Cellulose_like Chondroitin_4S GPI_anchor Isoglobo_series LewisD Globo_series SDA Muco_series Heparin Peptidoglycan Dermatansulfate CAD Lactosylceramide Lactotriaosylceramide LexLex GM3 H_type3 GM2 GM1 cisGM1 VIM2 GD3 GD1a GD2 GD1b SDLex Nglycolyl_GM2 Fuc_LN3 GT1b GD1 GD1a_2 LcGg4 GT3 Disialyl_T_antigen GT1a GT2 GT1c 2Fuc_GM1 GQ1c O_linked_mannose GT1aa GQ1b HNK1 GQ1ba O_mannose_Lex 2Fuc_GD1b Sialopentaosylceramide Sulfogangliotetraosylceramide B-GM1 GQ1aa bisSulfo-Lewis x para-Forssman core_fucose core_fucose(a1-3) GP1c B-GD1b GP1ca Isoglobotetraosylceramide polySia high_mannose Gala_series LPS_core Nglycan_complex Nglycan_complex2 Oglycan_core2 Oglycan_core3 Oglycan_core4 Oglycan_core5 Oglycan_core6 Oglycan_core7 Xylogalacturonan Sialosylparagloboside LDNF OFuc Arabinogalactan_type2 EGF_repeat Nglycan_hybrid Arabinan Xyloglucan Acharan_Sulfate M3FX M3X 1-6betaGalactan Arabinogalactan_type1 Galactomannan Tetraantennary_Nglycan Mucin_elongated_core2 Fucoidan Alginate FG XX diameter branching nbrLeaves avgDeg varDeg maxDeg nbrDeg4 max_deg_leaves mean_deg_leaves deg_assort betweeness betwVar betwMax eigenMax eigenMin eigenAvg eigenVar closeMax closeMin closeAvg closeVar flowMax flowAvg flowVar flow_edgeMax flow_edgeMin flow_edgeAvg flow_edgeVar loadMax loadAvg loadVar harmMax harmMin harmAvg harmVar secorderMax secorderMin secorderAvg secorderVar size_corona size_core nbr_node_types egap entropyStation N dens 1,4-Anhydro-Gal 1,4-Anhydro-Kdo 1-1 1-2 1-3 1-4 1-5 1-6 1dAlt-ol 1dEry-ol 2,3-Anhydro-All 2,3-Anhydro-Man 2,3-Anhydro-Rib 2,5-Anhydro-D-Alt 2,5-Anhydro-D-AltOS 2,5-Anhydro-L-Man 2,5-Anhydro-Man 2,5-Anhydro-Man-ol 2,5-Anhydro-ManOS 2,5-Anhydro-Tal-ol 2,5-Anhydro-TalOP 2,7-Anhydro-Kdo 2,7-Anhydro-Kdof 2-4 2-5 2-6 3 3,6-Anhydro-Fruf 3,6-Anhydro-Gal 3,6-Anhydro-GalOS 3,6-Anhydro-Glc 3,6-Anhydro-L-Gal 3,6-Anhydro-L-GalOMe 3-3 3-5 3-6 3dLyxHepUlosaric 4 4,7-Anhydro-KdoOPEtn 4,8-Anhydro-DDGlcOct 4,8-Anhydro-Kdo 4,8-Anhydro-LDGlcOct 4-5 4dAraHex 4dEry-ol 4dGal 4dNeu5Ac 4eLeg5Ac7Ac 5-2 5-3 5-5 5-6 6dAlt 6dAltNAc 6dAltOAc 6dAltf 6dAltfOAc 6dGal 6dGul 6dManHep 6dTal 6dTalNAc 6dTalNAcOAc 6dTalOAc 6dTalOAcOAc 6dTalOAcOMe 6dTalOMe 6dTalOMe-ol 6dTalf 7dNeu5Ac 8dNeu5Ac 8eAciNAcNAc 8eLeg 8eLeg5Ac7Ac 8eLeg5Ac7AcGro 8eLegNAc 8eLegNAcNBut 9dNeu5Ac Abe AbeOAc AcefA AciNAcNAc Aco AcoNAc AllN AllOAc AllOMe Alt AltA AltAN AltNAcA AltOMeA Altf AltfOAc Ami ApiOAc ApiOMe-ol Apif Ara Ara-ol AraHepUloNAc-onic AraHepUloNAcN-onic AraHepUloNGc-onic AraHexA AraN AraNMeOMe AraOAc AraOAcOP-ol AraOMe AraOPN Araf ArafGro ArafOCoum ArafOFer ArafOMe ArafOS Asc Bac BacNAc BoiOMe Col D-2dAraHex D-2dAraHexA D-3dAraHepUlosonic D-3dLyxHepUlosaric D-3dThrHexUlosonic D-3dThrPen D-3dXylHexOMe D-4dAraHex D-4dEryHexOAcN4en D-4dLyxHex D-4dLyxHexOMe D-4dThrHexA4en D-4dThrHexAN4en D-4dThrHexOAcN4en D-4dXylHex D-6dAllOMe D-6dAlt D-6dAltHep D-6dAltHepOMe D-6dAltHepf D-6dAraHex D-6dAraHexN D-6dAraHexNAc D-6dAraHexOMe D-6dLyxHexOMe D-6dManHep D-6dManHepOAc D-6dManHepOP D-6dTal D-6dTalHep D-6dTalOAc D-6dTalOAcOMe D-6dTalOMe D-6dXylHex D-6dXylHexN4Ulo D-6dXylHexNAc4Ulo D-6dXylHexOMe D-7dLyxOctUlosonic D-9dThrAltNon-onic D-Alt D-Apif D-ApifOAc D-ApifOMe D-Ara D-Ara-ol D-AraHepUlo-onic D-AraHex D-AraHexUloOMe D-AraN D-AraOS D-Araf D-ArafN D-Fuc D-Fuc-ol D-FucN D-FucNAc D-FucNAc-ol D-FucNAcN D-FucNAcNMe D-FucNAcNMeN D-FucNAcOAc D-FucNAcOMe D-FucNAcOP D-FucNAcOPEtn D-FucNAlaAc D-FucNAsp D-FucNBut D-FucNButGro D-FucNFo D-FucNLac D-FucNMeN D-FucNN D-FucNThrAc D-FucOAc D-FucOAcN D-FucOAcNBut D-FucOAcNGroA D-FucOAcOBut D-FucOAcOMe D-FucOBut D-FucOEtn D-FucOMe D-FucOMeN D-FucOMeOCoum D-FucOMeOFer D-FucOMeOSin D-FucOS D-Fucf D-FucfNAc D-FucfOAc D-Ido D-IdoA D-IdoOSA D-Rha D-Rha-ol D-RhaCMe D-RhaGro D-RhaN D-RhaNAc D-RhaNAcOAc D-RhaNBut D-RhaNButOMe D-RhaNFo D-RhaOFoN D-RhaOMe D-RhaOMeN D-RhaOP D-RhaOS D-RibHex D-RibHexNAc D-Sor D-ThrHexA4en D-ThrHexAN4en D-ThrHexfNAc2en D-ThrPen D-Thre-ol DDAltHep DDAltHepOMe DDGalHep DDGalHepOMe DDGlcHep DDManHep DDManHepGroPA DDManHepOBut DDManHepOEtn DDManHepOMe DDManHepOP DDManHepOPEtn DDManNonUloNAcOFoN-onic DLAltNonUloNAc-onic DLGalNonUloNAc-onic DLGalNonUloNAcN DLGalNonUloNAcN-onic DLGlcHepOMe DLHepGlcOMe DLManHep DLManHepOPEtn Dha Dig DigCMe DigOAc DigOFo DigOMe Ery Ery-L-GlcNonUloNAcOAcOMeSH-onic Ery-ol Ery-onic EryHex EryHex2en EryHexA3en EryOMe-onic Fru Fruf FrufF FrufI FrufN FrufNAc FrufOAc FrufOAcOBzOCoum FrufOAcOFer FrufOBzOCin FrufOBzOCoum FrufOBzOFer FrufOFer FrufOLau Fuc Fuc-ol Fuc4S FucN FucNAc FucNAcA FucNAcGroP FucNAcN FucNAcNMe FucNAcOAc FucNAcOMe FucNAla FucNAm FucNBut FucNFo FucNProp FucNThrAc FucOAc FucOAcNAm FucOAcNBut FucOAcOMe FucOAcOSOMe FucOMe FucOMeOPam FucOMeOVac FucOP FucOPOMe FucOS FucOSOMe Fucf Gal Gal-ol Gal3S Gal6S Gal6Ulo GalA GalA-ol GalAAla GalAAlaLys GalAGroN GalALys GalAN GalANCys GalANCysAc GalANSerAc GalAOLac GalAOPyr GalASer GalAThr GalAThrAc GalCl GalF GalGro GalGroN GalGroP GalN GalNAc GalNAc-ol GalNAc-onic GalNAc4S GalNAcA GalNAcAAla GalNAcAN GalNAcASer GalNAcGro GalNAcGroP GalNAcGroPAN GalNAcN GalNAcOAc GalNAcOAcA GalNAcOAcAN GalNAcOAcGroP GalNAcOAcOMeA GalNAcOAcOP GalNAcOMe GalNAcOP GalNAcOPCho GalNAcOPEtn GalNAcOPyr GalNAcOS GalNAla GalNAmA GalNCysGly GalNFoA GalNFoAN GalNOPCho GalNSuc GalNonUloNAc-onic GalOAc GalOAcA GalOAcAGroN GalOAcAOLac GalOAcAThr GalOAcGroP GalOAcN GalOAcNAla GalOAcNAmA GalOAcNFoA GalOAcNFoAN GalOAcOFoA GalOAcOMe GalOAcOP GalOAcOPyr GalOFoAN GalOFoNAN GalOLac GalOLac-ol GalOMe GalOMeA GalOMeCl GalOMeF GalOMeNAla GalOP GalOPA GalOPAEtn GalOPAN GalOPCho GalOPEtn GalOPEtnA GalOPEtnN GalOPy GalOPyr GalOS GalOSA GalOSOEt GalOSOMeA GalOctUloNAc-onic Galf GalfGro GalfGroP GalfNAc GalfOAc GalfOAcGro GalfOAcGroP GalfOAcOLac GalfOLac GalfOMe GalfOP GalfOPCho GalfOPyr Gl Glc Glc-ol Glc6Ulo GlcA GlcA3S GlcAAla GlcAAlaLys GlcAGlu GlcAGly GlcAGro GlcAGroN GlcALys GlcAN GlcAOLac GlcAOPy GlcAOPyr GlcASer GlcAThr GlcAThrAc GlcCho GlcF GlcGro GlcGroA GlcGroP GlcGroPA GlcI GlcN GlcN-ol GlcN2S6S GlcNAc GlcNAc-ol GlcNAc6S GlcNAcA GlcNAcAAla GlcNAcAN GlcNAcANAla GlcNAcANAlaAc GlcNAcANAlaFo GlcNAcAla GlcNAcCl GlcNAcGlu GlcNAcGly GlcNAcGro GlcNAcGroP GlcNAcGroPA GlcNAcI GlcNAcN GlcNAcN-ol GlcNAcNAla GlcNAcNAlaFo GlcNAcNAmA GlcNAcNButA GlcNAcOAc GlcNAcOAcA GlcNAcOAcN GlcNAcOAcNAla GlcNAcOAcOCmOOle GlcNAcOAcOCmOPam GlcNAcOAcOCmOVac GlcNAcOAcOLac GlcNAcOAcOOle GlcNAcOAcOPam GlcNAcOAcOPyr GlcNAcOAcOS-ol GlcNAcOAcOVac GlcNAcOGc GlcNAcOLac GlcNAcOLacAla GlcNAcOLacGro GlcNAcOMe GlcNAcOMeA GlcNAcOP GlcNAcOPCho GlcNAcOPEtg GlcNAcOPEtn GlcNAcOPOAch GlcNAcOProp GlcNAcOPyr GlcNAcOS GlcNAcOS-ol GlcNAcOSA GlcNAm GlcNAmA GlcNBut GlcNButAN GlcNButOAc GlcNCmOCm GlcNCmOCmOOle GlcNCmOCmOVac GlcNCmOVac GlcNGc GlcNGly GlcNMe GlcNMeOCm GlcNMeOCmOPam GlcNMeOCmOSte GlcNMeOCmOVac GlcNMeOSte GlcNMeOVac GlcNN GlcNOAep GlcNOCmOAch GlcNOCmOVac GlcNOMar GlcNOMe GlcNOMyr GlcNOOle GlcNOPam GlcNOPyr GlcNOSte GlcNOVac GlcNS GlcNSOS GlcNSOSOMe GlcNSuc GlcOAc GlcOAcA GlcOAcGro GlcOAcGroA GlcOAcGroP GlcOAcN GlcOAcNBut GlcOAcNCmOOle GlcOAcNCmOPam GlcOAcNCmOVac GlcOAcNMeOCm GlcOAcNMeOCmOVac GlcOAcNMeOVac GlcOAcNOCmOVac GlcOAcNOOle GlcOAcNOPam GlcOAcNOVac GlcOAcOCoum GlcOAcOFer GlcOAcOOle GlcOAcOP GlcOAcOPam GlcOAcOS GlcOAcOSA GlcOAcOSte GlcOButA GlcOBz GlcOCoum GlcOEt GlcOEtn GlcOEtnA GlcOEtnN GlcOFer GlcOFoN GlcOGc GlcOLac GlcOMal GlcOMe GlcOMe-ol GlcOMeA GlcOMeAN GlcOMeN GlcOMeNOMyr GlcOMeOFoA GlcOMeOPyr GlcOOle GlcOP GlcOP-ol GlcOPA GlcOPCho GlcOPChoGro GlcOPEtn GlcOPEtnGro GlcOPEtnN GlcOPGroP GlcOPN GlcOPNOMyr GlcOPNOPam GlcOPOOle GlcOPPEtn GlcOPPEtnN GlcOPam GlcOPyr GlcOS GlcOSA GlcOSN GlcOSNMeOCm GlcOSOEt GlcOSOMe GlcOSOMeA GlcOSin GlcS GlcSH GlcThr Glcf Gro Gro-ol Gul GulAN GulNAcA GulNAcAN GulNAcNAmA GulNAcOAcA Hep HepOP HepOPEtn HepOPPEtn Hex HexA HexN HexNAc HexOMeOFo Hexf Ido IdoA IdoA2S IdoN IdoNAc IdoOAcA IdoOAcOSA IdoOMeA IdoOS IdoOSA IdoOSOEtA IdoOSOMeA Kdn KdnOAc KdnOMe KdnOPyr Kdo Kdo-ol KdoGroP KdoN KdoOAc KdoOAcOS KdoOMe KdoOP KdoOPEtn KdoOPN KdoOPOEtn KdoOPOPEtn KdoOPPEtn KdoOPPEtnN KdoOPyr KdoOS Kdof Ko KoOMe KoOPEtn L-4dEryHexAN4en L-4dThrHex4en L-4dThrHexA4en L-4dThrHexA4enAla L-4dThrHexAN4en L-4dThre-ol L-6dAraHex L-6dAraHexOMe L-6dGalHep L-6dGalHepOP L-6dGulHep L-6dGulHepOMe L-6dGulHepOP L-6dXylHexNAc4Ulo L-Aco L-AcoOMe L-AcoOMeOFo L-BoiOMe L-Cym L-CymOAc L-DigOMe L-Ery L-EryCMeOH L-EryHexA4en L-Fru L-Fruf L-Gal L-GalAN L-GalNAc L-GalNAc-onic L-GalNAcA L-GalNAcAN L-GalNAcOAcA L-GalNAmA L-GalOAcNAmA L-GalOS L-Glc L-GlcA L-GlcNAc L-GlcOMe L-Gro-onic L-GroHexUlo L-Gul L-Gul-onic L-GulA L-GulAN L-GulHep L-GulNAc L-GulNAcA L-GulNAcAGly L-GulNAcAN L-GulNAcANEtn L-GulNAcNAmA L-GulNAcNEtnA L-GulNAcOAc L-GulNAcOAcA L-GulNAcOAcAN L-GulNAcOEtA L-GulNAcOEtnA L-GulOAcA L-Lyx L-LyxHex L-LyxHexNMe L-LyxHexOMe L-Man L-ManOMe L-ManOctUlo-onic L-Ole L-OleOAc L-Oli L-OliOMe L-Qui L-QuiN L-QuiNAc L-QuiNAcOMe L-QuiNAcOP L-QuiOMeN L-RibHex L-Ribf L-Tal L-The L-TheOAc L-Thr L-ThrHexA4en L-ThrHexAN4en L-ThrHexOMe4en L-ThrHexOMeA4en L-ThrHexOSA4en L-Xyl L-XylHex L-XylOMe LDGalHep LDGalNonUloNAc-onic LDGlcHep LDIdoHep LDIdoHepPro LDManHep LDManHepGroN LDManHepGroPA LDManHepOAc LDManHepOCm LDManHepOEtn LDManHepOMe LDManHepOP LDManHepOPEtn LDManHepOPEtnOEtn LDManHepOPOCm LDManHepOPOMe LDManHepOPOPEtn LDManHepOPOPPEtn LDManHepOPPEtn LDManHepOPPEtnOPyrP LDManHepOPyrP LDManNonUloNAcOFoN-onic LDManNonUloOFoNN-onic LLManNonUloOFoN-onic Leg Leg5Ac7Ac LegNAc LegNAcAla LegNAcNAla LegNAcNAm LegNAcNBut LegNFo Lyx LyxHex LyxHexOMe LyxOMe LyxOctUlo-onic Lyxf Man Man-ol ManA ManCMe ManF ManGroP ManN ManNAc ManNAcA ManNAcAAla ManNAcAGro ManNAcAN ManNAcANOOrn ManNAcASer ManNAcAThr ManNAcGroA ManNAcGroP ManNAcGroPA ManNAcNAmA ManNAcNEtnA ManNAcOAc ManNAcOAcA ManNAcOLac ManNAcOMe ManNAcOMeAN ManNAcOPEtn ManNAcOPyr ManNBut ManNGroP ManNonUloNAc-onic ManOAc ManOAcA ManOAcN ManOAcOMe ManOAcOPyr ManOAep ManOBut ManOEtn ManOLac ManOMe ManOMeA ManOP ManOP-ol ManOPCho ManOPEtn ManOPOMe ManOPOPyr-ol ManOPy ManOPyr ManOS ManOctUlo ManSH Manf Mur MurNAc MurNAcAla MurNAcOP MurNAcSer Neu Neu5Ac Neu5Ac9Ac Neu5AcN Neu5AcNAc Neu5AcNMe Neu5AcOAc Neu5AcOAcOMe Neu5AcOGc Neu5AcOMe Neu5AcOS Neu5Gc Neu5GcA Neu5GcN Neu5GcOMe Neu5GcOS NeuNAc NeuOFo NeuOMe OLac Ole Oli OliN OliNAc OliOMe Par Parf PerNAc Pse Pse5Ac7Ac Pse5Ac7AcNBut Pse5Ac7AcOBut PseNAc PseNAcNAm PseNAcNBut PseNAcNFo PseNAcNGro PseNAcOAcNBut PseNAcOBut PseNButNFo PseNGcNAm PseOAc PseOAcOFo PseOFo Qui QuiN QuiNAc QuiNAc-ol QuiNAcGro QuiNAcGroP QuiNAcN QuiNAcNAlaAc QuiNAcNAm QuiNAcNAspAc QuiNAcNBut QuiNAcNButGro QuiNAcNGroA QuiNAcOAc QuiNAcOBut QuiNAcOMe QuiNAcOP QuiNAla QuiNAlaAc QuiNAlaAcGro QuiNAlaBut QuiNAlaButGro QuiNAspAc QuiNBut QuiNButAla QuiNButOMe QuiNFo QuiNGlyAc QuiNHse QuiNHseGro QuiNLac QuiNMal QuiNSerAc QuiNThrAc QuiOMe QuiOMeN QuiOS QuiOSN QuiOSNBut Rha Rha-ol RhaCMe RhaCl RhaGro RhaGroA RhaGroP RhaNAc RhaNAcNBut RhaNAcNFo RhaNAcOAc RhaNPro RhaOAc RhaOAcOLac RhaOAcOMe RhaOBut RhaOFer RhaOLac RhaOMe RhaOMeCMeNLac RhaOMeCMeOFo RhaOP RhaOPEtn RhaOPOMe RhaOProp RhaOPyr RhaOS Rhaf Rib Rib-ol RibGroP-ol RibOAc RibOAcOP-ol RibOP-ol RibOPEtn-ol RibOPGroP-ol Ribf Ribf-uronic RibfOAc Sed Sedf Sia Sor Sorf Suc Sug SugOAc Tag Tal The Thr Thre-ol Thre-onic Tyv VioNAc Xluf XlufOMe Xyl Xyl-ol Xyl-onic XylHex XylHexNAc XylHexUlo XylHexUloN XylHexUloNAc XylNAc XylNMe XylOAc XylOBz XylOMe XylOP XylOS Xylf Yer YerOAc a-Tri-ol a-Tri-onic a1-1 a1-2 a1-3 a1-4 a1-5 a1-6 a1-7 a1-8 a2-1 a2-2 a2-3 a2-4 a2-5 a2-6 a2-7 a2-8 a2-9 a6-6 aldehyde-2,5-Anhydro-L-Man aldehyde-2,5-Anhydro-Tal aldehyde-Gro aldehyde-Hex aldehyde-L-Gro aldehyde-L-GroN aldehyde-QuiNAc aldehyde-Rib aldehyde-a-Tri-ol aldehyde-b-Tri-ol b-Tri-N-ol b-Tri-OP-ol b-Tri-ol b-Tri-onic b1-1 b1-2 b1-3 b1-4 b1-4Glc b1-5 b1-6 b1-7 b1-8 b1-9 b2-1 b2-2 b2-3 b2-4 b2-5 b2-6 b2-7 b2-8 b3-3 bond cNeu5Ac monosaccharide Fuc(a1-3)GlcNAc GalNAc(a1-4)GlcNAcA GlcN(b1-7)Kdo GlcNAc(b1-4)GlcNAc GlcNAcA(a1-4)Kdo GlcOPN(b1-6)GlcOPN Kdo(a2-4)Kdo Kdo(a2-5)Kdo Kdo(a2-6)GlcOPN Man(a1-2)Man Man(a1-3)Man Man(a1-6)Man Man(b1-4)GlcNAc Xyl(b1-2)Man
Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 8.0 1.0 4.0 1.846154 0.591716 4.0 1.0 4.0 4.0 -3.448276e-02 0.240093 0.051241 0.727273 0.337083 0.251423 0.276471 0.000487 0.400000 0.181818 0.288591 0.003994 0.727273 0.240093 0.051241 0.318182 0.090909 0.179293 0.006460 0.727273 0.240093 0.051241 6.950000 3.253571 4.820330 0.908780 66.603303 26.305893 44.589784 127.751200 4.0 13.0 13.0 0.034262 -2.390521 13.0 12.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 1
Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10.0 1.0 3.0 1.866667 0.248889 3.0 0.0 3.0 3.0 -2.396231e-15 0.263004 0.037282 0.703297 0.288267 0.234925 0.257853 0.000179 0.341463 0.157303 0.238951 0.003059 0.703297 0.263004 0.037282 0.296703 0.076923 0.182104 0.005067 0.703297 0.263004 0.037282 6.616667 3.407937 4.921958 0.799295 70.823725 26.381812 48.985176 174.452553 3.0 15.0 15.0 0.043739 -2.621893 15.0 14.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0
GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10.0 2.0 4.0 1.866667 0.382222 3.0 0.0 4.0 4.0 -1.449275e-02 0.239560 0.044684 0.615385 0.287575 0.234359 0.257668 0.000274 0.350000 0.172840 0.255611 0.003247 0.615385 0.239560 0.044684 0.307692 0.076923 0.169545 0.006240 0.615385 0.239560 0.044684 6.616667 3.563492 5.083122 0.950051 66.992537 28.248894 47.236515 150.711681 4.0 15.0 15.0 0.025865 -2.680266 15.0 14.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0 0 0 0 0
from glycowork.motif.analysis import get_pvals_motifs
glycans = ['Man(a1-3)[Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc',
           'Man(a1-2)Man(a1-2)Man(a1-3)[Man(a1-3)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'GalNAc(a1-4)GlcNAcA(a1-4)[GlcN(b1-7)]Kdo(a2-5)[Kdo(a2-4)]Kdo(a2-6)GlcOPN(b1-6)GlcOPN',
           'Man(a1-2)Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
           'Glc(b1-3)Glc(b1-3)Glc']
label = [3.234, 2.423, 0.733, 3.102, 0.108]
test_df = pd.DataFrame({'glycan':glycans, 'binding':label})
out = get_pvals_motifs(test_df, glycan_col_name = 'glycan', label_col_name = 'binding').iloc[:10,:]
motif pval corr_pval
4 GlcNAc 0.013469 0.394527
19 b1-4 0.013469 0.394527
8 Man 0.025198 0.590671
11 a1-3 0.025636 0.590671
13 a1-6 0.091752 0.958241
26 GlcNAc(b1-4)GlcNAc 0.091752 0.958241
33 Man(a1-3)Man 0.091752 0.958241
34 Man(a1-6)Man 0.091752 0.958241
35 Man(b1-4)GlcNAc 0.091752 0.958241
10 a1-2 0.130826 0.980276