models
describes some examples for machine learning architectures applicable to glycans. The main portal is prep_models which allows users to setup (trained) models by their string names
SweetNet
 SweetNet (lib_size:int, num_classes:int=1, hidden_dim:int=128)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.
.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
| lib_size | int |  | number of unique tokens for graph nodes | 
| num_classes | int | 1 | number of output classes (>1 for multilabel) | 
| hidden_dim | int | 128 | dimension of hidden layers | 
| Returns | None |  |  | 
LectinOracle
 LectinOracle (input_size_glyco:int, hidden_size:int=128,
               num_classes:int=1, data_min:float=-11.355,
               data_max:float=23.892, input_size_prot:int=960)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.
.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
| input_size_glyco | int |  | number of unique tokens for graph nodes | 
| hidden_size | int | 128 | layer size for graph convolutions | 
| num_classes | int | 1 | number of output classes (>1 for multilabel) | 
| data_min | float | -11.355 | minimum observed value in training data | 
| data_max | float | 23.892 | maximum observed value in training data | 
| input_size_prot | int | 960 | dimensionality of protein representations | 
| Returns | None |  |  | 
LectinOracle_flex
 LectinOracle_flex (input_size_glyco:int, hidden_size:int=128,
                    num_classes:int=1, data_min:float=-11.355,
                    data_max:float=23.892, input_size_prot:int=1000)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.
.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
| input_size_glyco | int |  | number of unique tokens for graph nodes | 
| hidden_size | int | 128 | layer size for graph convolutions | 
| num_classes | int | 1 | number of output classes (>1 for multilabel) | 
| data_min | float | -11.355 | minimum observed value in training data | 
| data_max | float | 23.892 | maximum observed value in training data | 
| input_size_prot | int | 1000 | maximum protein sequence length for padding/cutting | 
| Returns | None |  |  | 
NSequonPred
 NSequonPred ()
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.
.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
init_weights
 init_weights (model:torch.nn.modules.module.Module, mode:str='sparse',
               sparsity:float=0.1)
initializes linear layers of PyTorch model with a weight initialization
| model | Module |  | neural network for analyzing glycans | 
| mode | str | sparse | initialization algorithm: ‘sparse’, ‘kaiming’, ‘xavier’ | 
| sparsity | float | 0.1 | proportion of sparsity after initialization | 
| Returns | None |  |  | 
prep_model
 prep_model (model_type:Literal['SweetNet','LectinOracle','LectinOracle_fl
             ex','NSequonPred'], num_classes:int,
             libr:Optional[Dict[str,int]]=None, trained:bool=False,
             hidden_dim:int=128)
wrapper to instantiate model, initialize it, and put it on the GPU
| model_type | Literal |  | type of model to create | 
| num_classes | int |  | number of unique classes for classification | 
| libr | Optional | None | dictionary of form glycoletter:index | 
| trained | bool | False | whether to use pretrained model | 
| hidden_dim | int | 128 | hidden dimension for the model (SweetNet only) | 
| Returns | Module |  | initialized PyTorch model | 
inference
 >can be used to analyze trained models, make predictions, or obtain glycan representations
glycans_to_emb
 glycans_to_emb (glycans:List[str], model:torch.nn.modules.module.Module,
                 libr:Optional[Dict[str,int]]=None, batch_size:int=32,
                 rep:bool=True, class_list:Optional[List[str]]=None)
Returns a dataframe of learned representations for a list of glycans
| glycans | List |  | list of glycans in IUPAC-condensed | 
| model | Module |  | trained graph neural network for analyzing glycans | 
| libr | Optional | None | dictionary of form glycoletter:index | 
| batch_size | int | 32 | batch size used during training | 
| rep | bool | True | True returns representations, False returns predicted labels | 
| class_list | Optional | None | list of unique classes to map predictions | 
| Returns | Union |  | dataframe of representations or list of predictions | 
get_lectin_preds
 get_lectin_preds (prot:str, glycans:List[str],
                   model:torch.nn.modules.module.Module,
                   prot_dic:Optional[Dict[str,List[float]]]=None,
                   background_correction:bool=False, correction_df:Optiona
                   l[pandas.core.frame.DataFrame]=None,
                   batch_size:int=128, libr:Optional[Dict[str,int]]=None,
                   sort:bool=True, flex:bool=False)
Wrapper that uses LectinOracle-type model for predicting binding of protein to glycans
| prot | str |  | protein amino acid sequence | 
| glycans | List |  | list of glycans in IUPAC-condensed | 
| model | Module |  | trained LectinOracle-type model | 
| prot_dic | Optional | None | dict of protein sequence:ESM1b representation | 
| background_correction | bool | False | whether to correct predictions for background | 
| correction_df | Optional | None | background prediction for glycans | 
| batch_size | int | 128 | batch size used during training | 
| libr | Optional | None | dict of glycoletter:index | 
| sort | bool | True | whether to sort prediction results descendingly | 
| flex | bool | False | LectinOracle (False) or LectinOracle_flex (True) | 
| Returns | DataFrame |  | glycan sequences and predicted binding | 
get_Nsequon_preds
 get_Nsequon_preds (prots:List[str], model:torch.nn.modules.module.Module,
                    prot_dic:Dict[str,List[float]])
Predicts whether an N-sequon will be glycosylated
| prots | List | 20 AA + N + 20 AA sequences; replace missing with ‘z’ | 
| model | Module | trained NSequonPred-type model | 
| prot_dic | Dict | dict of protein sequence:ESM1b representation | 
| Returns | DataFrame | protein sequences and predicted likelihood | 
get_esmc_representations
 get_esmc_representations (prots:List[str],
                           model:torch.nn.modules.module.Module)
Retrieves ESMC-300M representations of protein for using them as input for LectinOracle
| prots | List | list of protein sequences to convert | 
| model | Module | trained ESMC model | 
| Returns | Dict | dict of protein sequence:ESMC-300M representation | 
In order to run get_esmc_representations, you first have to run this snippet:
!pip install fair-esm from esm.models.esmc import ESMC model = ESMC.from_pretrained("esmc_300m").to(device)