models
describes some examples for machine learning architectures applicable to glycans. The main portal is prep_models which allows users to setup (trained) models by their string names
SweetNet
SweetNet (lib_size:int, num_classes:int=1, hidden_dim:int=128)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
lib_size |
int |
|
number of unique tokens for graph nodes |
num_classes |
int |
1 |
number of output classes (>1 for multilabel) |
hidden_dim |
int |
128 |
dimension of hidden layers |
Returns |
None |
|
|
LectinOracle
LectinOracle (input_size_glyco:int, hidden_size:int=128,
num_classes:int=1, data_min:float=-11.355,
data_max:float=23.892, input_size_prot:int=1280)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
input_size_glyco |
int |
|
number of unique tokens for graph nodes |
hidden_size |
int |
128 |
layer size for graph convolutions |
num_classes |
int |
1 |
number of output classes (>1 for multilabel) |
data_min |
float |
-11.355 |
minimum observed value in training data |
data_max |
float |
23.892 |
maximum observed value in training data |
input_size_prot |
int |
1280 |
dimensionality of protein representations |
Returns |
None |
|
|
LectinOracle_flex
LectinOracle_flex (input_size_glyco:int, hidden_size:int=128,
num_classes:int=1, data_min:float=-11.355,
data_max:float=23.892, input_size_prot:int=1000)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
input_size_glyco |
int |
|
number of unique tokens for graph nodes |
hidden_size |
int |
128 |
layer size for graph convolutions |
num_classes |
int |
1 |
number of output classes (>1 for multilabel) |
data_min |
float |
-11.355 |
minimum observed value in training data |
data_max |
float |
23.892 |
maximum observed value in training data |
input_size_prot |
int |
1000 |
maximum protein sequence length for padding/cutting |
Returns |
None |
|
|
NSequonPred
NSequonPred ()
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
init_weights
init_weights (model:torch.nn.modules.module.Module, mode:str='sparse',
sparsity:float=0.1)
initializes linear layers of PyTorch model with a weight initialization
model |
Module |
|
neural network for analyzing glycans |
mode |
str |
sparse |
initialization algorithm: ‘sparse’, ‘kaiming’, ‘xavier’ |
sparsity |
float |
0.1 |
proportion of sparsity after initialization |
Returns |
None |
|
|
prep_model
prep_model (model_type:Literal['SweetNet','LectinOracle','LectinOracle_fl
ex','NSequonPred'], num_classes:int,
libr:Optional[Dict[str,int]]=None, trained:bool=False,
hidden_dim:int=128)
wrapper to instantiate model, initialize it, and put it on the GPU
model_type |
Literal |
|
type of model to create |
num_classes |
int |
|
number of unique classes for classification |
libr |
Optional |
None |
dictionary of form glycoletter:index |
trained |
bool |
False |
whether to use pretrained model |
hidden_dim |
int |
128 |
hidden dimension for the model (SweetNet/LectinOracle only) |
Returns |
Module |
|
initialized PyTorch model |
inference
>can be used to analyze trained models, make predictions, or obtain glycan representations
glycans_to_emb
glycans_to_emb (glycans:List[str], model:torch.nn.modules.module.Module,
libr:Optional[Dict[str,int]]=None, batch_size:int=32,
rep:bool=True, class_list:Optional[List[str]]=None)
Returns a dataframe of learned representations for a list of glycans
glycans |
List |
|
list of glycans in IUPAC-condensed |
model |
Module |
|
trained graph neural network for analyzing glycans |
libr |
Optional |
None |
dictionary of form glycoletter:index |
batch_size |
int |
32 |
batch size used during training |
rep |
bool |
True |
True returns representations, False returns predicted labels |
class_list |
Optional |
None |
list of unique classes to map predictions |
Returns |
Union |
|
dataframe of representations or list of predictions |
get_lectin_preds
get_lectin_preds (prot:str, glycans:List[str],
model:torch.nn.modules.module.Module,
prot_dic:Optional[Dict[str,List[float]]]=None,
background_correction:bool=False, correction_df:Optiona
l[pandas.core.frame.DataFrame]=None,
batch_size:int=128, libr:Optional[Dict[str,int]]=None,
sort:bool=True, flex:bool=False)
Wrapper that uses LectinOracle-type model for predicting binding of protein to glycans
prot |
str |
|
protein amino acid sequence |
glycans |
List |
|
list of glycans in IUPAC-condensed |
model |
Module |
|
trained LectinOracle-type model |
prot_dic |
Optional |
None |
dict of protein sequence:ESM1b representation |
background_correction |
bool |
False |
whether to correct predictions for background |
correction_df |
Optional |
None |
background prediction for glycans |
batch_size |
int |
128 |
batch size used during training |
libr |
Optional |
None |
dict of glycoletter:index |
sort |
bool |
True |
whether to sort prediction results descendingly |
flex |
bool |
False |
LectinOracle (False) or LectinOracle_flex (True) |
Returns |
DataFrame |
|
glycan sequences and predicted binding |
get_Nsequon_preds
get_Nsequon_preds (prots:List[str], model:torch.nn.modules.module.Module,
prot_dic:Dict[str,List[float]])
Predicts whether an N-sequon will be glycosylated
prots |
List |
20 AA + N + 20 AA sequences; replace missing with ‘z’ |
model |
Module |
trained NSequonPred-type model |
prot_dic |
Dict |
dict of protein sequence:ESM1b representation |
Returns |
DataFrame |
protein sequences and predicted likelihood |
get_esm1b_representations
get_esm1b_representations (prots:List[str],
model:torch.nn.modules.module.Module,
alphabet:Any)
Retrieves ESM1b representations of protein for using them as input for LectinOracle
prots |
List |
list of protein sequences to convert |
model |
Module |
trained ESM1b model |
alphabet |
Any |
used for converting sequences |
Returns |
Dict |
dict of protein sequence:ESM1b representation |
In order to run get_esm1b_representations
, you first have to run this snippet:
!pip install fair-esm import esm model, alphabet = esm.pretrained.esm1b_t33_650M_UR50S()