pyprobound.layers.psam.PSAM
- class PSAM(kernel_size, alphabet=None, out_channels=None, in_channels=None, pairwise_distance=0, dilation=1, symmetry=None, seed=None, seed_scale=1, score_reverse=None, shift_footprint=False, shift_footprint_heuristic=False, increment_footprint=False, increment_flank=False, increment_flank_with_footprint=False, information_threshold=0.1, max_kernel_size=None, frozen_parameters=frozenset({}), normalize=False, train_bias=False, train_betas=True, name='')
Bases:
LayerSpecPSAM parameters with symmetry encoding and pairwise features.
The PSAM is a sliding filter where the weight \(\beta_\phi\) of feature \(\phi\) is defined as \(-\Delta\Delta G_\phi/RT\). A feature \(\phi\) can be the presence of a letter at a position, or a pair of two letters at two different positions.
- betas
The sequence-specific parameters.
- Type:
- __init__(kernel_size, alphabet=None, out_channels=None, in_channels=None, pairwise_distance=0, dilation=1, symmetry=None, seed=None, seed_scale=1, score_reverse=None, shift_footprint=False, shift_footprint_heuristic=False, increment_footprint=False, increment_flank=False, increment_flank_with_footprint=False, information_threshold=0.1, max_kernel_size=None, frozen_parameters=frozenset({}), normalize=False, train_bias=False, train_betas=True, name='')
Initializes the PSAM.
- Parameters:
kernel_size (
int) – The size of the convolving PSAM kernel.alphabet (
Alphabet|None) – The alphabet used to encode sequences into tensors. Must be provided if input sequences are not embedded, or if out_channels or in_channels are not specified.out_channels (
int|None) – The number of output channels, inferred from score_reverse if not specified. If score_reverse, must be even, with half the channels representing the complement.in_channels (
int|None) – The number of input channels, inferred from alphabet if not specified.pairwise_distance (
int) – The distance between two positions on the PSAM for which pairwise letter features will be scored.dilation (
int) – The spacing between kernel elements.symmetry (
Sequence[int] |None) – An encoding of reverse-complement and translational symmetries. All positions with the same integer will share parameters, while two positions with opposite signs will be complementary. For example, [1,2,3,-3,-2,-1] encodes a reverse-complement symmetric PSAM.seed (
Sequence[Sequence[str]] |None) – A seed string for each non-reverse complement output channel. Any character in the alphabet’s encoding may be used; for example, [“TATAWAW”] might be a seed for the TATA box.seed_scale (
int) – A scaling factor for the strength of the seed.score_reverse (
bool|None) – Whether to score the reverse strand, inferred from alphabet if not specified, defaults to False if neither.shift_footprint (
bool) – Whether to add a greedy exploration of shifts of positions on the PSAM to the sequential optimization procedure. For example, given a symmetry vector [2,3,4,5], will attempt [3,4,5,6] and [1,2,3,4] to try to escape local optima.shift_footprint_heuristic (
bool) – Like shift_footprint, but in one step by calculating the center of mass of the information content.increment_footprint (
bool) – Whether to add a greedy exploration of the kernel size to the sequential optimization procedure. For example, given a symmetry vector [2,3,4,5], will attempt [1,2,3,4,5,6].increment_flank_with_footprint (
bool) – Whether to increment the flank length with the footprint to keep the output length constant.information_threshold (
float) – The minimum information in the first two and last two positions on the PSAM for incrementing the footprint.max_kernel_size (
int|None) – The maximum kernel size allowed.frozen_parameters (
Set[str]) – The name of the parameters in betas which will never be trained.normalize (
bool) – Whether to mean-center the PSAM.train_bias (
bool) – Whether to train a bias term. Should be on if normalize is True.train_betas (
bool) – Whether to train any PSAM parameters, used to restrict gradient calculation to sequence-independent parameters only.name (
str) – A string used to describe the PSAM.
Methods
check_length_consistency()Checks that input lengths of Binding components are consistent.
components()Iterator of child components.
Removes invariances between monomer and pairwise parameters.
forward()Define the computation performed at every call.
freeze()Turns off gradient calculation for all parameters.
frozen_positions(positions, symmetry, ...)The keys of betas parameters to be frozen.
get_bias()A bias parameter passed to Conv1d.
Calculates the Dirichlet-inspired regularization value (see pub).
get_filter(dist)PSAM filter for a given pairwise distance.
get_information(position)Get the information content of the position on the PSAM.
in_len(length[, mode])Calculates the receptive field.
max_embedding_size()The maximum number of bytes needed to encode a sequence.
optim_procedure([ancestry, current_order])The sequential optimization procedure for all Binding components.
out_len(length[, mode])Calculates the number of elements in the output length dimension.
reload(checkpoint)Loads the model from a checkpoint file.
reload_from_state_dict(state_dict)Loads the model from a state dict.
save(checkpoint[, flank_lengths])Saves the model to a file with "state_dict" and "metadata" fields.
Shift the footprint to center the information COM.
unfreeze([parameter])Turns on gradient calculation for the specified parameter.
update_binding_optim(binding_optim)Updates a BindingOptim with the specification's optimization steps.
update_footprint([left_shift, right_shift, ...])Extend or shrink the symmetry in either direction.
update_params([pairwise_grad])Update the betas according to the symmetry string.
Attributes
The spacing between kernel elements.
in_channelsThe number of input channels.
The size of the convolving PSAM kernel.
The number of strands scored by the PSAM.
out_channelsThe number of output channels.
The distance between two positions with pairwise letter features.
Whether to score the reverse strand.
alias of
Literal['all', 'monomer', 'pairwise']Non-Inherited Members
- unfreezable
alias of
Literal[‘all’, ‘monomer’, ‘pairwise’]
- property kernel_size: int
The size of the convolving PSAM kernel.
- property dilation: int
The spacing between kernel elements.
- property n_strands: Literal[1, 2]
The number of strands scored by the PSAM.
- property pairwise_distance: int
The distance between two positions with pairwise letter features.
- property score_reverse: bool
Whether to score the reverse strand.
- out_len(length, mode='shape')
Calculates the number of elements in the output length dimension.
- Parameters:
length (
TypeVar(T,int,Tensor)) – The input length.mode (
Literal['min','max','shape']) – Either shape, which returns the number of elements, or min or max, which return the minimum or maximum number of finite elements.
- Return type:
TypeVar(T,int,Tensor)- Returns:
The number of elements in the output length dimension, according to the specified mode.
- in_len(length, mode='max')
Calculates the receptive field.
- Parameters:
length (
TypeVar(T,int,Tensor)) – The output length.mode (
Literal['min','max']) – Either min or max, representing the minimum or maximum number of positions contributing to the output length.
- Return type:
TypeVar(T,int,Tensor)- Returns:
The number of input positions that contribute to the values of the corresponding number of output positions. Outputs None if the max receptive field is undefined.
- static frozen_positions(positions, symmetry, out_channels, in_channels)
The keys of betas parameters to be frozen.
- Parameters:
positions (
list[int]) – The positions on the symmetry that will be frozen.symmetry (
list[int]) – The vector encoding reverse-complement and translational symmetries.out_channels (
int) – The number of output channels.in_channels (
int) – The number of input channels.
- Return type:
set[str]- Returns:
The keys of betas parameters to be passed to __init__.
- unfreeze(parameter='all')
Turns on gradient calculation for the specified parameter.
- Parameters:
parameter (
Literal['all','monomer','pairwise']) – Parameter to be unfrozen, defaults to all parameters.- Return type:
None
- update_binding_optim(binding_optim)
Updates a BindingOptim with the specification’s optimization steps.
- Parameters:
binding_optim (
BindingOptim) – The parent BindingOptim to be updated.- Return type:
- Returns:
The updated BindingOptim.
- update_params(pairwise_grad=False)
Update the betas according to the symmetry string.
- Parameters:
pairwise_grad (
bool) – Whether enable gradient calculation on new pairwise parameters. Monomer parameters always have it enabled.- Return type:
tuple[tuple[Tensor,...],tuple[Tensor,...]]- Returns:
A tuple of the added and removed parameters, respectively.
- update_footprint(left_shift=0, right_shift=0, check_threshold=False)
Extend or shrink the symmetry in either direction.
- Parameters:
left_shift (
int) – The number of positions to add to the left of the PSAM.right_shift (
int) – The number of positions to add to the right of the PSAM.check_threshold (
bool) – Whether to ensure that the information content of the first and last two positions are above information_threshold.
- Return type:
tuple[tuple[Tensor,...],tuple[Tensor,...]]- Returns:
A tuple of the added and removed parameters, respectively.
- shift_heuristic()
Shift the footprint to center the information COM.
- Return type:
tuple[tuple[Tensor,...],tuple[Tensor,...]]- Returns:
A tuple of the added and removed parameters, respectively.
- get_information(position)
Get the information content of the position on the PSAM.
- Return type:
float
- get_dirichlet()
Calculates the Dirichlet-inspired regularization value (see pub).
- Return type:
Tensor
- get_bias()
A bias parameter passed to Conv1d.
- Return type:
Tensor
- get_filter(dist)
PSAM filter for a given pairwise distance.
- Return type:
Tensor
- fix_gauge()
Removes invariances between monomer and pairwise parameters.
- Return type:
None