pyprobound.layers.psam.PSAM

class PSAM(kernel_size, alphabet=None, out_channels=None, in_channels=None, pairwise_distance=0, dilation=1, symmetry=None, seed=None, seed_scale=1, score_reverse=None, shift_footprint=False, shift_footprint_heuristic=False, increment_footprint=False, increment_flank=False, increment_flank_with_footprint=False, information_threshold=0.1, max_kernel_size=None, frozen_parameters=frozenset({}), normalize=False, train_bias=False, train_betas=True, name='')

Bases: LayerSpec

PSAM parameters with symmetry encoding and pairwise features.

The PSAM is a sliding filter where the weight \(\beta_\phi\) of feature \(\phi\) is defined as \(-\Delta\Delta G_\phi/RT\). A feature \(\phi\) can be the presence of a letter at a position, or a pair of two letters at two different positions.

betas

The sequence-specific parameters.

Type:

TParameterDict

__init__(kernel_size, alphabet=None, out_channels=None, in_channels=None, pairwise_distance=0, dilation=1, symmetry=None, seed=None, seed_scale=1, score_reverse=None, shift_footprint=False, shift_footprint_heuristic=False, increment_footprint=False, increment_flank=False, increment_flank_with_footprint=False, information_threshold=0.1, max_kernel_size=None, frozen_parameters=frozenset({}), normalize=False, train_bias=False, train_betas=True, name='')

Initializes the PSAM.

Parameters:
  • kernel_size (int) – The size of the convolving PSAM kernel.

  • alphabet (Alphabet | None) – The alphabet used to encode sequences into tensors. Must be provided if input sequences are not embedded, or if out_channels or in_channels are not specified.

  • out_channels (int | None) – The number of output channels, inferred from score_reverse if not specified. If score_reverse, must be even, with half the channels representing the complement.

  • in_channels (int | None) – The number of input channels, inferred from alphabet if not specified.

  • pairwise_distance (int) – The distance between two positions on the PSAM for which pairwise letter features will be scored.

  • dilation (int) – The spacing between kernel elements.

  • symmetry (Sequence[int] | None) – An encoding of reverse-complement and translational symmetries. All positions with the same integer will share parameters, while two positions with opposite signs will be complementary. For example, [1,2,3,-3,-2,-1] encodes a reverse-complement symmetric PSAM.

  • seed (Sequence[Sequence[str]] | None) – A seed string for each non-reverse complement output channel. Any character in the alphabet’s encoding may be used; for example, [“TATAWAW”] might be a seed for the TATA box.

  • seed_scale (int) – A scaling factor for the strength of the seed.

  • score_reverse (bool | None) – Whether to score the reverse strand, inferred from alphabet if not specified, defaults to False if neither.

  • shift_footprint (bool) – Whether to add a greedy exploration of shifts of positions on the PSAM to the sequential optimization procedure. For example, given a symmetry vector [2,3,4,5], will attempt [3,4,5,6] and [1,2,3,4] to try to escape local optima.

  • shift_footprint_heuristic (bool) – Like shift_footprint, but in one step by calculating the center of mass of the information content.

  • increment_footprint (bool) – Whether to add a greedy exploration of the kernel size to the sequential optimization procedure. For example, given a symmetry vector [2,3,4,5], will attempt [1,2,3,4,5,6].

  • increment_flank_with_footprint (bool) – Whether to increment the flank length with the footprint to keep the output length constant.

  • information_threshold (float) – The minimum information in the first two and last two positions on the PSAM for incrementing the footprint.

  • max_kernel_size (int | None) – The maximum kernel size allowed.

  • frozen_parameters (Set[str]) – The name of the parameters in betas which will never be trained.

  • normalize (bool) – Whether to mean-center the PSAM.

  • train_bias (bool) – Whether to train a bias term. Should be on if normalize is True.

  • train_betas (bool) – Whether to train any PSAM parameters, used to restrict gradient calculation to sequence-independent parameters only.

  • name (str) – A string used to describe the PSAM.

Methods

check_length_consistency()

Checks that input lengths of Binding components are consistent.

components()

Iterator of child components.

fix_gauge()

Removes invariances between monomer and pairwise parameters.

forward()

Define the computation performed at every call.

freeze()

Turns off gradient calculation for all parameters.

frozen_positions(positions, symmetry, ...)

The keys of betas parameters to be frozen.

get_bias()

A bias parameter passed to Conv1d.

get_dirichlet()

Calculates the Dirichlet-inspired regularization value (see pub).

get_filter(dist)

PSAM filter for a given pairwise distance.

get_information(position)

Get the information content of the position on the PSAM.

in_len(length[, mode])

Calculates the receptive field.

max_embedding_size()

The maximum number of bytes needed to encode a sequence.

optim_procedure([ancestry, current_order])

The sequential optimization procedure for all Binding components.

out_len(length[, mode])

Calculates the number of elements in the output length dimension.

reload(checkpoint)

Loads the model from a checkpoint file.

reload_from_state_dict(state_dict)

Loads the model from a state dict.

save(checkpoint[, flank_lengths])

Saves the model to a file with "state_dict" and "metadata" fields.

shift_heuristic()

Shift the footprint to center the information COM.

unfreeze([parameter])

Turns on gradient calculation for the specified parameter.

update_binding_optim(binding_optim)

Updates a BindingOptim with the specification's optimization steps.

update_footprint([left_shift, right_shift, ...])

Extend or shrink the symmetry in either direction.

update_params([pairwise_grad])

Update the betas according to the symmetry string.

Attributes

dilation

The spacing between kernel elements.

in_channels

The number of input channels.

kernel_size

The size of the convolving PSAM kernel.

n_strands

The number of strands scored by the PSAM.

out_channels

The number of output channels.

pairwise_distance

The distance between two positions with pairwise letter features.

score_reverse

Whether to score the reverse strand.

unfreezable

alias of Literal['all', 'monomer', 'pairwise']

Non-Inherited Members

unfreezable

alias of Literal[‘all’, ‘monomer’, ‘pairwise’]

property kernel_size: int

The size of the convolving PSAM kernel.

property dilation: int

The spacing between kernel elements.

property n_strands: Literal[1, 2]

The number of strands scored by the PSAM.

property pairwise_distance: int

The distance between two positions with pairwise letter features.

property score_reverse: bool

Whether to score the reverse strand.

out_len(length, mode='shape')

Calculates the number of elements in the output length dimension.

Parameters:
  • length (TypeVar(T, int, Tensor)) – The input length.

  • mode (Literal['min', 'max', 'shape']) – Either shape, which returns the number of elements, or min or max, which return the minimum or maximum number of finite elements.

Return type:

TypeVar(T, int, Tensor)

Returns:

The number of elements in the output length dimension, according to the specified mode.

in_len(length, mode='max')

Calculates the receptive field.

Parameters:
  • length (TypeVar(T, int, Tensor)) – The output length.

  • mode (Literal['min', 'max']) – Either min or max, representing the minimum or maximum number of positions contributing to the output length.

Return type:

TypeVar(T, int, Tensor)

Returns:

The number of input positions that contribute to the values of the corresponding number of output positions. Outputs None if the max receptive field is undefined.

static frozen_positions(positions, symmetry, out_channels, in_channels)

The keys of betas parameters to be frozen.

Parameters:
  • positions (list[int]) – The positions on the symmetry that will be frozen.

  • symmetry (list[int]) – The vector encoding reverse-complement and translational symmetries.

  • out_channels (int) – The number of output channels.

  • in_channels (int) – The number of input channels.

Return type:

set[str]

Returns:

The keys of betas parameters to be passed to __init__.

unfreeze(parameter='all')

Turns on gradient calculation for the specified parameter.

Parameters:

parameter (Literal['all', 'monomer', 'pairwise']) – Parameter to be unfrozen, defaults to all parameters.

Return type:

None

update_binding_optim(binding_optim)

Updates a BindingOptim with the specification’s optimization steps.

Parameters:

binding_optim (BindingOptim) – The parent BindingOptim to be updated.

Return type:

BindingOptim

Returns:

The updated BindingOptim.

update_params(pairwise_grad=False)

Update the betas according to the symmetry string.

Parameters:

pairwise_grad (bool) – Whether enable gradient calculation on new pairwise parameters. Monomer parameters always have it enabled.

Return type:

tuple[tuple[Tensor, ...], tuple[Tensor, ...]]

Returns:

A tuple of the added and removed parameters, respectively.

update_footprint(left_shift=0, right_shift=0, check_threshold=False)

Extend or shrink the symmetry in either direction.

Parameters:
  • left_shift (int) – The number of positions to add to the left of the PSAM.

  • right_shift (int) – The number of positions to add to the right of the PSAM.

  • check_threshold (bool) – Whether to ensure that the information content of the first and last two positions are above information_threshold.

Return type:

tuple[tuple[Tensor, ...], tuple[Tensor, ...]]

Returns:

A tuple of the added and removed parameters, respectively.

shift_heuristic()

Shift the footprint to center the information COM.

Return type:

tuple[tuple[Tensor, ...], tuple[Tensor, ...]]

Returns:

A tuple of the added and removed parameters, respectively.

get_information(position)

Get the information content of the position on the PSAM.

Return type:

float

get_dirichlet()

Calculates the Dirichlet-inspired regularization value (see pub).

Return type:

Tensor

get_bias()

A bias parameter passed to Conv1d.

Return type:

Tensor

get_filter(dist)

PSAM filter for a given pairwise distance.

Return type:

Tensor

fix_gauge()

Removes invariances between monomer and pairwise parameters.

Return type:

None