pyprobound.fitting.BaseFit
- class BaseFit(rnd, dataset, prediction, observation=<function BaseFit.<lambda>>, update_construct=False, train_posbias=False, train_hill=False, max_split=None, batch_size=None, checkpoint='valmodel.pt', output='/dev/null', device=None, sampler=None, optimizer=<class 'torch.optim.lbfgs.LBFGS'>, optim_args=None, sampler_args=None, name='')
Bases:
BaseLoss[CountBatch],ABCBase class for curve fitting to independent validation data.
- __init__(rnd, dataset, prediction, observation=<function BaseFit.<lambda>>, update_construct=False, train_posbias=False, train_hill=False, max_split=None, batch_size=None, checkpoint='valmodel.pt', output='/dev/null', device=None, sampler=None, optimizer=<class 'torch.optim.lbfgs.LBFGS'>, optim_args=None, sampler_args=None, name='')
Initializes the curve fitting.
- Parameters:
rnd (
BaseRound|Aggregate) – A component containing an aggregate of different modes.dataset (
CountTable) – A CountTable with 1 to 3 columns, with the first column taken as the target; if 2 columns are provided, the second column is taken as a symmetrical error; if 3 columns are provided, the second is taken as the lower error and the third is taken as the upper error.prediction (
Callable[[Tensor],Tensor]) – A callable applied to the log aggregate \(\log Z\).observation (
Callable[[Tensor],Tensor]) – A callable applied to the target \(y\).update_construct (
bool) – Whether to reset experiment-specific parameters.train_posbias (
bool) – Whether to retrain positional bias profiles \(\omega\).train_hill (
bool) – Whether to train a Hill coefficient.max_split (
int|None) – Maximum number of sequences scored at a time (lower values reduce memory but increase computation time).batch_size (
int|None) – The number of sequences used to optimize the model at a time.checkpoint (
str|PathLike[str]) – The file where the model will be checkpointed to.output (
str|PathLike[str]) – The file where the optimization output will be written to.device (
str|None) – The device on which to perform optimization.sampler (
type[Sampler[CountBatch]] |None) – The sampler used when creating the dataloader.optimizer (
type[Optimizer]) – The optimizer used for optimization.optim_args (
MutableMapping[str,Any] |None) – Parameters passed to the optimizer. (Defaults to {“line_search_fn”:”strong_wolfe”} if available).sampler_args (
MutableMapping[str,Any] |None) – Parameters passed to the sampler.name (
str) – A string used to describe the validation dataset.
Methods
check_length_consistency()Checks that input lengths of Binding components are consistent.
components()Iterator of child components.
forward(batches)Calculates the multitask weighted loss and regularization.
freeze()Turns off gradient calculation for all parameters.
A description used when printing the output of an optimizer.
log_aggregate(seqs)Calculates the log aggregate \(\log Z_i\).
max_embedding_size()The maximum number of bytes needed to encode a sequence.
negloglik(transform, batch)Calculates the negative log-likelihood plus a normalization factor.
obs_pred(seqs, target)Calculates the observed and predicted values used for the loss.
optim_procedure([ancestry, current_order])The sequential optimization procedure for all Binding components.
regularization(component)Calculates parameter regularization.
reload(checkpoint)Loads the model from a checkpoint file.
reload_from_state_dict(state_dict)Loads the model from a state dict.
save(checkpoint[, flank_lengths])Saves the model to a file with "state_dict" and "metadata" fields.
score(batch)Wraps obs_pred, automatically managing devices.
unfreeze([parameter])Turns on gradient calculation for the specified parameter.
Attributes
unfreezablealias of
Literal['all']Non-Inherited Members
-
scale:
Parameter
-
intercept:
Parameter
- get_setup_string()
A description used when printing the output of an optimizer.
- Return type:
str
- log_aggregate(seqs)
Calculates the log aggregate \(\log Z_i\).
- Parameters:
seqs (
Tensor) – A sequence tensor of shape \((\text{minibatch},\text{length})\) or \((\text{minibatch},\text{in_channels},\text{length})\).- Return type:
Tensor- Returns:
The log aggregate tensor of shape \((\text{minibatch},)\).
- abstract obs_pred(seqs, target)
Calculates the observed and predicted values used for the loss.
- Parameters:
seqs (
Tensor) – A sequence tensor of shape \((\text{minibatch},\text{length})\) or \((\text{minibatch},\text{in_channels},\text{length})\).target (
Tensor) – A target tensor of shape \((\text{minibatch},1-3)\)
- Return type:
tuple[Tensor,Tensor,Tensor|None,Tensor|None]- Returns:
A tuple of four tensors of shape \((\text{minibatch},)\), being the transformed observed values, the transformed predicted values, the lower error values, and the upper error values.
- score(batch)
Wraps obs_pred, automatically managing devices.
- Return type:
tuple[Tensor,Tensor,Tensor|None,Tensor|None]
- negloglik(transform, batch)
Calculates the negative log-likelihood plus a normalization factor.
- Parameters:
transform (
Transform) – The component used for scoring.batch (
CountBatch) – The batch to be scored.
- Return type:
tuple[Tensor,int]- Returns:
A tuple of scalar tensors (negloglik, norm), where negloglik is the negative log-likelihood of the batch and norm is a scaling factor. A running sum of each is kept, and the loss is the ratio.