espnet2.asr.state_spaces.s4.S4
espnet2.asr.state_spaces.s4.S4
class espnet2.asr.state_spaces.s4.S4(d_model, d_state=64, l_max=None, channels=1, bidirectional=False, activation='gelu', postact='glu', hyper_act=None, dropout=0.0, tie_dropout=False, bottleneck=None, gate=None, transposed=True, verbose=False, **kernel_args)
Bases: Module
Initialize S4 module.
d_state: the dimension of the state, also denoted by N l_max: the maximum kernel length, also denoted by L.
Set l_max=None to always use a global kernel
channels: can be interpreted as a number of “heads”; : the SSM is a map from a 1-dim to C-dim sequence. It’s not recommended to change this unless desperate for things to tune; instead, increase d_model for larger models
bidirectional: if True, convolution kernel will be two-sided
Position-wise feedforward components:
activation: activation in between SS and FF postact: activation after FF hyper_act: use a “hypernetwork” multiplication (experimental) dropout: standard dropout argument. tie_dropout=True ties the dropout
mask across the sequence length, emulating nn.Dropout1d
Other arguments:
transposed: choose backbone axis ordering of : (B, L, H) (if False) or (B, H, L) (if True) [B=batch size, L=sequence length, H=hidden dimension]
gate: add gated activation (GSS) bottleneck: reduce SSM dimension (GSS)
See the class SSKernel for the kernel constructor which accepts kernel_args. Relevant options that are worth considering and tuning include “mode” + “measure”, “dt_min”, “dt_max”, “lr”
Other options are all experimental and should not need to be configured
property d_output
default_state(*batch_shape, device=None)
forward(u, state=None, rate=1.0, lengths=None, **kwargs)
Forward pass.
u: (B H L) if self.transposed else (B L H) state: (H N) never needed unless you know what you’re doing
Returns: same shape as u
setup_step(**kwargs)
step(u, state, **kwargs)
Step one time step as a recurrent model.
Intended to be used during validation.
u: (B H) state: (B H N) Returns: output (B H), state (B H N)