spm_encode
Less than 1 minute
spm_encode
usage: spm_encode [-h] --model MODEL [--inputs INPUTS [INPUTS ...]]
                  [--outputs OUTPUTS [OUTPUTS ...]]
                  [--output_format {piece,id}] [--min-len N] [--max-len N]
options:
  --model MODEL         sentencepiece model to use for encoding
  --inputs INPUTS [INPUTS ...]
                        input files to filter/encode
  --outputs OUTPUTS [OUTPUTS ...]
                        path to save encoded outputs
  --output_format {piece,id}
  --min-len N           filter sentence pairs with fewer than N tokens
  --max-len N           filter sentence pairs with more than N tokens