spm_encode
Less than 1 minute
spm_encode
usage: spm_encode [-h] --model MODEL [--inputs INPUTS [INPUTS ...]]
[--outputs OUTPUTS [OUTPUTS ...]]
[--output_format {piece,id}] [--min-len N] [--max-len N]
options:
--model MODEL sentencepiece model to use for encoding
--inputs INPUTS [INPUTS ...]
input files to filter/encode
--outputs OUTPUTS [OUTPUTS ...]
path to save encoded outputs
--output_format {piece,id}
--min-len N filter sentence pairs with fewer than N tokens
--max-len N filter sentence pairs with more than N tokens