Class ByteTokenizer

java.lang.Object
io.github.kirstenali.deepj.tokenizers.ByteTokenizer
All Implemented Interfaces:
Tokenizer

public final class ByteTokenizer extends Object implements Tokenizer
Minimal byte-level tokenizers (0-255). This is enough to train a GPT-style model end-to-end without external dependencies. For real projects you can swap this for BPE/Unigram.