Text Datasets

HDF Dataset

Based on the conventions by our team for translation datasets. It gets a directory and expects these files:

source.dev(.gz)? source.train(.gz)? source.vocab.pkl target.dev(.gz)? target.train(.gz)? target.vocab.pkl