Train on 20M dataset

In order to prepare the 20M dataset, restructure the data pipeline, to write examples directly to disk.