If you are lucky there is a ready-to-use file in a valid CSV- or JSON-format with raw text documents in it. This means other people have made an effort for you to save all retrieved documents into one file. This file is a good starting point for your task, provided that you know the constraints and filters that were applied when the structured datafile was created.
This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.