36 - Paragraph Segmentation

Splitting text into paragraphs requires more custom logic. A paragraph might contain a more comprehensive meaning than a sentence.

Detecting Paragraphs is somehow less mainstream. Mostly there is some custom logic like: split after two line-ends, or split before uppercased title. Maybe there is some layout-meta information, or a specific paragraph- and chapter numbering that could help.

Mostly, there just is no default way of determining the paragraph boundary and people tend to work with sentences. Still, the unit of a paragraph might be of a higher value than that of a sentence. Examples might be: coreference resolutions that overlap multiple sentences. Questions that find their answer throughout a whole paragraph. A reader that understands a paragraph better than an isolated sentence. It’s clear that the signal from a writer is best expressed in a paragraph.

