Skip to content Skip to sidebar Skip to footer
Showing posts with the label Mojibake

How To Identify Likely Broken Pdf Pages Before Extracting Its Text?

TL;DR My workflow: Download PDF Split it into pages using pdftk Extract text of each page using pd… Read more How To Identify Likely Broken Pdf Pages Before Extracting Its Text?