This project seems to convert the PDF into an image before doing the semantic an...

This project seems to convert the PDF into an image before doing the semantic annotation, so it would work on scans as well. This doesn't give you the text, but it gets you halfway there. The other half is can be done by passing the discovered regions into an OCR engine to pull out the text.

The one time I needed to turn a scanned PDF (600+ page book) into searchable text, I used this Ruby script https://github.com/gkovacs/pdfocr/ , which pulls out individual pages using pdftk, turns them into images to feed into an OCR engine of your choice (Tesseract seems to be the gold standard) and then puts them back together. It can blow up the file size tremendously, but worked well enough for my use case. (I did write a very special purpose PDF compressor to shrink the file back, but that was more for fun.)