![]() ![]() Res = n(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) SCRIPT_DIR = os.path.dirname(os.path.abspath(_file_)) After trying textract (which seemed to have too many dependencies) and pypdf2 (which could not extract text from the pdfs I tested with) and tika (which was too slow) I ended up using pdftotext from xpdf (as already suggested in another answer) and just called the binary from python directly (you may need to adapt the path to pdftotext): import os, subprocess ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |