This data loader allows loading PDF files from local storage.
PDFLoader
, you first need to install the purecpp_extract
Python package:
PDFLoader
by providing the path to a .pdf
file or a directory containing .pdf
files.
Load()
method to extract the contents of the files. This method returns a list of Document
objects.
Each Document
contains the following attributes:
metadata
: A dictionary with metadata about the documentpage_content
: The full text content of the documentDocument
.Document
per .pdf
file found in the directory.