Installation
To use thePDFLoader, you first need to install the purecpp_extract Python package:
Initialization
You can initialize thePDFLoader by providing the path to a .pdf file or a directory containing .pdf files.
Load
Once initialized, use theLoad() method to extract the contents of the files. This method returns a list of Document objects.
Each Document contains the following attributes:
metadata: A dictionary with metadata about the documentpage_content: The full text content of the document
- If a single file path was provided during initialization, the returned list will contain one
Document. - If a directory path was provided, the list will contain one
Documentper.pdffile found in the directory.