Install

Before you begin, ensure your environment meets the following requirements:

  • Python 3.9, 3.10, 3.11: PureCPP is compatible with the latest versions of Python.
  • Linux/WSL support: The library is fully compatible with Linux-based systems and Windows Subsystem for Linux (WSL).
  • pip: Ensure pip is installed and updated to the latest version.

run:

pip install purecpp_extract

Loaders

The library currently includes four loaders: WebLoader, TXTLoader, DOCXLoader, and PDFLoader.

Document LoaderDescription
WebLoaderLoads and processes HTML web pages.
TXTLoaderLoads and processes text (.txt) files.
PDFLoaderLoads and processes text from PDF documents.
DOCXLoaderLoads and processes from Microsoft Word (.docx) files.