This data loader allows loading DOCX files from local storage.
DOCXLoader
, you first need to install the purecpp_extract
Python package:
DOCXLoader
by providing the path to a .docx
file or a directory containing .docx
files.
Load()
method to extract the contents of the files. This method returns a list of Document
objects.
Each Document
contains the following attributes:
metadata
: A dictionary with metadata about the documentpage_content
: The full text content of the documentDocument
.Document
per .docx
file found in the directory.