Data Loaders
WEB Loader
This data loader allows loading webpages from the internet.
Installation
To use the WebLoader
, you first need to install the purecpp_extract
Python package:
Initialization
You can initialize the WebLoader
by providing a single URL.
Load
Once initialized, use the Load()
method to fetch and extract content from the webpage. This method returns a list containing one Document
object.
Each Document
contains the following attributes:
metadata
: A dictionary with metadata about the documentpage_content
: The full text content of the webpage
- Since only a single URL is allowed per instance, the returned list will always contain exactly one
Document
.