Installation
To use ChunkDefault, you first need to install thepurecpp_chunks_clean Python package:
Initialization
To use the ChunkDefault module, you first need to create an instance by specifying thechunk_size and overlap. These parameters define how the text will be split, ensuring that each chunk stays within the defined size and shares context with the following chunk.
| Parameter | Description |
|---|---|
chunk_size | Maximum size of each chunk (in characters). |
overlap | Number of characters shared between consecutive chunks. |
Note:Example:overlapmust be smaller thanchunk_size, otherwise an error will be raised.
Processing Documents
To process a list of documents and split them into chunks, use theProcessDocuments method. Each resulting chunk will also be an instance of Document.
The max_workers parameter controls the number of concurrent threads used during processing.
Document instances containing the chunked text data.