Installation
To use ChunkCount, you first need to install thepurecpp_chunks_clean Python package:
Initialization
To initialize ChunkCount, define thecount_unit, overlap, and count_threshold.
| Parameter | Description |
|---|---|
count_unit | The element to count before splitting (word, character, regex). |
overlap | Number of characters shared between consecutive chunks. |
count_threshold | Number of times the count_unit must appear before splitting. |
Processing Documents
To process a list of documents and split them into chunks, use theProcessDocuments method. Each resulting chunk will also be an instance of Document.
The max_workers parameter controls the number of concurrent threads used during processing.
Document instances containing the chunked text data.