Chunks
Introduction - Chunks
Chunking modules split large pieces of text into smaller, manageable segments. Overlapping helps maintain context between chunks, making them essential for Retrieval-Augmented Generation (RAG) pipelines and other text-processing tasks.
Installation
Before you begin, ensure your environment meets the following requirements:
- Python 3.9, 3.10, 3.11: PureCPP is compatible with the latest versions of Python.
- Linux/WSL support: The library is fully compatible with Linux-based systems and Windows Subsystem for Linux (WSL).
- pip: Ensure pip is installed and updated to the latest version.
run:
Chunking Modules
The library includes four main chunking modules: ChunkDefault, ChunkCount, ChunkQuery, and ChunkSimilarity.
Module | Description |
---|---|
ChunkDefault | Splits large texts into smaller chunks while maintaining context through overlap. |
ChunkCount | Segments text based on a specific count pattern. |
ChunkQuery | Filters and retrieves chunks most relevant to a given query. |
ChunkSimilarity | Splits and ranks chunks based on their similarity. |