Embedding
Generate text embeddings using OpenAI’s embedding model.
Installation
To use EmbeddingOpenAI, you need to install the purecpp_embed
Python package:
Before you begin, ensure your environment meets the following requirements:
- Python 3.9, 3.10, 3.11: PureCPP is compatible with the latest versions of Python.
- Linux/WSL support: The library is fully compatible with Linux-based systems and Windows Subsystem for Linux (WSL).
- pip: Ensure pip is installed and updated to the latest version.
Introduction
The EmbeddingOpenAI module generates vector embeddings using OpenAI’s embedding models. These embeddings are numerical representations of text, useful for semantic search, document similarity, and information retrieval.
Initialization
To use EmbeddingOpenAI
, set your OpenAI API key using the os.environ
environment variable.
Example:
Generating Embeddings
The EmbeddingOpenAI
class provides a method GenerateEmbeddings
that expects two arguments:
items
: A list of Documents (objects withpage_content
and optionalmetadata
).model
: A string representing the OpenAI embedding model you wish to use (e.g.,"text-embedding-ada-002"
).
It returns a list of Documents of the same length, preserving the metadata
and page_content
, but with an additional field embedding
containing the embedding vector (a list of floats).
Example using a text loader and chunk
For more details on how to chunk documents, refer to the Chunk documentation.