Vector Embeddings
crul can be used to generate vector embeddings, load vector embeddings to a vector database such as pinecone, and semantically query a vector database. The three commands that support this functionality are:
- The
vectorize
command, which transforms crul results into vector embeddings using the OpenAI API. - The
vectorload
command, which loads vectors into a vector database such as pinecone. - The
vectorquery
command, which queries a vector database such as pinecone.
Note: These vector related commands require auth.
- For the
vectorize
command, you'll need to configure anopenai
credential containing your OpenAI API key with the nameopenai
. - For the
vectorload
andvectorquery
commands you'll need to configure apinecone
credential containing your Pinecone API key with the namepinecone
.
How it works​
The two main commands to understand are the vectorize
command and the vectorquery
command.
The
vectorize
command can take any crul results, whether from an API, webpage, cellar file, or other source, and transform them into vector embeddings using the OpenAI embeddings endpoint. From here, you can use theapi
command to push vector embeddings to a vector database, or if pinecone is your vector databse, simply use thevectorload
command.The
vectorquery
command can be used to semantically query an existing pinecone vector database. This can be a databse that has vector embeddings loadded in using crul and theapi
/vectorload
commands, or an existing vector database that is already configured with vectors.
Need support for another vector database? Let us know!
Let's take a look at some examples.
Examples​
vectorize
only​
Query​
devices
|| vectorize name
vectorquery
only​
Query​
vectorquery "Headlines relating to California" --pinecone.index "{INDEX}.pinecone.io"
vectorize
and vectorload
​
Query​
devices
|| vectorize name
|| vectorload --pinecone.index "{INDEX}.pinecone.io"
vectorize
, vectorload
and vectorquery
​
Query​
This first example will demonstrate all 3 commands at once. We will first use the open
command to get back a list of headlines, then vectorize
the results, vectorload
the results into a pinecone vector database index, then vectorquery
the pinecone vector database index with a semantic search for Headlines relating to California
.
open https://news.ycombinator.com/news
|| filter "(nodeName == 'A' and parentElement.attributes.class == 'titleline')"
|| rename innerText headline
|| vectorize innerText
|| vectorload --pinecone.index "{INDEX}.pinecone.io"
|| vectorquery "Headlines relating to California" --pinecone.index "{INDEX}.pinecone.io" \