Argilla
Argilla is an open-source data curation platform for LLMs. Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring.
In this guide we will demonstrate how to track the inputs and responses of your LLM to generate a dataset in Argilla, using the ArgillaCallbackHandler
.
It's useful to keep track of the inputs and outputs of your LLMs to generate datasets for future fine-tuning. This is especially useful when you're using a LLM to generate data for a specific task, such as question answering, summarization, or translation.
Installation and Setupโ
%pip install --upgrade --quiet langchain langchain-openai argilla
Getting API Credentialsโ
To get the Argilla API credentials, follow the next steps:
- Go to your Argilla UI.
- Click on your profile picture and go to "My settings".
- Then copy the API Key.
In Argilla the API URL will be the same as the URL of your Argilla UI.
To get the OpenAI API credentials, please visit https://platform.openai.com/account/api-keys
import os
os.environ["ARGILLA_API_URL"] = "..."
os.environ["ARGILLA_API_KEY"] = "..."
os.environ["OPENAI_API_KEY"] = "..."
Setup Argillaโ
To use the ArgillaCallbackHandler
we will need to create a new FeedbackDataset
in Argilla to keep track of your LLM experiments. To do so, please use the following code:
import argilla as rg
from packaging.version import parse as parse_version
if parse_version(rg.__version__) < parse_version("1.8.0"):
raise RuntimeError(
"`FeedbackDataset` is only available in Argilla v1.8.0 or higher, please "
"upgrade `argilla` as `pip install argilla --upgrade`."
)
dataset = rg.FeedbackDataset(
fields=[
rg.TextField(name="prompt"),
rg.TextField(name="response"),
],
questions=[
rg.RatingQuestion(
name="response-rating",
description="How would you rate the quality of the response?",
values=[1, 2, 3, 4, 5],
required=True,
),
rg.TextQuestion(
name="response-feedback",
description="What feedback do you have for the response?",
required=False,
),
],
guidelines="You're asked to rate the quality of the response and provide feedback.",
)
rg.init(
api_url=os.environ["ARGILLA_API_URL"],
api_key=os.environ["ARGILLA_API_KEY"],
)
dataset.push_to_argilla("langchain-dataset")