Diving deep with Prodigy recipes

While using Prodigy we can choose to implement already existing recipes or we can write our custom recipe from scratch.

A recipe is a simple Python function that returns a dictionary of its components. The arguments of the recipe function will become available from the command line and let you pass in parameters like the dataset ID, the text source and other settings. Recipes can receive a name and a variable number of argument annotations, following the Plac syntax.

import prodigy@prodigy.recipe(
"my-custom-recipe",
dataset=("Dataset to save answers to", "positional", None, str),
view_id=("Annotation interface", "option", "v", str)
)
def my_custom_recipe(dataset, view_id="text"):
# Load your own streams from anywhere you want
stream = load_my_custom_stream()
def update(examples):
# This function is triggered when Prodigy receives annotations
print(f"Received {len(examples)} annotations!")
return {
"dataset": dataset,
"view_id": view_id,
"stream": stream,
"update": update
}

Custom recipes can be used from the command line just like the built-in recipes. All you need to do is point the -F option to the Python file containing your recipe.

prodigymy-custom-recipemy_dataset--view-id text-F recipe.py

Files can contain multiple recipes, so you can group them however you like. Argument annotations, as well as the recipe function’s docstring will also be displayed when you use the --help flag on the command line.

Components

The components returned by the recipe need to include an iterable stream, a view_id and a dataset (if you want to use the storage to save the annotations to the database). The following components can be defined by a recipe:

ComponentTypeDescription

- datasetstrID of the current project. Used to associate the annotation with a project in the database.

- view_idstrAnnotation interface to use.

- streamiterableStream of annotation tasks in Prodigy’s JSON format.

- updatecallableFunction invoked when Prodigy receives annotations. Can be used to update a model. See here for details.db-Storage ID, True for default database, False for no database or custom database class.

- progresscallable Function that takes two arguments (as of v1.10): the controller and a update_return_value (return value of the update callback, if the recipe provides one). It returns a progress value (float).

- on_loadcallableFunction that is executed when Prodigy is started. Can be used to update a model with existing annotations.

- on_exitcallableFunction that is executed when the user exits Prodigy. Can be used to save a model’s state to disk or export other data.

- before_dbcallableNEW: 1.10 Function that is called on examples before they’re placed in the database and can be used to strip out base64 data etc. Use cautiously, as a bug in your code here could lead to data loss. See here for details.

- validate_answercallableNEW: 1.10 Function that’s called on each answer when it’s submitted in the UI and can raise validation errors. See here for details.

- get_session_idcallableFunction that returns a custom session ID. If not set, a timestamp is used.

- excludelistList of dataset IDs whose annotations to exclude.

- configdictRecipe-specific configuration. Can be overwritten by the global and project config.

As an example I will show customer feedback sentiment which I found from prodigy main website.

Let’s say you’ve extracted examples of customer feedback and support emails, and you want to classify those by sentiment in the categories “happy”, “sad”, “angry” or “neutral”. The result could be used to get some insights into your customers’ overall satisfaction, or to provide statistics for your yearly report. Maybe you also want to experiment with training a model to predict the tone of incoming customer emails, so you can make sure that unhappy customers or critical situations can receive priority support. Your data could look something like this:

feedback.jsonl{"text": "Thanks for your great work – really made my day!"}
{"text": "Worst experience ever, never ordering from here again."}
{"text": "My order arrived last Tuesday."}

Prodigy comes with a built-in choice interface that lets you render a task with a number of multiple or single choice options. The result could look like this:

The combination of the options and Prodigy’s ACCEPT, REJECT and IGNORE actions provides a powerful and intuitive annotation system. If none of the options apply, you can simply ignore the task. You can also select an option and reject the task — for instance, to generate negative examples for training a model. To use a multiple-choice interface, you can set "choice_style": "multiple" in the config settings returned by your recipe. "choice_auto_accept": true will automatically accept selected answers in single-choice mode, so you won’t have to hit ACCEPT in addition to selecting the option.

I found Prodigy super helpful for my text classification projects and will be happy to help you with any questions you may have.

Data Science student @Flatiron-School