Diving deeper into Prodigy

Kristinelpetrosyan
2 min readApr 12, 2021

--

Having installed the prodigy tool is not enough. Next step is to know how and when to use it. First thing I tried was using manual NER. Mark entity spans in a text by highlighting them and selecting the as labels. The model is used to tokenize the text to allow less sensitive highlighting, since the token boundaries are used to set the entity spans. The label set can be defined as a comma-separated list on the command line or as a path to a text file with one label per line. If no labels are specified, Prodigy will check if labels are present in the model.

$ prodigy ner.manual hoax_conspiracy blank:en ./conspiracy_3000.jsonl --label HOAX --patterns ./hoax_patterns.jsonl

We use the ner.manual method for training. This requires us to specify a new database, a language model, the text we will be training from, the Entity Name, and the examples we’ve collected annotations for. Since we are creating a new model, we pass the method an empty language model.

After executing the above command you can then use your web browser to interactively train your model. Once again while using the browser tool we are in an active training loop except now instead of spaCy giving us words to choose from, it pulls a line of text from our training file and makes a prediction regarding what entities are present. If spaCy is wrong we correct the mistake and spaCy learns from its mistake. If training is going well, spaCy should get better and better at predicting entities. After training in the browser, Prodigy saves everything you’ve done to the database you’ve specified.

Typically I try to train about 500 text examples before calculating model metrics. While training this model I noticed that spaCy was not improving on its ability to predict Entities, its performance was pretty flat. At ~300 I decided to quit and do some quick metrics to see if continuing was worth my time.

This is only a small portion what we can do using Prodigy. About other Prodigy recipes will mention below and will explore in my next article.

Thank you for reading and hope this was helpful.

--

--

Kristinelpetrosyan
Kristinelpetrosyan

Written by Kristinelpetrosyan

Data Science student @Flatiron-School

No responses yet