The IMAGE2TEXT Starter Pack is designed for any newsroom that is interested in using AI to describe images. It is intended for media organisations of all sizes that aren’t already familiar with this kind of technology, and might think that it is out of scope for them.
The IMAGE2TEXT API provides access to our model initially trained to recognize at least 30 women politicians in our countries. We are building it in a way that allows others to contribute. If a newsroom in another country trains it to recognise politicians (or cultural figures, celebrities, athletes, etc.) they can do so and share that knowledge. The main objective is to have numerous newsrooms from different places sharing and creating knowledge.
Facilitates access and search of multimedia content for newsrooms by accurate identification and tagging
Generates rich metadata useful for optimising search engine strategies whilst improving accessibility for visually impaired people
Learns from diverse training dataset and includes diverse categories backgrounds for inclusion and representation
Collect at least 5 images per subject of interest using a public image dataset
Rename files for proper identification
Annotate and label images (image annotation)
Prepare input image data in JSON file
Standard image testing with pre-trained Convolutional Neural Network (CNN) model using preprocessed images and JSON file
Train CNN output with pre-trained Recurrent Neural Network (RNN) model for accuracy and precision
Use RNN output layer for analysis and validation
Deploy REST API with Amazon API Gateway
Although mostly accurate, this description wasn’t relevant for journalistic purposes. There were other elements in the images that could be more useful: journalists holding their telephones, microphones, and stands, which altogether could have been interpreted by the model as a press conference. If later on, a journalist would type “Gabriel Boric, press conference” in a search engine and get that video as a result, that would indeed be helpful.
On the other hand, the AI-based tools available were not created for the specific context of our countries. When we ran our first trials, none of the presidents and prominent political figures of our countries were recognised (unlike, for example, the president of the United States). The story behind these results is a broader one, about the way AI tools are being built: the datasets used to train the models lack diversity and representation from the Larger World. They are biased toward the regions where the most amount of AI development is being done.
So, even if what got us to collaborate in the first place was to use AI to describe images, we discovered along the way that we had other motivations in common: to contribute to a more diverse AI ecosystem by having more people from diverse regions, genders and cultures building datasets, training models, and developing AI products.