Mobirise Website Builder


The IMAGE2TEXT Starter Pack is designed for any newsroom that is interested in using AI to describe images. It is intended for media organisations of all sizes that aren’t already familiar with this kind of technology, and might think that it is out of scope for them.


The IMAGE2TEXT API provides access to our model initially trained to recognize at least 30 women politicians in our countries. We are building it in a way that allows others to contribute. If a newsroom in another country trains it to recognise politicians (or cultural figures, celebrities, athletes, etc.) they can do so and share that knowledge. The main objective is to have numerous newsrooms from different places sharing and creating knowledge.


Identifies, classifies and labels content in video and image files for newsrooms, whilst promoting better data governance by including contexts from the Larger World.

We seek to standardize how computer vision models are built and trained to identify people in images, to include ethical guidelines from its conception and representation through diverse use cases.

I2T Benefits

Facilitates access and search of multimedia content for newsrooms by accurate identification and tagging

Generates rich metadata useful for optimising search engine strategies whilst improving accessibility for visually impaired people

Learns from diverse training dataset and includes diverse categories backgrounds for inclusion and representation

How I2T Works

Data preprocessing

Collect at least 5 images per subject of interest using a public image dataset

Rename files for proper identification

Annotate and label images (image annotation)

Prepare input image data in JSON file

Testing & training

Standard image testing with pre-trained Convolutional Neural Network (CNN) model using preprocessed images and JSON file

Train CNN output with pre-trained Recurrent Neural Network (RNN) model for accuracy and precision

Use RNN output layer for analysis and validation

Deployment & training

Deploy REST API with Amazon API Gateway

I2T Challenges

Describing images is the next step, we first had to focus on getting the labels right.

At this stage, building a cloud agnostic API implied the increase of development costs and complexity.

Data preprocessing requires many hours of manual work.

Our Journey

Local newsrooms may be key to a more diverse AI ecosystem.

In June 2022, we – a team of journalists and technologists from Argentina, Paraguay and The Philippines – decided to collaborate in the JournalismAI Fellowship to develop a product that uses AI to describe images produced at our newsrooms. We seek to process photos, videos and infographics to automatically get tags – such as names of people – in order to categorise, distribute, and archive material more efficiently. Right from the outset, we encountered a challenge and an opportunity: most computer-vision models available in the market are not trained for our specific contexts.

On the one hand, the tools available were not created for the context of journalism. For example, we ran some trials with a video that showed the recently appointed president of Chile in his first official meeting with the president of Argentina. The AI-based tool that we used was somehow successful in describing the video, but the result tags were somewhat incomplete and out of context: “suit” (their outfit), “red carpet” (there was in fact one), “men”, and “first date” — let’s say this algorithm is sort of a romantic, and probably interpreted the elegance of the situation as a date.

Although mostly accurate, this description wasn’t relevant for journalistic purposes. There were other elements in the images that could be more useful: journalists holding their telephones, microphones, and stands, which altogether could have been interpreted by the model as a press conference. If later on, a journalist would type “Gabriel Boric, press conference” in a search engine and get that video as a result, that would indeed be helpful.

On the other hand, the AI-based tools available were not created for the specific context of our countries. When we ran our first trials, none of the presidents and prominent political figures of our countries were recognised (unlike, for example, the president of the United States). The story behind these results is a broader one, about the way AI tools are being built: the datasets used to train the models lack diversity and representation from the Larger World. They are biased toward the regions where the most amount of AI development is being done.

So, even if what got us to collaborate in the first place was to use AI to describe images, we discovered along the way that we had other motivations in common: to contribute to a more diverse AI ecosystem by having more people from diverse regions, genders and cultures building datasets, training models, and developing AI products.

Main Takeaway

AI should be beneficial for everyone, everywhere. Journalists, editors and audiences need to participate in the process of designing databases, models, and tools. For AI to be made accessible to as many people as possible, it must take into account the rich cultural diversity of the world’s population and the particular needs of different societies across the globe.

What's Next for I2T

Launch interface to open the platform for 3rd party use to diversify training data and scale the model to new use-cases

Implement an initial pricing strategy
Implement features such as context identification and image description through Natural Language Processing
Simplify infrastructure to facilitate implementation at different levels and promote AI literacy
Create landing pages with ease.

The Team

Lucila Pinto
Nicolas Russo
Raymund Sarmiento
Jaemark Tordecilla
Sara Campos
This project is part of the 2022 JournalismAI Fellowship Programme. The Fellowship brought together 46 journalists and technologists from across the world to collaboratively explore innovative solutions to improve journalism via the use of AI technologies. You can explore all the Fellowship projects at this link.
The project was developed as a collaboration between El Surti, GMA News and Grupo Octubre. The fellows who contributed to the project are: Jaemark Tordeciilla, GMA News; Raymund Sarmiento, GMA News; Lucila Pinto, Grupo Octubre; Nicolas Russo, Grupo Octubre; Sara Campos, El Surti.
JournalismAI is a project of Polis – the journalism think-tank at the London School of Economics and Political Science – and it’s sponsored by the Google News Initiative. If you want to know more about the Fellowship and the other JournalismAI activities, sign up for the newsletter or get in touch with the team via