Accelerating Data-Driven Reporting with Generative AI

Generative AI can help beat reporters to leverage data for original stories, pushing the boundaries of what breaking-news reporters can achieve in a fast-paced and time-constrained field. AI-powered chatbots like ChatGPT can be used to identify data patterns, streamline data cleaning, and create on-the-spot interview questions about a dataset, assisting time-constrained reporters sift through large volumes of numbers and pursue underreported stories.

While AI tools can expedite data analysis, ChatGPT is ultimately a journalistic tool like any other, and reporters need to learn to catch inaccuracies and must have the final say in how data is used in a story.

Increasing demand for instant news content has pressured journalists into doing more with less. But at times, when the surfeit of data and information threatens to overwhelm the lone beat reporter, many have found themselves doing less with more—less reporting, less analysis, less interpretation, and more regurgitation.

How Filipino reporters use numbers and data in journalism—described by seasoned data journalism expert Philip Meyer as ‘social science in a hurry’—is often constrained by lack of time.

Due to the speed at which the news cycle moves, reporters who break the news are neither incentivised nor expected to come up with original insights from data. Tight deadlines mean that simply lifting a statistic from a government press release often gets the job done. Opportunities to deepen coverage with interesting data buried in government handouts are easily missed.

In many newsrooms, only researchers detached from the daily grind have the luxury of time to scrutinise what enterprise story can be developed from a large dataset. But sadly, beat reporters who have the most access to the sources and stakeholders in their field of coverage are forced to constantly shift their attention to new issues, potentially missing underreported stories.

Not all newsrooms have a research team that can, through numbers, make the news and not break it. That is how we do less with more.

But generative AI poses opportunities for beat reporters like me to take a crack at data-driven reporting. Beyond being used as a tool to paraphrase or grammar-check sentences, AI-powered chatbots like ChatGPT can act as a number cruncher for reporters who cycle through numerous coverages in a single day.

As a reporter for an online news organisation, breaking news takes up more space on my plate than any other journalistic output. Besides education, I cover the Philippine Congress, where numbers are not always part of the biggest stories of the day, unlike business and economy.

However, I try to set a weekly goal of producing one unique or explainer where I analyse and visualise data. I have used ChatGPT to act as a more mathematically inclined extension of my brain, with the caveat that I still validate any number it produces.

There are more sophisticated ways of using AI to prepare data for analysis and reportage. But from the perspective of a daily grind reporter, here are some of the ways ChatGPT has allowed me to save time when I go out of my way to use data in story:

  • AI can help to identify patterns in a large dataset. In particular, feeding Chat-GPT with results from a large survey allows me to speed up pattern-recognition. This allows me to go beyond just citing the percentages of responses. It helps me more quickly uncover nuanced trends, correlations, and outliers that might otherwise be difficult to spot due to the large volume of data.

    Examples of prompt generation:
    ‘Based on the survey data I have inputted, can you give me a list of bullet points that can be derived from the findings? Identify outliers and interesting correlations, but ensure accuracy.’

    ‘Based on the outliers and correlations you identified, check if the survey supports (argument 1), (argument 2), and explain how.’
  • AI can be a tool for data validation, but accuracy checks are required. Chat-GPT can also help me ‘clean’ a dataset. This means ensuring that every entry is uniform. All names have to be standardised. This allows for a correct analysis to be done through Google Sheets or Microsoft Excel.

    While there are limits to Chat-GPT’s knowledge (only reaching 2021 at the latest), utilising it to help in correcting typographies and even assigning codes is a useful function. But as written by data journalist Roberto Rocha, ‘It shows great potential, but like with anything critical, it’s not quite ready to be trusted one hundred percent, at least not with the current models.’

    I still go through each entry to ensure accuracy. But it beats spending hours correcting basic formatting mistakes.
  • AI can create instant interview questions from a dataset. Besides creating story leads from data, Chat-GPT can also help generate questions that one can ask sources knowledgeable about the data.
    As written by an award-winning group of data journalists at the Online Journalism Blog, ‘Sometimes the data journalism part only takes up 10 minutes and the rest of your time is tracking down case studies, or experts. Know when to put the numbers down and pick up the phone.’

    For many beat reporters, new or crucial data is sent in the middle of a press briefing or conference. How does one analyse the dataset while transcribing officials’ answers and thinking of follow-up questions at the same time? It seems like a nightmare. Questions presented to an official after a press briefing are not always answered in time.

    Conversely, AI-powered chatbots can help one generate questions about a dataset on the spot by prompting it to supply talking points based on its findings. While many reporters might be able to do this with small datasets on their own, number-crunching large datasets often takes time and headspace. Math- or number-averse journalists will therefore benefit from this starting point.

    Journalists generating interview questions based on any prompt or data should be mindful, however, that Chat-GPT cannot capture nuance. It is best to use this as a mere guide and not blindly present the final questions without due diligence.

    Examples of prompt generation:

    ‘Based on the data I have inputted, can you get the most important findings, and generate a couple of questions that I can ask to a (describe your source and their connection to the data)? Ensure that the questions will be helpful for a news story.’

Note that journalists should approach these tips with caution and beware of data hallucinations and mix-ups, as is common in AI tools. The final decision should always rest with the reporter. As the Global Investigative Journalism Network stated, ‘Putting ChatGPT at the end of journalistic workflow risks exchanging more speed and quantity for less credibility.’

Nothing beats the analysis and questions produced by the reporter itself. But what ChatGPT ultimately does is to ensure one never starts with a blank page.

Find this story and more in

ARTIQULATE #03

ArtIQulate is a publication associated with the Adenauer Fellowship, a scholarship programme by the Media Programme Asia, Konrad-Adenauer-Stiftung Ltd.
About the author

Cristina Chi

Chi is a multimedia reporter for Philstar.com, one of the Philippines' leading digital news organisations. Chi primarily covers education, the Office of the Vice President, the House of Representatives, and human rights. She specialises in reporting on inequality in education and the broader social justice issues it cuts across through data-driven storytelling. A finalist for the Best Thesis award in the University of the Philippines journalism department in 2023, Chi's undergraduate thesis explored data journalism techniques in education reporting.

Connect with Cristina Chi

More articles by Cristina Chi

feat_img_fallback

UP crafts ‘responsible’ AI use guidelines

by Cristina Chi

ARTIQULATE #03