Flash floods are one of the world’s deadliest weather phenomena, killing more than 5,000 people each year. They are also some of the most difficult to predict. But Google thinks it has solved that problem in an unexpected way: by reading the news.
Humans collect a lot of weather data, but flash floods are too short-lived and localized to be comprehensively measured using long-term monitoring methods for temperatures or even river flows. This data gap means that deep learning models, which are increasingly capable of predicting weather, are unable to predict flash floods.
To solve this problem, Google researchers used Gemini (Google’s large-scale language model) to classify 5 million news articles from around the world, isolate 2.6 million different flood reports, and convert them into geotagged time series called “Groundsource.” Gila Loike, Google Research product manager, said this is the first time the company has used language models for this type of work. The study and dataset were published Thursday morning.
Using Groundsource as a real-world baseline, the researchers trained a model built on a long short-term memory (LSTM) neural network to ingest global weather forecasts and generate flash flood probabilities for specific regions.
Google’s flash flood prediction models currently reveal risk for urban areas in 150 countries on its Flood Hub platform, and the company shares that data with emergency response agencies around the world. Antonio José Beleza, an emergency response officer with the Southern African Development Community who trialled the predictive model with Google, said the model helped the organization respond more quickly to floods.
The model still has limitations. First, the resolution is fairly low, identifying risks across an area of 20 square kilometers. Google’s models also don’t incorporate local radar data that would allow real-time tracking of precipitation, making them less accurate than the National Weather Service’s flood warning system.
One important point, however, is that the project was designed to work in places where local governments cannot afford to invest in expensive weather sensing infrastructure or do not have extensive weather data records.
tech crunch event
San Francisco, California
|
October 13-15, 2026
Juliet Rosenberg, a program manager on Google’s Resilience team, told reporters this week that “Groundsource datasets really help us rebalance the map because we aggregate millions of reports.” “This allows us to extrapolate to other regions where we don’t have much information.”
Rosenberg said the team hopes that using LLM to develop quantitative datasets from written qualitative sources can be applied to efforts to build datasets for other temporary but important to predict phenomena, such as heat waves and debris flows.
Marshall Mouteno, CEO of Upstream Tech, a company that uses similar deep learning models to predict river flows for customers such as hydropower companies, said Google’s contribution is part of a growing effort to collect data for deep learning-based weather prediction models. Moutenot co-founded dynamical.org, a group that maintains a collection of machine learning-enabled weather data for researchers and startups.
“Data scarcity is one of the toughest challenges in geophysics,” Mutono said. “At the same time, we have too much data on Earth and not enough when we want to evaluate it against the truth. This was a very creative approach to getting that data.”
