Peering into the sky, could you tell the difference between a rough-legged hawk and a northern harrier? Say you're walking through tall grass. Would you know the difference between a harmless gopher snake and a rattlesnake, by their markings? What about dragonflies—could you pick out a darner versus a meadowhawk if it landed on your arm?
These are the questions the iNaturalist app answers every day for its users. A joint initiative of the California Academy of Sciences and the National Geographic Society, the app is a top download in app stores and a go-to app for hikers, campers, and anyone who loves spending time outdoors. Snap a photo of any bit of flora or fauna and—if you've got a signal—iNaturalist will tell you what you're looking at and everything you might need to know about it. Amazon Web Services and some very precise machine learning models help power the app's image recognition and pattern matching algorithm. If you’re looking at one of the 50,000 more common species found on the planet, iNaturalist can probably tell you what it is.
The goal is to get as many people as possible taking pictures of nature to add to our shared scientific knowledge. To encourage that exchange of information, iNaturalist has made their datasets openly accessible via the Registry of Open Data on AWS. Now, researchers around the world can access the datasets—more than 160 terabytes worth—without needing to pay to store their own copies of the data.
Co-Director Scott Loarie sees iNaturalist’s crowdsourced dataset as an important step toward advancing science and conservation. Loarie, who holds a master’s degree in biology from Stanford University and a doctorate in environmental sciences from Duke University, believes that combining access to hundreds of millions of crowdsourced images with machine learning tools that can find patterns in the data is ushering in a new era in scientific organismal biology, evolution, and ecology, similar to what occurred when tools to amplify and sequence DNA became available.
Call it the "phenotypic revolution", except rather than an explosion of data around an animal's genetic code, there is an explosion of data around an animal's physical characteristics. With machine learning models analyzing millions of images of species from around the world, such as those in the iNaturalist dataset, scientists are uncovering patterns that were missed in the past. For instance, scientists examining iNaturalist datasets have started to notice that dragonflies of the same species will take on darker and lighter tones depending on the climate in which they live. Discoveries like this depend on people to record the data as well as systems to make it easy to publish and access the data.
We've never had access to images of living things at this scale before. Using machine learning to reveal new scientific insights from these new biodiversity image datasets is changing how we approach life sciences.
"It's easy to think we understand everything about the natural world now," Loarie said. “But as we gather more data about it, more questions appear."
Loarie points out that despite the technologies at our fingertips, so much of field biology still involves collecting specimens and storing them in museums. While this is a useful and necessary practice, it means a lot of our understanding of living things comes from examining dead things.
"Now we have the tools to get at the data and pull out those patterns from living specimens," Loarie said. “This can reveal a great deal of information about behavior and aspects of how a species lives in the world, which isn’t preserved in specimens.”
Loarie doesn't see the revolution restricted to exploring physical differences across species populations. The ability to analyze large datasets affects our understanding of behaviors and life cycles, too. For instance, every spring, mountain goats shed their thick warm coats. However, with thousands of images of mountain goats in their various stages of molting, crossed with data points like altitude, temperature, and other environmental factors, scientists can better understand how this seasonal shedding process will be impacted by climate change.
Earth hosts millions of plants and animals, even if we exclude fungi, algae, and bacteria. To date, our knowledge of the world's flora and fauna is skewed towards animals, especially birds, reptiles, and mammals. Loarie hopes that as more users upload photos and more naturalists—both professionals and amateurs—share their data through iNaturalist, the digital picture of life on our planet will continue to round out.
"We've never had access to images of living things at this scale before," Loarie said. "Using machine learning to reveal new scientific insights from these new biodiversity image datasets is changing how we approach life sciences."