Consider a typical cup of yogurt. It’s vegetarian, has active and live cultures, and delivers 15% of the daily recommended dose of calcium. But your aunt Lily is visiting, and she has celiac. So you want to know if the yogurt is gluten free before buying it on Amazon.
Ordinarily, you’d have to look up a yogurt by brand, and check the label to see if it was certified gluten-free. But what if there was a simpler way? What if you could search for certified gluten-free yogurts on Amazon? Or what if you could say, “Alexa, what yogurts are gluten-free?”
Now, with the product graph being developed by a team of scientists and engineers in Amazon’s consumer organization, you can.

What is a product graph?

Xin Luna Dong is a Principal Scientist on Amazon’s product graph team. Luna’s team is responsible for the incredibly complex task of associating every product on Amazon with concrete and abstract concepts. This would allow shoppers to search for the best products that fit their need with search terms like “items for picnic”, “best toys for new-born baby,” “mid-century modern furniture,” or “longest-lasting watch battery.”
Needless to say, with a catalog comprised of millions of items, attaching attributes to every product can be a daunting process. It’s a task that’s right up Luna’s alley.
“Our product graph will structure all of the world’s information as it relates to everything available on Amazon,” says Luna. “So for example, Party in the USA is a great song to play at a summer barbecue. An instant-read thermometer is also a great to have item for your barbecue party. So if you were to search for summer barbecue on Amazon, you would be given the option to stream both the song and buy the thermometer, even though they are from different product families.”
In this manner, Amazon’s product graph will describe every item using product and non-product concepts, and form links between different entities. In addition, the graph will help customers use greater variation in search terms when looking for items.
You say tomato, I say tomahto, you say “bathing suits”, I say “swimsuit”, but no matter who says what, Amazon will help shoppers find the item that’s just right for them.

Building a product graph

The Product Graph team uses a variety of machine learning techniques to get product-related information from Amazon detail pages and the Internet at large. The obvious challenge is that product information across the Internet is largely unstructured. For example, not every website or blog would list Tom Hanks neatly under a pre-defined field, such as “actor”.
To overcome this, the product graph team uses distantly supervised learning techniques – where the algorithm is trained to identify actors from a smaller, more structured database, before being let loose on the larger web. Open IE techniques are then applied to form relationships between various concepts (Tom Hanks, actor, Forrest Gump).
What’s really different about Amazon is that we get to apply state-of-the-art machine learning techniques not because they are cool or exciting, but because they solve actual customer needs.

Xin Luna Dong - principle scientist on Amazon product graph

The team also applies knowledge linkage and cleaning to make sure that the data is reliable. One technique used is to make a judgment on the validity of information depending on the source. For example, a personal blog could list the release year of Forrest Gump as 1993, while IMDb lists the movie release date as 1994. In this case, the algorithm would know to use the information from the more trusted source – IMDb.
Ultimately, the team applies graph mining techniques to identify interesting hidden patterns. This would be applied to serve recommendations such as “People who bought A also bought B”. For example, the graph could find that shoppers who bought an organic pet food were also more likely to be interested in a Fitbit, because people who are conscious of their pet’s health often pay attention to their own health.
“What’s really different about Amazon is that we get to apply state-of-the-art machine learning techniques not because they are cool or exciting, but because they solve actual customer needs,” says Luna. “People don’t just come to Amazon to buy products. They visit Amazon to see what’s new or interesting, or to discover ways they can simplify and enrich their lives, and I’m delighted to have the opportunity to help them do that.”
Xin Luna Dong spoke about Amazon’s product graph at the KDD 2018 conference. In fact, several research and data scientists from Amazon presented papers and gave talks at the event. Luna was also recently recognized as a Distinguished Engineer by the Association for Computing Machinery (ACM)