Story Forest: Organizing Massive News Documents via AI and Natural Language Processing

Di Niu

Friday, May 4th, 2018, 11h, salle 26-00/428, Campus Jussieu

Slides

I will describe our recent experience of implementing a news content organization system in collaboration with Tencent that can discover hot events from vast streams of breaking news and connect events into stories for easy viewing. Our real-world system has distinct requirements in contrast to previous studies on document topic modeling and detection, in that 1) an event does not only contain articles of a similar topic, but is a cluster of documents that report exactly the same physical incidence; 2) we must evolve news stories in a logical and online manner. In solving these challenges, we propose Story Forest, a state-of-the-art news content organization system based on artificial intelligence and natural language processing. I will briefly describe the key enabling technologies in Story Forest, including identifying the relationship between text objects, e.g., whether they talk about the same event or whether one article is a follow-up of another, based on deep learning. Our system has been deployed in Tencent QQ Browser mobile app.