Abstract

Abstract—We propose a novel two-step mining and optimization framework for inferring the root cause of anomalies that appear in road traffic data. We model road traffic as a time dependent flow on a network formed by partitioning a city into regions bounded by major roads. In the first step we identify link anomalies based on their deviation from their historical traffic profile. However, link anomalies on their own shed very little light on what caused them to be anomalous. In the second step we take a generative approach by modeling the flow in a network in terms of the origin-destination (OD) matrix which physically relates the latent flow between origin and destination and the observable flow on the links. The key insight is that instead of using all of link traffic as the observable vector we only use the link anomaly vector. By solving an L1 inverse problem we infer the routes (the origin-destination pairs) which gave rise to the link anomalies. Experiments on a very large GPS data set consisting on nearly eight hundred million data points demonstrate that we can discover routes which can clearly explain the appearance of link anomalies. The use of optimization techniques to explain observable anomalies in a generative fashion is, to the best of our knowledge, entirely novel.