Datasets
This is a GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project by 182 users in a period of over two years (from April 2007 to August 2012). This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation. The following heat maps visualize its distribution in Beijing.


please cite the following two papers when using this dataset.
[1] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321.
[2] Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791-800.
This is a sample of T-Drive taxi trajectory dataset which was generated by over 10,000 taxis in a period of one week in Beijing.
Please cite the following two papers when using the dataset:
[1] Jing Yuan*, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, Yan Huang. T-Drive: Driving Directions Based on Taxi Trajectories. In Proceedings of ACM SIGSPATIAL Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL GIS 2010),
[2] Jing Yuan*, Yu Zheng, Xing Xie, Guangzhong Sun. Driving with Knowledge from the Physical World. accepted by 17th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2011).
This is a portion of GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project. Each trajectory has a set of transportation mode labels, such as by driving, taking a bus, riding a bike and walking, which can support transportation mode learning.
Please cite the following three papers when using this GPS dataset.
[1] Yu Zheng, Like Liu, Longhao Wang, Xing Xie. Learning Transportation Mode from Raw GPS Data for Geographic Application on the Web, In Proceedings of International conference on World Wild Web (WWW 2008), Beijing, China. ACM Press: 247-256
[2] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321.
[3] Yu Zheng, Yukun Chen, Quannan Li, Xing Xie, Wei-Ying Ma. Understanding transportation modes based on GPS data for Web applications. ACM Transaction on the Web. Volume 4, Issue 1, January, 2010. pp. 1-36.
This simulator can generate people’s requests for taxicabs on different road segments, using the knowledge mined from a large-scale real taxi trajectories. Each query consists of an origin, destination, and a timestamp. Please cite the following paper when using the simulator.
[1] Shuo Ma, Yu Zheng, Ouri Wolfson. T-Share: A Large-Scale Dynamic Taxi Ridesharing Service. In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE 2013).
The dataset consists of the check-in data in New York City and Los Angels as well as the social structure of the users. Each check-in includes a venue ID, the category of the venue, a timestamp, and a user ID. Please cite the following papers when using the dataset.
[1] Jie Bao, Yu Zheng, Mohamed F. Mokbel. Location-based and Preference-Aware Recommendation Using Sparse Geo-Social Networking Data. ACM SIGSPATIAL GIS 2012.
[2] Ling-Yin Wei, Yu Zheng, Wen-Chih Peng, Constructing Popular Routes from Uncertain Trajectories. 18th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2012).
This dataset includes the concentration of three air pollutants, PM2.5, PM10, and NO2, from air quality monitoring stations in Beijing and Shanghai in the time span of 2013-2-8 to 2014-2-8. Please cite the following two papers when using the dataset.
[1] Yu Zheng, Furui Liu, Hsun-Ping Hsieh. U-Air: When Urban Air Quality Inference Meets Big Data. 19th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2013).
[2] Yu Zheng, Xuxu Chen, Qiwei Jin, Yubiao Chen, Xiangyun Qu, Xin Liu, Eric Chang, Wei-Ying Ma, Yong Rui, Weiwei Sun. A Cloud-Based Knowledge Discovery System for Monitoring Fine-Grained Air Quality. MSR-TR-2014-40.
The package is comprised of six parts of data that were extracted from the GPS trajectories of taxicabs, road networks, POIs of Beijing, and video clips recording real traffic on roads. Please cite the following two papers when using the dataset.
[1] Jingbo Shang*, Yu Zheng, Wenzhu Tong, Eric Chang. Inferring Gas Consumption and Pollution Emission of Vehicles throughout a City. In the Proceeding of the 20th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2014).
[2] Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 5(3), 2014.
This package is comprised of three parts of data. 1) tensors representing the 311 complaints on urban noise; 2) geographical feature of each region in NYC; 3) Real noise levels of 36 locations in NYC. Please cite the following two papers when using the dataset.
[1] Yu Zheng, Tong Liu, Yilun Wang, Yanchi Liu, Yanmin Zhu, Eric Chang. Diagnosing New York City’s Noises with Ubiquitous Data. In Proc of UbiComp 2014.
[2] Wang, Y., Zheng, Y., Liu, T. A noise map of New York City. In Proc. of UbiComp 2014.
The dataset was used for air quality forecast and real-time inference. It also can be used for test cross-domain data fusion methods. Please cite the following papers when using the dataset.
[1] Yu Zheng, Furui Liu, Hsun-Ping Hsieh. U-Air: When Urban Air Quality Inference Meets Big Data. 19th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2013).
[2] Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, Tianrui Li. Forecasting Fine-Grained Air Quality Based on Big Data. In the Proceeding of the 21th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2015).
The dataset contains bike usage (denoted by the number of check-outs and check-ins) at each bike sharing station in NYC and Chicago. The weather condition data during the period, in which the bike sharing data is collected, is also shared. Please cite the following papers when using the dataset.
[1] Yexin Lee, Yu Zheng, Huichu Zhang, Lei Chen. Traffic Prediction in a Bike Sharing System, In Proceedings of the 23rd ACM International Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL 2015)
[2] Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 5(3), 2014.
This dataset is comprised of five parts of data, named Taxi Trip Data, Bike sharing data, 311 data, POIs and road network data of NYC. Please cite the following papers when using the dataset.
[1] Yu Zheng, Huichu Zhang, Yong Yu. Detecting Collective Anomalies from Multiple Spatio-Temporal Datasets across Different Domains. In Proceedings of the 23rd ACM International Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL 2015). (Data) (Codes)
[2] Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 5(3), 2014.
This data set consists of two types of crowd flows. One is a five-year taxis flow in Beijing. The other is bike usage in a bike sharing system in New York City. A research on predicting flow of crowds have been conducted based on this dataset. Please cite the following paper when using the dataset.
[1] Junbo Zhang, Yu Zheng, Dekang Qi. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction, In Proceedings of the 31st AAAI Conference (AAAI 2017). (code)(data)(system)
Chinese Bio
郑宇(1979年-)博士、教授,湖南衡阳人,IEEE Fellow,京东集团副总裁、京东城市总裁、京东智能城市研究院院长、京东科技首席数据科学家,KDD China主席,国家“万人计划”科技创新领军人才,享受国务院特殊津贴专家,Elsevier中国高被引学者,具有十八年中美领先科技公司的管理和产品研发经验;在加入京东集团之前,他在微软亚洲研究院工作12年,是城市计算领域负责人;他还是上海交通大学的讲席教授(Chair Professor),西安电子科技大学“华山学者”讲座教授,南京大学、香港科技大学和香港理工大学等多所知名高校的兼职教授,以及西南交通大学人工智能研究院院长。
郑宇博士在国际上开辟了“城市计算”(Urban Computing)领域和学科,提出了城市计算理论体系,是城市计算领域的先驱和奠基人,也是大数据和人工智能领域的领军人物和实践者。自2006年以来,郑博士城市计算和时空数据挖掘领域发表CCF-A类论文百余篇,论文被引用62,000余次,H-Index:112,根据Google Scholar的排名,在这两个领域均位列世界第一。由他主编的《Computing with Spatial Trajectories》一书被多个国家的高校选用为教材,被Springer评为(全球华人撰写的)最受欢迎的十本计算机类图书之一。他的个人专著《Urban Computing》由麻省理工出版社发行,是国际上城市计算领域的第一本教科书。他的七项研究成果历经行业十年的验证,分别于2022年和2024年两次获得数据挖掘领域最高奖项SIGKDD Test-of-Time Award (中国唯一学者),分别于2019年、2020年、2022年和2023年四次获得时空数据领域国际最高奖项SIGSPATIAL 10-Year-Impact Award(全球唯一学者),以及IEEE MDM 2023 Test-of-Time Award,连续多年入选全球高被引学者和全球顶尖科学家。2021年,根据AI2000影响力排名,郑宇博士在数据挖掘领域位列中国第一、全球第八。
郑宇博士曾担任人工智能顶尖国际期刊ACM Transactions on Intelligent Systems and Technology的主编(Editor-in-Chief),是大陆学者担任美国计算机学会(ACM)顶尖期学术刊主编的第一人。他还担任大数据领域知名国际会议ICDE2014和CIKM2017的工业界主席,以及人工智能领域顶尖国际会议IJCAI2019的工业界主席,促进了该领域学界和工业界的融合。2019年,他作为大陆首位受邀学者在国际人工智能顶尖会议AAAI上发表主旨演讲(Keynote Speech),AAAI是人工智能领域最具影响力的国际会议之一。他也是KDD大陆首位被邀请的圆桌主题演讲者(Plenary Keynote Panel)、IJCAI 2019工业界主题演讲者和MDM2021、SSTD2021的主题演讲者。
他主持国家级科研项目4项,担任国家重点研发计划-智慧城市与物联网重大专项首席科学家、总负责人,主导工业界和政府侧亿级经费以上大型项目二十余个。担任IEEE智能城市操作系统标准组主席,负责相关国际标准的制定。担任ACM数据挖掘中国分会(KDD China)主席,有效的连接工业界和学界,国内和国外的数据挖掘领域。担任北京市智慧城市专家委员会委员、全总数字化技术专家委员会委员、中国地理信息产业协会-城市空间信息工作委员会副主任等社会职务,担任北京大数据交易所、首都会展集团、特斯联、首旅慧科等公司的董事。
他拥有丰富的科研实践和项目落地经验,拥有100多项国际、美国和中国发明专利,多项研究成功被应用在微软的产品中,三次获得微软技术转化奖。他主持开发了GeoLife、T-Drive、Urban Air和CityNoise等城市大数据系统,多次被科技评论等国际权威媒体报道。其中Urban Air首次利用大数据和人工智能技术来监测和预报细粒度空气质量,该服务覆盖了中国的300多个城市,并被中国环境保护部采用。2016年,他主持了城市大数据平台的设计和实施,并成功在中国大数据示范基地贵阳市部署。
他开创了京东智能城市业务板块,负责搭建团队、制定发展战略、打造科技产品、研发核心技术和统筹组织管理,为全国70多个城市提供服务,获得京东集团最高奖项“CEO特别奖”。他提出的城市计算为雄安智能城市建设提供了理论框架,他带领团队研发的城市操作系统成为雄安智能城市的数字底座,获得中国计算机学会科技进步一等奖,经国家批复,雄安的智能中枢以“雄安城市计算中心”命名。在南通建设了中国第一个市域治理指挥中心,成为国家级标杆,并在后续主导了十余个城市的一网统管项目,他撰写的《城市治理一网统管》一书,成为多地政府的学习材料。他带领团队跟北京市一起推出了中国首个面向政府的协同办公系统“京办”,显著提高了政府的工作效率,在北京的疫情防控和重大活动保障中发挥了至关重要的作用。他为北京国际大数据交易所搭建技术服务体系,开启了中国数据交易的新篇章。
2013年,郑宇博士因在城市计算领域的贡献被MIT科技评论评为全球杰出青年创新者(MIT TR35),该奖项从计算机、通信、生物、医疗和金融等多个领域中全球范围一共评选出35位35岁以下的顶尖创新者。2013年11月,他作为现代创新者代表登上了美国《时代》周刊。2014年,由于他主导的城市计算具有巨大的商业前景和改变行业格局的潜力,被美国《财富》评选为中国40位40岁以下商界精英。2016年,他因为在城市计算领域的贡献被评为美国计算机学会杰出科学家(ACM Distinguished Scientist)。2020年11月,他因在时空数据挖掘和城市计算领域的杰出贡献,被评为国际电气与电子工程师协会会士(IEEE Fellow)。
2017年在乌镇互联网大会上被评为中国AI英雄风云榜十大技术创新人物。2019年他与邬贺铨院士、李德仁院士一起被评为中国智慧城市十大影响力人物。2021年5月,因为他在智能城市领域做出的杰出贡献,被授予首都劳动奖章。2021年11月,郑宇博士主导的智能城市操作系统获得中国计算机学会科技进步一等奖。2021年,被中国电子商会授予“中国AI金雁奖卓越成就奖”。2022年,被中国测绘学会评为“智慧城市先锋榜领军人物”。2023年被北京市委评为“北京市有突出贡献的科学、技术、管理人才”。