University of Wisconsin–Madison

Safety Big Data

Twitter

Dr. Zhu has successfully mined social media data to identify motor vehicle collisions with animals (Zhu, et al., 2011) and study bullying (Xu, et al., 2014) [project page]. His methodology to obtain spatio-temporal signal recovery from social media data was recognized with Best Paper in Knowledge Discovery in the 2012 ECML PKDD conference (Xu, et al., 2012). An on-going work is also being conducted by Dr. Lee to investigate driver distraction based on tweets (link).

Roadkill



Figure 1. Temporal distribution of roadkill for four species. [ref]


Figure 2. Spatial distribution of species in roadkill tweets. [ref]

Bullying


Click to see an animation of two years' bullying tweets in 40 seconds.

Two years' bullying tweets in 40 seconds


Figure 3. Venn diagram of bullying tweets. [ref]


Table 1. Number and percentage of author's role in bullying traces. [ref]


Figure 4. Temporal distribution of bullying. [ref]

Driver Distraction



Figure 5. Network graph of frequent terms possibly related to driving distraction and their associations. [ref]

References:

  • Xiaojin Zhu, Jun-Ming Xu, Christine M. Marsh, Megan K. Hines, and F. Joshua Dein. Machine learning for zoonotic emerging disease detection. In ICML 2011 Workshop on Machine Learning for Global Challenges, 2011. [pdf, poster]
  • Jun-Ming Xu, Hsun-Chih Huang, Amy Bellmore, and Xiaojin Zhu. School Bullying in Twitter and Weibo: a Comparative Study. In the Eighth International AAAI Conference on Weblogs and Social Media (ICWSM), 2014. [pdf]
  • Jun-Ming Xu, Aniruddha Bhargava, Robert Nowak, and Xiaojin Zhu. Socioscope: Spatio-temporal signal recovery from social media. In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2012. [pdf]
  • Paper under review: Deciphering 140 Characters: Text Mining Tweets On #DriverDistraction [pdf]

SHRP2 Data Processing

The research team has extensive experience working with SHRP2 NDS data. Dr. Lee helped to develop data reduction and analysis methods for SHRP2 data (Dozza, et al., 2012 and Boyle, et al., 2012) and is part of the team led by Chalmers University that is studying eye glances and distraction using SHRP2 NDS data (Yekhshatya & Lee, 2013). Drs. Noyce and Lee are currently working on another EARP project ''Quantifying Driver Distraction and Engagement using Video Analytics'' using SHRP2 NDS video data (link).

SHRP2 Data Processing



Figure 1. Comparison between conventional data reduction procedure and the proposed chunking procedure. [ref]


Figure 2. The dynamic relationships between driver, vehicle, roadway, and environment and the resulting safety consequences.[ref]


Figure 3. Data sampling strategies (figure not to scale).[ref]


Figure 4. Examples of dynamic and static factors that relate to driver, vehicle, roadway, and environmental variables at the trip level and event level.[ref]


Figure 5. Relationships among S05 contractor questions (five and eight themes).[ref]

References:

  • Dozza, M., Bärgman, J., & Lee, J. D. (2012). Chunking: A procedure to improve naturalistic data analysis. Accident Analysis & Prevention. [pdf]
  • Boyle, L. N., Hallmark, S., Lee, J. D., McGehee, D. V., Neyens, D. M., & Ward, N. J. (2012). Integration of Analysis Methods and Development of Analysis Plan. SHRP2 (Strategic Highway Research Program). [pdf]
  • Yekhshatya, L., & Lee, J. D. (2013). Changes in the correlation between eye and steering movements indicate driver distraction. IEEE Transactions on Intelligent Transportation Systems. [pdf]

NHTSA VOQ

Dr. Lee used text mining to decipher free response consumer complaints in the NHTSA vehicle owner' complaint database (Ghazizadeh, et al., 2014). They showed that complaints increased just before the recalls for Toyota and Ford/Firestone. ... more>>

VOQ Text Mining



Figure 1. Comparison of the Airbag clusters identified in fatal incidents and incidents involving injury. The vertical axis shows the frequency of each term relative to each level of severity and the size and horizontal position of the terms reflect their average frequency. In plotting this graph, the term ''airbag'' was removed from both clusters, as it had a much higher frequency than the other terms and would make it difficult to see any other terms. In addition, those terms that occurred in fewer than 10% of the reports were removed to reduce clutter. [ref]


Figure 2. Comparison of the Contact clusters identified in fatal incidents, incidents involving injury, and minor incidents. The size of the terms reflects their average frequency. In plotting this graph, the term ''contact'' was removed from all three clusters, as it had a much higher frequency than the other terms and would make it difficult to see any other terms. In addition, those terms that occurred in fewer than 10% of the reports were removed to reduce clutter. [ref]

References:

  • Ghazizadeh, M., McDonald, A.D. and Lee, J.D. Text mining to decipher free-response consumer complaints: Insights from the NHTSA vehicle owner's complaint database.[pdf]

Weather


Dr. Noyce and his team have integrated weather data with crash data using spatial statistics for ice, snow and rain related crashes (Khan, et al., 2008 and Khan, et al. 2009).

Safety Analyses Related to Weather



Figure 1. Relative rain crash rates by Wisconsin county 2000 to 2002. [ref]


Figure 2. Local Moran’s I analysis for ice-related crashes. [ref]

References:

  • Khan, G., Qin, X., and Noyce, D. (2008). Spatial Analysis of Weather Crash Patterns. J. Transp. Eng., 134(5), 191−202. [pdf]
  • Khan, G., Santiago-Chaparro, K., Qin, X., and David, N. Application and Integration of Lattice Data Analysis, Network K Functions, and Geographic Information System Software to Study Ice-Related Crashes. Transportation Research Record, v2136, pp. 67−76, 2009. [pdf]

Safety Data Integration


Dr. Noyce had developed a safety data integration framework to integrate roadway infrastructure, pavement marking, signing, weather and traffic information with crash data (Khan, et al., 2008). This framework has been used to analyze the safety performance of highway curves(Khan, et al., 2012 and Khan, et al., 2013). A recent publication by Dr. Noyce studied secondary crashes by integrating multiple data sources for an entire state and was recognized by an Outstanding Paper Award for 2014 by the TRB Committee ANB20 (Zheng, et al., 2014).


Figure 1. Safety data integration.

Curve Safety



Figure 2. Development process and description of horizontal curve (H. curve) data set for crash prediction models (HCM = Highway Capacity Manual). [ref]


Figure 3. Details of horizontal curve data set on Wisconsin STN roads. [ref]

Secondary Crash Identification



Figure 4. Framework of large-scale secondary identification using integrated highway and crash data. [ref]

References:

  • Khan, G., Santiago-Chaparro, K., Chiturri, M., and Noyce, D.. Development of Data Collection and Integration Framework for Road Inventory Data. Transportation Research Record: Journal of the Transportation Research Board, v2160, pp. 29−39, 2010. [pdf]
  • Khan, G., Bill, A., Chiturri, M., and Noyce, D.. Horizontal Curves, Signs, and Safety. Transportation Research Record: Journal of the Transportation Research Board, v2279, pp. 124−131, 2012. [pdf]
  • Khan, G., Bill, A., Chiturri, M., and Noyce, D.. Safety Evaluation of Horizontal Curves on Rural Undivided Roads. Transportation Research Record: Journal of the Transportation Research Board, v2386, pp. 147−157, 2013. [pdf]
  • Dongxi Zheng, Madhav Chitturi, Andrea Bill and David A. Noyce. Secondary Crash Identification on A Large-Scale Highway System. In Proceeding of the 2014 TRB Annual Meeting. Jan. 2014. Outstanding Paper Award of the Safety Data, Analysis and Evaluation (ANB20) Committee[pdf]




 
Traffic Safety
Highway Engineering
Human Factors
Psychology
Data Integration
Machine Learning
Artificial Intelligence
Social Media Analysis
Data Mining

indicates substantial expertise (three or more papers).

indicates marginal expertise (at least one paper or project).