Abstract
Data science includes a variety of scientific methods and processes to extract data from various sources. The integration of interdisciplinary fields such as mathematics, statistics, information science, and computer science affords techniques to analyze large volumes of data to arrive at unique insights and make data-driven decisions (Sinelnikov et al., 2015) in real time. The technique lends itself to other applications across many domains including hazard assessments, analysis of near-miss data, identification of leading and lagging indicators from past accidents, and others. Benefits of this technique include efficiency due to improved data acquisition. Near-miss data represents an important source to identify conditions that lead to accidents to develop strategies to prevent them. Analysis of near-miss data sets can involve various techniques. This paper will explore the use of data science to mine accident reports, with a special emphasis on near misses to uncover occurrences that were not initially identified in the documentation. Data-science techniques such as text analyses facilitate searching large volumes of data to uncover patterns for more informed decisions. Regarding near-miss data, data science techniques can be used to test the ability to uncover new hazards/ hazardous preconditions and the accuracy of those findings. With the benefits of crunching large data sets and uncovering new hazards, considerations and implications are also made regarding how that might influence safety culture.