University of Twente Student Theses

Login

Automatic aviation safety reports classification

Torres Cano, Andrés Felipe (2019) Automatic aviation safety reports classification.

[img] PDF
1MB
Abstract:In this master thesis we present an approach to automatically classify aviation safety reports by applying supervised machine learning. Our proposed model is able to classify 7 different types of safety reports according to a taxonomy hierarchy of more than 800 labels using a dataset with 19 815 reports which is highly imbalanced. The reports comprise numerical, categorical and text fields that are written in english and dutch languages. Such reports are manually categorized by safety analysts in a multi-label setting. The reports are later aggregated according to such taxonomies and reported in dashboards that are used to monitor the safety of the organization and to identify emerging safety issues. Our model trains one classifier per each type of report using a LightGBM base classifier in a binary relevance setting, achieving a final macroaveraged F0:5 score of 50.22%. Additionally, we study the impact of using different text representation techniques in the problem of classifying aviation safety reports, concluding that term frequency (TF) and term frequency inverse document frequency (TF-IDF) are the best methods for our setting. We also address the imbalanced learning problem by using SMOTE oversampling, improving our classifier performance by 41%. Furthermore, we evaluate the impact of using hierarchical classification in our classifier with results suggesting that it does not improve the performance for our task. Finally, we run an experiment to measure the reliability of the manual classification process that is carried out by the analysts using Krippendoff’s mv alpha. We also present some suggestions to improve such process.
Item Type:Essay (Master)
Clients:
KLM, Schiphol, The Netherlands
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/79286
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page