Tsunami warning procedures adopted by national tsunami warning centres largely rely on the classical approach of earthquake location, magnitude determination, and the consequent modelling of possible tsunami waves. Although this approach is based on known physics theories of earthquake and tsunami generation processes, this may be the main shortcoming due to the need to satisfy minimum seismic data requirement to estimate those physical parameters. At least four seismic stations are necessary to locate the earthquake and a minimum of approximately 10 minutes of seismic waveform observation to reliably estimate the magnitude of a large earthquake similar to the 2004 Indian Ocean Tsunami Earthquake of Mw9.2. Taking into account the possibility of close seismic station saturation, the total time to tsunami warning could be more than half an hour. In attempt to reduce the time of tsunami alert a new approach is proposed based on the classification of tsunamigenic and non tsunamigenic earthquakes using speaker recognition techniques. A Tsunamigenic Dataset (TGDS) was compiled to promote the development of machine learning and pattern recognition techniques for application to seismic trace analysis and, in particular, tsunamigenic event detection, and compare them to existing seismological methods. The TGDS contains 2314 trace sets (13884 individual traces), and covers 227 off shore events (87 tsunamigenic and 140 non-tsunamigenic earthquakes with M6) from January 2000 to December 2011, inclusive. Only temporally isolated events were included in the dataset to simplify and expedite initial development of suitable feature representations.
A Support Vector Machine (SVM) classifier using a radial-basis function (RBF) kernel was applied to spectral features derived from 400-second frames of 3-component, 1-Hz broadband seismometer data. Ten-fold cross-validation was used during training to choose classifier parameters. Voting was applied to the classifier predictions provided from each station to form an overall prediction for an event. The F1 score (harmonic mean of precision) was chosen to rate each classifier due to the imbalance between representative number of events in the tsunamigenic and non-tsunamigenic classes. The described classifier achieved an F1 score of 0.923, with tsunamigenic classification precision and recall/sensitivity of 0.928 and 0.919 respectively. The system requires a minimum of 3 stations with ~400 seconds of data each to make a prediction. The accuracy improves as further stations and data become available.