The goal of the Kaggle Seizure Detection Challenge was to identify when a patient has an epileptic seizure given 1 second clips of signals from electrodes implanted in the patient's head. My python code is here. |
After the 15 second mark, seizures are characterized by large fluctuations in the voltages measured at the implanted electrodes. However, part of the challenge was to identify the seizure in the first 15 seconds, before it really got going. As I examined some plots of the raw data I thought "Hmmm, I don't really know what the doctors are looking for when they identify the early portions of seizures." Instead of investing time identifying useful features in the raw data (and FFTs of the raw data) I decided to see if I could get any useful predictions using nolearn. nolearn is a pretrained neural network for classifying images that performed well in the Kaggle Cats vs Dogs image classification competition. I figured that if nolearn could distinguish cats from dogs, perhaps it could distinguish ictal (seizure) data from interictal (non-seizure) data. In the end, applying nolearn to the electrode signals and selecting the most predictive electrodes was not as effective as other methods which made use of manually identified features in the raw data and the power spectrum of the raw data. |
My strategy involved two steps: first, I used nolearn to create a seizure-classification model for each electrode in each patient. In the second step, I combined the nolearn prediction probabilities for the most useful channels using naive bayes. I identified the most useful channels by applying naive bayes to half of the training subset and evaluated the predictions for the remaining half of the training subset. This allowed me to weed out channels for which overfitting was the biggest problem. The final prediction was based on a naive bayes classification of the probabilities predicted by nolearn for each of the most predictive channels. To validate the neural network strategy, I split the training data into a validation subset (1 seizure or 20% of the training seizures, whichever was more, plus 20% of the non-seizure data) and a training subset. Since nolearn took a while to process the data and because most of the training data only included 2-5 seizure events per patient, cross-validation wasn't very feasible. |
Since the FFT of the electrode signals was just white noise at frequencies higher than 250 Hz, I down-sampled the data to 250 Hz (human patients) or 200 Hz (dogs). Next, I converted each 1-second clip to a 250 x 250 pixel images (humans) or a 200 x 200 pixel image (dogs) by first subtracting the clip mean and then rescaling the signal by 3.0 times the standard deviation for all the clips for that electrode in the Larger images took longer to process. In addition, the 500 x 500 pixel training data set for Patient 6 did not fit into the available memory on my laptop. Processing all the training data at 250 x 250 and 200 x 200 pixel resolution was doable overnight. |
The best part of this Kaggle challenge was trying out some novel ways to harness the power of the pre-trained neural net in nolearn.
Instead of feeding images of signals from a single channel to the nolearn classifier, I glued images of all the signals from each electrode together and fed the resulting A cursory literature search indicated that existing seizure detection algorithms had some success using the FFT of the signal instead of just the raw signal. I found that feeding the signal FFT to nolearn did not produce better predictions than the raw signal alone. In addition, when I ensembled predictions from the FFT and the raw signal from different channels, the predictions were not better than with just the raw signal. Other Kagglers seem to have had more success with using a small frequency band in their models. I tried training one model for all the electrodes of one patient, as well as appending signals from other patients to the training data. I thought doing so would force nolearn to make better use of the most relevant features of the signal images. Unfortunately, the predictions did not improve. |