Winning BrainHack ATL 2019
BrainHack ATL
A few weeks ago, my advisor suggested that some labmates and I enter in a hackathon downtown called “BrainHack”. As I’m considering getting more into machine learning with applications to medicine, a hackathon which professes its purpose is:
Bringing together students, researchers, non-academic industry professionals, engineers, data scientists, and neuroscientists from a variety of fields and expertise to explore the multiple applications of neuroimaging tools from a cross-disciplinary perspective.
That seemed like a pretty good fit for me. BrainHackATL (http://brainhackatl.org/) is actually a part of a much larger international series of hackathons just called BrainHack (http://www.brainhack.org/global2019/) which exists to get computer scientists and engineers to start working on cool brain-related problems.
The Tracks
At this, the first ever BrainHackATL, there were 2 primary challenge tracks. The first was about speeding up fMRI processing and cleaning, a daunting task that seemed to involve learning a bunch of new software packages and pipelines. The second, which my team opted to do, was a classification problem of quality control in fMRI processing, with the objective being: identify poorly processed images using machine learning.
While the problem seems simple enough (classify good vs. bad images), it turns out that there was more to it. When a neuroscientist says “image”, they mean “volume” or “image with 212 channels”. Rather than a standard grayscale brain that we were expecting, my team got a bunch of files in an unexpected format we’d never seen before (.nii.gz). This turned out to be (182, 212, 182), hardly anything we could work on by simply applying the usual computer vision models. There were also only about 3600 samples, making for a rather small dataset for so many features for each sample.
Our Solution
Since a standard image classification network wouldn’t work, my team opted for an adaptation of a 3D ResNet [1] architecture. Even with some significant downsampling of the input data, a smaller version of the original 3D ResNet network, and a couple of very powerful GPUs, each pass through all of the data took over an hour. Turns out giant 3D data is hard.
While the network itself yielded decent performance on our limited cross-validation testing, the moderators of BrainHackATL informed us that more than just accuracy, they also wanted some notion of “confidence.” If the scan was definitely good or bad, the decision should be close to 1 or 0, respectively. However, if the scan is somewhere in the middle -maybe good, maybe bad, hard to tell- the decision should be closer to 0.5.
To this end, we also added some sklearn [2] classifiers. Instead of simply classifying from the network, we also extracted embeddings from the last layer of the network and passed those to random forest, AdaBoost, and k-nearest neighbors classifiers. The predictions from all of these algorithms were averaged together, giving us more of a “confidence” score, though not exactly a rigorous one.
The Results
We won! Our team tried 3 different approaches to deep learning for fMRIs, and two of them didn’t work due to difficulties with giant data and slow training. Our third approach was deemed the best at quality control for fMRIs from BrainHackATL 2019, and our team was declared the winner! Huge thanks to Zac and Manisha from my lab, and Dr. Dr. Danny Comer from Emory, who graciously came for a couple of days to lend his expertise and help out with our sklearn stuff!
The code for our winning approach is online here: https://github.com/Core-Collab/Brainhack-ATL-2019 and the failed approaches might make their way online someday too.
Acknowledgements
Huge thanks to my advisor, Dr. Matthew Gombolay and to Dr. Comer’s advisor, Dr. Sarah Milla, for giving us the few days to compete in this hackathon and get some experience with medical machine learning, and to the coordinators of BrainHackATL, as well as the TReNDS center for hosting!
References
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.APA
[2] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of machine learning research 12.Oct (2011): 2825-2830.