Imagine your brain activity displayed on a computer screen — multiple, bustling tabs open, some sparked by a fleeting thought, others derived from prior or underlying behaviors or features.
Now, imagine a scientist trying to make sense of that activity.
To understand what’s happening, the brain’s development, or whether it’s characterized by any defining features associated with disorders or diseases, the scientist will have to, essentially, reconstruct your thoughts by mapping and describing billions of neural connections and comparing it to other brain images. These comparisons help identify any distinguishing features.
Making sense of the billions of connections measured in thousands of brains across hundreds of research centers is a difficult problem, representative of what is generally called “big data.” To handle this type of work researchers use supercomputers, like the ones housed in Texas Advanced Computing Center. Supercomputers, or Cloud systems, can easily ingest, process and compare PetaBytes of data from brain scans.
“It takes the world to understand the brain,” says Franco Pestilli, an associate professor of psychology at The University of Texas at Austin. “Collaboration across laboratories is the only way to allow collating large enough data sets to make sense of such a complex problem like brain function.”
Scientists must collaborate and share data in ways that allow easy organization and intelligible matching of subjects and data types across laboratories and research centers. Yet without establishing best practices, matching data modalities or brain connections among datasets can feel like finding a needle in a haystack.
“Historically, data storage procedures have been an overlooked aspect of the full scientific lifecycle, left either to the mercy of the local IT experts or to individuals in a lab,” says Pestilli. “As a result, data sharing and re-use have been difficult and limited not just scientific reproducibility, but also the ability of investigators to ask bigger, more important questions.”
Enter BIDS – Brain Imaging Data Structure — a community driven data-sharing standard that lowers the barriers of entry to effective data sharing in neuroscience research. The standard, backed by the National Institute of Health’s flagship neuroscience program, the BRAIN Initiative, promotes simplicity and clarity in saving, storing and sharing neuroscience data by establishing file formats and naming conventions that are both human- and machine-readable.
In this aspect, BIDS positions itself as the language of big data neuroscience with the potential to unlock answers to big questions about brain diseases, such as Alzheimer’s, or be coded into human-like advancements in AI.
So far, more than 130 researchers have contributed to the development of the standard, and its practices have been adopted by neuroimaging labs around the world. But Pestilli, along with Russ Poldrack from Stanford University, Ted Satterthwaite at University of Pennsylvania and Ariel Rokem at University of Washington, hope to use a new NIH grant to apply the standard to pre-processed data, rendering it more shareable.
“Breaking the barriers to sharing data across individual neuroscience labs can allow inquiry at the scale of populations to capture individual variability and the diversity of human brain biology,” Pestilli says. “Data sharing can effectively impact the understanding of health and disease by promoting data use from goals beyond those initially supporting the data collection process.”