Sharing Data to Solve the Riddle of Autism
Cover Story

Sharing Data to Solve the Riddle of Autism

Worldwide, at least one in 100 people have autism spectrum disorder. In the United States, the Centers for Disease Control and Prevention put the number at one in 68 (1). Despite the high prevalence and increased awareness of autism in recent years, the underlying mechanisms still remain unclarified.

In an attempt to shed some much-needed light on this condition, labs from around the world are contributing resting state functional magnetic resonance imaging (R-fMRI) that provides information about brain activity, as well as anatomical and phenotypic datasets in a grassroots effort called the Autism Brain Imaging Data Exchange (ABIDE) (2).  The idea is to provide an open-access resource so scientists from many disciplines can delve into the data as a way to understand why autism spectrum disorder occurs in one person and not the next, and to gain insight into the considerable diversity within the disorder itself.

The first iteration of the effort—now known as ABIDE I—began gathering previously collected data in December 2011, and in August 2012, it released 1,112 datasets from 17 international sites. The datasets were nearly evenly split between individuals who had ASD and control individuals who did not. Since then, a second phase of ABIDE—ABIDE II—received funding from the National Institute of Mental Health to carry on the work, and as of June 2017, it has aggregated more than 1,100 additional datasets, which were released to the broader scientific community last summer.

To learn more about the initiative, IEEE Pulse spoke with Adriana Di Martino, who cofounded and now coordinates ABIDE, and Michael Milham, who is a member of the ABIDE team tasked with aggregating and organizing imaging data. Di Martino, M.D., is an associate professor in the Department of Child and Adolescent Psychiatry at New York University (NYU) Langone Medical Center. Milham, M.D., Ph.D., is director of the Center for the Developing Brain at the non-profit Child Mind Institute in New York City, and director of the Center for Biomedical Imaging and Neuromodulation at the Nathan S. Kline Institute for Psychiatric Research, in Orangeburg, New York.

IEEE Pulse: Tell me a bit about what we know — or don’t know — about autism spectrum disorder.

Adriana Di Martino, M. D., who cofounded and now coordinates the Autism Brain Imaging Data Exchange (ABIDE), M.D., is an associate professor in the Department of Child and Adolescent Psychiatry at New York University (NYU) Langone Medical Center. Photo courtesy of NYU Langone Medical Center

Adriana Di Martino, M.D.

Di Martino: What we know today is that autism is a neurodevelopmental condition that is characterized by impairment in social communication and by a pattern of restricted, stereotyped interests or behaviors that can be seen very early on. During play, for instance, a young child with autism may spend more time lining up toys in a particular order than engaging in pretend play. Later on, in individuals who are verbal and have typical intelligence, this might be seen as being particularly interested in specific topics, such as trains or maps, to the point that these interests can interfere with their learning and their ability to interact.

In addition, autism spectrum disorder affects the ability to have typical social interactions, including the ability to maintain and sustain typical friendships. It doesn’t mean that the individuals with autism will not develop friendships at all; autism just makes it harder to navigate the social world. That said, what we also know is that the earlier the diagnosis is made, the more impactful the interventions.

Michael Milham, M.D., Ph.D., a member of the ABIDE team as well as director of the Center for the Developing Brain at the Child Mind Institute in New York City, and director of the Center for Biomedical Imaging and Neuromodulation at the Nathan S. Kline Institute for Psychiatric Research, in Orangeburg, New York. Photo courtesy of the Child Mind Institute.

Michael Milham, M.D., Ph.D.

To date, however, we do not know the exact etiology or causes of autism spectrum disorder, and while we know that it is a neurodevelopmental condition, we don’t know the specific mechanisms involved.

Milham: In terms of mechanisms, there is general agreement that patterns of brain connectivity in autism — whether you’re looking structurally or functionally — are different. The question now is, what is the nature of these differences? And as you can imagine, the brain is highly complex and heterogeneous, so maybe there’s one network or set of networks that shows increased connectivity in autism, and another that may show decreased connectivity.

IEEE Pulse: How does ABIDE fit into the quest for a greater understanding of autism spectrum disorder?

Di Martino: Although we don’t know the underlying mechanisms of autism, the evidence from multiple disciplines—genetics, pathology, and neuroimaging—suggests that connectivity of brain circuits, which we call the connectome, is involved. The awareness that autism may be a dysconnection syndrome was one of the main reasons that motivated the initial grassroots initiative. The other motivator was the realization that very large-scale databanks and collaborations have been feasible and successful to study complex disorders, as shown in genetics and prior other successful data sharing initiatives, including one to investigate attention deficit hyperactivity disorder (ADHD) (3).

For autism, if you think about the millions of connections existing in the brain, in combination with the significant and remarkable heterogeneity of its presentation, and likely in its underlying mechanisms, we have to deal with a formidable challenge. Before we started ABIDE, neuroimaging data in particular was collected from single labs, generally yielding relatively small sample size studies. So while each individual lab may have had great ideas on how to address important questions about the brain connectome in autism, the sample sizes that each lab could afford to generate, due to the cost of these studies, were limited.

So we started with a group of colleagues who had already participated in data-sharing initiatives, and we began spreading the word and the invitation among other colleagues. And we were pleased to find they were ready to provide access to data for the purpose of accelerating the pace of discovery about the brain connectome in autism.

IEEE Pulse: What led up to ABIDE?

Milham: ABIDE is essentially part of a larger set of data-sharing initiatives, including the 1000 Functional Connectomes Project (4), which started back in 2009. Each of more than 30 sites from around the world contributed R-fMRI datasets. The idea was that we would collect preexisting datasets, aggregate them, and share them so researchers could do functional connectivity analyses and various sorts of morphometry analyses. That was the beginning.

A year later, we founded the International Neuroimaging Data-Sharing Initiative, or INDI (5), which was basically the next phase of the 1000 Functional Connectomes Project. The idea there was to start to shift the community toward looking at more clinically enriched datasets, meaning datasets with more comprehensive phenotyping and psychiatric characterizations, and that also included clinical populations.

After that came the ADHD-200 data-sharing project Dr. Di Martino mentioned, which launched in 2011, and then ABIDE came along to focus on autism data. For that, Dr. Di Martino, Stewart Mostofsky (director of the Laboratory for Neurocognitive and Imaging Research, Kennedy Krieger Institute in Baltimore), and I worked to pull together previously collected autism imaging data, aggregate the data, openly share it, and put minimal restrictions on users. For instance, people don’t have to register their analyses with us, but we ask them to appropriately cite where they got the data.

IEEE Pulse: With the success of ABIDE I in providing more than 1,100 datasets, why was it important to continue with the second iteration of ABIDE?

Milham: There are multiple answers to that. First, at the purest level, you want discovery datasets and replication datasets, so you need more and more datasets. Second, because these sites weren’t working together when they generate the data, there’s a lot of variation in terms of what protocols are used, including which age groups are in the contributed data — some was from adults, some from children — so again more data is better to take into account for these variations. And then beyond that, we know that autism, as well as most psychiatric disorders, are quite heterogeneous and have a range of presentations. So overall, the major driving forces for data-sharing initiatives, whether it’s ABIDE or something else, is to create large-scale datasets that attempt to capture heterogeneity as well as the varying sources of confounding artifacts so that we can come up with more meaningful scientific findings.

IEEE Pulse: What are you learning from ABIDE I and II in terms of data collection and analysis?

Di Martino: We learned pretty quickly from ABIDE I that even though it was an unprecedentedly large sample, the data were still not sufficient to process the heterogeneity in autism spectrum disorder, and even with ABIDE II, we may find that an even larger dataset may still be helpful to create more homogeneous subgroups within this disorder. As an obvious example, there are many more data from males than from females. While autism is more frequent in males by a ratio of about 3-4:1, it is still extremely important to learn about the brain connectome in females so that we can understand the mechanisms underlying this sex difference. There are other things that today we cannot see, but that we may be able to see with more data.

Overall, I have to say that ABIDE I and ABIDE II have been successful in many ways, not only because we showed that it was feasible to do it, but also because there are already more than 77 peer-reviewed manuscripts that have used ABIDE data as of June 2017. In addition, this initiative has opened up the data to not only experts in autism, but also to applied mathematicians, statisticians, and others who generally do not have easy access to these data.

Milham: One of the interesting findings actually came from ABIDE collaborators Alexandre Abraham and Gael Varoquaux (of INRIA Saclay-ˆIle-de-France, Saclay, France), who are looking into the challenges arising from datasets collected using different protocols. What their group did was come up with predictive classifiers for identifying the presence or absence of autism in an individual, which were robust to the site at which the data were collected (6).

IEEE Pulse: In other words, from these multiple sites that collected the data using different protocols, they still managed to develop a way to tease out sufficient patterns of connectivity that they could identify which subjects had autism and which were controls.

Milham: Yes. The accuracy of predictions is still somewhere in the mid- to high-60 percentile range, so there’s obviously more work to be done, but they showed it was possible. In addition, they also found that the more data that were included, the better the classifiers performed. Many thought more data would mean more noise because the datasets are coming from different sites, but instead, when the classifiers were presented with a dataset from a site that had never been seen before, they were able to extract enough signal to find commonalities and make a prediction.

IEEE Pulse: Have ABIDE I and II contributed to our understanding of autism yet?

Milham: There has been a range of findings. One of the things that researchers are working to reconcile is the seemingly conflicting findings about whether the brain is more or less connected in autism, and by looking at autism more broadly through the initial dataset from ABIDE I, we in the ABIDE consortium published a paper showing that in autism, cortical-cortical connectivity is decreased, whereas subcortical-cortical connectivity is increased (7).

Di Martino: A couple of other leads are emerging and suggesting that along with the usual suspects, such as cortical circuits involved in social interaction, there are also circuits involved in sensory processes and motor processes that are involved and affected in autism.

IEEE Pulse: What do you ultimately hope that ABIDE I and II will accomplish?

Di Martino: One area of great interest is to be able to identify and clarify mechanisms, particularly those of the subtypes. If it is true that this heterogeneity has a biological underpinning, perhaps ABIDE can contribute and provide some needed insight.

Milham: My hope for ABIDE and the broader data-sharing initiatives is that they will advance our understanding of phenomenology associated with autism and our understanding of various brain differences, and will also start to foster more of a neurodevelopmental perspective by the inclusion of increasingly younger children.

My other hope is that it will push the scientific community to innovate the methodologies required for more sophisticated imaging-data analysis and predictive tools. As we start to develop predictive tools, we’ll be able to not only begin taking on the challenges of differentiating individuals into subtypes based upon patterns of brain connectivity, but also start to make predictions about prognosis and risk. That’s where we can start having an impact on early identification and intervention.


  1. Centers for Disease Control and Prevention, “Autism Spectrum Disorder (ASD): Data & Statistics.”
  2. ABIDE, “Welcome to the Autism Brain Imaging Data Exchange!”
  3. HD-200 Consortium, “The ADHD-200 Consortium: A model to advance the translational potential of neuroimaging in clinical neuroscience,” Front Syst Neurosci, vol. 6 (September 2012): 62.
  4. “1000 Functional Connectomes Project.”
  5. M. Mennes, B. B. Biswal, F. X. Castellanos, and M. P. Milham. “Making data sharing work: The FCP/INDI experience,” Neuroimage, vol. 82 (November 15, 2013): 683-691.
  6. A. Abraham, M. Milham, A. Di Martino, R. C. Craddock, D. Samaras, B. Thirion, and G. Varoquaux. “Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example,” Neuroimage, vol. 147 (February 15, 2017):736-745.
  7. A. Di Martino, C. G. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, M. Assaf, S. Y. Bookheimer, M. Dapretto, B. Deen, S. Delmonte, I. Dinstein, B. Ertl-Wagner, D. A. Fair, L. Gallagher, D. P. Kennedy, C. L. Keown, C. Keysers, J. E. Lainhart, C. Lord, B. Luna, V. Menon, N. J. Minshew, C. S. Monk, S. Mueller, R. A. Müller, M. B. Nebel, J. T. Nigg, K. O’Hearn, K. A. Pelphrey, S. J. Peltier, J. D. Rudie, S. Sunaert, M. Thioux, J. M. Tyszka, L. Q. Uddin, J. S. Verhoeven, N. Wenderoth, J. L. Wiggins, S. H. Mostofsky, and M. P. Milham. “The autism brain imaging data exchange: Towards a large-scale evaluation of the intrinsic brain architecture in autism,” Molecular Psychiatry, vol. 19, no. 6 (June 2014): 659-67.