Neuroinformatics
Sridevi Polavaram and Giorgio Ascoli (2015), Scholarpedia, 10(11):1312. | doi:10.4249/scholarpedia.1312 | revision #152270 [link to/cite this article] |
Neuroinformatics is a highly interdisciplinary field employing methods and approaches from computer science, information systems, and integrative biology to identify, analyze, digest, simulate, and compute neuroscience data (Grant, 2003; Eisenstein, 2009; Walter et al., 2010; Akil et al., 2011; Burns et al., 2015). Although in the broadest sense neuroinformatics encompasses neuromorphic engineering and computational neuroscience, core areas of emphasis in the field pertain to software tools, ontologies, and databases related to integration, sharing, transformation, visualization, and quantification of neuroscience research. Despite pioneering developments starting from the 1990-2000 "Decade of the Human Brain" (Shepherd et al., 1998), the establishment of an informatics infrastructure for neuroscience still requires substantial progress. The tremendous heterogeneity of neuroscience data, reflecting multimodal and multi-scale investigations ultimately dictated by brain complexity and inter-species diversity, slowed down the emergence of an information management culture and data sharing policies (Wertheim & Sidman, 1991; Kötter, 2001; Toga, 2002; Smith et al., 2004; Campbell, 2009). Nevertheless, these same challenges, namely scientific complexity and data heterogeneity, also make informatics essential for the eventual success of neuroscience.
Contents |
Scope of this article
We surveyed the advances in the last five years to assess how the field matured to its current state-of the art and to evaluate specific opportunities and growing trends. Careful characterization of a representative sample of 337 neuroinformatics resources in terms of several dimensions enabled the distillation of summary analytics. Metadata annotation included among others animal species, resource type (software, atlas, database, etc.), scale (from molecules to whole brain), application (e.g. visualization, analysis, simulation), and measurement type (anatomical, functional, biochemical, and more). One of the most active areas of research and software development revealed by this overview concerns the visualization and analysis of human neuroanatomical data at the whole brain scale. The complete curated list of neuroinformatics resources available online that accompanies this article ( http://hdl.handle.net/1920/9150 ) constitutes a valuable tool in its own merit for free browsing, search, and exploration. The curated data sample is also made available to download in .csv format.
Resource identification
We identified a representative sample of 337 tools by querying four independent neuroinformatics sources (Figure 1): (1) The Neuroscience Information Framework (NIF) and its community-curated lexicon NeuroLex, (2) The Springer Neuroinformatics journal (3) The Frontiers in Neuroscience journals, and (4) Scholarpedia’s expert curated collection of encyclopedia entries. The approach was separately customized by trial-and-error approximation for the different sources. Traditional journal publications were interrogated by full-text searches for the combined occurrence of the keywords "software," "tools," "resources," and "http." This query produced 404 and 401 hits from Neuroinformatics and Frontiers, respectively, over the 5-years inclusive span 2010-2014. For Scholarpedia, only a subset of the keywords, "software" and "tools," returned 36 articles. After adding the entire list of 262 resources federated with NIF to this initial pool, we removed out-of-scope entries (news items, blog posts, podcasts, people, grants, and non-neuroscience resources) as well as broken or clearly obsolete pointers. Moreover, when both parent and child links were present, we only included the most relevant ones. Lastly, we filtered off overlaps, resulting in a final resource count of 337 (Figure 1).
Resource categorization
We annotated each of the 337 resources according to its characteristics across certain classification dimensions or 'features' (Figure 2): Scale, Species, Resource type, Measurement, Application, Area of study, User support, and Resource availability (the first six of these dimensions are included in Table 1 below). For every feature we chose a list of 'elements' suitable to describe the resource, clarifying their intended meaning with a series of self-explanatory keyword examples (Figure 2). For instance, the elements (and related representative examples) of Scale are: whole nervous system (non-invasive brain imaging with magnetic resonance), regional (cortical areas, specific fiber pathways), cellular (axonal trees, patch clamp), and macromolecular (gene sequences, protein expression). We informally defined the eight Features and 38 Elements with simple textual descriptors , which are hyperlinked to the corresponding terms in Table 1 and in the complete online table. This organizational layout, however, is emphatically not meant as a formal schema (let alone an ontology), but rather as a simple and practical attempt to group resources in a manner amenable to human comprehension. As such, we acknowledge that our choices are arbitrary, and several alternatives would be equally reasonable. Moreover, features are not orthogonal to or independent of each other. When selecting more than one element as adequate descriptors of a resource, we also always indicated one as primary (marked with * in Table 1 and in the complete online table) to facilitate subsequent analytics.
Resources examples
We selected a small subset (<10%) of the identified resources to illustrate the collated information with a few examples (Table 1). Each of the 31 resources listed below and all entries in the comprehensive collection at http://hdl.handle.net/1920/9150 are hyperlinked to the respective home page. In addition to the six features reported in Table 1, the complete online table includes for each of the 337 resources a summary description of every entry, literature references or unique resource identifier (PMID, PMCID, NIF ID), public availability (freeware, open source, etc.), support provided to users (e.g. manual, mailing lists, frequently asked questions), funding agencies, and institutional affiliations. The Pubmed or Pubmed Central identifiers refer to the recent article(s) citing the resource that was returned by the literature search described above under "Resource Identification," and not necessarily to the original publications of the resource. The user support feature also includes a 'citation' element indicating that the tool has been used by at least one external party in peer-reviewed publications beyond its intended in-house purposes tags the resource (a sign of resource utility and maturity).
A major neuroinformatics initiative pertaining to whole-brain non-invasive human imaging (Table 1) is the Human Connectome Project. High-resolution scanning of over 1200 healthy adults using cutting-edge methods such as diffusion and resting-state magnetic resonance aimed at comprehensive mapping of neural circuitry, its relationship to behavior, and the contributions of genetic and environmental factors to individual differences. This "big data" project, including the development and free distribution of analysis and visualization tools, was supported by funding from the National Institute of Health Neuroscience Blueprint to a selected consortium of major universities. A complementary "grass root" effort of similar scope, included in the complete online table, is the 1000 Functional Connectomes Project, entailing the aggregation and full unrestricted public release (via www.nitrc.org) of over 1200 resting state fMRI datasets collected from 33 sites.
Exploiting the progressive acceleration in computing power, spiking neural network simulations are continuously improving in terms of both scale and biological realism. A notable tool in this regard is CARLSim (Table 1), an efficient open source simulator written in C/C++ (Beyeler et al., 2014) that allows execution on both generic central processing units (CPUs) and standard off-the-shelf graphical processing units (GPUs). In particular, CARLSim implements the formalism of Izhikevich models, allowing generation of faithfully complex neuronal dynamics with a simple system of equations suitable for fast numerical integration (Izhikevich, 2006). The complete online table includes several other resources relevant for network simulations, including NEST (Gewaltig & Diesmann, 2007), Brian (Goodman, 2010), and the Open Source Brain for sharing and collaboratively developing computational models that encourages the use of open standards to ensure transparency, modularity, accessibility and cross simulator portability.
One of the most prominent neuroinformatics resources at the molecular and systems level is the Allen Brain Atlas (Lein et al., 2007), an annotated brain-wide, genome-wide gene expression map of the adult mouse obtained from in situ hybridization (Table 1). Each expression pattern is registered with a high-resolution 3D anatomical delineation of more than 600 nervous system areas from Nissl stain. The same portal also provides free access to similar data for the developing mouse and the human brains (including microarray experiments), together with regional connectivity maps from tract-tracing and a powerful informatics platform for query and analysis. A related resource in the complete online table is Gensat (Heintz, 2004), a publicly available gene expression atlas of the developing and adult central nervous system in the mouse, using both in situ hybridization and transgenic techniques.
At the level of dendritic and axonal arbors, the widely used repository of digital reconstructions of neuronal morphology NeuroMorpho.Org (Table 1) is regarded as a success story in neuroscience data sharing (Parekh & Ascoli, 2014). This archive contains more than 30,000 neurons from two dozen species and hundreds of brain regions and cell types, contributed by over one hundred laboratories worldwide using a broad range of histological, visualization, imaging, and tracing techniques. A complementary resource included in the complete online table is ModelDB, a database for storing and efficiently retrieving computational neuroscience models, mostly in the popular NEURON simulation environment (Carnevale, 2007) using complex morphologies of single neurons such as those available at NeuroMorpho.Org.
Emerging trends
In order to gauge the prominence of distinct elements within each feature, we quantified the distribution of available resources across the main annotation dimensions. For every feature of Table 1 (scale, species, resource type, measurement, application, and area of study), we counted all resources in the complete online table that listed a given element as well as the subset of those that specifically indicated the element as primary. When All elements of a feature were marked, we added the resource in the "general" count for each of the elements but for none of the elements in the "primary" count. Moreover, we computed a weighted count by dividing each element so as to sum to one per resource. For example, if a resource listed primates, rodents, and insects as species, each of those elements would be counted one-third. This normalization allows a fair assessment of element proportion in every feature across resources (Figure 3).
With respect to scale, whole brain is the most represented element both in the general and primary counts (Figure 3a, bar charts) as well as the weighted proportion (Figure 3a, pie charts), followed by regions, cellular, and molecular, in straight order from macroscopic to microscopic.
Primates (largely humans but also including monkeys) are clearly the dominant species, reflecting the prominence of whole brain scale (Figure 3b). More than twice as many resources have primates as the primary species (154) than all other elements in this feature together (66), including rodents (mostly rats and mice), other mammals, insects (primarily flies), and others species (typically worm, fish, and birds). For humans the numbers are driven by non-invasive imaging in the areas of cognition and behavioral neuroscience, neuroanatomy, and clinical and developmental neuroscience. Resources dedicated to the phenotype-genotype relationship are particularly abundant for mice (e.g., Monarch Initiative and Mouse phenotype database) and fruit fly (flybase.org, flycircuit.tw, Bloomington Drosophila stock center). The representation of other species is fairly sparse and encompasses focus on comparative neuroanatomical (ABCD, Braininfo), antibody databases (NeuroMab), and live imaging (ZFIN).
By far the most numerous resource types fall into the category of software tools (Figure 3c), as exemplified by the popular open source program ImageJ. Next in representation are ontology and data management services (for example BioPortal) and databases (Alzheimer’s Disease Neuroimaging Initiative or ADNI). Ontologies and data management resources focus on data formats, standardization, terminologies, and machine readability, and thus often overlap with databases. In fact, several resources were annotated with all of these three most common type elements, as exemplified by Chemical Entities of Biological Interest (ChEBI), flybase.org, and Mouse Genome informatics. The remaining resource types are much more sparsely represented. Topical portals are typically the least structured types, consisting of collaborative initiatives, open challenges, and other thematic resources. Instrumentation emphasizes hardware machinery for data acquisition, analysis, or lab operations. Lastly, atlases consist of standardized reference templates (e.g. Talairach coordinate space) with registered 2D/3D spatial information, such as gene expression, tract-tracing, or magnetic resonance imaging.
In terms of measurement, the most abundant resources pertain to the anatomical dimension (Figure 3d), followed by functional, physiological, and biochemical. The first two elements are again largely linked to human non-invasive whole brain imaging. Anatomical resources mostly refer to structural and diffusion magnetic resonance, while functional, as the name suggests, to fMRI. Resources categorized as anatomical, however, also include a variety of other tools focusing on shape, size, location, and connectivity information (e.g., Automated reconstruction of complex curvilinear structures, Knife-Edge Scanning Microscopy Brain Atlas, SumDB, and NeuGen) across all scales from microscopic to macroscopic. Similarly, 'functional' resources (e.g., DICOM, resting-state fMRI, Protégé ontology editor) also include tools with an emphasis on mapping neural dynamics to brain states, including computational approaches (neural networks, information theoretical measures, non-linear time series analysis) as well as experimental (behavior measures of attention, speech recognition, response monitoring, and more). The last two measurement elements together account for less than a quarter of resources: physiological, quantifying spatiotemporal data with electrical or optical recordings (or related model simulations), and biochemical, most commonly high-throughput analyses, such as gene expression profiling using microarray technologies (e.g., BrainSpan, ClinVar).
Relative to the application domain, visualization and data analysis are almost omnipresent in all scientific studies, claiming the lion’s share of resources (Figure 3e). Visualization mainly refers to rendering of experimental data from e.g. microscopy, neuroimaging, physiology, and virtual reality. Data analysis plays an increasingly important role in large-scale automation analyses. These two applications are closely associated together in resources for result quantification with statistical plots or graphing of parameter relationships (e.g., IGOR Pro, PRoNTo, MIPAV). Tools for annotation, such as electronic lab notebooks for metadata entry, are crucial to frame the data in proper context for reuse (e.g., CogPO Wiki, clinicaltrials.gov, Multimodal MRI reproducibility resource). Computational modeling resources include tools for simulating morphological growth, membrane biophysics, electrophysiological activity, and network dynamics (e.g., Scilab, Cellular Dynamic Simulator, neuRosim). Lastly, neurotechnology include brain-machine interfaces, neuromorphic engineering, and other similar approaches and devices (e.g., Easycap, The Kilobot Project, BCI2000). The practical utility of available resources is ultimately determined by the user's creativity. In the domain of neuronal morphology, for example, diverse applications of L-Measure, Trees toolbox, and Neuromantic to visualization, analysis and modeling have yielded many advances in neuronal characterization, as evidenced by their many numerous citations.
Last but not least, resources were labeled across six broadly distributed areas of study (Figure 3f). The sheer number of neuroinformatics entries highlights the multidisciplinary nature of neuroscience data handling reflected in a diverse umbrella of resources spanning from automated neuronal tracing to functional classification of genome sequencing and open communication standards in knowledge creation and sharing. Cognitive and behavioral neuroscience also includes a wide range of resources related to neuropsychology, learning, and movement tasks among many others (e.g., child language talkbank, brain machine interface platform, Independent component analysis on multimodal imaging data). Computational neuroscience provides modeling approaches, environments, frameworks, tools, and formats to facilitate neuronal and network simulations (biological models markup language, Self-Organizing maps, Genesis platform, etc.). Neuroanatomy is a central sub-field in the structure-function mapping of the nervous system, and is represented by numerous resources including electronic archives (Cell centered database), Atlases (Virtual Honeybee Brain), tools for morphological conversion and display (Neuronland), and knowledge bases (temporal-lobe). Resources in the clinical and developmental area (many funded by NIH) include translational neuroscience research, surgical aid and disease related tools, aging, and medical data analysis. Lastly, the cellular, molecular, and systems neuroscience element is a catch-all category encompassing resources as diverse as those for human brain immunohistochemistry, plasmid sharing, and cross-scale interactions in information processing.
Concluding remarks
This article provides a broad snapshot of recent neuroinformatics progress based on a curated sample of resources identified in the relevant scientific literature of the last five years. The comprehensiveness of this collection is necessarily constrained by our search and selection approach, and thus is not meant to reflect the actual number of all available digital resources in neuroscience. Instead, the purpose of this overview is to describe a representative variety of neuroinformatics resources across multiple features, in order to appreciate the broad diversity of technologies and methodologies used in the field.
One of the main observations that results from our analysis regards the prominence of human non-invasive whole brain imaging. Major neuroinformatics developments, however, are also transforming other domains of neuroscience, as exemplified by the comprehensive genomic profiles of the Allen Brain Atlas (Ng et al., 2012; Shen et al., 2012). With appropriate resources, even individual laboratories can successfully embark in "big data" initiatives, such as the tracing of a substantial proportion of the fly nervous system at single cell resolution (Chiang et al., 2011). At the same time, the impact and potential of many smaller but valuable resources should not be discounted, as they are indispensable to sustain the variety of research approaches that the complexity of the brain demands. Such grass-root distribution underscores the importance of community standards for data formatting, annotating, reporting, and sharing. We predict that the growing availability of data, metadata, and informatics tools will progressively increase the scientific impact of computational modeling and biologically realistic large-scale simulations.
Acknowledgements
We thank Mr. Sean Mackesey from University of California, Berkeley, CA for sharing the database ingestion code. We also thank Ms. Wendy Mann and Ms. Joanna Lee from George Mason University Libraries, Data Services Group and the Mason Archival Repository service for hosting the data on the web. Last but not least, we thank Dr. Diek Wheeler and Mr. David Hamilton for reviewing an earlier version of this article.
References
- Akil, H; Martone, M E and Van Essen, D C (2011). Challenges and opportunities in mining neuroscience data. Science 331(6018): 708-712. doi:10.1126/science.1199305.
- Beyeler, M; Richert, M; Dutt, N D and Krichmar, J L (2014). Efficient spiking neural network model of pattern motion selectivity in visual cortex. Neuroinformatics 12(3): 435-454. doi:10.1007/s12021-014-9220-y.
- Burns, R; Vogelstein, J T and Szalay, A S (2014). From Cosmos to Connectomes: The Evolution of Data-Intensive Science. Neuron 83(6): 1249-1252. doi:10.1016/j.neuron.2014.08.045.
- Campbell, P (2009). Data's shameful neglect. Nature 461(7261): 145. doi:10.1038/461145a.
- Chiang, A-S et al. (2011). Three-Dimensional Reconstruction of Brain-wide Wiring Networks in Drosophila at Single-Cell Resolution. Current Biology 21(1): 1-11. doi:10.1016/j.cub.2010.11.056.
- Eisenstein, M (2009). Neural circuits: putting neurons on the map. Nature 461(7267): 1149-1152. doi:10.1038/4611149a.
- Goodman, D F (2010). Code generation: a strategy for neural network simulators. Neuroinformatics 8(3): 183-196. doi:10.1007/s12021-010-9082-x.
- Grant, S G (2003). Systems biology in neuroscience: bridging genes to cognition. Current Opinion in Neurobiology 13(5): 577-582. doi:10.1016/j.conb.2003.09.016.
- Heintz, N (2004). Gene expression nervous system atlas (GENSAT). Acta Biomaterialia 7(5): 483. doi:10.1038/nn0504-483.
- Kötter, R (2001). Neuroscience Databases: Tools for Exploring Brain Structure-Function Relationships. Philosophical Transactions of the Royal Society B: Biological Sciences 356(1412): 1111–1120. doi:10.1098/rstb.2001.0902.
- Lein, E S et al. (2007). Genome-wide atlas of gene expression in the adult mouse brain. Nature 445(7124): 168-176. doi:10.1038/nature05453.
- Ng, L L et al. (2012). Large-scale neuroinformatics for in situ hybridization data in the mouse brain. International Review of Neurobiology 104: 159-182. doi:10.1016/b978-0-12-398323-7.00007-0.
- Parekh(2014). Quantitative Investigations of Axonal and Dendritic Arbors Development, Structure, Function, and Pathology. The Neuroscientist : 1073858414540216. doi:10.1177/1073858414540216.
- Shen, E H; Overly, C C and Jones, A R (2012). The Allen Human Brain Atlas: comprehensive gene expression mapping of the human brain. Trends in Neurosciences 35(12): 711-714. doi:10.1016/j.tins.2012.09.005.
- Shepherd, G M et al. (1998). The Human Brain Project: neuroinformatics tools for integrating, searching and modeling multidisciplinary neuroscience data. Trends in Neurosciences 21(11): 460-468. doi:10.1016/s0166-2236(98)01300-9.
- Smith, K et al. (2004). Enabling the Sharing of Neuroimaging Data through Well-Defined Intermediate Levels of Visibility. NeuroImage 22(4): 1646-1656. doi:10.1016/j.neuroimage.2004.03.048.
- Walter, T et al. (2010). Visualization of image data from cells to organisms. Acta Biomaterialia 7(6): 479. doi:10.1038/nmeth0610-479a.
- Wertheim(1991). Databases for Neuroscience. Nature 354(6348): 88-89. doi:10.1038/354088a0.
Internal references
- Carnevale, T (2007). Neuron simulation environment. Scholarpedia 2(6): 1378. doi:10.4249/scholarpedia.1378.
- Gewaltig(2007). NEST (neural simulation tool). Scholarpedia 2(4): 1430. doi:10.4249/scholarpedia.1430.
- Izhikevich, E (2006). Bursting. Scholarpedia 1(3): 1300. doi:10.4249/scholarpedia.1300.