Meta’s NeuralBench: A Unified Benchmark for EEG-Based NeuroAI Models

Introduction

The intersection of deep learning and neuroscience—often called NeuroAI—has grown rapidly. Researchers now adapt self-supervised learning from language and vision to build brain foundation models that can be fine-tuned for tasks like seizure detection or decoding visual perception. Yet evaluating these models has remained chaotic. Different teams use custom preprocessing, different datasets, and report results on narrow tasks, making fair comparisons nearly impossible. Meta AI’s new NeuralBench framework aims to bring order to this field.

Meta’s NeuralBench: A Unified Benchmark for EEG-Based NeuroAI Models — Source: www.marktechpost.com

The Fragmented State of NeuroAI Benchmarks

Existing benchmarking efforts are scattered. For instance, the MOABB benchmark covers over 148 brain-computer interfacing datasets but only evaluates five downstream tasks. Other tools like EEG-Bench, EEG-FM-Bench, and AdaBrain-Bench each have limited scope—some focus on few datasets, others on a single task family. For modalities like MEG and fMRI, no systematic benchmark exists at all. This fragmentation means claims of “generalizable” or “foundational” models often rely on cherry-picked tasks without a common reference point.

What NeuralBench Offers

NeuralBench v1.0—NeuralBench-EEG—is the largest open benchmark of its kind. It includes:

36 downstream tasks covering clinical, cognitive, and BCI domains
94 curated datasets from public repositories
9,478 subjects and 13,603 hours of EEG recordings
14 deep learning architectures evaluated under a unified interface

All models are tested using the same preprocessing pipelines, train/validation splits, and evaluation metrics, enabling direct comparison across tasks.

How NeuralBench Works

The framework is built on three modular Python packages, each handling a distinct stage of the pipeline.

NeuralFetch: Dataset Acquisition

This package handles downloading and curating data from public repositories like OpenNeuro, DANDI, and NEMAR. It ensures data is consistently formatted and versioned, removing the headache of manual collection.

NeuralSet: Data Preparation

Once raw data is fetched, NeuralSet prepares it as PyTorch-ready dataloaders. It wraps existing neuroscience tools like MNE-Python and nilearn for preprocessing, and integrates with Hugging Face to extract stimulus embeddings for tasks involving images, speech, or text.

NeuralTrain: Model Training and Evaluation

NeuralTrain provides modular training code built on PyTorch-Lightning, Pydantic, and the exca execution and caching library. It standardizes hyperparameters, training loops, and evaluation metrics across all models.

Using NeuralBench

After installation via pip install neuralbench, the framework is controlled through a command-line interface (CLI). Running a task involves three simple commands: download the data, prepare the cache, and execute. Every task is configured via a lightweight YAML file that specifies:

The data source
Train/validation/test splits
Preprocessing steps
Target processing
Training hyperparameters
Evaluation metrics

This standardisation ensures reproducibility and enables researchers to easily extend the benchmark with new models or datasets.

Impact and Future Directions

By providing a unified evaluation framework, NeuralBench enables fair comparisons that were previously impossible. It will help identify which architectures truly generalize across EEG tasks, accelerate progress in clinical applications, and guide the development of next-generation brain-computer interfaces. Meta AI has open-sourced the framework under a permissive license, inviting the community to contribute new tasks, datasets, and models. In future releases, the team plans to extend coverage to MEG and fMRI, broadening the scope of NeuroAI benchmarking.

For more details, see the Meta AI research publication.

Tags: