# 2022 - Spike sorting benchmark datasets Spike sorting is the process by which one extracts the spikes information (times, waveforms) from the raw electrophysiology signals. In the case of our Brainwide map dataset, such electrophysiology signals are heterogeneous, changing from one brain region to another. By looking at recordings in widespread anatomical regions we discovered that the current spike sorting method has a limited range of applicability. Specifically, we uncovered two major issues. Firstly, there is no one-size-fit-all: an algorithm that is well tailored to extract signals for a given region or electrode type may perform poorly for another. Secondly, and most importantly, we had very limited means to conclude on the performance of such algorithms beyond eyes on the data and qualitative judgements. In order to facilitate the development of spike sorting algorithms, we aim to provide benchmarks datasets (for a full explanation, see our proposal [Spike Net](https://docs.google.com/document/d/1OA69Ptg58AQnGdmGi6UvZFrngwZDMixil1V7hJX6bNI/edit)). Here, we explain how to download these datasets, taken to represent various anatomical regions from our Brainwide map. ## Overview of the Data We have selected 13 recordings for our benchmarks. The insertion IDs are : ```python pids = [ '1a276285-8b0e-4cc9-9f0a-a3a002978724', '1e104bf4-7a24-4624-a5b2-c2c8289c0de7', '5d570bf6-a4c6-4bf1-a14b-2c878c84ef0e', '5f7766ce-8e2e-410c-9195-6bf089fea4fd', '6638cfb3-3831-4fc2-9327-194b76cf22e1', '749cb2b7-e57e-4453-a794-f6230e4d0226', 'd7ec0892-0a6c-4f4f-9d8f-72083692af5c', 'da8dfec1-d265-44e8-84ce-6ae9c109b8bd', 'dab512bd-a02d-4c1f-8dbc-9155a163efc0', 'dc7e9403-19f7-409f-9240-05ee57cb7aea', 'e8f9fba4-d151-4b00-bee7-447f0f3e752c', 'eebcaf65-7fa4-4118-869d-a084e84530e2', 'fe380793-8035-414e-b000-09bfe5ece92a', ] ``` (Coming soon) In 2023, we will provide 20-30min chunks of raw electrophysiology data processed in a standardised fashion, with manual annotations of spikes and quality metrics for spike detection recall. These will serve as benchmarks for spike sorting algorithm development. In the meantime, you can familiarise yourself with the data heterogeneity by looking at the whole recordings. ## View the data You can view the whole electrophysiology data: * [Through a specific AWS website](http://reveal.internationalbrainlab.org.s3-website-us-east-1.amazonaws.com/benchmarks.html) * [Through our main visualisation website](https://viz.internationalbrainlab.org/app) , inputting a PID in the search bar at the top * By downloading the electrophysiology data (see below), and using the GUI [viewephys](https://github.com/int-brain-lab/viewephys) to navigate through it ## Data structure and download The organisation of the data follows the standard IBL data structure. Please see * [These instructions](https://int-brain-lab.github.io/iblenv/notebooks_external/data_structure.html) to download an example dataset for one session, and get familiarised with the data structure * Note that probe insertion ids `pid` are provided here, not session ids `eid`. * Note that you will be most interested in the folders [raw_ephys_data](https://int-brain-lab.github.io/iblenv/notebooks_external/data_structure.html#Datasets-in-raw_ephys_data) / [probeXX](https://int-brain-lab.github.io/iblenv/notebooks_external/data_structure.html#Datasets-in-raw_ephys_data/probeXX) for the raw ephys data, and [alf/probeXX/pykilosort](https://int-brain-lab.github.io/iblenv/notebooks_external/data_structure.html#Datasets-in-alf/probeXX/pykilosort) if using the pyKilosort spike sorting output. * [These instructions](https://int-brain-lab.github.io/iblenv/notebooks_external/data_download.html) to learn how to use the ONE-api to search and download the released datasets * [These instructions](https://int-brain-lab.github.io/iblenv/loading_examples.html) to get familiarised with specific data loading functions * You will want to load in particular the [raw ephys data](https://int-brain-lab.github.io/iblenv/notebooks_external/loading_raw_ephys_data.html) and the [spike sorting data](https://int-brain-lab.github.io/iblenv/notebooks_external/loading_spikesorting_data.html) for a given `pid`