dartsort¶
dartsort is a modular spike sorter built around a statistical clustering model and a new approach to probe motion. It is also a toolkit of modules for building spike sorters and other analyses of electrophysiology data.
work in progress
¶
We do not currently recommend DARTsort for production spike sorting purposes. Please feel free to open an issue or a discussion if you run into problems.
Installation¶
Installing into an existing environment¶
If you already have a Python environment with PyTorch working and you just want to install dartsort there, use
If you want to run the test suite or use dartsort.vis, you can install the optional dependencies with pip install dartsort[test,vis].
Setting up a Python environment¶
If you need to set up Python or PyTorch, I find that a conda-forge-based distribution is the most reliable at installing the GPU dependencies which PyTorch needs (note: conda-forge is different from the non-free Anaconda).
You can use conda-forge to install Python, dartsort, and its dependencies as follows:
- Follow the
conda-forgeinstallation instructions for your platform at https://conda-forge.org/download/ - Create an environment with
This will create an environment called
dartsort, but you can change the name by adding-n othername. - Activate the environment:
- Install
dartsortand the rest of its dependencies by running the pip command above
Usage¶
For more detailed documentation of the main function parameters and configuration options, see the main API documentation.
As a Python function¶
dartsort can be run in Python with:
import dartsort
dartsort.dartsort(recording, output_dir)
# or, to set configuration options, something like...
dartsort_result = dartsort.dartsort(
recording,
output_dir,
cfg=dartsort.DARTsortUserConfig(
preprocessing="ibllike",
work_in_tmpdir=True,
copy_recording_to_tmpdir=True,
),
)
Please read the important configuration details section below for information on the parameters of the DARTsortUserConfig.
Some of them, like preprocessing, are not set by default and need your input!
(This will probably change.)
Here, recording is a SpikeInterface recording object (see their tutorial on reading various recording formats).
output_dir is the folder where dartsort will save its output.
There are more details on these and the rest of the arguments in the main API documentation.
Once you've run dartsort, you might want to check out the outputs and exporting section below.
Important configuration details¶
Before running dartsort, please be aware of the following important configuration options.
preprocessing: dartsort won't touch your data by default (preprocessing="none"), leaving you free to implement your own preprocessing in SpikeInterface or otherwise. However, dartsort expects standardized input, and therefore it will explode if you don't set this flag and your data is in its original state (for instance, the rawint16data off the probe).- For a cheap but sensible default, try
preprocessing="ibllikecmr", which applies a pipeline similar to that of the IBL but with global median common referencing instead of their spatial highpass filter.preprocessing="ibllike"will use their spatial highpass filter.
- For a cheap but sensible default, try
do_motion_estimation=Trueby default, but you may like to disable it if you know that there is (say) less than 5 microns of total drift in your recording, or if you have handled this in your own preprocessing.work_in_tmpdirandcopy_recording_to_tmpdircan be helpful in some cases where slow network drives are involved.
Outputs and exporting¶
The dartsort_result = dartsort(...) function returns a dictionary dartsort_result containing a DARTsortSorting object: sorting = dartsort_result["sorting"].
This object has all the spike train data attached (as arrays under property names .times_samples and .times_seconds for spike times in samples and seconds, .labels for unit labels, and many others; print(sorting) to see some more).
If you already ran dartsort and want to load the output spike trains, use dartsort.load(output_dir) to get the DARTsortSorting object.
This object can also export itself to other formats:
- For a SpikeInterface
NumpySortingobject, usesorting.to_numpy_sorting() - For a Pynapple
TsGroup, usesorting.to_tsgroup() - To export to Phy, we currently suggest bridging through SpikeInterface. Start with
sorting.to_numpy_sorting()and follow the instructions in SpikeInterface's documentation for first creating aSortingAnalyzerand then exporting that to Phy. - For a simple dictionary of numpy arrays, use
dict = sorting.spike_feature_dict. - For pandas, use
sorting.to_pandas().
dartsort also saves motion information, returned as dartsort_result["motion"] or loaded after the fact as dartsort.try_load_motion_info(output_dir).
The data is saved to output_dir in the following files:
- dartsort_sorting.npz: This NPZ file contains the final spike train under the keys times_samples, channels, and labels.
- matching1.h5: This HDF5 file contains spike features and other data from the last matching step. Amplitudes, localizations, and other features live in here; use h5ls on the command line to see what's in there. Be aware that the labels dataset in this HDF5 is not the same as what's saved in the dartsort_sorting.npz.
- motion_info.pkl is a pickled MotionInfo object.
- There may be a models/ folder containing PyTorch weights files with modeling quantities (for instance, featurization SVD bases, localization neural nets, Gaussian mixture model parameters). If you want to load these up, feel free to reach out for help.
Visualization¶
To make some basic visualizations of the sorting result with matplotlib, try:
import dartsort.vis as dartvis
# gather outputs from dartsort
dartsort_result = dartsort(recording, output_dir, ...)
sorting = dartsort_result["sorting"]
motion = dartsort_result["motion"]
# or, if you already ran it
sorting = dartsort.load(output_dir)
motion = dartsort.try_load_motion_info(output_dir)
dartvis.visualize_sorting(
recording,
sorting,
vis_save_dir,
motion=motion,
make_unit_summaries=False,
)
Set make_unit_summaries=True to create a summary plot for each unit.
Command-line interface¶
Try running
on your command line to see usage instructions; parameters can be configured on the command line or read from a TOML file.
Troubleshooting¶
Please let us know if you run into any issues. If you feel that the issue is a software bug, feel free to open an issue or a discussion on GitHub. If it's more of a data-related or methodology thing, feel free to use the email on my GitHub profile.
References¶
- SpikeInterface
- DREDge on PyPI and the DREDge paper (PDF link).