Main API: dartsort() and the DARTsortUserConfig¶
This page shows the main functions and objects that you'd run into when spike sorting with dartsort. If you're just getting started, the usage section on the front page might be good to read first.
For details on parameters you should think about before running the sorter, see the important configuration details note.
Main function: dartsort()¶
dartsort.dartsort ¶
dartsort(recording: BaseRecording, output_dir: str | Path, cfg: DARTsortUserConfig | str | Path | DeveloperConfig | DARTsortInternalConfig = default_dartsort_cfg, motion: MotionInfo | None = None, si_motion: Motion | None = None, dredge_motion_est: MotionEstimate | None = None, overwrite=False)
This function runs a spike sorter called dartsort.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
recording
|
BaseRecording
|
A SpikeInterface |
required |
output_dir
|
str or Path
|
Folder where outputs are stored |
required |
cfg
|
DARTsortUserConfig or DARTsortInternalConfig or str or Path
|
Your settings. Either create a |
default_dartsort_cfg
|
si_motion
|
Motion
|
Allows users to pass their own external motion estimate. |
None
|
dredge_motion_est
|
MotionEstimate
|
Allows users to pass their own external motion estimate. |
None
|
overwrite
|
bool
|
Ignore and overwrite stored results, if any. Otherwise, dartsort will try to resume from the last step that ran, or if it had finished then it will do nothing. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
results |
DARTsortReturn
|
Dictionary of sorting results, with keys:
|
The return value from the dartsort() function is a DARTsortReturn object, which is a dictionary containing spike trains and motion information:
dartsort.DARTsortReturn ¶
The spike trains and motion info are stored in some internal objects, described in the last section on this page. Up next, we'll discuss how to adjust dartsort's parameters.
Configuration¶
For details on parameters you should know about before running the sorter, see the important configuration details note. Here, we'll show a reference for all of the configuration options. Some of the important ones not mentioned in that note include: the voltage threshold for initial spike detection and the "energy" thresholds for spike detection in the initial and template matching passes. You may also want to take a look at the paramters for
dartsort.DARTsortUserConfig ¶
User-facing configuration options
To change dartsort's behavior, set parameters here and pass the object
to the dartsort() function.
do_motion_estimation
class-attribute
instance-attribute
¶
do_motion_estimation: bool = True
Set this to false if your data is super stable or already motion-corrected.
preprocessing
class-attribute
instance-attribute
¶
If other than 'none', dartsort will apply some preprocessing to the
recording. Leave as 'none' if you are passing in an already-preprocesed
recording. If so, be aware that dartsort expects its input to be standardized on
each channel in addition to the usual highpass filtering, but that
whitening is handled internally. See util/preprocess_util.py if you're
curious about the details of the methods.
Options: 'ibllikecmr', 'ibllike', 'standardize', 'none'
preprocessing_dtype
class-attribute
instance-attribute
¶
preprocessing_dtype: Literal['float16', 'float32'] = 'float32'
If you have a lot of data and you're using a workflow where it is important
to save a preprocessed copy of the recording, float16 is a good option. Only
relevant if preprocessing != 'none'. If the recording isn't getting saved,
stick to float32.
subsampling_spikes
class-attribute
instance-attribute
¶
subsampling_spikes: int | None = 2048000
Detection steps before the final matching round will run until at least this many spikes are found or the whole recording is covered, to make sure that there is enough data for clustering. See also subsampling_fraction. Set to None to disable subsampling.
subsampling_presence
class-attribute
instance-attribute
¶
subsampling_presence: float = 0.1
Early detection steps which have already found subsampling_spikes
spikes are only allowed to end early if they additionally cover this
fraction of the recording, to make sure there's good coverage of
conditions for template estimation.
matching_iterations
class-attribute
instance-attribute
¶
matching_iterations: int = 1
By default, 1 template matching step is carried out using templates estimated from the initial detection round.
dredge_only
class-attribute
instance-attribute
¶
dredge_only: bool = False
Whether to stop after initial localization and motion tracking.
n_jobs_cpu
class-attribute
instance-attribute
¶
n_jobs_cpu: int = 0
Number of parallel workers to use when running on CPU. 0 means
everything runs on the main thread; negative means #cpu - (val+1)
so that -1 is all cores, -2 is all less 1, etc.
n_jobs_gpu
class-attribute
instance-attribute
¶
n_jobs_gpu: int = 0
Number of parallel workers to use when running on GPU.
n_jobs_small
class-attribute
instance-attribute
¶
n_jobs_small: int = -2
Max workers to use for small jobs.
n_jobs_small_gpu
class-attribute
instance-attribute
¶
n_jobs_small_gpu: int = 4
Max workers to use for small jobs running on GPU.
device
class-attribute
instance-attribute
¶
device: str | None = None
The name of the PyTorch device to use. For example, 'cpu' or 'cuda' or 'cuda:1'. If unset, uses n_jobs_gpu of your CUDA GPUs if you have multiple, or else just the one, or your CPU.
executor
class-attribute
instance-attribute
¶
executor: str = 'threading_unless_multigpu'
Choose: 'threading_unless_multigpu', 'ThreadPoolExecutor', 'ProcessPoolExecutor', or some others.
chunk_length_samples
class-attribute
instance-attribute
¶
chunk_length_samples: int = 30000
Batch size for data processing.
work_in_tmpdir
class-attribute
instance-attribute
¶
work_in_tmpdir: bool = False
If True, dartsort will store all temporary data in a scratch directory in tmpdir_parent or TMPDIR.
copy_recording_to_tmpdir
class-attribute
instance-attribute
¶
copy_recording_to_tmpdir: bool = False
Save a copy of the preprocessed recording to a tmpdir?
workdir_copier
class-attribute
instance-attribute
¶
workdir_copier: Literal['shutil', 'rsync'] = 'shutil'
'shutil' or 'rsync'
tmpdir_parent
class-attribute
instance-attribute
¶
tmpdir_parent: str | None = None
Control where tmpdirs are created.
save_intermediates
class-attribute
instance-attribute
¶
save_intermediates: bool = False
Store all spike features from intermediate steps (for debugging)
save_final_features
class-attribute
instance-attribute
¶
save_final_features: bool = True
Store the spike features from the final step (instead of just basic spike train outputs).
ms_before
class-attribute
instance-attribute
¶
ms_before: float = 1.4
Length of time (ms) before trough (or peak) in waveform snippets. Default value corresponds to 42 samples at 30kHz.
ms_after
class-attribute
instance-attribute
¶
ms_after: float = 2.6 + 0.1 / 3
Length of time (ms) after trough (or peak) in waveform snippets. Default value corresponds to 79 samples at 30kHz.
alignment_ms
class-attribute
instance-attribute
¶
alignment_ms: float = 1.5
Largest time shift allowed when re-aligning events.
peak_sign
class-attribute
instance-attribute
¶
peak_sign: Literal['neg', 'both', 'pos'] = 'both'
Allow only troughs or events of both signs when detecting threshold crossings during initialization. Or positive only, if that's your thing.
voltage_threshold
class-attribute
instance-attribute
¶
voltage_threshold: float = 3.0
Threshold in standardized (SNR) voltage units for initial detection; peaks or troughs larger than this value will be grabbed.
matching_threshold
class-attribute
instance-attribute
¶
matching_threshold: float = 8.0
Template matching threshold. If subtracting a template leads to at least this great of a decrease in the norm of the residual, that match will be used. This is in the same units as the corresponding threshold in Kilosort and other sorters, and it represents reduction in Euclidean norm of standardized data due to matching a new event.
initial_threshold
class-attribute
instance-attribute
¶
initial_threshold: float = 10.0
Initial detection's neural net matching threshold. Same as matching_threshold, except that a neural net is trying to guess the true waveforms here, rather than using cluster templates.
motion_voltage_threshold
class-attribute
instance-attribute
¶
motion_voltage_threshold: float = 4.0
If subsampling, a quick thresholding will be run at this voltage threshold to grab spikes for motion estimation purposes.
temporal_pca_rank
class-attribute
instance-attribute
¶
temporal_pca_rank: int = 8
Rank of temporal PCAs used in denoising and featurization.
feature_ms_before
class-attribute
instance-attribute
¶
feature_ms_before: float = 0.75
As ms_before, but used only when computing PCA features in clustering.
feature_ms_after
class-attribute
instance-attribute
¶
feature_ms_after: float = 1.25
As ms_after, but used only when computing PCA features in clustering.
subtraction_radius_um
class-attribute
instance-attribute
¶
subtraction_radius_um: float = 200.0
Radius of neighborhoods around spike events extracted when denoising and subtracting NN-denoised events.
deduplication_radius_um
class-attribute
instance-attribute
¶
deduplication_radius_um: float = 50.0
During initial detection, if two spike events occur at the same time within this radius, then the smaller of the two is ignored. But also all of the secondary channels of the big one, which is important.
featurization_radius_um
class-attribute
instance-attribute
¶
featurization_radius_um: float = 100.0
Radius around detection channel or template peak channel used to extract spike features for clustering.
fit_radius_um
class-attribute
instance-attribute
¶
fit_radius_um: float = 75.0
Extraction radius when fitting features like PCA; smaller than other radii to include less noise.
localization_radius_um
class-attribute
instance-attribute
¶
localization_radius_um: float = 100.0
Radius around main channel used when localizing spikes.
nn_denoiser_class_name
class-attribute
instance-attribute
¶
nn_denoiser_class_name: Literal['SingleChannelWaveformDenoiser', 'Decollider'] = 'Decollider'
Which neural net to use in initial detection? Set to Decollider (and set the pretrained path to None to train a brand-new unsupervised denoiser.
nn_denoiser_pretrained_path
class-attribute
instance-attribute
¶
nn_denoiser_pretrained_path: str | None = None
Path to a pytorch saved model (.pt file as dumped by torch.save()). If this is None, a new model will be fit.
amplitude_scaling_stddev
class-attribute
instance-attribute
¶
amplitude_scaling_stddev: float = 0.01
Standard deviation of amplitude scaling regularization prior in template matching.
amplitude_scaling_boundary
class-attribute
instance-attribute
¶
amplitude_scaling_boundary: float = 1.0 / 3.0
Boundaries on the amount of scaling allowed.
temporal_upsamples
class-attribute
instance-attribute
¶
temporal_upsamples: int = 4
Upsampling of templates during matching to allow for temporal aliasing of waveforms.
rigid
class-attribute
instance-attribute
¶
rigid: bool = False
Use rigid registration and ignore the window parameters.
probe_boundary_padding_um
class-attribute
instance-attribute
¶
probe_boundary_padding_um: float = 100.0
speed_limit_um_per_s
class-attribute
instance-attribute
¶
speed_limit_um_per_s: float = 500.0
Motion bins exceeding this speed will be replaced by interpolation.
Data objects¶
dartsort uses some internal classes to represent spike trains and other data. Spike trains and motion are represented in the following objects.
dartsort.DARTsortSorting ¶
Class which holds spike times, channels, and labels
This class holds our algorithm state. Initially the sorter doesn't have unit labels, so these are optional. Export me to a SpikeInterface NumpySorting with .to_numpy_sorting()
When you instantiate this with from_peeling_hdf5, if the
flag load_simple_features is True (default), then additional
features of spikes will be loaded into memory -- like localizations,
which you can access like sorting.point_source_localizations[...].
__init__ ¶
__init__(*, times_samples: ndarray, channels: ndarray, labels: ndarray | None, parent_h5_path: str | Path | None = None, sampling_frequency: float | int = 30000.0, persistent_features: dict[str, ndarray] | None = None, ephemeral_features: dict[str, ndarray] | None = None)
Construct a DARTsortSorting directly from times, channels, labels, et cetera.
It's more common to construct from an HDF5 file with .from_peeling_hdf5() or from a .npz with .load().
to_numpy_sorting ¶
to_numpy_sorting() -> NumpySorting
Clean up and produce a spikeinterface NumpySorting object.
to_pandas ¶
Export to pandas DataFrame with some per-spike features.
to_tsgroup ¶
Export to pynapple.TsGroup.
If there is a weight_key feature, this will produce a TsGroup with Tsd entries. Else, regular Ts.
ephemeral_replace ¶
Return a shallow copy of self with certain datasets/features replaced by new_features.
add_ephemeral_feature ¶
add_ephemeral_feature(feature_name: str, feature: ndarray, check_shape: bool | None = None, overwrite=False)
Ephemeral features are accessible as properties and persisted to/from .npz, but not saved in the .h5.
add_feature ¶
Try to save a feature to h5, else register as ephemeral.
from_peeling_hdf5
classmethod
¶
from_peeling_hdf5(h5_path: str | Path, *, times_samples_dataset='times_samples', channels_dataset='channels', labels_dataset='labels', load_feature_names: Sequence[str] | None = None, load_simple_features=True, load_all_features=False, allow_missing=False) -> Self
Load sorting from .hdf5 format saved by peelers
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
load_feature_names
|
optional list of str
|
Load exactly these features, plus geom/channel index. |
None
|
load_simple_features
|
bool
|
If load_feature_names unspecified, load all scalar or vector features (per spike), but no matrix-valued features like waveforms or multi-channel PCA features. |
True
|
load_all_features
|
bool
|
|
False
|
save ¶
Save to npz (usually dartsort_sorting.npz)
Support persisting myself in non-h5-supportable cases Cases: - When there is no h5! - When I have new labels. This is done by saving to .npz, with a pointer (like a relative symlink) to the .h5 file if it exists.
load
classmethod
¶
load(sorting_npz, additional_persistent_features=None, load_ephemeral_feature_names=None, load_persistent_feature_names=None) -> Self
Load from npz (usually dartsort_sorting.npz).
drop_doubles ¶
Remove spikes detected at the exact same time assigned to the same unit.
dartsort.MotionInfo ¶
Holds motion-related info and helper functions.
drifting
instance-attribute
¶
drifting: bool
Was do_motion_estimation set, or are we ignoring motion?
dredge_motion_est
instance-attribute
¶
Motion estimate from DREDge, if using.
si_motion
instance-attribute
¶
si_motion: Motion | None
Motion estimate from SpikeInterface, if using.
from_motion_est
classmethod
¶
from_motion_est(*, geom: ndarray | Tensor, dredge_motion_est: MotionEstimate | None = None, si_motion: Motion | None = None, rgeom: ndarray | Tensor | None = None) -> Self
Main constructor for MotionInfo objects
Precomputes and saves motion-related data structures for use through all of dartsort. Notably, the probe pitch, the min inter-channel distance, and the "registered geometry". Also, k-d trees which are used everywhere.
If neither dredge_motion_est nor si_motion is supplied, drifting is set to False and there is assumed to be no motion.
uncorrect_s ¶
Attempt to invert the motion estimate to un-register reg_depths_um.
pitch_shifts ¶
pitch_shifts(*, sorting: DARTsortSorting | None = None, times_s: ndarray | None = None, depths_um: ndarray | None = None, reg_depths_um: ndarray | None = None, shift_mode: Literal['round', 'floor'] = 'round', motion_depth_mode: Literal['channel', 'localization'] = 'channel', localizations_dset='point_source_localizations') -> tuple[ndarray, ndarray]
Figure out coarse pitch shifts based on spike positions
Determine the number of pitches the probe would need to shift in order to coarsely align a waveform to its registered position.