Skip to content

Main API: dartsort() and the DARTsortUserConfig

This page shows the main functions and objects that you'd run into when spike sorting with dartsort. If you're just getting started, the usage section on the front page might be good to read first.

For details on parameters you should think about before running the sorter, see the important configuration details note.

Main function: dartsort()

dartsort.dartsort

dartsort(recording: BaseRecording, output_dir: str | Path, cfg: DARTsortUserConfig | str | Path | DeveloperConfig | DARTsortInternalConfig = default_dartsort_cfg, motion: MotionInfo | None = None, si_motion: Motion | None = None, dredge_motion_est: MotionEstimate | None = None, overwrite=False)

This function runs a spike sorter called dartsort.

Parameters:

Name Type Description Default
recording BaseRecording

A SpikeInterface BaseRecording object

required
output_dir str or Path

Folder where outputs are stored

required
cfg DARTsortUserConfig or DARTsortInternalConfig or str or Path

Your settings. Either create a DARTsortUserConfig directly in code, or you can pass a string or Path pointing to a .toml file here.

default_dartsort_cfg
si_motion Motion

Allows users to pass their own external motion estimate.

None
dredge_motion_est MotionEstimate

Allows users to pass their own external motion estimate.

None
overwrite bool

Ignore and overwrite stored results, if any. Otherwise, dartsort will try to resume from the last step that ran, or if it had finished then it will do nothing.

False

Returns:

Name Type Description
results DARTsortReturn

Dictionary of sorting results, with keys:

  • "sorting": DARTsortSorting
  • "motion": MotionInfo

The return value from the dartsort() function is a DARTsortReturn object, which is a dictionary containing spike trains and motion information:

dartsort.DARTsortReturn

sorting instance-attribute

sorting: DARTsortSorting

Output spike trains.

motion instance-attribute

motion: MotionInfo

Esimated motion

The spike trains and motion info are stored in some internal objects, described in the last section on this page. Up next, we'll discuss how to adjust dartsort's parameters.

Configuration

For details on parameters you should know about before running the sorter, see the important configuration details note. Here, we'll show a reference for all of the configuration options. Some of the important ones not mentioned in that note include: the voltage threshold for initial spike detection and the "energy" thresholds for spike detection in the initial and template matching passes. You may also want to take a look at the paramters for

dartsort.DARTsortUserConfig

User-facing configuration options

To change dartsort's behavior, set parameters here and pass the object to the dartsort() function.

do_motion_estimation class-attribute instance-attribute

do_motion_estimation: bool = True

Set this to false if your data is super stable or already motion-corrected.

preprocessing class-attribute instance-attribute

preprocessing: PreprocessingStrategy = 'none'

If other than 'none', dartsort will apply some preprocessing to the recording. Leave as 'none' if you are passing in an already-preprocesed recording. If so, be aware that dartsort expects its input to be standardized on each channel in addition to the usual highpass filtering, but that whitening is handled internally. See util/preprocess_util.py if you're curious about the details of the methods.

Options: 'ibllikecmr', 'ibllike', 'standardize', 'none'

preprocessing_dtype class-attribute instance-attribute

preprocessing_dtype: Literal['float16', 'float32'] = 'float32'

If you have a lot of data and you're using a workflow where it is important to save a preprocessed copy of the recording, float16 is a good option. Only relevant if preprocessing != 'none'. If the recording isn't getting saved, stick to float32.

subsampling_spikes class-attribute instance-attribute

subsampling_spikes: int | None = 2048000

Detection steps before the final matching round will run until at least this many spikes are found or the whole recording is covered, to make sure that there is enough data for clustering. See also subsampling_fraction. Set to None to disable subsampling.

subsampling_presence class-attribute instance-attribute

subsampling_presence: float = 0.1

Early detection steps which have already found subsampling_spikes spikes are only allowed to end early if they additionally cover this fraction of the recording, to make sure there's good coverage of conditions for template estimation.

matching_iterations class-attribute instance-attribute

matching_iterations: int = 1

By default, 1 template matching step is carried out using templates estimated from the initial detection round.

dredge_only class-attribute instance-attribute

dredge_only: bool = False

Whether to stop after initial localization and motion tracking.

n_jobs_cpu class-attribute instance-attribute

n_jobs_cpu: int = 0

Number of parallel workers to use when running on CPU. 0 means everything runs on the main thread; negative means #cpu - (val+1) so that -1 is all cores, -2 is all less 1, etc.

n_jobs_gpu class-attribute instance-attribute

n_jobs_gpu: int = 0

Number of parallel workers to use when running on GPU.

n_jobs_small class-attribute instance-attribute

n_jobs_small: int = -2

Max workers to use for small jobs.

n_jobs_small_gpu class-attribute instance-attribute

n_jobs_small_gpu: int = 4

Max workers to use for small jobs running on GPU.

device class-attribute instance-attribute

device: str | None = None

The name of the PyTorch device to use. For example, 'cpu' or 'cuda' or 'cuda:1'. If unset, uses n_jobs_gpu of your CUDA GPUs if you have multiple, or else just the one, or your CPU.

executor class-attribute instance-attribute

executor: str = 'threading_unless_multigpu'

Choose: 'threading_unless_multigpu', 'ThreadPoolExecutor', 'ProcessPoolExecutor', or some others.

chunk_length_samples class-attribute instance-attribute

chunk_length_samples: int = 30000

Batch size for data processing.

work_in_tmpdir class-attribute instance-attribute

work_in_tmpdir: bool = False

If True, dartsort will store all temporary data in a scratch directory in tmpdir_parent or TMPDIR.

copy_recording_to_tmpdir class-attribute instance-attribute

copy_recording_to_tmpdir: bool = False

Save a copy of the preprocessed recording to a tmpdir?

workdir_copier class-attribute instance-attribute

workdir_copier: Literal['shutil', 'rsync'] = 'shutil'

'shutil' or 'rsync'

workdir_follow_symlinks: bool = False

tmpdir_parent class-attribute instance-attribute

tmpdir_parent: str | None = None

Control where tmpdirs are created.

save_intermediates class-attribute instance-attribute

save_intermediates: bool = False

Store all spike features from intermediate steps (for debugging)

save_final_features class-attribute instance-attribute

save_final_features: bool = True

Store the spike features from the final step (instead of just basic spike train outputs).

ms_before class-attribute instance-attribute

ms_before: float = 1.4

Length of time (ms) before trough (or peak) in waveform snippets. Default value corresponds to 42 samples at 30kHz.

ms_after class-attribute instance-attribute

ms_after: float = 2.6 + 0.1 / 3

Length of time (ms) after trough (or peak) in waveform snippets. Default value corresponds to 79 samples at 30kHz.

alignment_ms class-attribute instance-attribute

alignment_ms: float = 1.5

Largest time shift allowed when re-aligning events.

peak_sign class-attribute instance-attribute

peak_sign: Literal['neg', 'both', 'pos'] = 'both'

Allow only troughs or events of both signs when detecting threshold crossings during initialization. Or positive only, if that's your thing.

voltage_threshold class-attribute instance-attribute

voltage_threshold: float = 3.0

Threshold in standardized (SNR) voltage units for initial detection; peaks or troughs larger than this value will be grabbed.

matching_threshold class-attribute instance-attribute

matching_threshold: float = 8.0

Template matching threshold. If subtracting a template leads to at least this great of a decrease in the norm of the residual, that match will be used. This is in the same units as the corresponding threshold in Kilosort and other sorters, and it represents reduction in Euclidean norm of standardized data due to matching a new event.

initial_threshold class-attribute instance-attribute

initial_threshold: float = 10.0

Initial detection's neural net matching threshold. Same as matching_threshold, except that a neural net is trying to guess the true waveforms here, rather than using cluster templates.

motion_voltage_threshold class-attribute instance-attribute

motion_voltage_threshold: float = 4.0

If subsampling, a quick thresholding will be run at this voltage threshold to grab spikes for motion estimation purposes.

temporal_pca_rank class-attribute instance-attribute

temporal_pca_rank: int = 8

Rank of temporal PCAs used in denoising and featurization.

feature_ms_before class-attribute instance-attribute

feature_ms_before: float = 0.75

As ms_before, but used only when computing PCA features in clustering.

feature_ms_after class-attribute instance-attribute

feature_ms_after: float = 1.25

As ms_after, but used only when computing PCA features in clustering.

subtraction_radius_um class-attribute instance-attribute

subtraction_radius_um: float = 200.0

Radius of neighborhoods around spike events extracted when denoising and subtracting NN-denoised events.

deduplication_radius_um class-attribute instance-attribute

deduplication_radius_um: float = 50.0

During initial detection, if two spike events occur at the same time within this radius, then the smaller of the two is ignored. But also all of the secondary channels of the big one, which is important.

featurization_radius_um class-attribute instance-attribute

featurization_radius_um: float = 100.0

Radius around detection channel or template peak channel used to extract spike features for clustering.

fit_radius_um class-attribute instance-attribute

fit_radius_um: float = 75.0

Extraction radius when fitting features like PCA; smaller than other radii to include less noise.

localization_radius_um class-attribute instance-attribute

localization_radius_um: float = 100.0

Radius around main channel used when localizing spikes.

nn_denoiser_class_name class-attribute instance-attribute

nn_denoiser_class_name: Literal['SingleChannelWaveformDenoiser', 'Decollider'] = 'Decollider'

Which neural net to use in initial detection? Set to Decollider (and set the pretrained path to None to train a brand-new unsupervised denoiser.

nn_denoiser_pretrained_path class-attribute instance-attribute

nn_denoiser_pretrained_path: str | None = None

Path to a pytorch saved model (.pt file as dumped by torch.save()). If this is None, a new model will be fit.

amplitude_scaling_stddev class-attribute instance-attribute

amplitude_scaling_stddev: float = 0.01

Standard deviation of amplitude scaling regularization prior in template matching.

amplitude_scaling_boundary class-attribute instance-attribute

amplitude_scaling_boundary: float = 1.0 / 3.0

Boundaries on the amount of scaling allowed.

temporal_upsamples class-attribute instance-attribute

temporal_upsamples: int = 4

Upsampling of templates during matching to allow for temporal aliasing of waveforms.

rigid class-attribute instance-attribute

rigid: bool = False

Use rigid registration and ignore the window parameters.

probe_boundary_padding_um class-attribute instance-attribute

probe_boundary_padding_um: float = 100.0

spatial_bin_length_um class-attribute instance-attribute

spatial_bin_length_um: float = 1.0

temporal_bin_length_s class-attribute instance-attribute

temporal_bin_length_s: float = 1.0

smoothing_um class-attribute instance-attribute

smoothing_um: float | None = 3.0

smoothing_s class-attribute instance-attribute

smoothing_s: float | None = None

window_step_um class-attribute instance-attribute

window_step_um: float = 400.0

window_scale_um class-attribute instance-attribute

window_scale_um: float = 600.0

window_margin_um class-attribute instance-attribute

window_margin_um: float | None = None

max_dt_s class-attribute instance-attribute

max_dt_s: float = 500.0

max_disp_um class-attribute instance-attribute

max_disp_um: float | None = None

correlation_threshold class-attribute instance-attribute

correlation_threshold: float = 0.1

min_amplitude class-attribute instance-attribute

min_amplitude: float | None = None

speed_limit_um_per_s class-attribute instance-attribute

speed_limit_um_per_s: float = 500.0

Motion bins exceeding this speed will be replaced by interpolation.

max_dist_from_median_um class-attribute instance-attribute

max_dist_from_median_um: float = 250.0

Motion bins farther than this from the local median will be replaced by interpolation.

median_neighborhood_bins class-attribute instance-attribute

median_neighborhood_bins: int = 51

Data objects

dartsort uses some internal classes to represent spike trains and other data. Spike trains and motion are represented in the following objects.

dartsort.DARTsortSorting

Class which holds spike times, channels, and labels

This class holds our algorithm state. Initially the sorter doesn't have unit labels, so these are optional. Export me to a SpikeInterface NumpySorting with .to_numpy_sorting()

When you instantiate this with from_peeling_hdf5, if the flag load_simple_features is True (default), then additional features of spikes will be loaded into memory -- like localizations, which you can access like sorting.point_source_localizations[...].

__init__

__init__(*, times_samples: ndarray, channels: ndarray, labels: ndarray | None, parent_h5_path: str | Path | None = None, sampling_frequency: float | int = 30000.0, persistent_features: dict[str, ndarray] | None = None, ephemeral_features: dict[str, ndarray] | None = None)

Construct a DARTsortSorting directly from times, channels, labels, et cetera.

It's more common to construct from an HDF5 file with .from_peeling_hdf5() or from a .npz with .load().

to_numpy_sorting

to_numpy_sorting() -> NumpySorting

Clean up and produce a spikeinterface NumpySorting object.

to_pandas

to_pandas(include_1d_features=True, extract_location_if_possible=True)

Export to pandas DataFrame with some per-spike features.

to_tsgroup

to_tsgroup(metadata=None, add_feature_mean_metadata=True, weight_key='soft_assignment_weight')

Export to pynapple.TsGroup.

If there is a weight_key feature, this will produce a TsGroup with Tsd entries. Else, regular Ts.

copy

copy() -> Self

Shallow copy. Doesn't copy data, but copies references and internal state.

ephemeral_replace

ephemeral_replace(*, check_shapes=True, **new_features: ndarray) -> Self

Return a shallow copy of self with certain datasets/features replaced by new_features.

has_persistent_labels

has_persistent_labels() -> bool

Are my .labels those from the hdf5 file?

add_ephemeral_feature

add_ephemeral_feature(feature_name: str, feature: ndarray, check_shape: bool | None = None, overwrite=False)

Ephemeral features are accessible as properties and persisted to/from .npz, but not saved in the .h5.

add_feature

add_feature(feature_name: str, feature: ndarray, check_shape: bool | None = None)

Try to save a feature to h5, else register as ephemeral.

from_peeling_hdf5 classmethod

from_peeling_hdf5(h5_path: str | Path, *, times_samples_dataset='times_samples', channels_dataset='channels', labels_dataset='labels', load_feature_names: Sequence[str] | None = None, load_simple_features=True, load_all_features=False, allow_missing=False) -> Self

Load sorting from .hdf5 format saved by peelers

Parameters:

Name Type Description Default
load_feature_names optional list of str

Load exactly these features, plus geom/channel index.

None
load_simple_features bool

If load_feature_names unspecified, load all scalar or vector features (per spike), but no matrix-valued features like waveforms or multi-channel PCA features.

True
load_all_features bool
False

save

save(sorting_npz: str | Path)

Save to npz (usually dartsort_sorting.npz)

Support persisting myself in non-h5-supportable cases Cases: - When there is no h5! - When I have new labels. This is done by saving to .npz, with a pointer (like a relative symlink) to the .h5 file if it exists.

load classmethod

load(sorting_npz, additional_persistent_features=None, load_ephemeral_feature_names=None, load_persistent_feature_names=None) -> Self

Load from npz (usually dartsort_sorting.npz).

drop_missing

drop_missing() -> Self

Remove spikes with -1 labels.

drop_doubles

drop_doubles()

Remove spikes detected at the exact same time assigned to the same unit.

flatten

flatten() -> Self

Flatten the unit IDs so that there are no gaps in the sorted unique label set.

dartsort.MotionInfo

Holds motion-related info and helper functions.

drifting instance-attribute

drifting: bool

Was do_motion_estimation set, or are we ignoring motion?

geom instance-attribute

geom: ndarray

The recording's get_channel_locations() array.

rgeom instance-attribute

rgeom: ndarray

The extended "registered geometry" array

dredge_motion_est instance-attribute

dredge_motion_est: MotionEstimate | None

Motion estimate from DREDge, if using.

si_motion instance-attribute

si_motion: Motion | None

Motion estimate from SpikeInterface, if using.

geom_kdt instance-attribute

geom_kdt: KDTree

k-d tree for querying original geometry.

rgeom_kdt instance-attribute

rgeom_kdt: KDTree

k-d tree for querying registered geometry.

min_dist instance-attribute

min_dist: float

k-d tree query bound.

pitch instance-attribute

pitch: float

Vertical probe geometry period.

from_motion_est classmethod

from_motion_est(*, geom: ndarray | Tensor, dredge_motion_est: MotionEstimate | None = None, si_motion: Motion | None = None, rgeom: ndarray | Tensor | None = None) -> Self

Main constructor for MotionInfo objects

Precomputes and saves motion-related data structures for use through all of dartsort. Notably, the probe pitch, the min inter-channel distance, and the "registered geometry". Also, k-d trees which are used everywhere.

If neither dredge_motion_est nor si_motion is supplied, drifting is set to False and there is assumed to be no motion.

uncorrect_s

uncorrect_s(times_s: float | ndarray, reg_depths_um: ndarray) -> ndarray

Attempt to invert the motion estimate to un-register reg_depths_um.

pitch_shifts

pitch_shifts(*, sorting: DARTsortSorting | None = None, times_s: ndarray | None = None, depths_um: ndarray | None = None, reg_depths_um: ndarray | None = None, shift_mode: Literal['round', 'floor'] = 'round', motion_depth_mode: Literal['channel', 'localization'] = 'channel', localizations_dset='point_source_localizations') -> tuple[ndarray, ndarray]

Figure out coarse pitch shifts based on spike positions

Determine the number of pitches the probe would need to shift in order to coarsely align a waveform to its registered position.