Main API: `dartsort()` and the `DARTsortUserConfig`¶

This page shows the main functions and objects that you'd run into when spike sorting with dartsort. If you're just getting started, the usage section on the front page might be good to read first.

For details on parameters you should think about before running the sorter, see the important configuration details note.

Main function: `dartsort()`¶

dartsort.dartsort ¶

dartsort(recording: BaseRecording, output_dir: str | Path, cfg: DARTsortUserConfig | str | Path | DeveloperConfig | DARTsortInternalConfig = default_dartsort_cfg, motion: MotionInfo | None = None, si_motion: Motion | None = None, dredge_motion_est: MotionEstimate | None = None, overwrite=False)

This function runs a spike sorter called dartsort.

Parameters:

Name	Type	Description	Default
`recording`	`BaseRecording`	A SpikeInterface `BaseRecording` object	required
`output_dir`	`str or Path`	Folder where outputs are stored	required
`cfg`	`DARTsortUserConfig or DARTsortInternalConfig or str or Path`	Your settings. Either create a `DARTsortUserConfig` directly in code, or you can pass a string or Path pointing to a .toml file here.	`default_dartsort_cfg`
`si_motion`	`Motion`	Allows users to pass their own external motion estimate.	`None`
`dredge_motion_est`	`MotionEstimate`	Allows users to pass their own external motion estimate.	`None`
`overwrite`	`bool`	Ignore and overwrite stored results, if any. Otherwise, dartsort will try to resume from the last step that ran, or if it had finished then it will do nothing.	`False`

Returns:

Name	Type	Description
`results`	`DARTsortReturn`	Dictionary of sorting results, with keys: "sorting": `DARTsortSorting` "motion": MotionInfo

The return value from the dartsort() function is a DARTsortReturn object, which is a dictionary containing spike trains and motion information:

dartsort.DARTsortReturn ¶

sorting `instance-attribute` ¶

sorting: DARTsortSorting

Output spike trains.

motion `instance-attribute` ¶

motion: MotionInfo

Esimated motion

The spike trains and motion info are stored in some internal objects, described in the last section on this page. Up next, we'll discuss how to adjust dartsort's parameters.

Configuration¶

For details on parameters you should know about before running the sorter, see the important configuration details note. Here, we'll show a reference for all of the configuration options. Some of the important ones not mentioned in that note include: the voltage threshold for initial spike detection and the "energy" thresholds for spike detection in the initial and template matching passes. You may also want to take a look at the paramters for

dartsort.DARTsortUserConfig ¶

User-facing configuration options

To change dartsort's behavior, set parameters here and pass the object to the dartsort() function.

do_motion_estimation `class-attribute` `instance-attribute` ¶

do_motion_estimation: bool = True

Set this to false if your data is super stable or already motion-corrected.

preprocessing `class-attribute` `instance-attribute` ¶

preprocessing: PreprocessingStrategy = 'none'

If other than 'none', dartsort will apply some preprocessing to the recording. Leave as 'none' if you are passing in an already-preprocesed recording. If so, be aware that dartsort expects its input to be standardized on each channel in addition to the usual highpass filtering, but that whitening is handled internally. See util/preprocess_util.py if you're curious about the details of the methods.

Options: 'ibllikecmr', 'ibllike', 'standardize', 'none'

preprocessing_dtype `class-attribute` `instance-attribute` ¶

preprocessing_dtype: Literal['float16', 'float32'] = 'float32'

If you have a lot of data and you're using a workflow where it is important to save a preprocessed copy of the recording, float16 is a good option. Only relevant if preprocessing != 'none'. If the recording isn't getting saved, stick to float32.

subsampling_spikes `class-attribute` `instance-attribute` ¶

subsampling_spikes: int | None = 2048000

Detection steps before the final matching round will run until at least this many spikes are found or the whole recording is covered, to make sure that there is enough data for clustering. See also subsampling_fraction. Set to None to disable subsampling.

subsampling_presence `class-attribute` `instance-attribute` ¶

subsampling_presence: float = 0.1

Early detection steps which have already found subsampling_spikes spikes are only allowed to end early if they additionally cover this fraction of the recording, to make sure there's good coverage of conditions for template estimation.

matching_iterations `class-attribute` `instance-attribute` ¶

matching_iterations: int = 1

By default, 1 template matching step is carried out using templates estimated from the initial detection round.

dredge_only `class-attribute` `instance-attribute` ¶

dredge_only: bool = False

Whether to stop after initial localization and motion tracking.

n_jobs_cpu `class-attribute` `instance-attribute` ¶

n_jobs_cpu: int = 0

Number of parallel workers to use when running on CPU. 0 means everything runs on the main thread; negative means #cpu - (val+1) so that -1 is all cores, -2 is all less 1, etc.

n_jobs_gpu `class-attribute` `instance-attribute` ¶

n_jobs_gpu: int = 0

Number of parallel workers to use when running on GPU.

n_jobs_small `class-attribute` `instance-attribute` ¶

n_jobs_small: int = -2

Max workers to use for small jobs.

n_jobs_small_gpu `class-attribute` `instance-attribute` ¶

n_jobs_small_gpu: int = 4

Max workers to use for small jobs running on GPU.

device `class-attribute` `instance-attribute` ¶

device: str | None = None

The name of the PyTorch device to use. For example, 'cpu' or 'cuda' or 'cuda:1'. If unset, uses n_jobs_gpu of your CUDA GPUs if you have multiple, or else just the one, or your CPU.

executor `class-attribute` `instance-attribute` ¶

executor: str = 'threading_unless_multigpu'

Choose: 'threading_unless_multigpu', 'ThreadPoolExecutor', 'ProcessPoolExecutor', or some others.

chunk_length_samples `class-attribute` `instance-attribute` ¶

chunk_length_samples: int = 30000

Batch size for data processing.

work_in_tmpdir `class-attribute` `instance-attribute` ¶

work_in_tmpdir: bool = False

If True, dartsort will store all temporary data in a scratch directory in tmpdir_parent or TMPDIR.

copy_recording_to_tmpdir `class-attribute` `instance-attribute` ¶

copy_recording_to_tmpdir: bool = False

Save a copy of the preprocessed recording to a tmpdir?

workdir_copier `class-attribute` `instance-attribute` ¶

workdir_copier: Literal['shutil', 'rsync'] = 'shutil'

'shutil' or 'rsync'

workdir_follow_symlinks `class-attribute` `instance-attribute` ¶

workdir_follow_symlinks: bool = False

tmpdir_parent `class-attribute` `instance-attribute` ¶

tmpdir_parent: str | None = None

Control where tmpdirs are created.

save_intermediates `class-attribute` `instance-attribute` ¶

save_intermediates: bool = False

Store all spike features from intermediate steps (for debugging)

save_final_features `class-attribute` `instance-attribute` ¶

save_final_features: bool = True

Store the spike features from the final step (instead of just basic spike train outputs).

ms_before `class-attribute` `instance-attribute` ¶

ms_before: float = 1.4

Length of time (ms) before trough (or peak) in waveform snippets. Default value corresponds to 42 samples at 30kHz.

ms_after `class-attribute` `instance-attribute` ¶

ms_after: float = 2.6 + 0.1 / 3

Length of time (ms) after trough (or peak) in waveform snippets. Default value corresponds to 79 samples at 30kHz.

alignment_ms `class-attribute` `instance-attribute` ¶

alignment_ms: float = 1.5

Largest time shift allowed when re-aligning events.

peak_sign `class-attribute` `instance-attribute` ¶

peak_sign: Literal['neg', 'both', 'pos'] = 'both'

Allow only troughs or events of both signs when detecting threshold crossings during initialization. Or positive only, if that's your thing.

voltage_threshold `class-attribute` `instance-attribute` ¶

voltage_threshold: float = 3.0

Threshold in standardized (SNR) voltage units for initial detection; peaks or troughs larger than this value will be grabbed.

matching_threshold `class-attribute` `instance-attribute` ¶

matching_threshold: float = 8.0

Template matching threshold. If subtracting a template leads to at least this great of a decrease in the norm of the residual, that match will be used. This is in the same units as the corresponding threshold in Kilosort and other sorters, and it represents reduction in Euclidean norm of standardized data due to matching a new event.

initial_threshold `class-attribute` `instance-attribute` ¶

initial_threshold: float = 10.0

Initial detection's neural net matching threshold. Same as matching_threshold, except that a neural net is trying to guess the true waveforms here, rather than using cluster templates.

motion_voltage_threshold `class-attribute` `instance-attribute` ¶

motion_voltage_threshold: float = 4.0

If subsampling, a quick thresholding will be run at this voltage threshold to grab spikes for motion estimation purposes.

temporal_pca_rank `class-attribute` `instance-attribute` ¶

temporal_pca_rank: int = 8

Rank of temporal PCAs used in denoising and featurization.

feature_ms_before `class-attribute` `instance-attribute` ¶

feature_ms_before: float = 0.75

As ms_before, but used only when computing PCA features in clustering.

feature_ms_after `class-attribute` `instance-attribute` ¶

feature_ms_after: float = 1.25

As ms_after, but used only when computing PCA features in clustering.

subtraction_radius_um `class-attribute` `instance-attribute` ¶

subtraction_radius_um: float = 200.0

Radius of neighborhoods around spike events extracted when denoising and subtracting NN-denoised events.

deduplication_radius_um `class-attribute` `instance-attribute` ¶

deduplication_radius_um: float = 50.0

During initial detection, if two spike events occur at the same time within this radius, then the smaller of the two is ignored. But also all of the secondary channels of the big one, which is important.

featurization_radius_um `class-attribute` `instance-attribute` ¶

featurization_radius_um: float = 100.0

Radius around detection channel or template peak channel used to extract spike features for clustering.

fit_radius_um `class-attribute` `instance-attribute` ¶

fit_radius_um: float = 75.0

Extraction radius when fitting features like PCA; smaller than other radii to include less noise.

localization_radius_um `class-attribute` `instance-attribute` ¶

localization_radius_um: float = 100.0

Radius around main channel used when localizing spikes.

nn_denoiser_class_name `class-attribute` `instance-attribute` ¶

nn_denoiser_class_name: Literal['SingleChannelWaveformDenoiser', 'Decollider'] = 'Decollider'

Which neural net to use in initial detection? Set to Decollider (and set the pretrained path to None to train a brand-new unsupervised denoiser.

nn_denoiser_pretrained_path `class-attribute` `instance-attribute` ¶

nn_denoiser_pretrained_path: str | None = None

Path to a pytorch saved model (.pt file as dumped by torch.save()). If this is None, a new model will be fit.

amplitude_scaling_stddev `class-attribute` `instance-attribute` ¶

amplitude_scaling_stddev: float = 0.01

Standard deviation of amplitude scaling regularization prior in template matching.

amplitude_scaling_boundary `class-attribute` `instance-attribute` ¶

amplitude_scaling_boundary: float = 1.0 / 3.0

Boundaries on the amount of scaling allowed.

temporal_upsamples `class-attribute` `instance-attribute` ¶

temporal_upsamples: int = 4

Upsampling of templates during matching to allow for temporal aliasing of waveforms.

rigid `class-attribute` `instance-attribute` ¶

rigid: bool = False

Use rigid registration and ignore the window parameters.

probe_boundary_padding_um `class-attribute` `instance-attribute` ¶

probe_boundary_padding_um: float = 100.0

spatial_bin_length_um `class-attribute` `instance-attribute` ¶

spatial_bin_length_um: float = 1.0

temporal_bin_length_s `class-attribute` `instance-attribute` ¶

temporal_bin_length_s: float = 1.0

smoothing_um `class-attribute` `instance-attribute` ¶

smoothing_um: float | None = 3.0

smoothing_s `class-attribute` `instance-attribute` ¶

smoothing_s: float | None = None

window_step_um `class-attribute` `instance-attribute` ¶

window_step_um: float = 400.0

window_scale_um `class-attribute` `instance-attribute` ¶

window_scale_um: float = 600.0

window_margin_um `class-attribute` `instance-attribute` ¶

window_margin_um: float | None = None

max_dt_s `class-attribute` `instance-attribute` ¶

max_dt_s: float = 500.0

max_disp_um `class-attribute` `instance-attribute` ¶

max_disp_um: float | None = None

correlation_threshold `class-attribute` `instance-attribute` ¶

correlation_threshold: float = 0.1

min_amplitude `class-attribute` `instance-attribute` ¶

min_amplitude: float | None = None

speed_limit_um_per_s `class-attribute` `instance-attribute` ¶

speed_limit_um_per_s: float = 500.0

Motion bins exceeding this speed will be replaced by interpolation.

max_dist_from_median_um `class-attribute` `instance-attribute` ¶

max_dist_from_median_um: float = 250.0

Motion bins farther than this from the local median will be replaced by interpolation.

median_neighborhood_bins `class-attribute` `instance-attribute` ¶

median_neighborhood_bins: int = 51

Data objects¶

dartsort uses some internal classes to represent spike trains and other data. Spike trains and motion are represented in the following objects.

dartsort.DARTsortSorting ¶

Class which holds spike times, channels, and labels

This class holds our algorithm state. Initially the sorter doesn't have unit labels, so these are optional. Export me to a SpikeInterface NumpySorting with .to_numpy_sorting()

When you instantiate this with from_peeling_hdf5, if the flag load_simple_features is True (default), then additional features of spikes will be loaded into memory -- like localizations, which you can access like sorting.point_source_localizations[...].

init ¶

__init__(*, times_samples: ndarray, channels: ndarray, labels: ndarray | None, parent_h5_path: str | Path | None = None, sampling_frequency: float | int = 30000.0, persistent_features: dict[str, ndarray] | None = None, ephemeral_features: dict[str, ndarray] | None = None)

Construct a DARTsortSorting directly from times, channels, labels, et cetera.

It's more common to construct from an HDF5 file with .from_peeling_hdf5() or from a .npz with .load().

to_numpy_sorting ¶

to_numpy_sorting() -> NumpySorting

Clean up and produce a spikeinterface NumpySorting object.

to_pandas ¶

to_pandas(include_1d_features=True, extract_location_if_possible=True)

Export to pandas DataFrame with some per-spike features.

to_tsgroup ¶

to_tsgroup(metadata=None, add_feature_mean_metadata=True, weight_key='soft_assignment_weight')

Export to pynapple.TsGroup.

If there is a weight_key feature, this will produce a TsGroup with Tsd entries. Else, regular Ts.

copy ¶

copy() -> Self

Shallow copy. Doesn't copy data, but copies references and internal state.

ephemeral_replace ¶

ephemeral_replace(*, check_shapes=True, **new_features: ndarray) -> Self

Return a shallow copy of self with certain datasets/features replaced by new_features.

has_persistent_labels ¶

has_persistent_labels() -> bool

Are my .labels those from the hdf5 file?

add_ephemeral_feature ¶

add_ephemeral_feature(feature_name: str, feature: ndarray, check_shape: bool | None = None, overwrite=False)

Ephemeral features are accessible as properties and persisted to/from .npz, but not saved in the .h5.

add_feature ¶

add_feature(feature_name: str, feature: ndarray, check_shape: bool | None = None)

Try to save a feature to h5, else register as ephemeral.

from_peeling_hdf5 `classmethod` ¶

from_peeling_hdf5(h5_path: str | Path, *, times_samples_dataset='times_samples', channels_dataset='channels', labels_dataset='labels', load_feature_names: Sequence[str] | None = None, load_simple_features=True, load_all_features=False, allow_missing=False) -> Self

Load sorting from .hdf5 format saved by peelers

Parameters:

Name	Type	Description	Default
`load_feature_names`	`optional list of str`	Load exactly these features, plus geom/channel index.	`None`
`load_simple_features`	`bool`	If load_feature_names unspecified, load all scalar or vector features (per spike), but no matrix-valued features like waveforms or multi-channel PCA features.	`True`
`load_all_features`	`bool`		`False`

save ¶

save(sorting_npz: str | Path)

Save to npz (usually dartsort_sorting.npz)

Support persisting myself in non-h5-supportable cases Cases: - When there is no h5! - When I have new labels. This is done by saving to .npz, with a pointer (like a relative symlink) to the .h5 file if it exists.

load `classmethod` ¶

load(sorting_npz, additional_persistent_features=None, load_ephemeral_feature_names=None, load_persistent_feature_names=None) -> Self

Load from npz (usually dartsort_sorting.npz).

drop_missing ¶

drop_missing() -> Self

Remove spikes with -1 labels.

drop_doubles ¶

drop_doubles()

Remove spikes detected at the exact same time assigned to the same unit.

flatten ¶

flatten() -> Self

Flatten the unit IDs so that there are no gaps in the sorted unique label set.

dartsort.MotionInfo ¶

Holds motion-related info and helper functions.

drifting `instance-attribute` ¶

drifting: bool

Was do_motion_estimation set, or are we ignoring motion?

geom `instance-attribute` ¶

geom: ndarray

The recording's get_channel_locations() array.

rgeom `instance-attribute` ¶

rgeom: ndarray

The extended "registered geometry" array

dredge_motion_est `instance-attribute` ¶

dredge_motion_est: MotionEstimate | None

Motion estimate from DREDge, if using.

si_motion `instance-attribute` ¶

si_motion: Motion | None

Motion estimate from SpikeInterface, if using.

geom_kdt `instance-attribute` ¶

geom_kdt: KDTree

k-d tree for querying original geometry.

rgeom_kdt `instance-attribute` ¶

rgeom_kdt: KDTree

k-d tree for querying registered geometry.

min_dist `instance-attribute` ¶

min_dist: float

k-d tree query bound.

pitch `instance-attribute` ¶

pitch: float

Vertical probe geometry period.

from_motion_est `classmethod` ¶

from_motion_est(*, geom: ndarray | Tensor, dredge_motion_est: MotionEstimate | None = None, si_motion: Motion | None = None, rgeom: ndarray | Tensor | None = None) -> Self

Main constructor for MotionInfo objects

Precomputes and saves motion-related data structures for use through all of dartsort. Notably, the probe pitch, the min inter-channel distance, and the "registered geometry". Also, k-d trees which are used everywhere.

If neither dredge_motion_est nor si_motion is supplied, drifting is set to False and there is assumed to be no motion.

uncorrect_s ¶

uncorrect_s(times_s: float | ndarray, reg_depths_um: ndarray) -> ndarray

Attempt to invert the motion estimate to un-register reg_depths_um.

pitch_shifts ¶

pitch_shifts(*, sorting: DARTsortSorting | None = None, times_s: ndarray | None = None, depths_um: ndarray | None = None, reg_depths_um: ndarray | None = None, shift_mode: Literal['round', 'floor'] = 'round', motion_depth_mode: Literal['channel', 'localization'] = 'channel', localizations_dset='point_source_localizations') -> tuple[ndarray, ndarray]

Figure out coarse pitch shifts based on spike positions

Determine the number of pitches the probe would need to shift in order to coarsely align a waveform to its registered position.

Main API: dartsort() and the DARTsortUserConfig¶

Main function: dartsort()¶

dartsort.dartsort ¶

dartsort.DARTsortReturn ¶

sorting instance-attribute ¶

motion instance-attribute ¶

Configuration¶

dartsort.DARTsortUserConfig ¶

do_motion_estimation class-attribute instance-attribute ¶

preprocessing class-attribute instance-attribute ¶

preprocessing_dtype class-attribute instance-attribute ¶

subsampling_spikes class-attribute instance-attribute ¶

subsampling_presence class-attribute instance-attribute ¶

matching_iterations class-attribute instance-attribute ¶

dredge_only class-attribute instance-attribute ¶

n_jobs_cpu class-attribute instance-attribute ¶

n_jobs_gpu class-attribute instance-attribute ¶

n_jobs_small class-attribute instance-attribute ¶

n_jobs_small_gpu class-attribute instance-attribute ¶

device class-attribute instance-attribute ¶

executor class-attribute instance-attribute ¶

chunk_length_samples class-attribute instance-attribute ¶

work_in_tmpdir class-attribute instance-attribute ¶

copy_recording_to_tmpdir class-attribute instance-attribute ¶

workdir_copier class-attribute instance-attribute ¶

workdir_follow_symlinks class-attribute instance-attribute ¶

tmpdir_parent class-attribute instance-attribute ¶

save_intermediates class-attribute instance-attribute ¶

save_final_features class-attribute instance-attribute ¶

ms_before class-attribute instance-attribute ¶

ms_after class-attribute instance-attribute ¶

alignment_ms class-attribute instance-attribute ¶

peak_sign class-attribute instance-attribute ¶

voltage_threshold class-attribute instance-attribute ¶

matching_threshold class-attribute instance-attribute ¶

initial_threshold class-attribute instance-attribute ¶

motion_voltage_threshold class-attribute instance-attribute ¶

temporal_pca_rank class-attribute instance-attribute ¶

feature_ms_before class-attribute instance-attribute ¶

feature_ms_after class-attribute instance-attribute ¶

subtraction_radius_um class-attribute instance-attribute ¶

deduplication_radius_um class-attribute instance-attribute ¶

featurization_radius_um class-attribute instance-attribute ¶

fit_radius_um class-attribute instance-attribute ¶

localization_radius_um class-attribute instance-attribute ¶

nn_denoiser_class_name class-attribute instance-attribute ¶

nn_denoiser_pretrained_path class-attribute instance-attribute ¶

amplitude_scaling_stddev class-attribute instance-attribute ¶

amplitude_scaling_boundary class-attribute instance-attribute ¶

temporal_upsamples class-attribute instance-attribute ¶

rigid class-attribute instance-attribute ¶

probe_boundary_padding_um class-attribute instance-attribute ¶

spatial_bin_length_um class-attribute instance-attribute ¶

temporal_bin_length_s class-attribute instance-attribute ¶

smoothing_um class-attribute instance-attribute ¶

smoothing_s class-attribute instance-attribute ¶

window_step_um class-attribute instance-attribute ¶

window_scale_um class-attribute instance-attribute ¶

window_margin_um class-attribute instance-attribute ¶

max_dt_s class-attribute instance-attribute ¶

max_disp_um class-attribute instance-attribute ¶

correlation_threshold class-attribute instance-attribute ¶

min_amplitude class-attribute instance-attribute ¶

speed_limit_um_per_s class-attribute instance-attribute ¶

max_dist_from_median_um class-attribute instance-attribute ¶

median_neighborhood_bins class-attribute instance-attribute ¶