PopulationCoding package

Submodules

PopulationCoding.corr module

Measuring and identifying significant correlations.

PopulationCoding.corr.corr(A, B)

Calculates the correlation matrix between variables in two arrays.

Parameters:
Aarray_like of shape (m_features, t_observations)
Barray_like of shape (n_features, t_observations)
Returns:
Cndarray of shape (m_features, n_features)

Correlation matrix

PopulationCoding.corr.sig_stim_corr(X, y, thresh=3, nshuff=100, random=False)

Identify timeseries in X that are significantly correlated with another “stimulus” y.

Parameters:
Xarray_like of shape (n_features, t_observations)

Timeseries data

yarray_like of shape (1, t_observations)

Stimulus vector

threshfloat, default=3

Number of standard deviations above mean required for significance

nshuffint, default=100

Number of shuffling tests

randombool, default=False

If false, shuffling is done by circular permutations of y. If true, shuffling is done by creating new shuffles of repeated stimuli in y (assuming y is 0 or a value)

Returns:
Cndarray of shape (n_features,)

Correlations of each feature with the stimulus y

idx_sigtuple

Indices of significantly correlated variables in X, as a tuple where idx_sig[0] are positively-correlated and idx_sig[1] negatively

c_threshtuple

Thresholds (upper, lower) on correlation

C_shuffndarray of shape (n_features, nshuff)

Shuffled correlations

PopulationCoding.dimred module

Dimensionality reduction tools!

PopulationCoding.dimred.SVCA(X, ntrain=None, ntest=None, itrain=None, itest=None, n_randomized=None, shuffle=False, flip_traintest=False, prePCA=True, **kwargs)

Shared Variance Component Analysis (SVCA), originally described by Stringer et al. 2019 [1]. SVCA is essentially a cross-validated canonical covariance analysis between two sets of neuronal populations. New features here include shuffling approaches and prePCA.

Parameters:
Ffarray_like of shape (N, T)

Neural data matrix

n_randomizedint, optional

Number of SVCs to compute (if not None, uses randomized SVD approximation)

ntrainarray_like, optional

Indices of first neural subset (note this is not truly “training” data, as ntrain and ntest are both used to train and test SVCA)

ntestarray_like, optional

Indices of second neural subset

itrainarray_like, optional

Indices of training timepoints

itestarray_like, optional

Indices of test timepoints

shufflebool, default=False

Whether to shuffle by circularly permuting each neuron’s timeseries (not recommended whenever # neurons > # timepoints!)

flip_traintest: bool, default=False

Whether to shuffle by swapping the train and test timepoints for one neural subset (recommended if not using the session permutation method [2])

prePCA: bool, default=False

Whether to peform PCA on each subset independently before computing SVCs. This is very useful when len(ntrain)xlen(ntest) covariance matrix does not fit into local memory and # neurons >> # timepoints. All PCs are kept, such that the resulting SVCA decomposition is mathematically equivalent to running SVCA on the original data matrix

Returns:
sneurndarray

Shared variance of each covariance component

vneurndarray

Total variance of each covariance component

undarray

Left eigenvectors of covariance matrix between ntrain and ntest during itrain timepoints

vndarray

Right eigenvectors of covariance matrix between ntrain and ntest during itrain timepoints

pcaNone | dict

If prePCA=True, return a dictionary containing the projections and principal components for both neural sets with the keys train_projs, test_projs, train_vecs, test_vecs

References

[1]

Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C.B., Carandini, M., & Harris, K.D. (2019). Spontaneous behaviors drive multidimensional, brainwide activity. Science, 364(6437), https://doi.org/10.1126/science.aav7893.

[2]

Harris, K. D. (2020). Nonsense correlations in neuroscience. bioRxiv, 2020-11. https://doi.org/10.1101/2020.11.29.402719.

PopulationCoding.dimred.estimate_id_twonn(X, plot=False, X_is_dist=False)

TWO-NN method for estimating intrinsic dimensionality as described by Facco et al. 2017 [1]. This implementation is taken from https://github.com/jmmanley/two-nn-dimensionality-estimator.

Parameters:
Xarray_like of shape (N, p)

Matrix of N p-dimensional samples (when X_is_dist=False)

plotbool, default=False

Boolean flag of whether to plot fit

X_is_distbool, default=False

Boolean flag of whether X is an (N, N) distance matrix instead

Returns:
dTWO-NN estimate of intrinsic dimensionality

References

[1]

Facco, E., d’Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific reports, 7(1), 12140. https://doi.org/10.1038/s41598-017-11873-y.

PopulationCoding.dimred.lda(data, labels, d=None, classes=None)

Linear discriminant analysis (LDA)

LDA is a generalization of Fisher’s linear discriminant that finds a projection that minimizes the Fisher-Rao discriminant among multiple classes. This technique requires continuous input data and a priori known classes.

ASSUMPTIONS:

  • For those familiar with MANOVA, the same assumptions apply here.

  • Independent variables are all normal across same levels of grouping variables (multivariate normality).

  • Covariances are equal across classes (homoscedasticity).

  • Samples are chosen independently.

  • Predictive power may decrease with increased correlations among predictor variables (multicollinearity).

For a more thorough overview of LDA, consult e.g. McLachlan, 2005 [1].

Parameters:
dataarray_like of shape (n, m)

nxm data matrix, where columns represent m features and rows represent n samples

labelsarray_like of shape (n, )

Vector of class labels for the given data

dint, optional

Desired dimensionality after projection. d <= cardinality(labels)-1 if None, d = cardinality(labels)-1

classesarray_like of shape (cardinality(labels),), optional

Class names (as in labels)

Returns:
Warray_like of shape (m, d)

Projection matrix to reduced dimensional space, spanned by the top cardinality(labels)-1 generalized eigenvectors of S_b and S_w

proj_dataarray_like of shape (n, d)

Data after projection under W

valsarray_like of shape (d, )

Eigenvalues corresponding to eigenvectors in columns of W

mu_carray_like of shape (d, cardinality(classes))

Matrix where each column is the class mean after projection

S_carray_like of shape (d, d, cardinality(classes))

Covariance matrices after projection

References

[1]

McLachlan, G. J. (2005). Discriminant analysis and statistical pattern recognition. John Wiley & Sons.

PopulationCoding.dimred.optimize_latent_dim(X, y, model, dims, cv=3, scoring='neg_mean_squared_error')

Identify the optimal latent dimensionality via cross validation.

Parameters:
Xarray_like of shape (n_observations, p)

Data matrix

yarray_like of shape (n_observations, q)

Targets

modelestimator, must take an n_components argument

An sklearn model to try out

dimsarray_like

List of integer dimensionalities to try out

cvint, default=3

Cross validation fold, can be int or KFold, etc.

scoringstr | callable, default=”neg_mean_squared_error”

Metric to use to assess prediction quality in latent space

Returns:
d_optint

Latent dimensionality that maximizes cross-validated score

scoresarray_like of shape (len(dims),)

Cross-validated score at each dimensionality

PopulationCoding.dimred.plot_component_images(components, nplot=9, **kwargs)

After applying matrix decomposition to a set of images or other 2D data, plots the first nplot components.

Parameters:
componentsarray_like of shape (N, W, ncomp)

Matrix of weights to be plotted

nplotint, default=9

Number of components to plot

kwargspassed to plt.subplots
Returns:
figFigure

PopulationCoding.info module

Information metrics for neuronal populations.

PopulationCoding.info.conditional_cov(covX, covXY, covY=None)

Computes conditional covariance, given single and joint covariance matrices.

PopulationCoding.info.dprimesq(X1, X2, shuffle=False, diagonal=False)

Calculates the squared sensitivity index (d’) between two distributions. Modeled after Rumyantsev et al. 2021 [1].

Parameters:
X1array_like of shape (n, t1_observations)
X2array_like of shape (n, t2_observations)
shufflebool, default=False

Whether to shuffle observations

diagonalbool, default=False

Whether to assume diagonal covariance matrix

Returns:
d2float

Value of squared sensitivity index

covarray_like of shape (n, n)

Covariance matrix

References

[1]

Rumyantsev, O. I., Lecoq, J. A., Hernandez, O., Zhang, Y., Savall, J., Chrapkiewicz, R., … & Schnitzer, M. J. (2020). Fundamental bounds on the fidelity of sensory cortical coding. Nature, 580(7801), 100-105. https://doi.org/10.1038/s41586-020-2130-2.

PopulationCoding.info.entropy(cov)

Computes entropy under the Gaussian assumption, given a covariance matrix.

PopulationCoding.info.mutual_information(covX, covXY, covY)

Computes mutual information under the Gaussian assumption. Mutual information is given as the estimated reduction of uncertainty (entropy) about system X given system Y.

PopulationCoding.predict module

Fit predictive models between multi-dimensionsal datasets.

PopulationCoding.predict.canonical_cov(Y, X, lam, npc=512, **kwargs)

Canonical covariance analysis to predict Y from X, based on originally CanonCor2 by Stringer et al. 2019.

After fitting a “sort of” canonical covariance analysis between two sets of data X and Y. The approximation of Y based on n projections is given by a[:, :n] @ b[:, :n].T @ X.T.

Parameters:
Yarray_like of shape (m, t)

Data matrix

Xarray_like of shape (n, t)

Data matrix

lamfloat

Regularization parameter

npcint, default=512

Number of projections to consider

Returns:
aarray_like of shape (m, npc)

Projections of Y

barray_like of shape (n, npc)

Projections of X

R2array_like of shape (npc,)

Proportion of total variance of Y explained by each projection

varray_like of shape (n, npc)

Actual value of each linear combination of X

PopulationCoding.utils module

Miscellaneous tools.

PopulationCoding.utils.bin2d(x, tbin, idim=0)

Bins data.

PopulationCoding.utils.classify_continuous(y, labels)

Take continuous input and classify it into the closest label.

PopulationCoding.utils.generate_gaussian_data(A, T=10000, mean_subtract=True)

Generates auto-regressive Gaussian test data given a connectivity matrix.

Parameters:
Aarray_like of shape (N, N)

Desired connectivity matrix

Tint, default=10**4

Number of time points to simulate

mean_subtractbool, default=True

Whether or not to mean subtract the time series

Returns:
Xarray_like of shape (N, T)

Timeseries data

covarray_like of shape (N, N)

Empirical covariance matrix

PopulationCoding.utils.get_consecutive_chunks(t)

Finds chunks of consecutive integers in a vector t.

PopulationCoding.utils.logdet(X)

Computes the log of the determinant.

PopulationCoding.utils.train_test_split_idx(N, train_frac=0.5, time=False, interleave=0)

Module contents