PopulationCoding package

Submodules

PopulationCoding.corr module

Measuring and identifying significant correlations.

PopulationCoding.corr.corr(A, B)

Calculates the correlation matrix between variables in two arrays.

Parameters:

Aarray_like of shape (m_features, t_observations)
Barray_like of shape (n_features, t_observations)

Returns:

Cndarray of shape (m_features, n_features): Correlation matrix

PopulationCoding.corr.sig_stim_corr(X, y, thresh=3, nshuff=100, random=False)

Identify timeseries in X that are significantly correlated with another “stimulus” y.

Parameters:

Xarray_like of shape (n_features, t_observations): Timeseries data
yarray_like of shape (1, t_observations): Stimulus vector
threshfloat, default=3: Number of standard deviations above mean required for significance
nshuffint, default=100: Number of shuffling tests
randombool, default=False: If false, shuffling is done by circular permutations of y. If true, shuffling is done by creating new shuffles of repeated stimuli in y (assuming y is 0 or a value)

Returns:

Cndarray of shape (n_features,): Correlations of each feature with the stimulus y
idx_sigtuple: Indices of significantly correlated variables in X, as a tuple where idx_sig[0] are positively-correlated and idx_sig[1] negatively
c_threshtuple: Thresholds (upper, lower) on correlation
C_shuffndarray of shape (n_features, nshuff): Shuffled correlations

PopulationCoding.dimred module

Dimensionality reduction tools!

PopulationCoding.dimred.SVCA(X, ntrain=None, ntest=None, itrain=None, itest=None, n_randomized=None, shuffle=False, flip_traintest=False, prePCA=True, **kwargs)

Shared Variance Component Analysis (SVCA), originally described by Stringer et al. 2019 [1]. SVCA is essentially a cross-validated canonical covariance analysis between two sets of neuronal populations. New features here include shuffling approaches and prePCA.

Parameters:

Ffarray_like of shape (N, T): Neural data matrix
n_randomizedint, optional: Number of SVCs to compute (if not None, uses randomized SVD approximation)
ntrainarray_like, optional: Indices of first neural subset (note this is not truly “training” data, as ntrain and ntest are both used to train and test SVCA)
ntestarray_like, optional: Indices of second neural subset
itrainarray_like, optional: Indices of training timepoints
itestarray_like, optional: Indices of test timepoints
shufflebool, default=False: Whether to shuffle by circularly permuting each neuron’s timeseries (not recommended whenever # neurons > # timepoints!)
flip_traintest: bool, default=False: Whether to shuffle by swapping the train and test timepoints for one neural subset (recommended if not using the session permutation method [2])
prePCA: bool, default=False: Whether to peform PCA on each subset independently before computing SVCs. This is very useful when len(ntrain)xlen(ntest) covariance matrix does not fit into local memory and # neurons >> # timepoints. All PCs are kept, such that the resulting SVCA decomposition is mathematically equivalent to running SVCA on the original data matrix

Returns:

sneurndarray: Shared variance of each covariance component
vneurndarray: Total variance of each covariance component
undarray: Left eigenvectors of covariance matrix between ntrain and ntest during itrain timepoints
vndarray: Right eigenvectors of covariance matrix between ntrain and ntest during itrain timepoints
pcaNone | dict: If prePCA=True, return a dictionary containing the projections and principal components for both neural sets with the keys train_projs, test_projs, train_vecs, test_vecs

References

[1]

Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C.B., Carandini, M., & Harris, K.D. (2019). Spontaneous behaviors drive multidimensional, brainwide activity. Science, 364(6437), https://doi.org/10.1126/science.aav7893.

[2]

Harris, K. D. (2020). Nonsense correlations in neuroscience. bioRxiv, 2020-11. https://doi.org/10.1101/2020.11.29.402719.

PopulationCoding.dimred.estimate_id_twonn(X, plot=False, X_is_dist=False)

TWO-NN method for estimating intrinsic dimensionality as described by Facco et al. 2017 [1]. This implementation is taken from https://github.com/jmmanley/two-nn-dimensionality-estimator.

Parameters:

Xarray_like of shape (N, p): Matrix of N p-dimensional samples (when X_is_dist=False)
plotbool, default=False: Boolean flag of whether to plot fit
X_is_distbool, default=False: Boolean flag of whether X is an (N, N) distance matrix instead

Returns:

dTWO-NN estimate of intrinsic dimensionality

References

[1]

Facco, E., d’Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific reports, 7(1), 12140. https://doi.org/10.1038/s41598-017-11873-y.

PopulationCoding.dimred.lda(data, labels, d=None, classes=None)

Linear discriminant analysis (LDA)

LDA is a generalization of Fisher’s linear discriminant that finds a projection that minimizes the Fisher-Rao discriminant among multiple classes. This technique requires continuous input data and a priori known classes.

ASSUMPTIONS:

For those familiar with MANOVA, the same assumptions apply here.
Independent variables are all normal across same levels of grouping variables (multivariate normality).
Covariances are equal across classes (homoscedasticity).
Samples are chosen independently.
Predictive power may decrease with increased correlations among predictor variables (multicollinearity).

For a more thorough overview of LDA, consult e.g. McLachlan, 2005 [1].

Parameters:

dataarray_like of shape (n, m): nxm data matrix, where columns represent m features and rows represent n samples
labelsarray_like of shape (n, ): Vector of class labels for the given data
dint, optional: Desired dimensionality after projection. d <= cardinality(labels)-1 if None, d = cardinality(labels)-1
classesarray_like of shape (cardinality(labels),), optional: Class names (as in labels)

Returns:

Warray_like of shape (m, d): Projection matrix to reduced dimensional space, spanned by the top cardinality(labels)-1 generalized eigenvectors of S_b and S_w
proj_dataarray_like of shape (n, d): Data after projection under W
valsarray_like of shape (d, ): Eigenvalues corresponding to eigenvectors in columns of W
mu_carray_like of shape (d, cardinality(classes)): Matrix where each column is the class mean after projection
S_carray_like of shape (d, d, cardinality(classes)): Covariance matrices after projection

References

[1]

McLachlan, G. J. (2005). Discriminant analysis and statistical pattern recognition. John Wiley & Sons.

PopulationCoding.dimred.optimize_latent_dim(X, y, model, dims, cv=3, scoring='neg_mean_squared_error')

Identify the optimal latent dimensionality via cross validation.

Parameters:

Xarray_like of shape (n_observations, p): Data matrix
yarray_like of shape (n_observations, q): Targets
modelestimator, must take an n_components argument: An sklearn model to try out
dimsarray_like: List of integer dimensionalities to try out
cvint, default=3: Cross validation fold, can be int or KFold, etc.
scoringstr | callable, default=”neg_mean_squared_error”: Metric to use to assess prediction quality in latent space

Returns:

d_optint: Latent dimensionality that maximizes cross-validated score
scoresarray_like of shape (len(dims),): Cross-validated score at each dimensionality

PopulationCoding.dimred.plot_component_images(components, nplot=9, **kwargs)

After applying matrix decomposition to a set of images or other 2D data, plots the first nplot components.

Parameters:

componentsarray_like of shape (N, W, ncomp): Matrix of weights to be plotted
nplotint, default=9: Number of components to plot
kwargspassed to plt.subplots

Returns:

figFigure

PopulationCoding.info module

Information metrics for neuronal populations.

PopulationCoding.info.conditional_cov(covX, covXY, covY=None): Computes conditional covariance, given single and joint covariance matrices.

PopulationCoding.info.dprimesq(X1, X2, shuffle=False, diagonal=False)

Calculates the squared sensitivity index (d’) between two distributions. Modeled after Rumyantsev et al. 2021 [1].

Parameters:

X1array_like of shape (n, t1_observations)
X2array_like of shape (n, t2_observations)
shufflebool, default=False: Whether to shuffle observations
diagonalbool, default=False: Whether to assume diagonal covariance matrix

Returns:

d2float: Value of squared sensitivity index
covarray_like of shape (n, n): Covariance matrix

References

[1]

Rumyantsev, O. I., Lecoq, J. A., Hernandez, O., Zhang, Y., Savall, J., Chrapkiewicz, R., … & Schnitzer, M. J. (2020). Fundamental bounds on the fidelity of sensory cortical coding. Nature, 580(7801), 100-105. https://doi.org/10.1038/s41586-020-2130-2.

PopulationCoding.info.entropy(cov): Computes entropy under the Gaussian assumption, given a covariance matrix.

PopulationCoding.info.mutual_information(covX, covXY, covY): Computes mutual information under the Gaussian assumption. Mutual information is given as the estimated reduction of uncertainty (entropy) about system X given system Y.

PopulationCoding.predict module

Fit predictive models between multi-dimensionsal datasets.

PopulationCoding.predict.canonical_cov(Y, X, lam, npc=512, **kwargs)

Canonical covariance analysis to predict Y from X, based on originally CanonCor2 by Stringer et al. 2019.

After fitting a “sort of” canonical covariance analysis between two sets of data X and Y. The approximation of Y based on n projections is given by a[:, :n] @ b[:, :n].T @ X.T.

Parameters:

Yarray_like of shape (m, t): Data matrix
Xarray_like of shape (n, t): Data matrix
lamfloat: Regularization parameter
npcint, default=512: Number of projections to consider

Returns:

aarray_like of shape (m, npc): Projections of Y
barray_like of shape (n, npc): Projections of X
R2array_like of shape (npc,): Proportion of total variance of Y explained by each projection
varray_like of shape (n, npc): Actual value of each linear combination of X

PopulationCoding.utils module

Miscellaneous tools.

PopulationCoding.utils.bin2d(x, tbin, idim=0): Bins data.

PopulationCoding.utils.classify_continuous(y, labels): Take continuous input and classify it into the closest label.

PopulationCoding.utils.generate_gaussian_data(A, T=10000, mean_subtract=True)

Generates auto-regressive Gaussian test data given a connectivity matrix.

Parameters:

Aarray_like of shape (N, N): Desired connectivity matrix
Tint, default=10**4: Number of time points to simulate
mean_subtractbool, default=True: Whether or not to mean subtract the time series

Returns:

Xarray_like of shape (N, T): Timeseries data
covarray_like of shape (N, N): Empirical covariance matrix

PopulationCoding.utils.get_consecutive_chunks(t): Finds chunks of consecutive integers in a vector t.

PopulationCoding.utils.logdet(X): Computes the log of the determinant.

PopulationCoding.utils.train_test_split_idx(N, train_frac=0.5, time=False, interleave=0)

PopulationCoding package

Submodules

PopulationCoding.corr module

PopulationCoding.dimred module

PopulationCoding.info module

PopulationCoding.predict module

PopulationCoding.utils module

Module contents