PopulationCoding package
Submodules
PopulationCoding.corr module
Measuring and identifying significant correlations.
- PopulationCoding.corr.corr(A, B)
Calculates the correlation matrix between variables in two arrays.
- Parameters:
- Aarray_like of shape (m_features, t_observations)
- Barray_like of shape (n_features, t_observations)
- Returns:
- Cndarray of shape (m_features, n_features)
Correlation matrix
- PopulationCoding.corr.sig_stim_corr(X, y, thresh=3, nshuff=100, random=False)
Identify timeseries in X that are significantly correlated with another “stimulus” y.
- Parameters:
- Xarray_like of shape (n_features, t_observations)
Timeseries data
- yarray_like of shape (1, t_observations)
Stimulus vector
- threshfloat, default=3
Number of standard deviations above mean required for significance
- nshuffint, default=100
Number of shuffling tests
- randombool, default=False
If false, shuffling is done by circular permutations of y. If true, shuffling is done by creating new shuffles of repeated stimuli in y (assuming y is 0 or a value)
- Returns:
- Cndarray of shape (n_features,)
Correlations of each feature with the stimulus y
- idx_sigtuple
Indices of significantly correlated variables in X, as a tuple where idx_sig[0] are positively-correlated and idx_sig[1] negatively
- c_threshtuple
Thresholds (upper, lower) on correlation
- C_shuffndarray of shape (n_features, nshuff)
Shuffled correlations
PopulationCoding.dimred module
Dimensionality reduction tools!
- PopulationCoding.dimred.SVCA(X, ntrain=None, ntest=None, itrain=None, itest=None, n_randomized=None, shuffle=False, flip_traintest=False, prePCA=True, **kwargs)
Shared Variance Component Analysis (SVCA), originally described by Stringer et al. 2019 [1]. SVCA is essentially a cross-validated canonical covariance analysis between two sets of neuronal populations. New features here include shuffling approaches and prePCA.
- Parameters:
- Ffarray_like of shape (N, T)
Neural data matrix
- n_randomizedint, optional
Number of SVCs to compute (if not None, uses randomized SVD approximation)
- ntrainarray_like, optional
Indices of first neural subset (note this is not truly “training” data, as ntrain and ntest are both used to train and test SVCA)
- ntestarray_like, optional
Indices of second neural subset
- itrainarray_like, optional
Indices of training timepoints
- itestarray_like, optional
Indices of test timepoints
- shufflebool, default=False
Whether to shuffle by circularly permuting each neuron’s timeseries (not recommended whenever # neurons > # timepoints!)
- flip_traintest: bool, default=False
Whether to shuffle by swapping the train and test timepoints for one neural subset (recommended if not using the session permutation method [2])
- prePCA: bool, default=False
Whether to peform PCA on each subset independently before computing SVCs. This is very useful when len(ntrain)xlen(ntest) covariance matrix does not fit into local memory and # neurons >> # timepoints. All PCs are kept, such that the resulting SVCA decomposition is mathematically equivalent to running SVCA on the original data matrix
- Returns:
- sneurndarray
Shared variance of each covariance component
- vneurndarray
Total variance of each covariance component
- undarray
Left eigenvectors of covariance matrix between ntrain and ntest during itrain timepoints
- vndarray
Right eigenvectors of covariance matrix between ntrain and ntest during itrain timepoints
- pcaNone | dict
If prePCA=True, return a dictionary containing the projections and principal components for both neural sets with the keys train_projs, test_projs, train_vecs, test_vecs
References
[1]Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C.B., Carandini, M., & Harris, K.D. (2019). Spontaneous behaviors drive multidimensional, brainwide activity. Science, 364(6437), https://doi.org/10.1126/science.aav7893.
[2]Harris, K. D. (2020). Nonsense correlations in neuroscience. bioRxiv, 2020-11. https://doi.org/10.1101/2020.11.29.402719.
- PopulationCoding.dimred.estimate_id_twonn(X, plot=False, X_is_dist=False)
TWO-NN method for estimating intrinsic dimensionality as described by Facco et al. 2017 [1]. This implementation is taken from https://github.com/jmmanley/two-nn-dimensionality-estimator.
- Parameters:
- Xarray_like of shape (N, p)
Matrix of N p-dimensional samples (when X_is_dist=False)
- plotbool, default=False
Boolean flag of whether to plot fit
- X_is_distbool, default=False
Boolean flag of whether X is an (N, N) distance matrix instead
- Returns:
- dTWO-NN estimate of intrinsic dimensionality
References
[1]Facco, E., d’Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific reports, 7(1), 12140. https://doi.org/10.1038/s41598-017-11873-y.
- PopulationCoding.dimred.lda(data, labels, d=None, classes=None)
Linear discriminant analysis (LDA)
LDA is a generalization of Fisher’s linear discriminant that finds a projection that minimizes the Fisher-Rao discriminant among multiple classes. This technique requires continuous input data and a priori known classes.
ASSUMPTIONS:
For those familiar with MANOVA, the same assumptions apply here.
Independent variables are all normal across same levels of grouping variables (multivariate normality).
Covariances are equal across classes (homoscedasticity).
Samples are chosen independently.
Predictive power may decrease with increased correlations among predictor variables (multicollinearity).
For a more thorough overview of LDA, consult e.g. McLachlan, 2005 [1].
- Parameters:
- dataarray_like of shape (n, m)
nxm data matrix, where columns represent m features and rows represent n samples
- labelsarray_like of shape (n, )
Vector of class labels for the given data
- dint, optional
Desired dimensionality after projection. d <= cardinality(labels)-1 if None, d = cardinality(labels)-1
- classesarray_like of shape (cardinality(labels),), optional
Class names (as in labels)
- Returns:
- Warray_like of shape (m, d)
Projection matrix to reduced dimensional space, spanned by the top cardinality(labels)-1 generalized eigenvectors of S_b and S_w
- proj_dataarray_like of shape (n, d)
Data after projection under W
- valsarray_like of shape (d, )
Eigenvalues corresponding to eigenvectors in columns of W
- mu_carray_like of shape (d, cardinality(classes))
Matrix where each column is the class mean after projection
- S_carray_like of shape (d, d, cardinality(classes))
Covariance matrices after projection
References
[1]McLachlan, G. J. (2005). Discriminant analysis and statistical pattern recognition. John Wiley & Sons.
- PopulationCoding.dimred.optimize_latent_dim(X, y, model, dims, cv=3, scoring='neg_mean_squared_error')
Identify the optimal latent dimensionality via cross validation.
- Parameters:
- Xarray_like of shape (n_observations, p)
Data matrix
- yarray_like of shape (n_observations, q)
Targets
- modelestimator, must take an n_components argument
An sklearn model to try out
- dimsarray_like
List of integer dimensionalities to try out
- cvint, default=3
Cross validation fold, can be int or KFold, etc.
- scoringstr | callable, default=”neg_mean_squared_error”
Metric to use to assess prediction quality in latent space
- Returns:
- d_optint
Latent dimensionality that maximizes cross-validated score
- scoresarray_like of shape (len(dims),)
Cross-validated score at each dimensionality
- PopulationCoding.dimred.plot_component_images(components, nplot=9, **kwargs)
After applying matrix decomposition to a set of images or other 2D data, plots the first nplot components.
- Parameters:
- componentsarray_like of shape (N, W, ncomp)
Matrix of weights to be plotted
- nplotint, default=9
Number of components to plot
- kwargspassed to plt.subplots
- Returns:
- figFigure
PopulationCoding.info module
Information metrics for neuronal populations.
- PopulationCoding.info.conditional_cov(covX, covXY, covY=None)
Computes conditional covariance, given single and joint covariance matrices.
- PopulationCoding.info.dprimesq(X1, X2, shuffle=False, diagonal=False)
Calculates the squared sensitivity index (d’) between two distributions. Modeled after Rumyantsev et al. 2021 [1].
- Parameters:
- X1array_like of shape (n, t1_observations)
- X2array_like of shape (n, t2_observations)
- shufflebool, default=False
Whether to shuffle observations
- diagonalbool, default=False
Whether to assume diagonal covariance matrix
- Returns:
- d2float
Value of squared sensitivity index
- covarray_like of shape (n, n)
Covariance matrix
References
[1]Rumyantsev, O. I., Lecoq, J. A., Hernandez, O., Zhang, Y., Savall, J., Chrapkiewicz, R., … & Schnitzer, M. J. (2020). Fundamental bounds on the fidelity of sensory cortical coding. Nature, 580(7801), 100-105. https://doi.org/10.1038/s41586-020-2130-2.
- PopulationCoding.info.entropy(cov)
Computes entropy under the Gaussian assumption, given a covariance matrix.
- PopulationCoding.info.mutual_information(covX, covXY, covY)
Computes mutual information under the Gaussian assumption. Mutual information is given as the estimated reduction of uncertainty (entropy) about system X given system Y.
PopulationCoding.predict module
Fit predictive models between multi-dimensionsal datasets.
- PopulationCoding.predict.canonical_cov(Y, X, lam, npc=512, **kwargs)
Canonical covariance analysis to predict Y from X, based on originally CanonCor2 by Stringer et al. 2019.
After fitting a “sort of” canonical covariance analysis between two sets of data X and Y. The approximation of Y based on n projections is given by
a[:, :n] @ b[:, :n].T @ X.T
.- Parameters:
- Yarray_like of shape (m, t)
Data matrix
- Xarray_like of shape (n, t)
Data matrix
- lamfloat
Regularization parameter
- npcint, default=512
Number of projections to consider
- Returns:
- aarray_like of shape (m, npc)
Projections of Y
- barray_like of shape (n, npc)
Projections of X
- R2array_like of shape (npc,)
Proportion of total variance of Y explained by each projection
- varray_like of shape (n, npc)
Actual value of each linear combination of X
PopulationCoding.utils module
Miscellaneous tools.
- PopulationCoding.utils.bin2d(x, tbin, idim=0)
Bins data.
- PopulationCoding.utils.classify_continuous(y, labels)
Take continuous input and classify it into the closest label.
- PopulationCoding.utils.generate_gaussian_data(A, T=10000, mean_subtract=True)
Generates auto-regressive Gaussian test data given a connectivity matrix.
- Parameters:
- Aarray_like of shape (N, N)
Desired connectivity matrix
- Tint, default=10**4
Number of time points to simulate
- mean_subtractbool, default=True
Whether or not to mean subtract the time series
- Returns:
- Xarray_like of shape (N, T)
Timeseries data
- covarray_like of shape (N, N)
Empirical covariance matrix
- PopulationCoding.utils.get_consecutive_chunks(t)
Finds chunks of consecutive integers in a vector t.
- PopulationCoding.utils.logdet(X)
Computes the log of the determinant.
- PopulationCoding.utils.train_test_split_idx(N, train_frac=0.5, time=False, interleave=0)