Constructing a single-cell genomic resource for 388 individuals


26K .txt PEC2_sample_metadata.txt Clinical and demographic meta-data for each sample in the brainSCOPE resource. [Data S1]
19K .xlsx PEC2_sample_mapping.xlsx Mapping of uniform IDs for each sample acorss sub-cohorts and data modalities (snRNA-Seq, snATAC-Seq, and genotype data). [Data S2]
8.1G .txt [sample]-annotated_matrix.txt.gz Expression matrices for individual samples in our cohort, arranged into sub-directories by cohort. (Note that expression matrices for ROSMAP cohort samples are available only through the AMP-AD Knowledge portal.)
710M .BED [celltype].expr.bed.gz Pseudo-bulk snRNA-Seq expression matrices for 24 cell types, listing logCPM normalized expression values for subsets of 388 individuals who pass quality control for each cell type. Data used as input for the scQTL analysis.
46K .tar PsychENCODE_scRNA_pipeline-main.tar.gz GZIP file containing code used for processing snRNA-Seq datasets. More details are available on the associated GitHub repository.
1.7G .RDS BICCN_mat.RDS.gz R Data Object contains the DLFPC cell-type annotation scheme from the BICCN Consortium.
2.1G .RDS Ma_Sestan_mat.rds.gz R Data Object contains the DLFPC cell-type annotation scheme from the Ma-Sestan study.
762K .RDS BICCN_meta_share.RDS R object containing the harmonized DLPFC cell-type annotation scheme used for analysis in the study.
3.9K .R Azimuth_mapping.R R script used to annotate cell types in an Azimuth object.
4.0K .py reconcile_annotations.py Python script used to generate the harmonized cell typing scheme from the input matrices.
817K .CSV cell_type_fraction_count_with_meta.csv Normalized cell fractions calculated from snRNA-Seq data. [Data S4]
120K .txt bisque.PEC_CMC.rev.txt Normalized cell fractions calculated from deconvolved bulk RNA-Seq data. [Data S5]
1.1K .txt cell_fraction_corr.txt Correlations between deconvolved and single-cell derived fractions per cell type. [Data S6]
33M .csv ASD_DEGcombined.csv Sets of differentially-expressed genes between control individuals and individuals with ASD for 20 cell types. [Data S7]
39M .csv Bipolar_DEGcombined.csv Sets of differentially-expressed genes between control individuals and individuals with Bipolar disorder for 19 cell types. [Data S7]
21M .csv Schizophrenia_DEGcombined.csv Sets of differentially-expressed genes between control individuals and individuals with Schizophrenia for 21 cell types. [Data S7]
17M .csv *_ASD_table.csv Different version of differentially-expressed genes between control individuals and individuals with ASD for 20 cell types, used for LNCTP model comparisons.
19M .csv *_Bipolar_disorder_table.csv Different version of differentially-expressed genes between control individuals and individuals with BPD for 19 cell types, used for LNCTP model comparisons.
40M .csv *_Schizophrenia_table.csv Different version of differentially-expressed genes between control individuals and individuals with SCZ for 21 cell types, used for LNCTP model comparisons.
1.0K .tsv IT_neuron_trajectory_sig_gene_set.tsv Overlapping set of genes across cohorts identified as significantly varying (FDR<0.05) across IT neuron trajectories in each cohort. [Table S5]

Determining regulatory elements for cell types from snATAC-seq


43M .BED [celltype].PeakCalls.bed Files contain scCREs (i.e. single-cell candidate cis-regulatory regions) identified from snATAC-Seq and snMultiome data (7 files for individual cell types & 1 file for peaks across all cell types).
67M .BIGWIG [celltype].bigwig Files contain signal tracks for chromatin accessibility derived from samples in each of three snATAC-Seq and snMultiome cohorts assessed (7 files for individual cell types & 1 file for peaks across all cell types).
14M .bed adult_bcCREs.bed List of brain candidate cis-regulatory elements (b-cCREs). These elements can also be visualized on PsychScreen.
347K .csv ukbb-all-traits-pval.csv File containing the -log(p-value) of UKBiobank GWAS summary statistic LDSC enrichment for scCREs in seven cell types and adult b-cCREs regions, indexed by trait ID. [Data S9]
625K .txt ccre.pval.table.txt File containing the -log(p-value) of UKBiobank GWAS summary statistic LDSC enrichment on adult cCREs regions, indexed by trait ID.
6.5K .csv PGC_PASS-all-traits-pval.csv File containing the -log(p-value) of PGC and PASS-derived GWAS summary statistic LDSC enrichment for scCREs in seven cell types and adult b-cCREs regions, indexed by trait ID. [Data S10]
686K .txt IDtoTraitName.txt This file contains a matrix to convert trait ID into the full UKBiobank trait name.
173K .txt cluster.trait.txt File containing a matrix that assigns each UK BioBank trait to a Human Phenotype Ontology category.
156K .csv *tf_enrich.csv Enrichments of TF binding motifs in cell-type-specific scCREs for proximal and distal regions. [Data S11]
190K .bed starrseq_enhancers_merged.bed Lists of enhancer elements validated by STARR-Seq experiments.

Measuring transcriptome and epigenome variation across the cohort at the single-cell level


951K .csv variancePartition_output.csv Summary of VariancePartition results for all genes, including total variation and variation from individual factors (sample and cell type). [Data S12]
1.2M .csv variation-partition-092623.csv Summary of VariancePartition results for all genes, including total variation and variation from individual factors (sample and cell type) plus brain region. [Data S13]
876K .txt gene_map_df_02252023.txt Data frame mapping genes to their gene families (i.e., serotonin). [Data S14]
12M .bed gene_coding_conservation_phastcons.bed Coding gene conservation calculated using phastCons.
9.4M .bed bcCRE_conservation_phastcons.bed Conservation of b-cCREs using phastCons.
34M .bed scATAC_conservation_phastcons.bed Conservation of cell-type-specific snATAC-Seq peaks using phastCons.

Determining cell-type-specific eQTLs from single-cell data


168M .dat [celltype]_sig_QTLs.dat Primary set of cell-type-specific eQTLs identified for 17 cell types, processed with a standardized/conservative pipeline.
45K .dat [celltype]_sig_eGenes.dat Lists of significant eGenes identified for 17 cell types, processed with the standardized pipeline. [Data S15]
201M .dat [celltype].SNPs.dat.gz List of all variant IDs used for scQTL analysis.
8.6M .bed [celltype].cov.100_expr_PCs.bed Sample-level covariates used for QTL calculations in each cell type (i.e., biological sex, age of death, diagnosis, cohort, genotype and expression PCs).
7.7K .TAR core_scqtl_processing_code.tar.gz GZIP file containing code used for calculating scQTLs with the standardized pipeline.
7.3M .dat [celltype]_sig_QTLs.dat Cell-type-specific eQTLS identified for 17 cell types, processed with a standard pipeline utilizing linkage disequilibrium variant selection.
168M .DAT [aggregated_celltype]__sig_QTLs.dat Cell-type-specific eQTLS identified for three PFC cell type groupings (inhibitory, excitatory, and non-neuronal cells), processed with a standard pipeline.
230M .TXT conditional.[celltype].txt eGenes for scQTLS calculated using a conditional QTL identification pipeline.
1.9M .TXT conditional_top_variants_per_signal_[celltype].txt Top variants identified for scQTLs calculated using a conditional QTL identification pipeline.
6.0M .DAT bayesian_scQTLs.dat Set of cell-type-specific QTLs processed with a Bayesian model-based QTL identification pipeline (for recovery of scQTLs in rare celltypes).
48K .TAR Bayesian_QTL_code.tar.gz GZIP file containing code used for calculating scQTLs with a Bayesian model-based pipeline.
1.2M .TSV full_dynamic_eqtl.tsv Full set of eQTLs used as input for the PME model (6225 top eQTLs identified with the pseudo-bulk approach).
797K .TSV sig_dynamic_eqtl.tsv File listing 4186 significant dynamic eQTLs identified with the PME model (SNP term only). [Data S18]
325K .TSV sig_ptime_dynamic_eqtl.tsv File listing the 1,692 significant dynamic eQTLs identified with the PME model (SNP and interaction terms). [Data S19]
21M .TSV SZBD.ptime.tsv This file lists the pseudo-time values for all tested excitatory neuron cells from samples in the SZBDMulti-Seq cohort (generated via SCALEX and Slingshot).
198M .TXT metaQTLs.txt This file lists a combined set of QTLs discovered in all analyses, including analysis type, cell type, gene and SNP identifiers, and p-values and test statistics.
288K .txt [celltype]_permuted_fdr_correct.nominal05.txt List of isoGenes in cell-type-specific isoform usage QTLs for 22 cell types (permuted beta distribution-derived p<0.05, with FDR values for each isoGene).
31M .txt [celltype]_significant_sCQTLs_p0.05.txt isoSNPs for cell-type isoform QTLs, filtered for isoGene-specific nominal p-value thresholds (permuted beta distribution-derived p<0.05 filter) in 22 cell types. [Data S16]
839K .bed starrseq_enhancers_merged_sig_qtls_fdr_0.05.bed Panel design used in mut-STARRseq by intersecting STARRseq validated enhancers with cell-type-specific eQTLs.
41M .tsv MultiomeBrain_scASE_genes_anonymized.tsv List of genes with allele-specific expression in individual cell types, with sample identifiers removed.
38M .tsv MultiomeBrain_scASE_hetSNVs_anonymized.tsv List of heterozygous SNVs conferring allele-specific expression in individual cell types, with sample identifiers removed.
4.6K .txt scQTL_disease_overlap.txt List of eGenes found in any cell type from the primary analysis that are annotated for four brain-related diseases and traits. [Data S17]

Building a gene regulatory network for each cell type


207M .txt [celltype]_GRN.txt Cell-type-specific gene regulatory networks constructed for 24 cell types.
267M .txt [celltype].txt Alternate version of the cell-type-specific gene regulatory networks used as input for the LNCTP model.
143K .CSV SCENIC_RSS.csv SCENIC-derived scores for regulons (TFs with all target genes) in 24 cell types, used as inputs for constructing the final GRNs. [Data S20]
34M .txt unified_GRN.txt Unified gene regulatory network built across all 24 cell types.
1.0M .txt [celltype].eqtl_edge.txt Cell-type-specific lists of scQTLs that overlap with enhancer/promoter elements in the GRNs. [Data S21]
9.5K .xlsx crispr_validation_results.xlsx List of GRN-predicted enhancers validated in targeted CRISPR KO experiments. [Data S22]
629B .txt inhubs.txt Lists of cell-type-specific "in-hub" genes in GRNs. [Data S23]
2.3K .txt outhubs.txt Lists of cell-type-specific "out-hub" genes in GRNs. [Data S23]
945B .txt bottlenecks.txt Lists of cell-type-specific "bottleneck" genes in GRNs. [Data S23]
5.4K .txt GO_enrichment_bottlenecks.txt Gene ontology enrichment among bottleneck genes in cell-type-specific GRNs. [Data S24]
78M .txt Unified_GRN_diffusion.txt List of diffusion scores between TFs and target genes for the unified GRN and each cell-type-specific GRN.
4.4M .csv gene_module_mappings.csv List of module memberships for genes within each cell type-specific GRN.

Constructing a cell-to-cell communication network


1.7M .txt cellchat_C2C_network_[disorder].txt Networks of ligand-receptor signaling patterns across cell types in control, schizophrenia, and bipolar disorder individuals. [Data S25]
320K .txt cellchat_C2C_network_netP_[disorder].txt Networks of ligand-receptor patterns summarized by signaling pathway across cell types in control, schizophrenia, and bipolar disorder individuals. [Data S26]
1.9G .rds SZBD-Kellis_annotated-CON_cellchat.rds.gz CellChat object containing ligand-receptor signaling patterns across cell types in control individuals.
1.6G .rds SZBD-Kellis_annotated-SZ_cellchat.rds.gz CellChat object containing ligand-receptor signaling patterns across cell types in schizophrenia individuals.
1.4G .rds SZBD-Kellis_annotated-BD_cellchat.rds.gz CellChat object containing ligand-receptor signaling patterns across cell types in bipolar disorder individuals.
14G .h5s SZBD-Kellis_annotated-BD_CON.h5seurat.gz NicheNet object containing changes in cell-to-cell networks between individuals with bipolar disorder and control individuals.
15G .h5s SZBD-Kellis_annotated-SCZ_CON.h5seurat.gz NicheNet object containing changes in cell-to-cell networks between individuals with schizophrenia and control individuals.

Assessing cell-type-specific transcriptomic and epigenetic changes in aging


39M .csv Aging_DEGcombined.csv Sets of differentially-expressed genes between older and younger control individuals for 20 cell types.
37M .csv Aging-schizophrenia_DEGcombined.csv Sets of differentially-expressed genes between older and younger individuals with schizophrenia for 20 cell types.
6.0M .txt [celltype].txt Sets of differentially-expressed genes for sample age derived from STEM analysis for 17 cell types.
6.4M .csv [celltype]_shap_summary_stratify.csv Sample age prediction gene priortization results per cell type, derived from the SHAP model.
3.8G .matrix bwaob_output_col5_[cell-type].matrix.gz Deconvolution of bulk ATAC signal to cell-type-specific signals for 7 cell types.
6.8K .txt bulk_ATAC_samples_ordered.cleaned.txt Column (sample) names for the deconvoluted signal matrices.
1.1M .csv pec_atacseq_metadata_08142021.csv Biosample metadata for bulk ATAC-Seq datasets.
1.7K .txt hybrid.admodel.auprc.txt AUPRC values for cell types and data modalities (rf.meth=methylation, rf.expr=expression) from AD model predictions. [Data S29]
9.2G .tar shap_xgboost.tar.gz TAR file containing source code and full results for gene expression prediction model for aging.

Imputing gene expression and prioritizing disease genes across cell types with an integrative model


25K .XLSX LNCTP_prioritized_genes_celltypes.xlsx Tables of genes, cell types, and network elements prioritized by the LNCTP model for each disorder, including salience, coheritability, and p-values for each gene-cell type combination. [Data S30]
17M .XLS clue_significant.xls List of perturbagens identified through CLUE.io that elicit expression profiles opposing perturbations induced by key genes prioritized in LNCTP.
2.1G .TAR lnctp.tar.gz TAR file containing Docker image of source code used to perform the LNCTP model analysis. This is also available to download from the Docker repository.
289M .TXT Metacells_Zscores_all.txt.gz Normalized MetaCell gene expression values for 7 cell types across individuals who pass quality control filters, used as input for GRNs and the LNCTP model.
1.4G .ZIP perturbed_expression.zip Zipped archive of *.csv files containing LNCTP forward perturbation results of key genes identified by LNCTP.
1.4G .ZIP perturbed_expression_reversed.zip Zipped archive of *.csv files containing LNCTP reverse perturbation results of key genes identified by LNCTP.
1.8G .ZIP perturbed_expression_background.zip Zipped archive of *.csv files containing LNCTP forward perturbation results of background genes.
1.8G .ZIP perturbed_expression_background_reversed.zip Zipped archive of *.csv files containing LNCTP reverse perturbation results of background genes.
1.4G .ZIP perturbed_expression_drugs.zip Zipped archive of *.csv files containing LNCTP forward perturbation results of drug target genes.
1.4G .ZIP perturbed_expression_drugs_reversed.zip Zipped archive of *.csv files containing LNCTP reverse perturbation results of drug target genes.
6.6G .ZIP perturbation_crispr.zip Zipped archive of *.csv files containing LNCTP perturbation results of CRISPR targets.
237M .ZIP perturbation_crispr_ref.zip Zipped archive of *.csv files containing reference expression data used in CRISPR perturbation experiments.