Stereo-seq Mouse Embryo Data Application

Here’s the main code that we used to apply ELLA to the Stereo-seq mouse embryo data. For each cell type, we conducted the analysis for every 100 genes in parallel to significantly save memory consumption and computation time.

We first ran the cell registrition for both cell types and the results were automatically saved under a output folder.

from ELLA.ELLA import model_beta, model_null, loss_ll, ELLA
ella_stereoseq = ELLA(dataset='stereoseq', max_ntanbin=4)

# load data
ella_stereoseq.load_data(data_path='input/stereoseq_data_sub_dict.pkl')

# register all cells
ella_stereoseq.register_cells()

We then further processed the data for the NHPP model fitting and the results were automatically saved under the output folder.

from ELLA.ELLA import model_beta, model_null, loss_ll, ELLA
ella_stereoseq = ELLA(dataset='stereoseq', max_ntanbin=4)

# load data
ella_stereoseq.load_data(data_path='input/stereoseq_data_sub_dict.pkl')

# load registered cells
ella_stereoseq.load_registered_cells(registered_path='output/df_registered_saved.pkl')

# prepare data for NHPP fit (r, c0, n etc.)
ella_stereoseq.nhpp_prepare() 

We next ran the NHPP fitting and again, the results would be saved under the output folder.

#!/usr/bin/env python3

import sys
cell_type = sys.argv[1]
gene_idx_begin = int(sys.argv[2])
gene_idx_end = int(sys.argv[3])
print(f'cell_type {cell_type} gene_idx_begin {gene_idx_begin} gene_idx_end {gene_idx_end}')

from ELLA.ELLA import model_beta, model_null, loss_ll, ELLA
ella_stereoseq = ELLA(dataset='stereoseq', max_ntanbin=4)

# load data
print(f'load data')
ella_stereoseq.load_data(data_path='input/stereoseq_data_sub_dict.pkl')

# load registered cells
print(f'load registered cells')
ella_stereoseq.load_registered_cells(registered_path='output/df_registered_saved.pkl')

# prepare data for NHPP fit (r, c0, n etc.)
print(f'prepare data for NHPP fit')
ella_stereoseq.load_nhpp_prepared(data_path='output/df_nhpp_prepared_saved.pkl')

# <<<<< the cell type of focus
t = cell_type
ella_stereoseq.type_list = [t]
print(ella_stereoseq.type_list)

# <<<<< choose a subset of genes of focus
gl_full = ella_stereoseq.gene_list_dict[t]
ella_stereoseq.gene_list_dict[t] = gl_full[gene_idx_begin:gene_idx_end]

# run nhpp fit
print(f'run nhpp fit')
ella_stereoseq.nhpp_fit(outfile=f'output/nhpp_fit_results_t{t}_g{gene_idx_begin}_{gene_idx_end}.pkl', ig_start=gene_idx_begin) 

Other scripts used for mRNA characteristic analysis and for plotting are shared in the github repo.