A Minimum Demo


Here’s a minimum demo to get started with ELLA.

Install ELLA

Install ELLA follows the steps in Install ELLA if you haven’t done so yet.

The script and data that will be used in this demo should have already been downloaded (while cloning the ELLA repo). You should be able to find these at your local ELLA folder:

ELLA/scripts/demo/mini_demo/
├── lightning_logs
│   └── run1
├── log
├── mini_demo_data.pkl
├── prepared_data
│   ├── cells_center.json
│   ├── cells_point_infos.json
│   ├── cells_polygon.json
│   └── training_data.jsonl
└── run_mini_demo.sh

The input data (mini_demo_data.pkl) mainly contains a dictionary of three dataframes corresponding to gene expression, cell segmentation, and nucleus segmentation (optional). The data contains 5 cells and 4 genes, and its details are explained in ELLA’s Inputs.

ELLA Anlysis

1. We first process the input data. This step includes registering cells (computing relative positions) and restructuring the data for efficient cell-type and gene-level parallelization.

python -m ella.data.prepare_data -i your_dir/ELLA/scripts/demo/mini_demo/mini_demo_data.pkl -o your_dir/ELLA/scripts/demo/mini_demo/prepared_data

The outputs including cells_center.json, cells_point_infos.json, cells_polygon.json, and training_data.jsonl.

2a. We can run one gene on a local machine by training a ELLA model based a recipe e.g. ELLA/configs/mini_demo.yaml

ella-train --config-name mini_demo

2b. We can run multiple genes in parallel using scripts e.g. run_mini_demo.sh on a computing cluster.

bash run_mini_demo.sh

Check out ELLA’s results

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

red = '#c0362c'
lightorange = '#fabc2e'
lightgreen = '#93c572'
lightblue = '#5d8aa8'
darkgray ='#545454'
colors = [red, lightorange, lightgreen, lightblue]

nr = 1
nc = 4
ss_nr = 1.7
ss_nc = 2
fig = plt.figure(figsize=(nc*ss_nc, nr*ss_nr), dpi=300)
gs = fig.add_gridspec(nr, nc,
                   width_ratios=[1]*nc,
                   height_ratios=[1]*nr)
gs.update(wspace=0.3, hspace=0.5)

res_dir = 'your_dir/ELLA/scripts/demo/mini_demo/lightning_logs/run1'
p_cauchy = []
for i in range(4):
    path = f'{res_dir}/gene_{i}_estimation_result.json'
    if os.path.exists(path):
        with open(path, "r", encoding="utf-8") as f:
            res = json.load(f)
            gene_id = res['gene_id']
            lam_est = res['lam_weighted']
            p_cauchy.append(res['p_cauchy'])
            
            ax = plt.subplot(gs[0,i])
            ax.set_title(gene_id)
            lam_std = (lam_est-np.min(lam_est))/(np.max(lam_est)-np.min(lam_est))
            ax.plot(np.linspace(0,1,len(lam_std)), lam_std, lw=2, color=colors[i])
            ax.set_xticks([0,0.5,1], [0,0.5,1])
            ax.set_yticks([0,0.5,1], [0,0.5,1])
            ax.set_xlabel('Relative Position')
            if i==0:
                ax.set_ylabel('Expression Intensity')            

Compute FDR-corrected P values

reject, p_fdr, _, _ = multipletests(p_cauchy, alpha=0.05, method='fdr_by')
print(np.sum(p_fdr<=0.05))
print(p_fdr)

Prints:

4
[2.31296463e-16 0.00000000e+00 4.62592927e-16 2.38025062e-04]

Here Slc38a2 looks like a nuclear localized genes as its estimated expression intensity is high when the relative position is near zero (corresponding to nuclear center); Col1a1 could be a nuclear edge localized gene as its expression intensity peaks around relative position 0.3; Actn1 should be a cytoplasmic localized gene as its expression intensity peak around 0.6; and Cyb5r3 should be a cell membrane localized gene as its expression intensity peaks near 1 (corresponding to the cell membrane).

More to plot: We can further plot the cells and genes to have a more intuitive sense of the localization patterns.

import alphashape

# load intput data
input_data = pd.read_pickle('your_dir/ELLA/scripts/demo/mini_demo/mini_demo_data.pkl')
cell_type = 'fibroblast'
genes = input_data['genes'][cell_type]
cells = input_data['cells'][cell_type]
expr = input_data['expr']
cell_seg = input_data['cell_seg']
nucleus_seg = input_data['nucleus_seg']

# plot
alphas = [0.6, 0.3, 0.5, 0.5]

nr = len(genes)
nc = len(cells)+1
ss_nr = 2
ss_nc = 2
fig = plt.figure(figsize=(nc*ss_nc, nr*ss_nr), dpi=300)
gs = fig.add_gridspec(nr, nc,
                      width_ratios=[1]*nc,
                      height_ratios=[1]*nr)
gs.update(wspace=0.0, hspace=0.0)

for j, g in enumerate(genes):
    for i, c in enumerate(cells):
        if i==0:
            ax = plt.subplot(gs[j,i])
            lam_est = lam_ests[j]
            lam_std = (lam_est-np.min(lam_est))/(np.max(lam_est)-np.min(lam_est)+1e-10)
            ax.plot(np.linspace(0,1,len(lam_std)), lam_std, lw=2, color=colors[j])
            ax.set_title(g)
            #ax.set_ylabel(f'{p_fdr[j]:.3f}')
            ax.set_xticks([-0.2,0.5,1.2], [-0.2,0.5,1.2])
            ax.set_yticks([-0.2,0.5,1.2], [-0.2,0.5,1.2])
            ax.spines['top'].set_visible(False)
            ax.spines['right'].set_visible(False)
            ax.spines['left'].set_visible(False)
            ax.spines['bottom'].set_visible(False)
            ax.set_xticks([])
            ax.set_yticks([])
        
        ax = plt.subplot(gs[j,i+1])

        cell_seg_c = cell_seg[cell_seg.cell==c]
        nucleus_seg_c = nucleus_seg[nucleus_seg.cell==c]
        expr_c = expr[expr.cell==c]

        # cell segmentation
        x_reduced = (cell_seg_c.x.values//10) * 10 # reduce resolution to speedup alphashape
        y_reduced = (cell_seg_c.y.values//10) * 10
        points = np.stack((x_reduced, y_reduced)).transpose()
        unique_points = np.unique(points, axis=0)
        alpha_shape_ = alphashape.alphashape(unique_points, 0.1)
        cb_x_, cb_y_ = alpha_shape_.exterior.xy
        ax.plot(cb_x_, cb_y_, 
                alpha=0.5,
                color=darkgray, lw=1, zorder=1)

        # nuclear segmentation
        x_reduced = (nucleus_seg_c.x.values//10) * 10 # reduce res to speedup alphashape
        y_reduced = (nucleus_seg_c.y.values//10) * 10
        points = np.stack((x_reduced, y_reduced)).transpose()
        unique_points = np.unique(points, axis=0)
        alpha_shape_ = alphashape.alphashape(unique_points, 0.1)
        cb_x_, cb_y_ = alpha_shape_.exterior.xy
        ax.plot(cb_x_, cb_y_, 
                alpha=0.5,
                color=darkgray, lw=1, zorder=1)

        # gene expr
        expr_c_g = expr_c[expr_c.gene==g]
        ax.scatter(expr_c_g.x,
                   expr_c_g.y,
                   s = 20,
                   edgecolor='none',
                   color=colors[j],
                   alpha=alphas[j],
                   zorder=2)

        # cell center
        xc = expr_c.centerX.iloc[0]
        yc = expr_c.centerY.iloc[0]
        ax.scatter(xc, yc, c=darkgray, marker='+',lw=1, s=60, zorder=3)

        ax.set_aspect('equal', adjustable='box')
        #ax.axis('off')
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.spines['left'].set_visible(False)
        ax.spines['bottom'].set_visible(False)
        ax.set_xticks([])
        ax.set_yticks([])
        
        if j==0:
            ax.set_title(c)