In this tutorial, we will visualize a 10x Genomics visium dataset from start to finish with the Vitessce web app.
We will perform data conversion using Python 3 and save our converted dataset to a Zarr-based AnnData file type.
Before proceeding with this tutorial, make sure you have installed conda, which we will be using to manage the Python environment. We will also be using http-server which can be installed with Homebrew on macOS.
First, create a new conda environment using the
conda create command.
Activate the environment so that any packages become scoped under the new environment.
conda install to install the required Python packages.
We will be using SciPy to perform hierarchical clustering along the gene axis of the cell-by-gene matrix to obtain an optimal gene ordering of genes for the heatmap visualization.
In a Python console (terminal, Jupyter notebook, etc.) or Python script, perform the following steps to access and pre-process the raw data.
First, import the Python dependencies.
Next, retrieve the AnnData object for the
Run the following functions to pre-process the data in the
These steps have been adapted from the Scanpy spatial analysis tutorial.
As part of the previous steps, we ran the
highly_variable_genes function, which updates the
highly_variable column of the
adata.var data frame, marking 300 genes with
True to denote that they are highly variable.
Based on the 300 most highly variable genes, we want to:
- compute the optimal ordering of the 300 highly variable genes (for heatmap visualization),
- re-order the columns of the smaller cell-by-gene matrix based on the optimal ordering, and
- append a smaller cell-by-gene matrix (N cells by 300 genes) to the
At this point, we have obtained the optimal ordering of the genes index.
We can run the following line to subset the AnnData object and replace our existing
adata object with a new re-ordered object.
We can append the smaller 300-gene cell-by-gene matrix under a new key
adata.obsm (with the optimal gene ordering).
Finally, we need to save the processed data as a Zarr store using the AnnData write_zarr function.
In this tutorial, we will use http-server to serve the processed Zarr store over HTTP on port 9000.
To test that the server is working as expected, try to access http://localhost:9000/V1_Human_Lymph_Node.zarr/.zgroup in a web browser.
This downloads a new file called
.zgroup to your Downloads folder.
.zgroup is opened in a text editor, it should contain the following contents:
Now that we have processed the data and started the local web server, we need to write a Vitessce configuration which specifies:
- the data that we want to visualize,
- the visualization types of interest, and
- linking of visualization parameters across views ("view coordinations").
We begin with a skeleton of a view config which lacks any visualizations or datasets.
- JS API
We want to visualize cell-level observations in scatterplots displaying the PCA and UMAP dimensionality reductions (which we computed with Scanpy earlier).
Cell positions for scatterplots and spatial plots must be defined in a file that has the
cells data type.
The file definition for the cells data should look like:
- JS API
options part of the file definition above is specific to the
anndata-cells.zarr file type, and the
mappings object tells Vitessce how to map names of dimensionality reductions to arrays in the Zarr store. The strings
"PCA" in the above example are arbitrary names that we are choosing, and these strings will appear in the Vitessce user interface. The value for
"key" must be a key to an array in the Zarr store, and the value for
"dims" must be a tuple that specifies which dimensions of the dimensionality reduction array to use for the [X, Y] axes of the scatterplot.
Before defining a scatterplot component, we need to set up the
embeddingType coordination object in the coordination space.
"Embedding type" is a coordination type because it allows us to coordinate which scatterplots should be displaying the same dimensionality reduction (e.g., PCA or UMAP).
- JS API
ET2 above are arbitrary names for the two coordination scopes.
Next, we can define the visualization components:
- JS API
Putting this all together:
- JS API