Skip to main content

File Types and File Formats

This page describes, for each file type, the configuration options and the file formatting requirements.

The file types and file formats listed here are natively (i.e., can be loaded directly without a conversion step) supported by Vitessce. To use other file formats with Vitessce, there are two options: convert to format(s) supported by Vitessce, or develop a plugin file type.

tip

If you encounter any issues, please check out our data troubleshooting page before opening an issue.

info

The JSON file definition snippets found on this page would be specified as objects in the array datasets[].files[] in the JSON view configuration.

CSV

obsFeatureMatrix.csv

An observation-by-feature matrix stored in a CSV file. Rows represent observations, columns represent features. The first column stores the observation index (unique ID for each observation). For example, the file contents might look like:

cell_idCD33MYC
cell_115.10.0
cell_20.021.4
cell_30.00.0
...,
{
"fileType": "obsFeatureMatrix.csv",
"url": "https://example.com/my_expression_matrix.csv",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
}
},
...

obsEmbedding.csv

A three-column (minimum; the file may contain extra columns) CSV file. One column stores the observation index (unique ID for each observation) and the other two store 2D embedding coordinates. The column names are configurable. For example, the file contents might look like:

cell_idUMAP_1UMAP_2
cell_11.52.7
cell_23.11.2
.........
...,
{
"fileType": "obsEmbedding.csv",
"url": "https://example.com/my_umap.csv",
"coordinationValues": {
"obsType": "cell",
"embeddingType": "UMAP"
},
"options": {
// The column containing the observation index.
"obsIndex": "cell_id",
// The two columns containing the embedding coordinates.
"obsEmbedding": ["UMAP_1", "UMAP_2"]
}
},
...

obsPoints.csv

A three-column (minimum; the file may contain extra columns) CSV file. One column stores the observation index (unique ID for each observation) and the other two store (x, y) spatial coordinates. The column names are configurable. For example, the file contents might look like:

cell_idXY
cell_11.52.7
cell_23.11.2
.........
...,
{
"fileType": "obsPoints.csv",
"url": "https://example.com/my_cell_coordinates.csv",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// The column containing the observation index.
"obsIndex": "cell_id",
// The two columns containing the (x, y) coordinates.
"obsPoints": ["X", "Y"]
}
},
...

obsSpots.csv

A three-column (minimum; the file may contain extra columns) CSV file. One column stores the observation index (unique ID for each observation) and the other two store (x, y) spatial coordinates. The column names are configurable. For example, the file contents might look like:

cell_idXY
cell_11.52.7
cell_23.11.2
.........
...,
{
"fileType": "obsSpots.csv",
"url": "https://example.com/my_cell_coordinates.csv",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// The column containing the observation index.
"obsIndex": "cell_id",
// The two columns containing the (x, y) coordinates.
"obsSpots": ["X", "Y"]
}
},
...

obsLocations.csv

A three-column (minimum; the file may contain extra columns) CSV file. One column stores the observation index (unique ID for each observation) and the other two store (x, y) spatial coordinates. The column names are configurable. For example, the file contents might look like:

cell_idXY
cell_11.52.7
cell_23.11.2
.........
...,
{
"fileType": "obsLocations.csv",
"url": "https://example.com/my_cell_coordinates.csv",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// The column containing the observation index.
"obsIndex": "cell_id",
// The two columns containing the (x, y) coordinates.
"obsLocations": ["X", "Y"]
}
},
...

obsSets.csv

Maps each observation to membership in one or more sets. Typically used to assign cells to cell type labels or cell cluster IDs. To allow for multiple groups of sets to be be specified, options takes an array.

If a group of sets is organized as a flat list, then "column" points to a column containing string labels. Alternatively, if organized as a hierarchy, then "column" can point to an array of columns, progressing from coarser to finer labels.

For example, the file contents might look like:

cell_idleidencell_type_coarsecell_type_finepred_cell_typepred_score
cell_11ImmuneB cellB cell0.81
cell_22ImmuneT cellT cell0.99
cell_32ImmuneT cellMacrophage0.21
cell_43NeuronExcitatory neuronInhibitory neuron0.25
..................
...,
{
"fileType": "obsSets.csv",
"url": "https://example.com/my_cell_set_membership.csv",
"coordinationValues": {
"obsType": "cell"
},
"options": {
"obsIndex": "cell_id",
"obsSets": [
{
"name": "Leiden Clustering",
"column": "leiden"
},
{
"name": "Cell Type Annotations",
"column": ["cell_type_coarse", "cell_type_fine"]
},
{
"name": "Predicted Cell Types",
"column": "pred_cell_type",
"scoreColumn": "pred_score"
}
]
}
},
...

obsLabels.csv

A two-column (minimum; the file may contain extra columns) CSV file. One column stores the observation index (unique ID for each observation) and the other stores string labels. The column names are configurable. For example, the file contents might look like:

cell_idalt_cell_id
cell_1ATGC
cell_2GTTA
......
...,
{
"fileType": "obsLabels.csv",
"url": "https://example.com/my_cell_barcodes.csv",
"coordinationValues": {
"obsType": "cell",
"obsLabelsType": "Alternate cell ID"
},
"options": {
// The column containing the observation index.
"obsIndex": "cell_id",
// The column containing the string labels.
"obsLabels": "alt_cell_id"
}
},
...

featureLabels.csv

A two-column (minimum; the file may contain extra columns) CSV file. One column stores the feature index (unique ID for each feature) and the other stores string labels. The column names are configurable. For example, the file contents might look like:

ensembl_gene_idgene_symbol
ENSG00000105383CD33
ENSG00000136997MYC
......
...,
{
"fileType": "featureLabels.csv",
"url": "https://example.com/my_gene_symbols.csv",
"coordinationValues": {
"featureType": "gene",
"featureLabelsType": "Gene symbol"
},
"options": {
// The column containing the feature index.
"featureIndex": "ensembl_gene_id",
// The column containing the string labels.
"featureLabels": "gene_symbol"
}
},
...

sampleSets.csv

Maps each sample to membership in one or more sets.

If a group of sets is organized as a flat list, then "column" points to a column containing string labels. Alternatively, if organized as a hierarchy, then "column" can point to an array of columns, progressing from coarser to finer labels.

For example, the file contents might look like:

donor_iddisease_state
donor_1Healthy reference
donor_2Diabetes
donor_3Diabetes
donor_4Healthy reference
......
...,
{
"fileType": "sampleSets.csv",
"url": "https://example.com/my_sample_set_membership.csv",
"coordinationValues": {
"sampleType": "donor"
},
"options": {
"sampleIndex": "donor_id",
"sampleSets": [
{
"name": "Disease state",
"column": "disease_state"
}
]
}
},
...

AnnData-Zarr

While Zarr is an efficient format for storing multidimensional arrays, it does not dictate how multiple individual arrays are organized in a larger data structure. AnnData fills this gap by defining a data structure for observation-by-feature matrices and many types of associated metadata. This works nicely for the single-cell use case: think of cells as observations (rows). AnnData objects can be saved to Zarr format.

For single-cell data visualization, we typically use the following fields of the AnnData object:

  • X: the observation-by-feature (e.g., cell-by-gene) expression matrix, stored as a 2D array
  • obs: a DataFrame where the rows match the rows of X (same number and ordering of rows in obs as rows in X)
  • var: a DataFrame where the rows match the columns of X (same number and ordering of rows in var as columns in X)
  • obsm: a dictionary storing named arrays
    • keys are strings, with the convention to begin with the prefix X_ (e.g., X_umap to store an array of UMAP coordinates)
    • values are multidimensional arrays where the rows (i.e., elements of the zeroth dimension) match the rows of X
  • layers: a dictionary storing named arrays
    • keys are strings, with the convention to begin with the prefix X_
    • values are 2D arrays with the same shape as X

To learn more, visit the AnnData documentation.

obsFeatureMatrix.anndata.zarr

An observation-by-feature matrix with observations along the obs axis (rows) and features along the var axis (columns). Typically stored in adata.X, but the "path" option allows pointing to any array within the AnnData object.

...,
{
"fileType": "obsFeatureMatrix.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
},
"options": {
// Should point to the observation-by-feature matrix
"path": "X"
}
},
...

Data types

Currently, Vitessce internally normalizes data to uint8 (sometimes abbreviated u1), to improve performance. (In the future, we hope to add the ability to perform multiple types of normalization on-the-fly within Vitessce.) If you would like full control over the normalization procedure, we recommend using the layers feature of AnnData to store a copy of adata.X that has been pre-normalized and cast to uint8, while keeping adata.X with its original dtype:

from vitessce.data_utils import to_uint8

# ...
adata.layers['X_uint8'] = to_uint8(adata.X, norm_along="global")
# ...
...,
{
"fileType": "obsFeatureMatrix.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
},
"options": {
// Should point to the observation-by-feature matrix
"path": "layers/X_uint8"
}
},
...

Sub-matrix

By default, rendering an observation-by-feature matrix in a heatmap requires fetching the entire matrix over the network which can result in a long initial load time and a large network request. There are two ways to alleviate this issue when using the obsFeatureMatrix.anndata.zarr file type:

  • Load (and therefore transfer over the network) only a subset of the matrix initially ("initialFeatureFilterPath")
  • Store a smaller matrix in an obsm array, and load that smaller matrix ("featureFilterPath")

Initialization-only filtering

import scanpy as sc

# ...
sc.pp.highly_variable_genes(adata, n_top_genes=200)
# ...
...,
{
"fileType": "obsFeatureMatrix.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
},
"options": {
// Should point to the observation-by-feature matrix
"path": "X",
// If you would like to limit the amount of data loaded
// initially (specifically in the heatmap),
// then "initialFeatureFilterPath" should point to a boolean array
// that indicates which features to load initially.
"initialFeatureFilterPath": "var/highly_variable"
}
},
...

Always filtering

import scanpy as sc

# ...
sc.pp.highly_variable_genes(adata, n_top_genes=200)
adata.obsm['X_subset'] = adata[:, adata.var['highly_variable']].X
# ...
...,
{
"fileType": "obsFeatureMatrix.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
},
"options": {
// Should point to the observation-by-feature matrix
"path": "obsm/X_subset",
// If the matrix specified in "path" is a subset of X,
// then "featureFilterPath" must point to a boolean array
// that indicates which features are contained in the subsetted matrix.
"featureFilterPath": "var/highly_variable"
}
},
...

obsEmbedding.anndata.zarr

A two-column array with entries along the obs axis. The two columns store 2D embedding coordinates. For example, the contents of adata.obsm['X_umap'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsEmbedding.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"embeddingType": "UMAP"
},
"options": {
// Should point to an array of (d1, d2) coordinate pairs, one coordinate pair per obs/cell.
"path": "obsm/X_umap",
// Dimension indices are optional. By default, [0, 1].
"dims": [0, 1]
}
},
...

obsPoints.anndata.zarr

A two-column array with entries along the obs axis. The two columns store (x, y) spatial coordinates. For example, the contents of adata.obsm['X_spatial'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsPoints.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "molecule"
},
"options": {
// Should point to an array of (x, y) coordinate pairs, one coordinate pair per obs/cell.
"path": "obs/X_spatial"
}
},
...

obsSpots.anndata.zarr

A two-column array with entries along the obs axis. The two columns store (x, y) spatial coordinates. For example, the contents of adata.obsm['X_spatial'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsSpots.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "bead"
},
"options": {
// Should point to an array of (x, y) coordinate pairs, one coordinate pair per obs/cell.
"path": "obs/X_spatial"
}
},
...

obsSets.anndata.zarr

Maps each observation to membership in one or more sets. Typically used to assign cells to cell type labels or cell cluster IDs. To allow for multiple groups of sets to be be specified, options takes an array.

If a group of sets is organized as a flat list, then "path" points to a column containing string labels. Alternatively, if organized as a hierarchy, then "path" can point to an array of columns, progressing from coarser to finer labels.

For example, the contents of adata.obs might look like:

indexleidencell_type_coarsecell_type_finepred_cell_typepred_score
cell_11ImmuneB cellB cell0.81
cell_22ImmuneT cellT cell0.99
cell_32ImmuneT cellMacrophage0.21
cell_43NeuronExcitatory neuronInhibitory neuron0.25
..................
...,
{
"fileType": "obsSets.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": [
{
"name": "Leiden Clustering",
"path": "obs/leiden"
},
{
"name": "Cell Type Annotations",
"path": ["obs/cell_type_coarse", "obs/cell_type_fine"]
},
{
"name": "Predicted Cell Types",
"path": "obs/pred_cell_type",
"scorePath": "obs/pred_score"
}
]
},
...

obsSegmentations.anndata.zarr

...,
{
"fileType": "obsSegmentations.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to an array of polygon vertices, one polygon per obs/cell.
"path": "obs/X_segmentations"
}
},
...

obsLabels.anndata.zarr

A column containing string labels along the obs axis. For example, the contents of adata.obs['alt_cell_id'] might look like:

index
cell_1 ATCGC
cell_2 TCGGC
cell_3 TTTCA
Name: alt_cell_id, dtype: object
...,
{
"fileType": "obsLabels.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"obsLabelsType": "Alternate cell ID"
},
"options": {
// Should point to a string column
"path": "obs/alt_cell_ids"
}
},
...

obsLocations.anndata.zarr

A two-column array with entries along the obs axis. The two columns store (x, y) spatial coordinates. For example, the contents of adata.obsm['X_spatial'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsLocations.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to an array of (x, y) coordinate pairs, one coordinate pair per obs/cell.
"path": "obs/X_spatial"
}
},
...

featureLabels.anndata.zarr

A column containing string labels along the var axis. For example, the contents of adata.var['gene_symbol'] might look like:

index
ENSG00000152128 TMEM163
ENSG00000153086 ACMSD
ENSG00000082258 CCNT2
ENSG00000176601 MAP3K19
ENSG00000115839 RAB3GAP1
Name: gene_symbol, dtype: object
...,
{
"fileType": "featureLabels.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"featureType": "gene",
"featureLabelsType": "Gene symbol"
},
"options": {
// Should point to a string column
"path": "var/gene_symbol"
}
},
...

sampleEdges.anndata.zarr

A column containing string labels along the obs axis, which maps observations to samples. For example, the contents of adata.obs['donor_id'] might look like:

index
cell_1 donor_1
cell_2 donor_1
cell_3 donor_2
Name: donor_id, dtype: object
...,
{
"fileType": "sampleEdges.anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"sampleType": "donor"
},
"options": {
"path": "obs/donor_id"
}
},
...

anndata.zarr

Defines an AnnData object that has been written to a Zarr store. This is a joint file type.

...,
{
"fileType": "anndata.zarr",
"url": "https://example.com/my_adata.zarr",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
},
"options": {
"obsPoints": {
// Accepts the same options as obsPoints.anndata.zarr
"path": "obsm/X_spatial"
},
"obsSpots": {
// Accepts the same options as obsSpots.anndata.zarr
"path": "obsm/X_spatial"
},
"obsSegmentations": {
// Accepts the same options as obsSegmentations.anndata.zarr
"path": "obsm/X_segmentations"
},
"obsLocations": {
// Accepts the same options as obsLocations.anndata.zarr
"path": "obsm/X_centroids"
},
"obsEmbedding": [
{
// Accepts a superset of the options from obsEmbedding.anndata.zarr
// Should point to an array of (d1, d2) coordinate pairs, one coordinate pair per obs/cell.
"path": "obsm/X_umap",
// An embeddingType must be specified to distinguish between multiple embedding arrays.
"embeddingType": "UMAP"
},
{
"path": "obsm/X_pca",
"dims": [4, 5],
"embeddingType": "PCA"
}
],
"obsLabels": [
{
// Accepts a superset of the options from obsLabels.anndata.zarr
"path": "obs/alt_cell_id",
// An obsLabelsType must be specified to distinguish between multiple label columns.
"obsLabelsType": "Alternate cell ID"
}
],
"obsSets": [
// Accepts the same options as obsSets.anndata.zarr
{
"name": "Cell Type Annotations",
"path": ["obs/cell_type_coarse", "obs/cell_type_fine"]
}
],
"obsFeatureMatrix": {
// Accepts the same options as obsFeatureMatrix.anndata.zarr
// Should point to the observation-by-feature matrix
"path": "X"
}
}
},
...

MuData-Zarr

MuData is the multi-modal analog of AnnData. A MuData object is a container data structure for multiple named AnnData objects. Like AnnData objects, MuData objects can be saved to Zarr format.

Note that MuData objects have both "global" and per-modality obs and var indices. In Vitessce, when loading an array from within a modality (i.e., with a path prefixed by mod/), the modality-specific indices will be used. In contrast, when loading a "global" array, the corresponding global indices will be used.

obsFeatureMatrix.mudata.zarr

An observation-by-feature matrix with observations along the obs axis (rows) and features along the var axis (columns). For some rna modality, this would typically be stored in mdata.mod['rna'].X, but the "path" option allows pointing to any array within the MuData object.

The data types and sub-matrix information from obsFeatureMatrix.anndata.zarr apply.

...,
{
"fileType": "obsFeatureMatrix.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell",
"featureType": "gene",
"featureValueType": "expression"
},
"options": {
// Should point to an observation-by-feature matrix
"path": "mod/rna/X"
}
},
...

obsEmbedding.mudata.zarr

A two-column array with entries along the obs axis. The two columns store 2D embedding coordinates. For example, the contents of mdata.mod['rna'].obsm['X_umap'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsEmbedding.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell",
"embeddingType": "UMAP"
},
"options": {
// Should point to an array of (d1, d2) coordinate pairs, one coordinate pair per obs/cell.
"path": "mod/rna/obsm/X_umap",
// Dimension indices are optional. By default, [0, 1].
"dims": [0, 1]
}
},
...

obsPoints.mudata.zarr

A two-column array with entries along the obs axis. The two columns store (x, y) spatial coordinates. For example, the contents of mdata.mod['rna'].obsm['X_spatial'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsPoints.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to an array of (x, y) coordinate pairs, one coordinate pair per obs/cell.
"path": "mod/rna/obs/X_spatial"
}
},
...

obsSpots.mudata.zarr

A two-column array with entries along the obs axis. The two columns store (x, y) spatial coordinates. For example, the contents of mdata.mod['rna'].obsm['X_spatial'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsSpots.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to an array of (x, y) coordinate pairs, one coordinate pair per obs/cell.
"path": "mod/rna/obs/X_spatial"
}
},
...

obsSets.mudata.zarr

Maps each observation to membership in one or more sets. Typically used to assign cells to cell type labels or cell cluster IDs. To allow for multiple groups of sets to be be specified, options takes an array.

If a group of sets is organized as a flat list, then "path" points to a column containing string labels. Alternatively, if organized as a hierarchy, then "path" can point to an array of columns, progressing from coarser to finer labels.

For example, the contents of mdata.mod['rna'].obs might look like:

indexleidencell_type_coarsecell_type_finepred_cell_typepred_score
cell_11ImmuneB cellB cell0.81
cell_22ImmuneT cellT cell0.99
cell_32ImmuneT cellMacrophage0.21
cell_43NeuronExcitatory neuronInhibitory neuron0.25
..................
...,
{
"fileType": "obsSets.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": [
{
"name": "Leiden Clustering",
"path": "mod/rna/obs/leiden"
},
{
"name": "Cell Type Annotations",
"path": ["mod/rna/obs/cell_type_coarse", "mod/rna/obs/cell_type_fine"]
},
{
"name": "Predicted Cell Types",
"path": "mod/rna/obs/pred_cell_type",
"scorePath": "mod/rna/obs/pred_score"
}
]
},
...

obsSegmentations.mudata.zarr

...,
{
"fileType": "obsSegmentations.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to an array of polygon vertices, one polygon per obs/cell.
"path": "mod/rna/obs/X_segmentations"
}
},
...

obsLabels.mudata.zarr

A column containing string labels along the obs axis. For example, the contents of mdata.mod['rna'].obs['alt_cell_id'] might look like:

index
cell_1 ATCGC
cell_2 TCGGC
cell_3 TTTCA
Name: alt_cell_id, dtype: object
...,
{
"fileType": "obsLabels.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell",
"obsLabelsType": "Alternate cell ID"
},
"options": {
// Should point to a string column
"path": "mod/rna/obs/alt_cell_ids"
}
},
...

obsLocations.mudata.zarr

A two-column array with entries along the obs axis. The two columns store (x, y) spatial coordinates. For example, the contents of mdata.mod['rna'].obsm['X_spatial'] might look like:

array([[ 3.1402664 , -7.1668797 ],
[-3.105793 , -3.2035291 ],
[ 6.1815314 , 3.4141443 ],
...,
[ 6.922351 , -6.529349 ],
[ 4.714882 , -4.027811 ],
[ 0.75445884, -4.2975116 ]], dtype=float32)
...,
{
"fileType": "obsLocations.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to an array of (x, y) coordinate pairs, one coordinate pair per obs/cell.
"path": "mod/rna/obs/X_spatial"
}
},
...

featureLabels.mudata.zarr

A column containing string labels along the var axis. For example, the contents of mdata.mod['rna'].var['gene_symbol'] might look like:

index
ENSG00000152128 TMEM163
ENSG00000153086 ACMSD
ENSG00000082258 CCNT2
ENSG00000176601 MAP3K19
ENSG00000115839 RAB3GAP1
Name: gene_symbol, dtype: object
...,
{
"fileType": "featureLabels.mudata.zarr",
"url": "https://example.com/my_mdata.zarr",
"coordinationValues": {
"featureType": "gene",
"featureLabelsType": "Gene symbol"
},
"options": {
// Should point to a string column
"path": "mod/rna/var/gene_symbol"
}
},
...

SpatialData

SpatialData is a data structure for spatial omics data. It uses a Zarr-based on-disk format which is a container for AnnData-Zarr tables and OME-Zarr images.

The on-disk format is relatively stable but still in the early days, so we will be following any changes to the on-disk representation closely so that we can support the most up-to-date version in Vitessce. For example, support for multiple tables is currently under discussion, coordinate transformations use a proposed format that is not yet incorporated into OME-NGFF, and channel metadata is temporarily using a non-NGFF property.

SpatialData defines several spatial elements: points, shapes, labels, images, and tables. These can be mapped to Vitessce data types:

SpatialData SpatialElementVitessce DataType
pointsobsPoints
shapes (circles)obsSpots
shapes (polygons)obsSegmentations
labelsobsSegmentations
imagesimage
tablesobsFeatureMatrix
obsSets
obsLabels
obsEmbedding

obsSpots.spatialdata.zarr

...,
{
"fileType": "obsSpots.spatialdata.zarr",
"url": "https://example.com/my_sdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to the table containing the circular shape coordinates.
"path": "shapes/some_region_shapes",
// Should point to the table which annotates the specified shapes. Optional.
"tablePath": "table/table",
// Region value to use for filtering the rows of the table.
"region": "some_region_shapes"
}
},
...

labels.spatialdata.zarr

SpatialData label images represent segmentation bitmasks for observations.

...,
{
"fileType": "labels.spatialdata.zarr",
"url": "https://example.com/my_sdata.zarr",
"coordinationValues": {
"fileUid": "cell-bitmask",
"obsType": "cell"
},
"options": {
// Should point to a label image.
"path": "labels/my_cell_bitmask"
}
},
...

image.spatialdata.zarr

...,
{
"fileType": "image.spatialdata.zarr",
"url": "https://example.com/my_sdata.zarr",
"coordinationValues": {
"fileUid": "histology-image"
},
"options": {
// Should point to an image.
"path": "images/some_image"
}
},
...

obsFeatureMatrix.spatialdata.zarr

...,
{
"fileType": "obsFeatureMatrix.spatialdata.zarr",
"url": "https://example.com/my_sdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Should point to a 2D array within the AnnData table object.
"path": "table/table/X"
}
},
...

obsSets.spatialdata.zarr

Maps each observation to membership in one or more sets. Typically used to assign cells to cell type labels or cell cluster IDs. To allow for multiple groups of sets to be be specified, options.obsSets takes an array.

If a group of sets is organized as a flat list, then "path" points to a column containing string labels. Alternatively, if organized as a hierarchy, then "path" can point to an array of columns, progressing from coarser to finer labels.

For example, the contents of sdata.table.obs might look like:

indexleidencell_type_coarsecell_type_finepred_cell_typepred_score
cell_11ImmuneB cellB cell0.81
cell_22ImmuneT cellT cell0.99
cell_32ImmuneT cellMacrophage0.21
cell_43NeuronExcitatory neuronInhibitory neuron0.25
..................
...,
{
"fileType": "obsSets.spatialdata.zarr",
"url": "https://example.com/my_sdata.zarr",
"coordinationValues": {
"obsType": "cell"
},
"options": {
// Array of { name, path, ... } objects specifying columns containing set values.
"obsSets": [
{
"name": "Leiden Clustering",
"path": "table/table/obs/leiden"
},
{
"name": "Cell Type Annotations",
"path": ["table/table/obs/cell_type_coarse", "table/table/obs/cell_type_fine"]
},
{
"name": "Predicted Cell Types",
"path": "table/table/obs/pred_cell_type",
"scorePath": "table/table/obs/pred_score"
}
],
// Region value to filter the rows of the table. Optional.
"region": "some_region",
// Path to table containing the index that is aligned to the set columns. Optional.
"tablePath": "table/table"
}
},
...

JSON

obsSets.json

Storage of sets of observations in a tree data structure. If this tree has a uniform height within each top-level group then it may be more straightforward to use the obsSets.csv or obsSets.anndata.zarr file types. See the JSON schema and an example for reference.

...,
{
"fileType": "obsSets.json",
"url": "https://example.com/my_cell_sets.json",
"coordinationValues": {
"obsType": "cell"
}
},
...

obsSegmentations.json

Storage of per-observation segmentation polygons, where each polygon is represented as an array of vertices. File contents might look like:

{
"cell_1": [
[6668, 26182],
[6668, 26296],
[6873, 26501],
[6932, 26501],
[6955, 26478],
[6955, 26260],
[6838, 26143],
[6707, 26143]
],
"cell_2": [
[5047, 44428],
[5047, 44553],
[5065, 44571],
[5125, 44571],
[5284, 44412],
[5284, 44368],
[5239, 44323],
[5152, 44323]
],
...
}
...,
{
"fileType": "obsSegmentations.json",
"url": "https://example.com/my_cell_segmentations.json",
"coordinationValues": {
"obsType": "cell"
}
},
...

obsSegmentations.raster.json

Points to one or more segmentation bitmasks in OME-TIFF format. See the options JSON schema for reference. Note that for this file type, the top-level "url" property is not required (URLs are specified for each image in options.images[].url instead).

...,
{
"fileType": "obsSegmentations.raster.json",
"coordinationValues": {
"obsType": "cell"
},
"options": {
"renderLayers": ["My OME-TIFF Mask"],
"schemaVersion": "0.0.2",
"images": [
{
"name": "My OME-TIFF Mask",
"url": "http://example.com/my_mask.ome.tif",
"type": "ome-tiff"
}
]
}
},
...

obsSegmentations.ome-tiff

Points to a label image ("bitmask") in OME-TIFF format. Pixel values are integers that correspond to segmented observations, with 0 representing background.

...,
{
"fileType": "obsSegmentations.ome-tiff",
"url": "https://example.com/my_cell_segmentations.ome.tif"
},
...

image.ome-tiff

Points to an image in OME-TIFF format.

If the OME-XML metadata contains PhysicalSizeX, PhysicalSizeXUnit, PhysicalSizeY, and PhysicalSizeYUnit, then the physical size will be used for scaling.

Optionally, options.coordinateTransformations can be defined, which will be interpreted according to the OME-NGFF v0.4 coordinateTransformations spec. The dimensions of these transformations must correspond to the DimensionOrder from OME-XML.

...,
{
"fileType": "image.ome-tiff",
"url": "https://example.com/my_image.ome.tif"
},
...

obsSegmentations.ome-zarr

Points to a label image ("bitmask") in OME-NGFF format. Pixel values are integers that correspond to segmented observations, with 0 representing background.

...,
{
"fileType": "obsSegmentations.ome-zarr",
"url": "https://example.com/my_cell_segmentations.ome.zarr"
},
...

image.ome-zarr

Points to an image in OME-NGFF format that has been saved to a Zarr store. See OME-NGFF data troubleshooting for more details.

Optionally, options.coordinateTransformations can be defined according to the OME-NGFF v0.4 coordinateTransformations spec. These will be applied after any coordinateTransformations contained within the Zarr store's OME-NGFF metadata.

...,
{
"fileType": "image.ome-zarr",
"url": "https://example.com/my_image.ome.zarr"
},
...

Example with coordinateTransformations provided via options:

...,
{
"fileType": "image.ome-zarr",
"url": "https://example.com/my_image.ome.zarr",
"options": {
coordinateTransformations: [
{
type: 'translation',
translation: [0, 0, 1, 1],
},
{
type: 'scale',
scale: [1, 0.5, 0.5, 0.5],
},
],
}
},
...

image.raster.json

Points to one or more images in OME-TIFF or Bioformats-Zarr format. See the options JSON schema for reference. Note that for this file type, the top-level "url" property is not required (URLs are specified for each image in options.images[].url instead).

...,
{
"fileType": "image.raster.json",
"options": {
"renderLayers": ["My OME-TIFF Image"],
"schemaVersion": "0.0.2",
"images": [
{
"name": "My OME-TIFF Image",
"url": "http://example.com/my_image.ome.tif",
"type": "ome-tiff",
"metadata": {
"transform": {
// An optional transformation matrix
// in column-major order.
"matrix": [
0.81915098, -0.57357901, 0, 3264.76514684,
0.57357502, 0.819152, 0, 556.50440621,
0, 0, 1, 0,
0, 0, 0, 1
]
}
}
}
]
}
},
...

Example with a Zarr store:

...,
{
"fileType": "image.raster.json",
"options": {
"schemaVersion": "0.0.2",
"images": [
{
"name": "My Bioformats-Zarr Image",
"url": "http://example.com/my_image.zarr",
"type": "zarr",
"metadata": {
"dimensions": [
{
"field": "channel",
"type": "nominal",
"values": [
"DAPI - Hoechst (nuclei)",
"FITC - Laminin (basement membrane)",
"Cy3 - Synaptopodin (glomerular)",
"Cy5 - THP (thick limb)"
]
},
{
"field": "y",
"type": "quantitative",
"values": null
},
{
"field": "x",
"type": "quantitative",
"values": null
}
],
"isPyramid": true,
"transform": {
"translate": {
"y": 0,
"x": 0
},
"scale": 1
}
}
}
]
}
},
...

genomic-profiles.zarr

Points to a Zarr store containing cluster-level quantitative genomic profiles.

...,
{
"fileType": "genomic-profiles.zarr",
"url": "https://example.com/my_genomic_profiles.zarr"
},
...

Other File Formats

Other file formats must be converted to one or more of the file types listed above prior to being used with Vitessce. Here we provide tips for conversion from common single-cell file formats.

AnnData as h5ad

Convert to Zarr

Use AnnData's read_h5ad function to load the file as an AnnData object, then use the .write_zarr function to convert to a Zarr store.

from anndata import read_h5ad
import zarr

adata = read_h5ad('path/to/my_dataset.h5ad')
adata.write_zarr('my_store.zarr')

Converted outputs can be used with the AnnData as Zarr family of native file types.

note

The ids in the obs part of the AnnData store must match the other data files with which you wish to coordinate outside the AnnData store. For example, if you have a bitmask that you wish to use with an AnnData store, the ids in obs need to be the very integers from each segmentation the bitmask.

Use or Store a subset of X

When the full expression matrix adata.X is large, there may be performance costs if Vitessce tries to load the full matrix for visualization, whether it be a heatmap or just loading genes to overlay on a spatial or scatterplot view. To offset this there are two things you can do:

  1. Use CSC format or chunk the zarr store efficiently (the later is recommended at the moment, see below) so that the UI remains responsive when selecting a gene to load into the client. Every time a gene is selected (or the heatmap is loaded), the client will use Zarr to fetch all the "cell x gene" information needed for rendering - however, a poor chunking strategy can result in too much data be loaded (and then not used). To remedy this, we recommend passing in the chunk_size argument to write_zarr so that the data is chunked in a manner that allows remote sources (like browsers) to fetch only the genes (and all cells) necessary for efficient display - to this end the chunk size is usually something like [num_cells, small_number] so every chunk contains all the cells, but only a few genes. That way, when you select a gene, only a small chunk of data is fetched for rendering and little is wasted. Ideally, at most one small request is made for every selection. You are welcome to try different chunking strategies as you see fit though!
  2. If only interested in a subset of the expression matrix for a heatmap, a filter (matrixGeneFilter in the view config) for the matrix can be stored as a boolean array in var. In this case, it is the highly_variable key from the sc.pp.highly_variable_genes call below. This will not alter the genes displayed in the Genes view (use geneFilter for that in the view config).
import scanpy as sc
from anndata import read_h5ad
import zarr

adata = read_h5ad('path/to/my_dataset.h5ad')

# Adds the `highly_variable` key to `var`
sc.pp.highly_variable_genes(adata, n_top_genes=200)
# If the matrix is sparse, it's best for performance to
# use non-sparse formats + chunking to keep the UI responsive.
# In the future, we should be able to use CSC sparse data natively
# and get equal performance with chunking:
# https://github.com/theislab/anndata/issues/524
# but for now, it is still not as good (although not unusable).
if isinstance(adata.X, sparse.spmatrix):
adata.X = adata.X.todense() # Or adata.X.tocsc() if you need to.
adata.write_zarr(zarr_path, [adata.shape[0], VAR_CHUNK_SIZE]) # VAR_CHUNK_SIZE should be something small like 10

Alternatively, a smaller matrix can be stored as multi-dimensional observation array in adata.obsm and used in conjunction with the geneFilter part of the view config.

sc.pp.highly_variable_genes(adata, n_top_genes=200)
adata.obsm['X_top_200_genes'] = adata[:, adata.var['highly_variable']].X.copy()
adata.write_zarr('my_store.zarr')

Converted outputs can be used with the AnnData as Zarr family of native file types. Both dense and sparse expression matrices are supported.

Loom

Convert to Zarr via AnnData

Use AnnData's read_loom function to load the Loom file as an AnnData object, then use the .write_zarr function to convert to a Zarr store.

from anndata import read_loom

adata = read_loom(
'path/to/my_dataset.loom',
obsm_names={ "tSNE": ["_tSNE_1", "_tSNE_2"], "spatial": ["X", "Y"] }
)
adata.write_zarr('my_store.zarr')

Converted outputs can be used with the AnnData as Zarr family of native file types.

Seurat

The Vitessce R package can be used to convert Seurat objects to the cells.json and cell-sets.json file types.

SnapATAC

The Vitessce Python package can be used to convert SnapATAC outputs to the genomic-profiles.zarr, cells.json, and cell-sets.json file types.

TIFF and Proprietary Image Formats

The Bio-Formats suite of tools can be used to convert from proprietary image formats to one of the open standard OME file formats supported by Vitessce.

Bio-Formats can also convert TIFF to OME-TIFF.

tip

The Data Preparation section of the Viv documentation is a helpful resource for learning about converting to OME formats.

Conversion to OME-TIFF

OME-TIFF images are supported via the image.ome-tiff file type.

Conversion to OME-NGFF

OME-NGFF images saved as Zarr stores are supported via the image.ome-zarr file type.