Supported File Types

This page contains details about the file types that are supported by Vitessce, both those which can be loaded natively (by specifying file URLs in the view config) and those other file types for which conversion to native file types is straightforward. Use the tabs on the page to toggle between code snippets for JSON- and JavaScript API-based view configs.

Native File Types#

Native file types are those which Vitessce can read directly from a static web server via a loader class. We welcome pull requests which implement loader classes to support additional file types natively.

AnnData as Zarr#

Once your AnnData object has been written to a Zarr store, columns and keys in the original object (such as adata.obs["leiden"] or adata.obsm["X_umap"]) become relative file paths such as obs/leiden and obsm/X_umap. In the options property for file definitions in the Vitessce view config, you must specify which columns and keys will be used for visualization using POSIX-style paths.

note

The same Zarr store URL can be used for defining multiple files in the view config, for different data types and file types.

caution

The Zarr file types are not yet compatible with the Vitessce R package because Zarr is not yet supported in R.

anndata-cells.zarr#

  • View config file definition snippet:

    ...,
    {
    "url": "http://example.com/my_store.zarr",
    "type": "cells",
    "fileType": "anndata-cells.zarr",
    "options": {
    // XY values represent spatial centroids, so values point to an array of tuples, one per observation/cell.
    "xy": "obsm/centroids",
    // Polygon values represent spatial segmentations, so values point to an array of arrays, one per observation/cell.
    "poly": "obsm/polygons",
    // Mappings define coordinates for scatterplot points -
    // the original arrays may contain more than two dimensions per observation/cell,
    // so the dims property must slice these down to tuples.
    // This allows comparing the fourth and fifth principal components, for example.
    // The key immediately under mappings must be used in the coordination scopes.
    "mappings": {
    "UMAP": {
    "key": "obsm/umap",
    "dims": [ 0, 1 ]
    },
    "PCA": {
    "key": "obsm/pca",
    "dims": [ 4, 5 ]
    }
    },
    // Factors define per-observation annotations, like clustering results, to display in the popover.
    "factors": [
    "obs/leiden"
    ]
    }
    },
    ...

anndata-cell-sets.zarr#

  • View config file definition snippet:

    {
    "url": "http://example.com/my_store.zarr",
    "type": "cell-sets",
    "fileType": "anndata-cell-sets.zarr",
    // Options defines which columns contain cell sets (clustering results) in the cell sets component.
    // The groupName is the display name and the setName is the path within the Zarr store.
    "options": [
    {
    "groupName": "Ledien",
    "setName": "obs/leiden"
    },
    {
    "groupName": "Cell Type",
    "setName": "obs/cell_type"
    },
    ]
    }

anndata-expression-matrix.zarr#

  • View config file definition snippet:

    {
    "url": "http://example.com/my_store.zarr",
    "type": "expression-matrix",
    "fileType": "anndata-expression-matrix.zarr",
    "options": {
    "matrix": "X"
    }
    }
    Note that the expression matrix file loader fetches the entire matrix and stores it in memory. If this causes performance issues, you may [add a subset of the matrix](#store-a-subset-of-x) to the Zarr store.

    The following snippet uses a Zarr store in which obsm/X_top_200_genes contains a 200-gene subset of X:

    {
    "url": "http://example.com/my_store.zarr",
    "type": "expression-matrix",
    "fileType": "anndata-expression-matrix.zarr",
    "options": {
    // Matrix provides the location of an
    // obs-by-var (cell-by-gene) matrix to load into memory.
    "matrix": "obsm/X_top_200_genes",
    // Genes filter is a boolean list which defines
    // the subset of genes contained in the matrix,
    // and must be defined if the matrix is a subset of AnnData.X
    "geneFilter": "var/highly_variable"
    }
    }

cells.json#

  • JSON schema

  • JSON schema fixture

  • Example file

  • View config file definition snippet:

    {
    "url": "http://example.com/my_cells.json",
    "type": "cells",
    "fileType": "cells.json"
    }

cell-sets.json#

  • JSON schema

  • JSON schema fixture

  • Example file

  • View config file definition snippet:

    {
    "url": "http://example.com/my_cell_sets.json",
    "type": "cell-sets",
    "fileType": "cell-sets.json"
    }

molecules.json#

  • JSON schema

  • JSON schema fixture

  • Example file

  • View config file definition snippet:

    {
    "url": "http://example.com/my_molecules.json",
    "type": "molecules",
    "fileType": "molecules.json"
    }

genes.json#

tip

The genes.json format is not very efficient from a file size perspective. For large expression matrices, we recommend using the more compact Zarr expression-matrix.zarr or anndata-expression-matrix.zarr formats.

  • JSON schema

  • JSON schema fixture

  • Example file

  • View config file definition snippet:

    {
    "url": "http://example.com/my_matrix_a.json",
    "type": "expression-matrix",
    "fileType": "genes.json"
    }

clusters.json#

note

The name clusters.json is misleading; this file type is not intended to store clustering results (see cell-sets.json for storing clustering results). clusters.json is meant to store cell-by-gene expression matrices.

tip

The clusters.json format is not very efficient from a file size perspective. For large expression matrices, we recommend using the more compact Zarr expression-matrix.zarr or anndata-expression-matrix.zarr formats.

  • JSON schema

  • JSON schema fixture

  • Example file

  • View config file definition snippet:

    {
    "url": "http://example.com/my_matrix_b.json",
    "type": "expression-matrix",
    "fileType": "clusters.json"
    }

expression-matrix.zarr#

  • View config file definition snippet:

    {
    "url": "http://example.com/my_matrix.zarr",
    "type": "expression-matrix",
    "fileType": "expression-matrix.zarr"
    }

raster.json#

  • JSON schema for options

    note

    When defining image data with the raster.json file type, the main url property is not used. Instead, image URLs may be specified in the images array in the options property.

  • View config file definition snippet with an OME-TIFF:

    {
    "type": "raster",
    "fileType": "raster.json",
    "options": {
    "schemaVersion": "0.0.2",
    "images": [
    {
    "name": "My OME-TIFF Image",
    "url": "http://example.com/my_image.ome.tif",
    "type": "ome-tiff",
    "metadata": {
    "transform": {
    // An optional transformation matrix
    // in column-major order.
    "matrix": [
    0.81915098, -0.57357901, 0, 3264.76514684,
    0.57357502, 0.819152, 0, 556.50440621,
    0, 0, 1, 0,
    0, 0, 0, 1
    ]
    }
    }
    }
    ]
    }
    }
  • View config file definition snippet with a Zarr store:

    {
    "type": "raster",
    "fileType": "raster.json",
    "options": {
    "schemaVersion": "0.0.2",
    "images": [
    {
    "name": "My Bioformats-Zarr Image",
    "url": "http://example.com/my_image.zarr",
    "type": "zarr",
    "metadata": {
    "dimensions": [
    {
    "field": "channel",
    "type": "nominal",
    "values": [
    "DAPI - Hoechst (nuclei)",
    "FITC - Laminin (basement membrane)",
    "Cy3 - Synaptopodin (glomerular)",
    "Cy5 - THP (thick limb)"
    ]
    },
    {
    "field": "y",
    "type": "quantitative",
    "values": null
    },
    {
    "field": "x",
    "type": "quantitative",
    "values": null
    }
    ],
    "isPyramid": true,
    "transform": {
    "translate": {
    "y": 0,
    "x": 0
    },
    "scale": 1
    }
    }
    }
    ]
    }
    }

neighborhoods.json#

  • JSON schema

  • JSON schema fixture

  • Example file

  • View config file definition snippet:

    {
    "url": "http://example.com/my_neighborhoods.json",
    "type": "neighborhoods",
    "fileType": "neighborhoods.json"
    }

genomic-profiles.zarr#

  • View config file definition snippet:

    {
    "url": "http://example.com/my_genomic_profiles.zarr",
    "type": "genomic-profiles",
    "fileType": "genomic-profiles.zarr"
    }

Other File Types#

Other file types must be converted to native file types prior to being used with Vitessce. Here we provide tips for conversion from common single-cell file types.

AnnData as h5ad#

Convert to Zarr#

Use AnnData's read_h5ad function to load the file as an AnnData object, then use the .write_zarr function to convert to a Zarr store.

from anndata import read_h5ad
adata = read_h5ad('path/to/my_dataset.h5ad')
adata.write_zarr('my_store.zarr')

Converted outputs can be used with the AnnData as Zarr family of native file types.

Store a subset of X#

When the full expression matrix adata.X is large, there may be performance costs if Vitessce tries to load the full matrix for visualization. If only interested in a subset of the expression matrix, a smaller matrix can be stored as multi-dimensional observation array in adata.obsm.

import scanpy as sc
from anndata import read_h5ad
adata = read_h5ad('path/to/my_dataset.h5ad')
sc.pp.highly_variable_genes(adata, n_top_genes=200)
adata.obsm['X_top_200_genes'] = adata[:, adata.var['highly_variable']].X.copy()
adata.write_zarr('my_store.zarr')

Converted outputs can be used with the AnnData as Zarr family of native file types. Both dense and sparse expression matrices are supported.

Loom#

Convert to Zarr via AnnData#

Use AnnData's read_loom function to load the Loom file as an AnnData object, then use the .write_zarr function to convert to a Zarr store.

from anndata import read_loom
adata = read_loom(
'path/to/my_dataset.loom',
obsm_names={ "tSNE": ["_tSNE_1", "_tSNE_2"], "spatial": ["X", "Y"] }
)
adata.write_zarr('my_store.zarr')

Converted outputs can be used with the AnnData as Zarr family of native file types.

Seurat#

The Vitessce R package can be used to convert Seurat objects to the cells.json and cell-sets.json file types.

SnapATAC#

The Vitessce Python package can be used to convert SnapATAC outputs to the genomic-profiles.zarr, cells.json, and cell-sets.json file types.