SpaHDmap.data.prepare_stdata

SpaHDmap.data.prepare_stdata(section_name=None, st_path=None, image_path=None, adata=None, select_hvgs=True, scale_rate=1, radius=None, swap_coord=True, create_mask=True, image_type=None, color_norm=False, gene_list=None, **kwargs)[source]

Prepare an STData object from various data sources, with a specific loading priority.

This function orchestrates the loading and preprocessing of spatial transcriptomics data to create a unified STData object. It can handle several input formats, including a pre-saved STData object, an AnnData object, 10x Visium data directories, or separate files for expression, coordinates, and imaging.

The function follows a specific priority for loading the gene expression data:

  • st_path: If provided, it will first attempt to load a serialized STData object.

  • adata: If st_path is not given or fails, it will use a provided AnnData object.

  • visium_path: If adata is not provided, it will look for a 10x Visium data directory.

  • spot_coord_path & spot_exp_path: If none of the above are available, it will load the data from separate coordinate and expression files.

Internal processing steps include:

  • Data Reading: Loads data based on the priority scheme.

  • Gene Expression Processing: Normalizes and log-transforms the expression data. Optionally, it selects spatially variable genes (SVGs).

  • Image Processing: Reads the high-resolution image, creates a tissue mask, and can apply color normalization for H&E images.

  • Coordinate Handling: Adjusts spot coordinates based on the scale rate and can swap row/column coordinates if needed, usually it has to be performed for the 10X Visium data.

Parameters:
  • section_name (Optional[str]) – The name for the tissue section. This is a required parameter.

  • st_path (Optional[str]) – Path to a saved .st file to load a pre-existing STData object.

  • image_path (Optional[str]) – Path to the high-resolution tissue image file. Required unless loading from st_path.

  • adata (Optional[AnnData]) – An AnnData object containing expression data and spatial coordinates.

  • select_hvgs (bool) – Whether to select highly variable genes (HVGs).

  • scale_rate (float) – The factor by which to scale the input image and coordinates. This is always interpreted relative to image_path. For example, if you want a target resolution of 0.5 um/px, image_path should point to the original full-resolution image and scale_rate should be computed against that image’s native microns-per-pixel value.

  • radius (Optional[float]) – The radius of the spots in the original, unscaled image. This is required when loading data from spot_coord_path and spot_exp_path.

  • swap_coord (bool) – Whether to swap the row and column coordinates.

  • create_mask (bool) – Whether to create a binary mask of the tissue from the image.

  • image_type (Optional[str]) – The type of imaging data, either ‘HE’ or ‘Immunofluorescence’. If None, it will be auto-detected.

  • color_norm (bool) – Whether to apply Reinhard color normalization. This is only applicable to H&E images.

  • gene_list (Optional[List[str]]) – A specific list of genes to use. If provided, select_hvgs is ignored.

  • **kwargs

    Additional keyword arguments for different loading schemes.

    • visium_path (str): Path to a 10x Visium data directory.

    • spot_coord_path (str): Path to the spot coordinates file (e.g., .csv).

    • spot_exp_path (str): Path to the gene expression file (e.g., .h5).

Returns:

A fully prepared STData object ready for analysis.

Return type:

STData