SpaHDmap.data.prepare_stdata¶
- SpaHDmap.data.prepare_stdata(section_name=None, st_path=None, image_path=None, adata=None, select_hvgs=True, scale_rate=1, radius=None, swap_coord=True, create_mask=True, image_type=None, color_norm=False, gene_list=None, **kwargs)[source]¶
Prepare an STData object from various data sources, with a specific loading priority.
This function orchestrates the loading and preprocessing of spatial transcriptomics data to create a unified STData object. It can handle several input formats, including a pre-saved STData object, an AnnData object, 10x Visium data directories, or separate files for expression, coordinates, and imaging.
The function follows a specific priority for loading the gene expression data:
st_path: If provided, it will first attempt to load a serialized STData object.
adata: If st_path is not given or fails, it will use a provided AnnData object.
visium_path: If adata is not provided, it will look for a 10x Visium data directory.
spot_coord_path & spot_exp_path: If none of the above are available, it will load the data from separate coordinate and expression files.
Internal processing steps include:
Data Reading: Loads data based on the priority scheme.
Gene Expression Processing: Normalizes and log-transforms the expression data. Optionally, it selects spatially variable genes (SVGs).
Image Processing: Reads the high-resolution image, creates a tissue mask, and can apply color normalization for H&E images.
Coordinate Handling: Adjusts spot coordinates based on the scale rate and can swap row/column coordinates if needed, usually it has to be performed for the 10X Visium data.
- Parameters:
section_name (
Optional[str]) – The name for the tissue section. This is a required parameter.st_path (
Optional[str]) – Path to a saved .st file to load a pre-existing STData object.image_path (
Optional[str]) – Path to the high-resolution tissue image file. Required unless loading from st_path.adata (
Optional[AnnData]) – An AnnData object containing expression data and spatial coordinates.select_hvgs (
bool) – Whether to select highly variable genes (HVGs).scale_rate (
float) – The factor by which to scale the input image and coordinates. This is always interpreted relative toimage_path. For example, if you want a target resolution of0.5 um/px,image_pathshould point to the original full-resolution image andscale_rateshould be computed against that image’s native microns-per-pixel value.radius (
Optional[float]) – The radius of the spots in the original, unscaled image. This is required when loading data from spot_coord_path and spot_exp_path.swap_coord (
bool) – Whether to swap the row and column coordinates.create_mask (
bool) – Whether to create a binary mask of the tissue from the image.image_type (
Optional[str]) – The type of imaging data, either ‘HE’ or ‘Immunofluorescence’. If None, it will be auto-detected.color_norm (
bool) – Whether to apply Reinhard color normalization. This is only applicable to H&E images.gene_list (
Optional[List[str]]) – A specific list of genes to use. If provided, select_hvgs is ignored.**kwargs –
Additional keyword arguments for different loading schemes.
visium_path (str): Path to a 10x Visium data directory.
spot_coord_path (str): Path to the spot coordinates file (e.g., .csv).
spot_exp_path (str): Path to the gene expression file (e.g., .h5).
- Returns:
A fully prepared STData object ready for analysis.
- Return type: