medigan package

Submodules

medigan.config_manager module

Config manager class that downloads, ingests, parses, and prepares the config information for all models.

class medigan.config_manager.ConfigManager(config_dict: Optional[dict] = None, is_new_download_forced: bool = False)[source]

Bases: object

ConfigManager class: Downloads, loads and parses medigan’s config json as dictionary.

Parameters

config_dict (dict) – Optionally provides the config dictionary if already loaded and parsed in a previous process.
is_new_download_forced (bool) – Flags, if True, that a new config file should be downloaded from the config link instead of parsing an existing file.

config_dict

Optionally provides the config dictionary if already loaded and parsed in a previous process.

Type: dict

is_new_download_forced

Flags, if True, that a new config file should be downloaded from the config link instead of parsing an existing file.

Type: bool

model_ids

Lists the unique id’s of the generative models specified in the config_dict

Type: list

is_config_loaded

Flags if the loading and parsing of the config file was successful (True) or not (False).

Type: bool

add_model_to_config(model_id: str, metadata: dict, is_local_model: bool = True, overwrite_existing_metadata: bool = False, store_new_config: bool = True) → bool[source]

Adding or updating a model entry in the global metadata.

Parameters

model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
overwrite_existing_metadata (bool) – in case of is_local_model, flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
store_new_config (bool) – flag indicating whether the current model metadata should be stored on disk i.e. in config/

Returns

Flag indicating whether model metadata update was successfully concluded

Return type

bool

get_config_by_id(model_id: str, config_key: Optional[str] = None) → dict[source]

From config_manager, get and return the part of the config below a config_key for a specific model_id.

The key param can contain ‘.’ (dot) separations to allow for retrieval of nested config keys such as ‘execution.generator.name’

Parameters

model_id (str) – The generative model’s unique id
config_key (str) – A key of interest present in the config dict

Returns

a dictionary from the part of the config file corresponding to model_id and config_key.

Return type

dict

is_model_in_config(model_id: str) → bool[source]

Checking if a model_id is present in the global model metadata file

Parameters: model_id (str) – The generative model’s unique id
Returns: Flag indicating whether a model_id is present in global model metadata
Return type: bool

is_model_metadata_valid(model_id: str, metadata: dict, is_local_model: bool = True) → bool[source]

Checking if a model’s corresponding metadata is valid.

Specific fields in the model’s metadata are mandatory. It is asserted if these key value pairs are present.

Parameters

model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models

Returns

Flag indicating whether the specific model’s metadata format and fields are valid

Return type

bool

load_config_file(is_new_download_forced: bool = False) → bool[source]

Load a config file and return boolean flag indicating success of loading process.

If the config file is not present in medigan.CONSTANTS.CONFIG_FILE_FOLDER, it is per default downloaded from the web resource specified in medigan.CONSTANTS.CONFIG_FILE_URL.

Parameters: is_new_download_forced (bool) – Forces new download of config file even if the file has been downloaded before.
Returns: a boolean flag indicating true only if the config file was loaded successfully.
Return type: bool

match_model_id(provided_model_id: str) → bool[source]

Replacing a model_id acronym (e.g. 00005 or 5) with the unique model_id present in the model metadata

Parameters: provided_model_id (str) – The user-provided model_id that might be shorter (e.g. “00005” or “5”) than the real unique model id
Returns: If matched, returning the unique model_id present in global model metadata.
Return type: str

medigan.constants module

Global constants of the medigan library

medigan.constants.CONFIG_FILE_FOLDER = 'config': Name and extensions of config file.

medigan.constants.CONFIG_FILE_KEY_DEPENDENCIES = 'dependencies': Below the execution dict, the key under which the package link of a model is present in the config file. Note: The model packages are per convention stored on Zenodo where they retrieve a static DOI avoiding security issues due to static non-modifiable content on Zenodo. Zenodo also helps to maintain clarity of who the owners and contributors of each generative model (and its IP) in medigan are.

medigan.constants.CONFIG_FILE_KEY_DESCRIPTION = 'description': Below the selection dict, the key under which the performance dictionary of a model is nested in the config file.

medigan.constants.CONFIG_FILE_KEY_EXECUTION = 'execution': The key under which the selection dictionary of a model is nested in the config file.

medigan.constants.CONFIG_FILE_KEY_GENERATE = 'generate_method': Below the execution dict, the key under which the exact name of a model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS = 'args': Below the execution dict, the key under which an array of mandatory base arguments of any model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_BASE = 'base': Below the execution dict, the key under which a nested dict of key-value pairs of model specific custom arguments of a model’s generate() function are present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_CUSTOM = 'custom': Below the execution dict, the key under which the model_file argument value of any model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_INPUT_LATENT_VECTOR_SIZE = 'input_latent_vector_size': Below the selectoin dict, the key under which the tags (list of strings) is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_MODEL_FILE = 'model_file': Below the execution dict, the key under which the num_samples argument value of any model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_NUM_SAMPLES = 'num_samples': Below the execution dict, the key under which the output_path argument value of any model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_OUTPUT_PATH = 'output_path': Below the execution dict, the key under which the save images boolean flag argument value of any model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_SAVE_IMAGES = 'save_images': Below the execution dict, the key under which the random input_latent_vector_size argument value of model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATE_NAME = 'name': Below the execution dict, the key under which a nested dict with info on the arguments of a model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_GENERATOR = 'generator': Below the execution dict, the key under which a model’s generator’s is present in the config file.

medigan.constants.CONFIG_FILE_KEY_GENERATOR_NAME = 'name': Below the execution dict, the key under which a model’s image_size is present in the config file.

medigan.constants.CONFIG_FILE_KEY_IMAGE_SIZE = 'image_size': Below the execution dict, the key under which a model’s name is present in the config file. This is the name of the weights file!

medigan.constants.CONFIG_FILE_KEY_MODEL_EXTENSION = 'extension': Below the execution dict, the key under which the package_name of a model is present in the config file.

medigan.constants.CONFIG_FILE_KEY_MODEL_NAME = 'model_name': Below the execution dict, the key under which a nested dict with info on the model’s generate() function is present.

medigan.constants.CONFIG_FILE_KEY_PACKAGE_LINK = 'package_link': Below the execution dict, the key under which the extension of a model is present in the config file.

medigan.constants.CONFIG_FILE_KEY_PACKAGE_NAME = 'package_name': Below the execution dict, the key under which the package_name of a model is present in the config file.

medigan.constants.CONFIG_FILE_KEY_PERFORMANCE = 'performance': Below the execution dict, the key under which the dependencies dictionary of a model is nested in the config file.

medigan.constants.CONFIG_FILE_KEY_SELECTION = 'selection': The key under which the description dictionary of a model is nested in the config file.

medigan.constants.CONFIG_FILE_KEY_TAGS = 'tags': The filetype of any of the generative model’s python packages after download and before unpacking.

medigan.constants.CONFIG_FILE_NAME_AND_EXTENSION = 'global.json': The key under which the execution dictionary of a model is nested in the config file.

medigan.constants.CONFIG_FILE_URL = 'https://raw.githubusercontent.com/RichardObi/medigan/main/config/global.json': Folder path that will be created to locally store the config file.

medigan.constants.CONFIG_TEMPLATE_FILE_NAME_AND_EXTENSION = 'template.json': Download link to template.json file.

medigan.constants.CONFIG_TEMPLATE_FILE_URL = 'https://raw.githubusercontent.com/RichardObi/medigan/main/templates/template.json': Name and extensions of template of config file.

medigan.constants.DEFAULT_OUTPUT_FOLDER = 'output': The folder containing an __init__.py file is a python module.

medigan.constants.GITHUB_REPO = 'RichardObi/medigan': The assignee of the Github Issue when adding a model to medigan

medigan.constants.GITHUB_TITLE = 'Model Integration Request for medigan': The repository of the Github Issue when adding a model to medigan

medigan.constants.INIT_PY_FILE = '__init__.py': Name and extensions of template of config file.

medigan.constants.MODEL_FOLDER = 'models'

To add a model, please create pull request in this github repo.

Type: Static link to the config of medigan. Note

medigan.constants.MODEL_ID = 'model_id': The default path to a folder under which the outputs of the medigan package (i.e. generated samples) are stored.

medigan.constants.PACKAGE_EXTENSION = '.zip': The string describing a model’s unique id in medigan’s data structures.

medigan.constants.TEMPLATE_FOLDER = 'templates': The line break in the Zenodo description that appears together with the pushed model on Zenodo

medigan.constants.ZENODO_API_URL = 'https://zenodo.org/api/deposit/depositions': The HEADER for Zenodo REST API requests

medigan.constants.ZENODO_GENERIC_MODEL_DESCRIPTION = "Usage: This GAN is used as part of the medigan library. This GANs metadata is therefore stored in and retrieved from medigan's <a href='https://raw.githubusercontent.com/RichardObi/medigan/main/config/global.json'>config file</a>. medigan is an open-source Python library on <a href='https://github.com/RichardObi/medigan'>Github</a> that allows developers and researchers to easily add synthetic imaging data into their model training pipelines. medigan is documented <a href='https://readthedocs.org/projects/medigan/'>here</a> and can be used via pip install: <pre><code class='language-python'>pip install medigan</code></pre> To run this model in medigan, use the following commands. <pre> <code class='language-python'> from medigan import Generators </code></pre><pre> <code class='language-python'> generators = Generators() </code></pre><pre> <code class='language-python'> generators.generate(model_id='YOUR_MODEL_ID',num_samples=10)</code></pre> ": The REST API to interact with Zenodo

medigan.constants.ZENODO_HEADERS = {'Content-Type': 'application/json'}: The title of the Github Issue when adding a model to medigan

medigan.constants.ZENODO_LINE_BREAK = ' ': A generic description appended to model uploads that are automatically uploaded to zenodo via Zenodo API call in medigan

medigan.exceptions module

Custom exceptions to handle module specific error and facilitate bug fixes and debugging.

medigan.generators module

Base class providing user-library interaction methods for config management, and model selection and execution.

class medigan.generators.Generators(config_manager: Optional[medigan.config_manager.ConfigManager] = None, model_selector: Optional[medigan.select_model.model_selector.ModelSelector] = None, model_executors: Optional[list] = None, model_contributors: Optional[list] = None, initialize_all_models: bool = False)[source]

Bases: object

Generators class: Contains medigan’s public methods to facilitate users’ automated sample generation workflows.

Parameters

config_manager (ConfigManager) – Provides the config dictionary, based on which model_ids are retrieved and models are selected and executed
model_selector (ModelSelector) – Provides model comparison, search, and selection based on keys/values in the selection part of the config dict
model_executors (list) – List of initialized ModelExecutor instances that handle model package download, init, and sample generation
initialize_all_models (bool) – Flag indicating, if True, that one ModelExecutor for each model_id in the config dict should be initialized triggered by creation of Generators class instance. Note that, if False, the Generators class will only initialize a ModelExecutor on the fly when need be i.e. when the generate method for the respective model is called.

config_manager

Provides the config dictionary, based on which model_ids are retrieved and models are selected and executed

Type: ConfigManager

model_selector

Provides model comparison, search, and selection based on keys/values in the selection part of the config dict

Type: ModelSelector

model_executors

List of initialized ModelExecutor instances that handle model package download, init, and sample generation

Type: list

add_all_model_executors()[source]

Add ModelExecutor class instances for all models available in the config.

Return type: None

add_metadata_from_file(model_id: str, metadata_file_path: str) → dict[source]

Read and parse the metadata of a local model, identified by model_id, from a metadata file in json format.

Parameters

model_id (str) – The generative model’s unique id
metadata_file_path (str) – the path pointing to the metadata file

Returns

Returns a dict containing the contents of parsed metadata json file.

Return type

dict

add_metadata_from_input(model_id: str, model_weights_name: str, model_weights_extension: str, generate_method_name: str, dependencies: list, fill_more_fields_interactively: bool = True, output_path: str = 'config') → dict[source]

Create a metadata dict for a local model, identified by model_id, given the necessary minimum metadata contents.

Parameters

model_id (str) – The generative model’s unique id
model_weights_name (str) – the name of the checkpoint file containing the model’s weights
model_weights_extension (str) – the extension (e.g. .pt) of the checkpoint file containing the model’s weights
generate_method_name (str) – the name of the sample generation method inside the models __init__.py file
dependencies (list) – the list of dependencies that need to be installed via pip to run the model
fill_more_fields_interactively (bool) – flag indicating whether a user will be interactively asked via command line for further input to fill out missing metadata content
output_path (str) – the path where the created metadata json file will be stored

Returns

Returns a dict containing the contents of the metadata json file.

Return type

dict

add_model_contributor(model_id: str, init_py_path: Optional[str] = None) → medigan.contribute_model.model_contributor.ModelContributor[source]

Add a ModelContributor instance of this model_id to the self.model_contributors list.

Parameters

model_id (str) – The generative model’s unique id
init_py_path (str) – The path to the local model’s __init__.py file needed for importing and running this model.

Returns

ModelContributor class instance corresponding to the model_id

Return type

ModelContributor

add_model_executor(model_id: str, install_dependencies: bool = False)[source]

Add one ModelExecutor class instance corresponding to the specified model_id.

Parameters

model_id (str) – The generative model’s unique id
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.

Return type

None

add_model_to_config(model_id: str, metadata: dict, is_local_model: Optional[bool] = None, overwrite_existing_metadata: bool = False, store_new_config: bool = True) → bool[source]

Adding or updating a model entry in the global metadata.

Parameters

model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
overwrite_existing_metadata (bool) – in case of is_local_model, flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
store_new_config (bool) – flag indicating whether the current model metadata should be stored on disk i.e. in config/

Returns

Flag indicating whether model metadata update was successfully concluded

Return type

bool

contribute(model_id: str, init_py_path: str, github_access_token: str, zenodo_access_token: str, metadata_file_path: Optional[str] = None, model_weights_name: Optional[str] = None, model_weights_extension: Optional[str] = None, generate_method_name: Optional[str] = None, dependencies: Optional[list] = None, fill_more_fields_interactively: bool = True, overwrite_existing_metadata: bool = False, output_path: str = 'config', creator_name: str = 'unknown name', creator_affiliation: str = 'unknown affiliation', model_description: str = '', install_dependencies: bool = False)[source]

Implements the full model contribution workflow including model metadata generation, model test, model Zenodo upload, and medigan github issue creation.

Parameters

model_id (str) – The generative model’s unique id
init_py_path (str) – The path to the local model’s __init__.py file needed for importing and running this model.
github_access_token (str) – a personal access token linked to your github user account, used as means of authentication
zenodo_access_token (str) – a personal access token in Zenodo linked to a user account for authentication
metadata_file_path (str) – the path pointing to the metadata file
model_weights_name (str) – the name of the checkpoint file containing the model’s weights
model_weights_extension (str) – the extension (e.g. .pt) of the checkpoint file containing the model’s weights
generate_method_name (str) – the name of the sample generation method inside the models __init__.py file
dependencies (list) – the list of dependencies that need to be installed via pip to run the model
fill_more_fields_interactively (bool) – flag indicating whether a user will be interactively asked via command line for further input to fill out missing metadata content
overwrite_existing_metadata (bool) – flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
output_path (str) – the path where the created metadata json file will be stored
creator_name (str) – the creator name that will appear on the corresponding github issue
creator_affiliation (str) – the creator affiliation that will appear on the corresponding github issue
model_description (list) – the model_description that will appear on the corresponding github issue
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.

Returns

Returns the url pointing to the corresponding issue on github

Return type

str

find_matching_models_by_values(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False) → list[source]

Search for values (and keys) in model configs and return a list of each matching ModelMatchCandidate.

This function calls an identically named function in a ModelSelector instance.

Parameters

values (list) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.

Returns

a list of ModelMatchCandidate class instances each of which was successfully matched against the search values.

Return type

list

find_model_and_generate(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False, num_samples: int = 30, output_path: Optional[str] = None, is_gen_function_returned: bool = False, install_dependencies: bool = False, **kwargs)[source]

Search for values (and keys) in model configs to generate samples with the found model.

Note that the number of found models should be ==1. Else no samples will be generated and a error is logged to console.

Parameters

values (list) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
is_gen_function_returned (bool) – flag indicating whether, instead of generating samples, the sample generation function will be returned
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function

Returns

However, if is_gen_function_returned is True, it returns the internal generate function of the model.

Return type

None

find_model_executor_by_id(model_id: str) → medigan.execute_model.model_executor.ModelExecutor[source]

Find and return the ModelExecutor instance of this model_id in the self.model_executors list.

Parameters: model_id (str) – The generative model’s unique id
Returns: ModelExecutor class instance corresponding to the model_id
Return type: ModelExecutor

find_models_and_rank(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False, metric: str = 'SSIM', order: str = 'asc') → list[source]

Search for values (and keys) in model configs, rank results and return sorted list of model dicts.

This function calls an identically named function in a ModelSelector instance.

Parameters

values (list`) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
metric (str) – The key in the selection dict that corresponds to the metric of interest
order (str) – the sorting order of the ranked results. Should be either “asc” (ascending) or “desc” (descending)

Returns

a list of the searched and matched model dictionaries containing metric and model_id, sorted by metric.

Return type

list

find_models_rank_and_generate(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False, metric: str = 'SSIM', order: str = 'asc', num_samples: int = 30, output_path: Optional[str] = None, is_gen_function_returned: bool = False, install_dependencies: bool = False, **kwargs)[source]

Search for values (and keys) in model configs, rank results to generate samples with highest ranked model.

Parameters

values (list) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
metric (str) – The key in the selection dict that corresponds to the metric of interest
order (str) – the sorting order of the ranked results. Should be either “asc” (ascending) or “desc” (descending)
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
is_gen_function_returned (bool) – flag indicating whether, instead of generating samples, the sample generation function will be returned
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function

Returns

However, if is_gen_function_returned is True, it returns the internal generate function of the model.

Return type

None

generate(model_id: str, num_samples: int = 30, output_path: Optional[str] = None, save_images: bool = True, is_gen_function_returned: bool = False, install_dependencies: bool = False, **kwargs)[source]

Generate samples with the model corresponding to the model_id or return the model’s generate function.

Parameters

model_id (str) – The generative model’s unique id
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
save_images (bool) – flag indicating whether generated samples are returned (i.e. as list of numpy arrays) or rather stored in file system (i.e in output_path)
is_gen_function_returned (bool) – flag indicating whether, instead of generating samples, the sample generation function will be returned
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function

Returns

Returns images as list of numpy arrays if save_images is False. However, if is_gen_function_returned is True, it returns the internal generate function of the model.

Return type

list

get_as_torch_dataloader(dataset=None, model_id: Optional[str] = None, num_samples: int = 1000, install_dependencies: bool = False, transform=None, batch_size=None, shuffle=None, sampler=None, batch_sampler=None, num_workers=None, collate_fn=None, pin_memory=None, drop_last=None, timeout=None, worker_init_fn=None, prefetch_factor: Optional[int] = None, persistent_workers: Optional[bool] = None, pin_memory_device: Optional[str] = None, **kwargs) → torch.utils.data.dataloader.DataLoader[source]

Get torch Dataloader sampling synthetic data from medigan model.

Dataloader combines a dataset and a sampler, and provides an iterable over the given torch dataset. Dataloader is created for synthetic data for the specified medigan model. Pytorch native parameters are set to None per default. Only those params are are passed to the Dataloader() initialization function that are not None.

Parameters

dataset (Dataset) – dataset from which to load the data.
model_id – str The generative model’s unique id
num_samples – int the number of samples that will be generated
install_dependencies – bool flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function (e.g. the input path for image-to-image translation models in medigan).
transform – the torch data transformation functions to be applied to the data in the dataset.
batch_size (int, optional) – how many samples per batch to load (default: None).
shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: None).
sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified. (default: None)
batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last. (default: None)
num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: None)
collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset. (default: None)
pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below. (default: None)
drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: None)
timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: None)
worker_init_fn (callable, optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
prefetch_factor (int, optional, keyword-only arg) – Number of batches loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default: None).
persistent_workers (bool, optional) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: None)
pin_memory_device (str, optional) – the device to pin memory to if pin_memory is True (default: None).

Returns

a torch.utils.data.DataLoader object with data generated by model corresponding to inputted Dataset or model_id.

Return type

DataLoader

get_as_torch_dataset(model_id: str, num_samples: int = 100, install_dependencies: bool = False, transform=None, **kwargs) → torch.utils.data.dataset.Dataset[source]

Get synthetic data in a torch Dataset for specified medigan model.

The dataset returns a dict with keys sample (== image), labels (== condition), and mask (== segmentation mask). While key ‘sample’ is mandatory, the other key value pairs are only returned if applicable to generative model.

Parameters

model_id – str The generative model’s unique id
num_samples – int the number of samples that will be generated
install_dependencies –

bool
flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.

transform
the torch data transformation functions to be applied to the data in the dataset.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function (e.g. the input path for image-to-image translation models in medigan).

Returns

a torch.utils.data.Dataset object with data generated by model corresponding to model_id.

Return type

Dataset

get_config_by_id(model_id: str, config_key: Optional[str] = None) → dict[source]

Get and return the part of the config below a config_key for a specific model_id.

The config_key parameters can be separated by a ‘.’ (dot) to allow for retrieval of nested config keys, e.g, ‘execution.generator.name’

This function calls an identically named function in a ConfigManager instance.

Parameters

model_id (str) – The generative model’s unique id
config_key (str) – A key of interest present in the config dict

Returns

a dictionary from the part of the config file corresponding to model_id and config_key.

Return type

dict

get_generate_function(model_id: str, num_samples: int = 30, output_path: Optional[str] = None, install_dependencies: bool = False, **kwargs)[source]

Return the model’s generate function.

Relies on the self.generate function.

Parameters

model_id (str) – The generative model’s unique id
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function

Returns

The internal reusable generate function of the generative model.

Return type

function

get_model_contributor_by_id(model_id: str) → medigan.contribute_model.model_contributor.ModelContributor[source]

Find and return the ModelContributor instance of this model_id in the self.model_contributors list.

Parameters: model_id (str) – The generative model’s unique id
Returns: ModelContributor class instance corresponding to the model_id
Return type: ModelContributor

get_model_executor(model_id: str, install_dependencies: bool = False) → medigan.execute_model.model_executor.ModelExecutor[source]

Add and return the ModelExecutor instance of this model_id from the self.model_executors list.

Relies on self.add_model_executor and self.find_model_executor_by_id functions.

Parameters

model_id (str) – The generative model’s unique id
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.

Returns

ModelExecutor class instance corresponding to the model_id

Return type

ModelExecutor

get_models_by_key_value_pair(key1: str, value1: str, is_case_sensitive: bool = False) → list[source]

Get and return a list of model_id dicts that contain the specified key value pair in their selection config.

The key param can contain ‘.’ (dot) separations to allow for retrieval of nested config keys such as ‘execution.generator.name’

This function calls an identically named function in a ModelSelector instance.

Parameters

key1 (str) – The key in the selection dict
value1 (str) – The value in the selection dict that corresponds to key1
is_case_sensitive (bool) – flag to evaluate keys and values with case sensitivity if set to True

Returns

a list of the dictionaries each containing a models id and the found key-value pair in the models config

Return type

list

get_selection_criteria_by_id(model_id: str, is_model_id_removed: bool = True) → dict[source]

Get and return the selection config dict for a specific model_id.

This function calls an identically named function in a ModelSelector instance.

Parameters

model_id (str) – The generative model’s unique id
is_model_id_removed (bool) – flag to to remove the model_ids from first level of dictionary.

Returns

a dictionary corresponding to the selection config of a model

Return type

dict

get_selection_criteria_by_ids(model_ids: Optional[list] = None, are_model_ids_removed: bool = True) → list[source]

Get and return a list of selection config dicts for each of the specified model_ids.

This function calls an identically named function in a ModelSelector instance.

Parameters

model_ids (list) – A list of generative models’ unique ids or ids abbreviated as integers (e.g. 1, 2, .. 21)
are_model_ids_removed (bool) – flag to remove the model_ids from first level of dictionary.

Returns

a list of dictionaries each corresponding to the selection config of a model

Return type

list

get_selection_keys(model_id: Optional[str] = None) → list[source]

Get and return all first level keys from the selection config dict for a specific model_id.

This function calls an identically named function in a ModelSelector instance.

Parameters: model_id (str) – The generative model’s unique id
Returns: a list containing the keys as strings of the selection config of the model_id.
Return type: list

get_selection_values_for_key(key: str, model_id: Optional[str] = None) → list[source]

Get and return the value of a specified key of the selection dict in the config for a specific model_id.

The key param can contain ‘.’ (dot) separations to allow for retrieval of nested config keys such as ‘execution.generator.name’

This function calls an identically named function in a ModelSelector instance.

Parameters

key (str) – The key in the selection dict
model_id (str) – The generative model’s unique id

Returns

a list of the values that correspond to the key in the selection config of the model_id.

Return type

list

is_model_executor_already_added(model_id) → bool[source]

Check whether the ModelExecutor instance of this model_id is already in self.model_executors list.

Parameters: model_id (str) – The generative model’s unique id
Returns: indicating whether this ModelExecutor had been already previously added to self.model_executors
Return type: bool

is_model_metadata_valid(model_id: str, metadata: dict, is_local_model: bool = True) → bool[source]

Checking if a model’s corresponding metadata is valid.

Specific fields in the model’s metadata are mandatory. It is asserted if these key value pairs are present.

Parameters

model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models

Returns

Flag indicating whether the specific model’s metadata format and fields are valid

Return type

bool

list_models() → list[source]

Return the list of model_ids as strings based on config.

Return type: list

push_to_github(model_id: str, github_access_token: str, package_link: Optional[str] = None, creator_name: str = '', creator_affiliation: str = '', model_description: str = '')[source]

Upload the model’s metadata inside a github issue to the medigan github repository.

To add your model to medigan, your metadata will be reviewed on Github and added to medigan’s official model metadata

The medigan repository issues page: https://github.com/RichardObi/medigan/issues

Get your Github access token here: https://github.com/settings/tokens

Parameters

model_id (str) – The generative model’s unique id
github_access_token (str) – a personal access token linked to your github user account, used as means of authentication
package_link – a package link
creator_name (str) – the creator name that will appear on the corresponding github issue
creator_affiliation (str) – the creator affiliation that will appear on the corresponding github issue
model_description (list) – the model_description that will appear on the corresponding github issue

Returns

Returns the url pointing to the corresponding issue on github

Return type

str

push_to_zenodo(model_id: str, zenodo_access_token: str, creator_name: str = 'unknown name', creator_affiliation: str = 'unknown affiliation', model_description: str = '') → str[source]

Upload the model files as zip archive to a public Zenodo repository where the model will be persistently stored.

Get your Zenodo access token here: https://zenodo.org/account/settings/applications/tokens/new/ (Enable scopes deposit:actions and deposit:write)

Parameters

model_id (str) – The generative model’s unique id
zenodo_access_token (str) – a personal access token in Zenodo linked to a user account for authentication
creator_name (str) – the creator name that will appear on the corresponding Zenodo model upload homepage
creator_affiliation (str) – the creator affiliation that will appear on the corresponding Zenodo model upload homepage
model_description (list) – the model_description that will appear on the corresponding Zenodo model upload homepage

Returns

Returns the url pointing to the corresponding Zenodo model upload homepage

Return type

str

rank_models_by_performance(model_ids: Optional[list] = None, metric: str = 'SSIM', order: str = 'asc') → list[source]

Rank model based on a provided metric and return sorted list of model dicts.

The metric param can contain ‘.’ (dot) separations to allow for retrieval of nested metric config keys such as ‘downstream_task.CLF.accuracy’

This function calls an identically named function in a ModelSelector instance.

Parameters

model_ids (list) – only evaluate the model_ids in this list. If none, evaluate all available model_ids
metric (str) – The key in the selection dict that corresponds to the metric of interest
order (str) – the sorting order of the ranked results. Should be either “asc” (ascending) or “desc” (descending)

Returns

a list of model dictionaries containing metric and model_id, sorted by metric.

Return type

list

test_model(model_id: str, is_local_model: bool = True, overwrite_existing_metadata: bool = False, store_new_config: bool = True, num_samples: int = 3, install_dependencies: bool = False)[source]

Test if a model generates and returns a specific number of samples in the correct format

Parameters

model_id (str) – The generative model’s unique id
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
overwrite_existing_metadata (bool) – in case of is_local_model, flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
store_new_config (bool) – flag indicating whether the current model metadata should be stored on disk i.e. in config/
num_samples (int) – the number of samples that will be generated
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.

visualize(model_id: str, slider_grouper: int = 10, auto_close: bool = False, install_dependencies: bool = False) → None[source]

Initialize and run ModelVisualizer of this model_id if it is available. It allows to visualize a sample from the model’s output. UI window will pop up allowing the user to control the generation parameters (conditional and unconditional ones).

Parameters

model_id (str) – The generative model’s unique id to visualize.
slider_grouper (int) – Number of input parameters to group together within one slider.
auto_close (bool) – Flag for closing the user interface automatically after time. Used while testing.
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.

medigan.model_visualizer module

ModelVisualizer class providing visualizing corresponding model input and model output changes.

class medigan.model_visualizer.ModelVisualizer(model_executor, config: None)[source]

Bases: object

ModelVisualizer class: Visualises synthetic data through a user interface. Depending on a model, it is possible to control the input latent vector values and conditional input.

Parameters

model_executor (ModelExecutor) – The generative model’s executor object
config (dict) – The config dict containing the model metadata

model_executor

The generative model’s executor object

Type: ModelExecutor

input_latent_vector_size

Size of the latent vector used as an input for generation

Type: int

conditional

Flag for models with conditional input

Type: bool

condition

Value of the conditinal input to the model

Type: Union[int, float]

max_input_value

Absolute value used for setting latent values input range

Type: float

visualize(slider_grouper: int = 10, auto_close=False)[source]

Visualize the model’s output. This method is called by the user. It opens up a user interface with available controls.

Parameters

slider_grouper (int) – Number of input parameters to group together within one slider.
auto_close (bool) – Flag for closing the user interface automatically after time. Used while testing.

Return type

None

medigan.utils module

Utils class providing generalized reusable functions for I/O, parsing, sorting, type conversions, etc.

class medigan.utils.Utils[source]

Bases: object

Utils class containing reusable static methods.

static call_without_removable_params(my_callable, removable_param_values: list = [None], **params)[source]: call a callable without passing parameters that contain any of the removable_param_values as value.

static copy(source_path: pathlib.Path, target_path: str = './')[source]: copy a folder or file from source_path to target_path

static deep_get(base_dict: dict, key: str)[source]: Split the key by “.” to get value in nested dictionary.

static dict_to_lowercase(target_dict: dict, string_conversion: bool = True) → dict[source]

transform values and keys in dict to lowercase, optionally with string conversion of the values.

Warning: Does not convert nested dicts in the target_dict, but rather removes them from return object.

static download_file(download_link: str, path_as_string: str, file_extension: str = '.json')[source]: download a file using the requests lib and store in path_as_string

static has_more_than_n_diff_pixel_values(img: numpy.ndarray, n: int = 4) → bool[source]

This function checks whether an image contains more than n different pixel values.

This helps to differentiate between segmentation masks and actual images.

static is_file_in(folder_path: str, filename: str) → bool[source]: Checks if a file is inside a folder

static is_file_located_or_downloaded(path_as_string: str, download_if_not_found: bool = True, download_link: Optional[str] = None, is_new_download_forced: bool = False, allow_local_path_as_url: bool = True) → bool[source]: check if is file in path_as_string and optionally download the file (again).

static is_url_valid(the_url: str) → bool[source]: Checks if a url is valid using urllib.parse.urlparse

static list_to_lowercase(target_list: list) → list[source]

string conversion and lower-casing of values in list.

trade-off: String conversion for increased robustness > type failure detection

static mkdirs(path_as_string: str) → bool[source]: create folder in path_as_string if not already created.

static order_dict_by_value(dict_list, key: str, order: str = 'asc', sort_algorithm='bubbleSort') → list[source]

Sorting a list of dicts by the values of a specific key in the dict using a sorting algorithm.

This function is deprecated. You may use Python List sort() with key=lambda function instead.

static read_in_json(path_as_string) → dict[source]: read a .json file and return as dict

static split_images_and_masks_no_ordering(data: list, num_samples: int, max_nested_arrays: int = 2) → [<class 'numpy.ndarray'>, <class 'numpy.ndarray'>][source]

Extracts and separates the masks from the images if a model returns both in the same np.ndarray.

This extendable function assumes that, in data, a mask follows the image that it corresponds to or vice versa.

This function is deprecated. Please use split_images_masks_and_labels instead.

static split_images_masks_and_labels(data: list, num_samples: int, max_nested_arrays: int = 2) → [<class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>][source]

Separates the data (sample, mask, other_imaging_data, label) returned by a generative model

This functions expects a list of tuples as input data and assumes that each tuple contains sample, mask, other_imaging_data, label at index positions [0], [1], [2], and [3] respectively.

samples, masks, and imaging data are expected to be of type np.ndarray and labels of type “str”.

For example, this extendable function assumes that, in data, a mask follows the image that it corresponds to or vice versa.

static store_dict_as(dictionary, extension: str = '.json', output_path: str = 'config/', filename: str = 'metadata.json')[source]: store a Python dictionary in file system as variable filetype.

static unzip_and_return_unzipped_path(package_path: str)[source]: if not already dir, unzip an archive with Utils.unzip_archive. Return path to unzipped dir/file

static unzip_archive(source_path: pathlib.Path, target_path: str = './')[source]: unzip a .zip archive in the target_path

Module contents

medigan is a modular Python library for automating synthetic dataset generation.

medigan package

Subpackages

Submodules

medigan.config_manager module

medigan.constants module

medigan.exceptions module

medigan.generators module

medigan.model_visualizer module

medigan.utils module

Module contents