medigan package
Subpackages
Submodules
medigan.config_manager module
Config manager class that downloads, ingests, parses, and prepares the config information for all models.
- class medigan.config_manager.ConfigManager(config_dict: Optional[dict] = None, is_new_download_forced: bool = False)[source]
Bases:
object
ConfigManager class: Downloads, loads and parses medigan’s config json as dictionary.
- Parameters
config_dict (dict) – Optionally provides the config dictionary if already loaded and parsed in a previous process.
is_new_download_forced (bool) – Flags, if True, that a new config file should be downloaded from the config link instead of parsing an existing file.
- config_dict
Optionally provides the config dictionary if already loaded and parsed in a previous process.
- Type
dict
- is_new_download_forced
Flags, if True, that a new config file should be downloaded from the config link instead of parsing an existing file.
- Type
bool
- model_ids
Lists the unique id’s of the generative models specified in the config_dict
- Type
list
- is_config_loaded
Flags if the loading and parsing of the config file was successful (True) or not (False).
- Type
bool
- add_model_to_config(model_id: str, metadata: dict, is_local_model: bool = True, overwrite_existing_metadata: bool = False, store_new_config: bool = True) bool [source]
Adding or updating a model entry in the global metadata.
- Parameters
model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
overwrite_existing_metadata (bool) – in case of is_local_model, flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
store_new_config (bool) – flag indicating whether the current model metadata should be stored on disk i.e. in config/
- Returns
Flag indicating whether model metadata update was successfully concluded
- Return type
bool
- get_config_by_id(model_id: str, config_key: Optional[str] = None) dict [source]
From config_manager, get and return the part of the config below a config_key for a specific model_id.
The key param can contain ‘.’ (dot) separations to allow for retrieval of nested config keys such as ‘execution.generator.name’
- Parameters
model_id (str) – The generative model’s unique id
config_key (str) – A key of interest present in the config dict
- Returns
a dictionary from the part of the config file corresponding to model_id and config_key.
- Return type
dict
- is_model_in_config(model_id: str) bool [source]
Checking if a model_id is present in the global model metadata file
- Parameters
model_id (str) – The generative model’s unique id
- Returns
Flag indicating whether a model_id is present in global model metadata
- Return type
bool
- is_model_metadata_valid(model_id: str, metadata: dict, is_local_model: bool = True) bool [source]
Checking if a model’s corresponding metadata is valid.
Specific fields in the model’s metadata are mandatory. It is asserted if these key value pairs are present.
- Parameters
model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
- Returns
Flag indicating whether the specific model’s metadata format and fields are valid
- Return type
bool
- load_config_file(is_new_download_forced: bool = False) bool [source]
Load a config file and return boolean flag indicating success of loading process.
If the config file is not present in medigan.CONSTANTS.CONFIG_FILE_FOLDER, it is per default downloaded from the web resource specified in medigan.CONSTANTS.CONFIG_FILE_URL.
- Parameters
is_new_download_forced (bool) – Forces new download of config file even if the file has been downloaded before.
- Returns
a boolean flag indicating true only if the config file was loaded successfully.
- Return type
bool
- match_model_id(provided_model_id: str) bool [source]
Replacing a model_id acronym (e.g. 00005 or 5) with the unique model_id present in the model metadata
- Parameters
provided_model_id (str) – The user-provided model_id that might be shorter (e.g. “00005” or “5”) than the real unique model id
- Returns
If matched, returning the unique model_id present in global model metadata.
- Return type
str
medigan.constants module
Global constants of the medigan library
- medigan.constants.CONFIG_FILE_FOLDER = 'config'
Name and extensions of config file.
- medigan.constants.CONFIG_FILE_KEY_DEPENDENCIES = 'dependencies'
Below the execution dict, the key under which the package link of a model is present in the config file. Note: The model packages are per convention stored on Zenodo where they retrieve a static DOI avoiding security issues due to static non-modifiable content on Zenodo. Zenodo also helps to maintain clarity of who the owners and contributors of each generative model (and its IP) in medigan are.
- medigan.constants.CONFIG_FILE_KEY_DESCRIPTION = 'description'
Below the selection dict, the key under which the performance dictionary of a model is nested in the config file.
- medigan.constants.CONFIG_FILE_KEY_EXECUTION = 'execution'
The key under which the selection dictionary of a model is nested in the config file.
- medigan.constants.CONFIG_FILE_KEY_GENERATE = 'generate_method'
Below the execution dict, the key under which the exact name of a model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS = 'args'
Below the execution dict, the key under which an array of mandatory base arguments of any model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_BASE = 'base'
Below the execution dict, the key under which a nested dict of key-value pairs of model specific custom arguments of a model’s generate() function are present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_CUSTOM = 'custom'
Below the execution dict, the key under which the model_file argument value of any model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_INPUT_LATENT_VECTOR_SIZE = 'input_latent_vector_size'
Below the selectoin dict, the key under which the tags (list of strings) is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_MODEL_FILE = 'model_file'
Below the execution dict, the key under which the num_samples argument value of any model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_NUM_SAMPLES = 'num_samples'
Below the execution dict, the key under which the output_path argument value of any model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_OUTPUT_PATH = 'output_path'
Below the execution dict, the key under which the save images boolean flag argument value of any model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_ARGS_SAVE_IMAGES = 'save_images'
Below the execution dict, the key under which the random input_latent_vector_size argument value of model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATE_NAME = 'name'
Below the execution dict, the key under which a nested dict with info on the arguments of a model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_GENERATOR = 'generator'
Below the execution dict, the key under which a model’s generator’s is present in the config file.
- medigan.constants.CONFIG_FILE_KEY_GENERATOR_NAME = 'name'
Below the execution dict, the key under which a model’s image_size is present in the config file.
- medigan.constants.CONFIG_FILE_KEY_IMAGE_SIZE = 'image_size'
Below the execution dict, the key under which a model’s name is present in the config file. This is the name of the weights file!
- medigan.constants.CONFIG_FILE_KEY_MODEL_EXTENSION = 'extension'
Below the execution dict, the key under which the package_name of a model is present in the config file.
- medigan.constants.CONFIG_FILE_KEY_MODEL_NAME = 'model_name'
Below the execution dict, the key under which a nested dict with info on the model’s generate() function is present.
- medigan.constants.CONFIG_FILE_KEY_PACKAGE_LINK = 'package_link'
Below the execution dict, the key under which the extension of a model is present in the config file.
- medigan.constants.CONFIG_FILE_KEY_PACKAGE_NAME = 'package_name'
Below the execution dict, the key under which the package_name of a model is present in the config file.
- medigan.constants.CONFIG_FILE_KEY_PERFORMANCE = 'performance'
Below the execution dict, the key under which the dependencies dictionary of a model is nested in the config file.
- medigan.constants.CONFIG_FILE_KEY_SELECTION = 'selection'
The key under which the description dictionary of a model is nested in the config file.
- medigan.constants.CONFIG_FILE_KEY_TAGS = 'tags'
The filetype of any of the generative model’s python packages after download and before unpacking.
- medigan.constants.CONFIG_FILE_NAME_AND_EXTENSION = 'global.json'
The key under which the execution dictionary of a model is nested in the config file.
- medigan.constants.CONFIG_FILE_URL = 'https://raw.githubusercontent.com/RichardObi/medigan/main/config/global.json'
Folder path that will be created to locally store the config file.
- medigan.constants.CONFIG_TEMPLATE_FILE_NAME_AND_EXTENSION = 'template.json'
Download link to template.json file.
- medigan.constants.CONFIG_TEMPLATE_FILE_URL = 'https://raw.githubusercontent.com/RichardObi/medigan/main/templates/template.json'
Name and extensions of template of config file.
- medigan.constants.DEFAULT_OUTPUT_FOLDER = 'output'
The folder containing an __init__.py file is a python module.
- medigan.constants.GITHUB_REPO = 'RichardObi/medigan'
The assignee of the Github Issue when adding a model to medigan
- medigan.constants.GITHUB_TITLE = 'Model Integration Request for medigan'
The repository of the Github Issue when adding a model to medigan
- medigan.constants.INIT_PY_FILE = '__init__.py'
Name and extensions of template of config file.
- medigan.constants.MODEL_FOLDER = 'models'
To add a model, please create pull request in this github repo.
- Type
Static link to the config of medigan. Note
- medigan.constants.MODEL_ID = 'model_id'
The default path to a folder under which the outputs of the medigan package (i.e. generated samples) are stored.
- medigan.constants.PACKAGE_EXTENSION = '.zip'
The string describing a model’s unique id in medigan’s data structures.
- medigan.constants.TEMPLATE_FOLDER = 'templates'
The line break in the Zenodo description that appears together with the pushed model on Zenodo
- medigan.constants.ZENODO_API_URL = 'https://zenodo.org/api/deposit/depositions'
The HEADER for Zenodo REST API requests
- medigan.constants.ZENODO_GENERIC_MODEL_DESCRIPTION = "<p><strong>Usage:</strong></p> <p>This GAN is used as part of the <strong><em>medigan</em></strong> library. This GANs metadata is therefore stored in and retrieved from <em>medigan's</em> <a href='https://raw.githubusercontent.com/RichardObi/medigan/main/config/global.json'>config file</a>. <em>medigan </em>is an open-source Python library on <a href='https://github.com/RichardObi/medigan'>Github</a> that allows developers and researchers to easily add synthetic imaging data into their model training pipelines. <em>medigan</em> is documented <a href='https://readthedocs.org/projects/medigan/'>here</a> and can be used via pip install:</p> <pre><code class='language-python'>pip install medigan</code></pre> <p>To run this model in medigan, use the following commands.</p> <pre> <code class='language-python'> from medigan import Generators </code></pre><pre> <code class='language-python'> generators = Generators() </code></pre><pre> <code class='language-python'> generators.generate(model_id='YOUR_MODEL_ID',num_samples=10)</code></pre><p> </p>"
The REST API to interact with Zenodo
- medigan.constants.ZENODO_HEADERS = {'Content-Type': 'application/json'}
The title of the Github Issue when adding a model to medigan
- medigan.constants.ZENODO_LINE_BREAK = '<p> </p>'
A generic description appended to model uploads that are automatically uploaded to zenodo via Zenodo API call in medigan
medigan.exceptions module
Custom exceptions to handle module specific error and facilitate bug fixes and debugging.
medigan.generators module
Base class providing user-library interaction methods for config management, and model selection and execution.
- class medigan.generators.Generators(config_manager: Optional[medigan.config_manager.ConfigManager] = None, model_selector: Optional[medigan.select_model.model_selector.ModelSelector] = None, model_executors: Optional[list] = None, model_contributors: Optional[list] = None, initialize_all_models: bool = False)[source]
Bases:
object
Generators class: Contains medigan’s public methods to facilitate users’ automated sample generation workflows.
- Parameters
config_manager (ConfigManager) – Provides the config dictionary, based on which model_ids are retrieved and models are selected and executed
model_selector (ModelSelector) – Provides model comparison, search, and selection based on keys/values in the selection part of the config dict
model_executors (list) – List of initialized ModelExecutor instances that handle model package download, init, and sample generation
initialize_all_models (bool) – Flag indicating, if True, that one ModelExecutor for each model_id in the config dict should be initialized triggered by creation of Generators class instance. Note that, if False, the Generators class will only initialize a ModelExecutor on the fly when need be i.e. when the generate method for the respective model is called.
- config_manager
Provides the config dictionary, based on which model_ids are retrieved and models are selected and executed
- Type
- model_selector
Provides model comparison, search, and selection based on keys/values in the selection part of the config dict
- Type
- model_executors
List of initialized ModelExecutor instances that handle model package download, init, and sample generation
- Type
list
- add_all_model_executors()[source]
Add ModelExecutor class instances for all models available in the config.
- Return type
None
- add_metadata_from_file(model_id: str, metadata_file_path: str) dict [source]
Read and parse the metadata of a local model, identified by model_id, from a metadata file in json format.
- Parameters
model_id (str) – The generative model’s unique id
metadata_file_path (str) – the path pointing to the metadata file
- Returns
Returns a dict containing the contents of parsed metadata json file.
- Return type
dict
- add_metadata_from_input(model_id: str, model_weights_name: str, model_weights_extension: str, generate_method_name: str, dependencies: list, fill_more_fields_interactively: bool = True, output_path: str = 'config') dict [source]
Create a metadata dict for a local model, identified by model_id, given the necessary minimum metadata contents.
- Parameters
model_id (str) – The generative model’s unique id
model_weights_name (str) – the name of the checkpoint file containing the model’s weights
model_weights_extension (str) – the extension (e.g. .pt) of the checkpoint file containing the model’s weights
generate_method_name (str) – the name of the sample generation method inside the models __init__.py file
dependencies (list) – the list of dependencies that need to be installed via pip to run the model
fill_more_fields_interactively (bool) – flag indicating whether a user will be interactively asked via command line for further input to fill out missing metadata content
output_path (str) – the path where the created metadata json file will be stored
- Returns
Returns a dict containing the contents of the metadata json file.
- Return type
dict
- add_model_contributor(model_id: str, init_py_path: Optional[str] = None) medigan.contribute_model.model_contributor.ModelContributor [source]
Add a ModelContributor instance of this model_id to the self.model_contributors list.
- Parameters
model_id (str) – The generative model’s unique id
init_py_path (str) – The path to the local model’s __init__.py file needed for importing and running this model.
- Returns
ModelContributor class instance corresponding to the model_id
- Return type
- add_model_executor(model_id: str, install_dependencies: bool = False)[source]
Add one ModelExecutor class instance corresponding to the specified model_id.
- Parameters
model_id (str) – The generative model’s unique id
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
- Return type
None
- add_model_to_config(model_id: str, metadata: dict, is_local_model: Optional[bool] = None, overwrite_existing_metadata: bool = False, store_new_config: bool = True) bool [source]
Adding or updating a model entry in the global metadata.
- Parameters
model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
overwrite_existing_metadata (bool) – in case of is_local_model, flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
store_new_config (bool) – flag indicating whether the current model metadata should be stored on disk i.e. in config/
- Returns
Flag indicating whether model metadata update was successfully concluded
- Return type
bool
- contribute(model_id: str, init_py_path: str, github_access_token: str, zenodo_access_token: str, metadata_file_path: Optional[str] = None, model_weights_name: Optional[str] = None, model_weights_extension: Optional[str] = None, generate_method_name: Optional[str] = None, dependencies: Optional[list] = None, fill_more_fields_interactively: bool = True, overwrite_existing_metadata: bool = False, output_path: str = 'config', creator_name: str = 'unknown name', creator_affiliation: str = 'unknown affiliation', model_description: str = '', install_dependencies: bool = False)[source]
Implements the full model contribution workflow including model metadata generation, model test, model Zenodo upload, and medigan github issue creation.
- Parameters
model_id (str) – The generative model’s unique id
init_py_path (str) – The path to the local model’s __init__.py file needed for importing and running this model.
github_access_token (str) – a personal access token linked to your github user account, used as means of authentication
zenodo_access_token (str) – a personal access token in Zenodo linked to a user account for authentication
metadata_file_path (str) – the path pointing to the metadata file
model_weights_name (str) – the name of the checkpoint file containing the model’s weights
model_weights_extension (str) – the extension (e.g. .pt) of the checkpoint file containing the model’s weights
generate_method_name (str) – the name of the sample generation method inside the models __init__.py file
dependencies (list) – the list of dependencies that need to be installed via pip to run the model
fill_more_fields_interactively (bool) – flag indicating whether a user will be interactively asked via command line for further input to fill out missing metadata content
overwrite_existing_metadata (bool) – flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
output_path (str) – the path where the created metadata json file will be stored
creator_name (str) – the creator name that will appear on the corresponding github issue
creator_affiliation (str) – the creator affiliation that will appear on the corresponding github issue
model_description (list) – the model_description that will appear on the corresponding github issue
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
- Returns
Returns the url pointing to the corresponding issue on github
- Return type
str
- find_matching_models_by_values(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False) list [source]
Search for values (and keys) in model configs and return a list of each matching ModelMatchCandidate.
This function calls an identically named function in a ModelSelector instance.
- Parameters
values (list) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
- Returns
a list of ModelMatchCandidate class instances each of which was successfully matched against the search values.
- Return type
list
- find_model_and_generate(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False, num_samples: int = 30, output_path: Optional[str] = None, is_gen_function_returned: bool = False, install_dependencies: bool = False, **kwargs)[source]
Search for values (and keys) in model configs to generate samples with the found model.
Note that the number of found models should be ==1. Else no samples will be generated and a error is logged to console.
- Parameters
values (list) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
is_gen_function_returned (bool) – flag indicating whether, instead of generating samples, the sample generation function will be returned
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function
- Returns
However, if is_gen_function_returned is True, it returns the internal generate function of the model.
- Return type
None
- find_model_executor_by_id(model_id: str) medigan.execute_model.model_executor.ModelExecutor [source]
Find and return the ModelExecutor instance of this model_id in the self.model_executors list.
- Parameters
model_id (str) – The generative model’s unique id
- Returns
ModelExecutor class instance corresponding to the model_id
- Return type
- find_models_and_rank(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False, metric: str = 'SSIM', order: str = 'asc') list [source]
Search for values (and keys) in model configs, rank results and return sorted list of model dicts.
This function calls an identically named function in a ModelSelector instance.
- Parameters
values (list`) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
metric (str) – The key in the selection dict that corresponds to the metric of interest
order (str) – the sorting order of the ranked results. Should be either “asc” (ascending) or “desc” (descending)
- Returns
a list of the searched and matched model dictionaries containing metric and model_id, sorted by metric.
- Return type
list
- find_models_rank_and_generate(values: list, target_values_operator: str = 'AND', are_keys_also_matched: bool = False, is_case_sensitive: bool = False, metric: str = 'SSIM', order: str = 'asc', num_samples: int = 30, output_path: Optional[str] = None, is_gen_function_returned: bool = False, install_dependencies: bool = False, **kwargs)[source]
Search for values (and keys) in model configs, rank results to generate samples with highest ranked model.
- Parameters
values (list) – list of values used to search and find models corresponding to these values
target_values_operator (str) – the operator indicating the relationship between values in the evaluation of model search results. Should be either “AND”, “OR”, or “XOR”.
are_keys_also_matched (bool) – flag indicating whether, apart from values, the keys in the model config should also be searchable
is_case_sensitive (bool) – flag indicating whether the search for values (and) keys in the model config should be case-sensitive.
metric (str) – The key in the selection dict that corresponds to the metric of interest
order (str) – the sorting order of the ranked results. Should be either “asc” (ascending) or “desc” (descending)
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
is_gen_function_returned (bool) – flag indicating whether, instead of generating samples, the sample generation function will be returned
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function
- Returns
However, if is_gen_function_returned is True, it returns the internal generate function of the model.
- Return type
None
- generate(model_id: str, num_samples: int = 30, output_path: Optional[str] = None, save_images: bool = True, is_gen_function_returned: bool = False, install_dependencies: bool = False, **kwargs)[source]
Generate samples with the model corresponding to the model_id or return the model’s generate function.
- Parameters
model_id (str) – The generative model’s unique id
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
save_images (bool) – flag indicating whether generated samples are returned (i.e. as list of numpy arrays) or rather stored in file system (i.e in output_path)
is_gen_function_returned (bool) – flag indicating whether, instead of generating samples, the sample generation function will be returned
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function
- Returns
Returns images as list of numpy arrays if save_images is False. However, if is_gen_function_returned is True, it returns the internal generate function of the model.
- Return type
list
- get_as_torch_dataloader(dataset=None, model_id: Optional[str] = None, num_samples: int = 1000, install_dependencies: bool = False, transform=None, batch_size=None, shuffle=None, sampler=None, batch_sampler=None, num_workers=None, collate_fn=None, pin_memory=None, drop_last=None, timeout=None, worker_init_fn=None, prefetch_factor: Optional[int] = None, persistent_workers: Optional[bool] = None, pin_memory_device: Optional[str] = None, **kwargs) torch.utils.data.dataloader.DataLoader [source]
Get torch Dataloader sampling synthetic data from medigan model.
Dataloader combines a dataset and a sampler, and provides an iterable over the given torch dataset. Dataloader is created for synthetic data for the specified medigan model. Pytorch native parameters are set to
None
per default. Only those params are are passed to the Dataloader() initialization function that are notNone
.- Parameters
dataset (Dataset) – dataset from which to load the data.
model_id – str The generative model’s unique id
num_samples – int the number of samples that will be generated
install_dependencies – bool flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function (e.g. the input path for image-to-image translation models in medigan).
transform – the torch data transformation functions to be applied to the data in the dataset.
batch_size (int, optional) – how many samples per batch to load (default:
None
).shuffle (bool, optional) – set to
True
to have the data reshuffled at every epoch (default:None
).sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any
Iterable
with__len__
implemented. If specified,shuffle
must not be specified. (default:None
)batch_sampler (Sampler or Iterable, optional) – like
sampler
, but returns a batch of indices at a time. Mutually exclusive withbatch_size
,shuffle
,sampler
, anddrop_last
. (default:None
)num_workers (int, optional) – how many subprocesses to use for data loading.
0
means that the data will be loaded in the main process. (default:None
)collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset. (default:
None
)pin_memory (bool, optional) – If
True
, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or yourcollate_fn
returns a batch that is a custom type, see the example below. (default:None
)drop_last (bool, optional) – set to
True
to drop the last incomplete batch, if the dataset size is not divisible by the batch size. IfFalse
and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default:None
)timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default:
None
)worker_init_fn (callable, optional) – If not
None
, this will be called on each worker subprocess with the worker id (an int in[0, num_workers - 1]
) as input, after seeding and before data loading. (default:None
)prefetch_factor (int, optional, keyword-only arg) – Number of batches loaded in advance by each worker.
2
means there will be a total of 2 * num_workers batches prefetched across all workers. (default:None
).persistent_workers (bool, optional) – If
True
, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default:None
)pin_memory_device (str, optional) – the device to pin memory to if
pin_memory
isTrue
(default:None
).
- Returns
a torch.utils.data.DataLoader object with data generated by model corresponding to inputted Dataset or model_id.
- Return type
DataLoader
- get_as_torch_dataset(model_id: str, num_samples: int = 100, install_dependencies: bool = False, transform=None, **kwargs) torch.utils.data.dataset.Dataset [source]
Get synthetic data in a torch Dataset for specified medigan model.
The dataset returns a dict with keys sample (== image), labels (== condition), and mask (== segmentation mask). While key ‘sample’ is mandatory, the other key value pairs are only returned if applicable to generative model.
- Parameters
model_id – str The generative model’s unique id
num_samples – int the number of samples that will be generated
install_dependencies –
- bool
flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
- transform
the torch data transformation functions to be applied to the data in the dataset.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function (e.g. the input path for image-to-image translation models in medigan).
- Returns
a torch.utils.data.Dataset object with data generated by model corresponding to model_id.
- Return type
Dataset
- get_config_by_id(model_id: str, config_key: Optional[str] = None) dict [source]
Get and return the part of the config below a config_key for a specific model_id.
The config_key parameters can be separated by a ‘.’ (dot) to allow for retrieval of nested config keys, e.g, ‘execution.generator.name’
This function calls an identically named function in a ConfigManager instance.
- Parameters
model_id (str) – The generative model’s unique id
config_key (str) – A key of interest present in the config dict
- Returns
a dictionary from the part of the config file corresponding to model_id and config_key.
- Return type
dict
- get_generate_function(model_id: str, num_samples: int = 30, output_path: Optional[str] = None, install_dependencies: bool = False, **kwargs)[source]
Return the model’s generate function.
Relies on the self.generate function.
- Parameters
model_id (str) – The generative model’s unique id
num_samples (int) – the number of samples that will be generated
output_path (str) – the path as str to the output folder where the generated samples will be stored
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
**kwargs – arbitrary number of keyword arguments passed to the model’s sample generation function
- Returns
The internal reusable generate function of the generative model.
- Return type
function
- get_model_contributor_by_id(model_id: str) medigan.contribute_model.model_contributor.ModelContributor [source]
Find and return the ModelContributor instance of this model_id in the self.model_contributors list.
- Parameters
model_id (str) – The generative model’s unique id
- Returns
ModelContributor class instance corresponding to the model_id
- Return type
- get_model_executor(model_id: str, install_dependencies: bool = False) medigan.execute_model.model_executor.ModelExecutor [source]
Add and return the ModelExecutor instance of this model_id from the self.model_executors list.
Relies on self.add_model_executor and self.find_model_executor_by_id functions.
- Parameters
model_id (str) – The generative model’s unique id
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
- Returns
ModelExecutor class instance corresponding to the model_id
- Return type
- get_models_by_key_value_pair(key1: str, value1: str, is_case_sensitive: bool = False) list [source]
Get and return a list of model_id dicts that contain the specified key value pair in their selection config.
The key param can contain ‘.’ (dot) separations to allow for retrieval of nested config keys such as ‘execution.generator.name’
This function calls an identically named function in a ModelSelector instance.
- Parameters
key1 (str) – The key in the selection dict
value1 (str) – The value in the selection dict that corresponds to key1
is_case_sensitive (bool) – flag to evaluate keys and values with case sensitivity if set to True
- Returns
a list of the dictionaries each containing a models id and the found key-value pair in the models config
- Return type
list
- get_selection_criteria_by_id(model_id: str, is_model_id_removed: bool = True) dict [source]
Get and return the selection config dict for a specific model_id.
This function calls an identically named function in a ModelSelector instance.
- Parameters
model_id (str) – The generative model’s unique id
is_model_id_removed (bool) – flag to to remove the model_ids from first level of dictionary.
- Returns
a dictionary corresponding to the selection config of a model
- Return type
dict
- get_selection_criteria_by_ids(model_ids: Optional[list] = None, are_model_ids_removed: bool = True) list [source]
Get and return a list of selection config dicts for each of the specified model_ids.
This function calls an identically named function in a ModelSelector instance.
- Parameters
model_ids (list) – A list of generative models’ unique ids or ids abbreviated as integers (e.g. 1, 2, .. 21)
are_model_ids_removed (bool) – flag to remove the model_ids from first level of dictionary.
- Returns
a list of dictionaries each corresponding to the selection config of a model
- Return type
list
- get_selection_keys(model_id: Optional[str] = None) list [source]
Get and return all first level keys from the selection config dict for a specific model_id.
This function calls an identically named function in a ModelSelector instance.
- Parameters
model_id (str) – The generative model’s unique id
- Returns
a list containing the keys as strings of the selection config of the model_id.
- Return type
list
- get_selection_values_for_key(key: str, model_id: Optional[str] = None) list [source]
Get and return the value of a specified key of the selection dict in the config for a specific model_id.
The key param can contain ‘.’ (dot) separations to allow for retrieval of nested config keys such as ‘execution.generator.name’
This function calls an identically named function in a ModelSelector instance.
- Parameters
key (str) – The key in the selection dict
model_id (str) – The generative model’s unique id
- Returns
a list of the values that correspond to the key in the selection config of the model_id.
- Return type
list
- is_model_executor_already_added(model_id) bool [source]
Check whether the ModelExecutor instance of this model_id is already in self.model_executors list.
- Parameters
model_id (str) – The generative model’s unique id
- Returns
indicating whether this ModelExecutor had been already previously added to self.model_executors
- Return type
bool
- is_model_metadata_valid(model_id: str, metadata: dict, is_local_model: bool = True) bool [source]
Checking if a model’s corresponding metadata is valid.
Specific fields in the model’s metadata are mandatory. It is asserted if these key value pairs are present.
- Parameters
model_id (str) – The generative model’s unique id
metadata (dict) – The model’s corresponding metadata
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
- Returns
Flag indicating whether the specific model’s metadata format and fields are valid
- Return type
bool
- list_models() list [source]
Return the list of model_ids as strings based on config.
- Return type
list
- push_to_github(model_id: str, github_access_token: str, package_link: Optional[str] = None, creator_name: str = '', creator_affiliation: str = '', model_description: str = '')[source]
Upload the model’s metadata inside a github issue to the medigan github repository.
To add your model to medigan, your metadata will be reviewed on Github and added to medigan’s official model metadata
The medigan repository issues page: https://github.com/RichardObi/medigan/issues
Get your Github access token here: https://github.com/settings/tokens
- Parameters
model_id (str) – The generative model’s unique id
github_access_token (str) – a personal access token linked to your github user account, used as means of authentication
package_link – a package link
creator_name (str) – the creator name that will appear on the corresponding github issue
creator_affiliation (str) – the creator affiliation that will appear on the corresponding github issue
model_description (list) – the model_description that will appear on the corresponding github issue
- Returns
Returns the url pointing to the corresponding issue on github
- Return type
str
- push_to_zenodo(model_id: str, zenodo_access_token: str, creator_name: str = 'unknown name', creator_affiliation: str = 'unknown affiliation', model_description: str = '') str [source]
Upload the model files as zip archive to a public Zenodo repository where the model will be persistently stored.
Get your Zenodo access token here: https://zenodo.org/account/settings/applications/tokens/new/ (Enable scopes deposit:actions and deposit:write)
- Parameters
model_id (str) – The generative model’s unique id
zenodo_access_token (str) – a personal access token in Zenodo linked to a user account for authentication
creator_name (str) – the creator name that will appear on the corresponding Zenodo model upload homepage
creator_affiliation (str) – the creator affiliation that will appear on the corresponding Zenodo model upload homepage
model_description (list) – the model_description that will appear on the corresponding Zenodo model upload homepage
- Returns
Returns the url pointing to the corresponding Zenodo model upload homepage
- Return type
str
- rank_models_by_performance(model_ids: Optional[list] = None, metric: str = 'SSIM', order: str = 'asc') list [source]
Rank model based on a provided metric and return sorted list of model dicts.
The metric param can contain ‘.’ (dot) separations to allow for retrieval of nested metric config keys such as ‘downstream_task.CLF.accuracy’
This function calls an identically named function in a ModelSelector instance.
- Parameters
model_ids (list) – only evaluate the model_ids in this list. If none, evaluate all available model_ids
metric (str) – The key in the selection dict that corresponds to the metric of interest
order (str) – the sorting order of the ranked results. Should be either “asc” (ascending) or “desc” (descending)
- Returns
a list of model dictionaries containing metric and model_id, sorted by metric.
- Return type
list
- test_model(model_id: str, is_local_model: bool = True, overwrite_existing_metadata: bool = False, store_new_config: bool = True, num_samples: int = 3, install_dependencies: bool = False)[source]
Test if a model generates and returns a specific number of samples in the correct format
- Parameters
model_id (str) – The generative model’s unique id
is_local_model (bool) – flag indicating whether the tested model is a new local user model i.e not yet part of medigan’s official models
overwrite_existing_metadata (bool) – in case of is_local_model, flag indicating whether existing metadata for this model in medigan’s config/global.json should be overwritten.
store_new_config (bool) – flag indicating whether the current model metadata should be stored on disk i.e. in config/
num_samples (int) – the number of samples that will be generated
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
- visualize(model_id: str, slider_grouper: int = 10, auto_close: bool = False, install_dependencies: bool = False) None [source]
Initialize and run ModelVisualizer of this model_id if it is available. It allows to visualize a sample from the model’s output. UI window will pop up allowing the user to control the generation parameters (conditional and unconditional ones).
- Parameters
model_id (str) – The generative model’s unique id to visualize.
slider_grouper (int) – Number of input parameters to group together within one slider.
auto_close (bool) – Flag for closing the user interface automatically after time. Used while testing.
install_dependencies (bool) – flag indicating whether a generative model’s dependencies are automatically installed. Else error is raised if missing dependencies are detected.
medigan.model_visualizer module
ModelVisualizer class providing visualizing corresponding model input and model output changes.
- class medigan.model_visualizer.ModelVisualizer(model_executor, config: None)[source]
Bases:
object
ModelVisualizer class: Visualises synthetic data through a user interface. Depending on a model, it is possible to control the input latent vector values and conditional input.
- Parameters
model_executor (ModelExecutor) – The generative model’s executor object
config (dict) – The config dict containing the model metadata
- model_executor
The generative model’s executor object
- Type
- input_latent_vector_size
Size of the latent vector used as an input for generation
- Type
int
- conditional
Flag for models with conditional input
- Type
bool
- condition
Value of the conditinal input to the model
- Type
Union[int, float]
- max_input_value
Absolute value used for setting latent values input range
- Type
float
- visualize(slider_grouper: int = 10, auto_close=False)[source]
Visualize the model’s output. This method is called by the user. It opens up a user interface with available controls.
- Parameters
slider_grouper (int) – Number of input parameters to group together within one slider.
auto_close (bool) – Flag for closing the user interface automatically after time. Used while testing.
- Return type
None
medigan.utils module
Utils class providing generalized reusable functions for I/O, parsing, sorting, type conversions, etc.
- class medigan.utils.Utils[source]
Bases:
object
Utils class containing reusable static methods.
- static call_without_removable_params(my_callable, removable_param_values: list = [None], **params)[source]
call a callable without passing parameters that contain any of the removable_param_values as value.
- static copy(source_path: pathlib.Path, target_path: str = './')[source]
copy a folder or file from source_path to target_path
- static deep_get(base_dict: dict, key: str)[source]
Split the key by “.” to get value in nested dictionary.
- static dict_to_lowercase(target_dict: dict, string_conversion: bool = True) dict [source]
transform values and keys in dict to lowercase, optionally with string conversion of the values.
Warning: Does not convert nested dicts in the target_dict, but rather removes them from return object.
- static download_file(download_link: str, path_as_string: str, file_extension: str = '.json')[source]
download a file using the requests lib and store in path_as_string
- static has_more_than_n_diff_pixel_values(img: numpy.ndarray, n: int = 4) bool [source]
This function checks whether an image contains more than n different pixel values.
This helps to differentiate between segmentation masks and actual images.
- static is_file_in(folder_path: str, filename: str) bool [source]
Checks if a file is inside a folder
- static is_file_located_or_downloaded(path_as_string: str, download_if_not_found: bool = True, download_link: Optional[str] = None, is_new_download_forced: bool = False, allow_local_path_as_url: bool = True) bool [source]
check if is file in path_as_string and optionally download the file (again).
- static is_url_valid(the_url: str) bool [source]
Checks if a url is valid using urllib.parse.urlparse
- static list_to_lowercase(target_list: list) list [source]
string conversion and lower-casing of values in list.
trade-off: String conversion for increased robustness > type failure detection
- static mkdirs(path_as_string: str) bool [source]
create folder in path_as_string if not already created.
- static order_dict_by_value(dict_list, key: str, order: str = 'asc', sort_algorithm='bubbleSort') list [source]
Sorting a list of dicts by the values of a specific key in the dict using a sorting algorithm.
This function is deprecated. You may use Python List sort() with key=lambda function instead.
- static split_images_and_masks_no_ordering(data: list, num_samples: int, max_nested_arrays: int = 2) [<class 'numpy.ndarray'>, <class 'numpy.ndarray'>] [source]
Extracts and separates the masks from the images if a model returns both in the same np.ndarray.
This extendable function assumes that, in data, a mask follows the image that it corresponds to or vice versa.
This function is deprecated. Please use split_images_masks_and_labels instead.
- static split_images_masks_and_labels(data: list, num_samples: int, max_nested_arrays: int = 2) [<class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>] [source]
Separates the data (sample, mask, other_imaging_data, label) returned by a generative model
This functions expects a list of tuples as input data and assumes that each tuple contains sample, mask, other_imaging_data, label at index positions [0], [1], [2], and [3] respectively.
samples, masks, and imaging data are expected to be of type np.ndarray and labels of type “str”.
For example, this extendable function assumes that, in data, a mask follows the image that it corresponds to or vice versa.
- static store_dict_as(dictionary, extension: str = '.json', output_path: str = 'config/', filename: str = 'metadata.json')[source]
store a Python dictionary in file system as variable filetype.
Module contents
medigan is a modular Python library for automating synthetic dataset generation.