biothings.hub.autoupdate¶
biothings.hub.autoupdate.dumper¶
- class biothings.hub.autoupdate.dumper.BiothingsDumper(*args, **kwargs)[source]¶
Bases:
HTTPDumper
This dumper is used to maintain a BioThings API up-to-date. BioThings data is available as either as an ElasticSearch snapshot when full update, and a collection of diff files for incremental updates. It will either download incremental updates and apply diff, or trigger an ElasticSearch restore if the latest version is a full update. This dumper can also be configured with precedence rules: when a full and a incremental update is available, rules can set so full is preferably used over incremental (size can also be considered when selecting the preferred way).
- AUTO_UPLOAD = False¶
- AWS_ACCESS_KEY_ID = None¶
- AWS_SECRET_ACCESS_KEY = None¶
- SRC_NAME = None¶
- SRC_ROOT_FOLDER = None¶
- TARGET_BACKEND = None¶
- VERSION_URL = None¶
- property base_url¶
- choose_best_version(versions)[source]¶
Out of all compatible versions, choose the best: 1. choose incremental vs. full according to preferences 2. version must be the highest (most up-to-date)
- compare_remote_local(remote_version, local_version, orig_remote_version, orig_local_version)[source]¶
- create_todump_list(force=False, version='latest', url=None)[source]¶
Fill self.to_dump list with dict(“remote”:remote_path,”local”:local_path) elements. This is the todo list for the dumper. It’s a good place to check whether needs to be downloaded. If ‘force’ is True though, all files will be considered for download
- download(remoteurl, localfile, headers=None)[source]¶
Download “remotefile’ to local location defined by ‘localfile’ Return relevant information about remotefile (depends on the actual client)
- find_update_path(version, backend_version=None)[source]¶
Explore available versions and find the path to update the hub up to “version”, starting from given backend_version (typically current version found in ES index). If backend_version is None (typically no index yet), a complete path will be returned, from the last compatible “full” release up-to the latest “diff” update. Returned is a list of dict, where each dict is a build metadata element containing information about each update (see versions.json), the order of the list describes the order the updates should be performed.
- async get_target_backend()[source]¶
Example: [{
‘host’: ‘es6.mygene.info:9200’, ‘index’: ‘mygene_allspecies_20200823_ufkwdv79’, ‘index_alias’: ‘mygene_allspecies’, ‘version’: ‘20200906’, ‘count’: 38729977
}]
- async info(version='latest')[source]¶
Display version information (release note, etc…) for given version {
“info”: … “release_note”: …
}
- post_dump(*args, **kwargs)[source]¶
Placeholder to add a custom process once the whole resource has been dumped. Optional.
- prepare_client()[source]¶
Depending on presence of credentials, inject authentication in client.get()
- remote_is_better(remotefile, localfile)[source]¶
Determine if remote is better
Override if necessary.
- property target_backend¶
- async versions()[source]¶
Display all available versions. Example: [{
‘build_version’: ‘20171003’, ‘url’: ‘https://biothings-releases.s3.amazonaws.com:443/mygene.info/20171003.json’, ‘release_date’: ‘2017-10-06T11:58:39.749357’, ‘require_version’: None, ‘target_version’: ‘20171003’, ‘type’: ‘full’
}, …]
biothings.hub.autoupdate.uploader¶
- class biothings.hub.autoupdate.uploader.BiothingsUploader(*args, **kwargs)[source]¶
Bases:
BaseSourceUploader
db_conn_info is a database connection info tuple (host,port) to fetch/store information about the datasource’s state.
- AUTO_PURGE_INDEX = False¶
- SYNCER_FUNC = None¶
- TARGET_BACKEND = None¶
- get_snapshot_repository_config(build_meta)[source]¶
Return (name,config) tuple from build_meta, where name is the repo name, and config is the repo config
- async load(*args, **kwargs)[source]¶
Main resource load process, reads data from doc_c using chunk sized as batch_size. steps defines the different processes used to laod the resource: - “data” : will store actual data into single collections - “post” : will perform post data load operations - “master” : will register the master document in src_master
- name = None¶
- property syncer_func¶
- property target_backend¶