biothings.hub.datarelease¶
biothings.hub.datarelease.publisher¶
- class biothings.hub.datarelease.publisher.BasePublisher(envconf, log_folder, es_backups_folder, *args, **kwargs)[source]¶
Bases:
BaseManager
,BaseStatusRegisterer
- property category¶
- clean_stale_status()[source]¶
During startup, search for action in progress which would have been interrupted and change the state to “canceled”. Ex: some donwloading processes could have been interrupted, at startup, “downloading” status should be changed to “canceled” so to reflect actual state on these datasources. This must be overriden in subclass.
- property collection¶
Return collection object used to fetch doc in which we store status
- get_pinfo()[source]¶
Return dict containing information about the current process (used to report in the hub)
- get_pre_post_previous_result(build_doc, key_value)[source]¶
In order to start a pre- or post- pipeline, a first previous result, fed all along the pipeline to the next step, has to be defined, and depends on the type of publisher.
- publish_release_notes(release_folder, build_version, s3_release_folder, s3_release_bucket, aws_key, aws_secret, prefix='release_')[source]¶
- class biothings.hub.datarelease.publisher.DiffPublisher(diff_manager, *args, **kwargs)[source]¶
Bases:
BasePublisher
- get_pre_post_previous_result(build_doc, key_value)[source]¶
In order to start a pre- or post- pipeline, a first previous result, fed all along the pipeline to the next step, has to be defined, and depends on the type of publisher.
- post_publish(build_name, repo_conf, build_doc)[source]¶
Post-publish hook, running steps declared in config, but also whatever would be defined in a sub-class
- pre_publish(previous_build_name, repo_conf, build_doc)[source]¶
Pre-publish hook, running steps declared in config, but also whatever would be defined in a sub-class
- publish(build_name, previous_build=None, steps=('pre', 'reset', 'upload', 'meta', 'post'))[source]¶
Publish diff files and metadata about the diff files, release note, etc… on s3. Using build_name, a src_build document is fetched, and a diff release is searched. If more than one diff release is found, “previous_build” must be specified to pick the correct one. - steps:
pre/post: optional steps processed as first and last steps.
reset: highly recommended, reset synced flag in diff files so they won’t get skipped when used…
upload: upload diff_folder content to S3
meta: publish/register the version as available for auto-updating hubs
- class biothings.hub.datarelease.publisher.ReleaseManager(diff_manager, snapshot_manager, poll_schedule=None, *args, **kwargs)[source]¶
Bases:
BaseManager
,BaseStatusRegisterer
- DEFAULT_DIFF_PUBLISHER_CLASS¶
alias of
DiffPublisher
- DEFAULT_SNAPSHOT_PUBLISHER_CLASS¶
alias of
SnapshotPublisher
- build_release_note(old_colname, new_colname, note=None) ReleaseNoteSource [source]¶
Build a release note containing most significant changes between build names “old_colname” and “new_colname”. An optional end note can be added to bring more specific information about the release.
Return a dictionary containing significant changes.
- clean_stale_status()[source]¶
During startup, search for action in progress which would have been interrupted and change the state to “canceled”. Ex: some donwloading processes could have been interrupted, at startup, “downloading” status should be changed to “canceled” so to reflect actual state on these datasources. This must be overriden in subclass.
- property collection¶
Return collection object used to fetch doc in which we store status
- configure(release_confdict)[source]¶
Configure manager with release “confdict”. See config_hub.py in API for the format.
- create_release_note(old, new, filename=None, note=None, format='txt')[source]¶
Generate release note files, in TXT and JSON format, containing significant changes summary between target collections old and new. Output files are stored in a diff folder using generate_folder(old,new).
‘filename’ can optionally be specified, though it’s not recommended as the publishing pipeline, using these files, expects a filenaming convention.
‘note’ is an optional free text that can be added to the release note, at the end.
txt ‘format’ is the only one supported for now.
- get_pinfo()[source]¶
Return dict containing information about the current process (used to report in the hub)
- poll(state, func)[source]¶
Search for source in collection ‘col’ with a pending flag list containing ‘state’ and and call ‘func’ for each document found (with doc as only param)
- publish_diff(publisher_env, build_name, previous_build=None, steps=('pre', 'reset', 'upload', 'meta', 'post'))[source]¶
- publish_snapshot(publisher_env, snapshot, build_name=None, previous_build=None, steps=('pre', 'meta', 'post'))[source]¶
- reset_synced(old, new)[source]¶
Reset sync flags for diff files produced between “old” and “new” build. Once a diff has been applied, diff files are flagged as synced so subsequent diff won’t be applied twice (for optimization reasons, not to avoid data corruption since diff files can be safely applied multiple times). In any needs to apply the diff another time, diff files needs to reset.
- class biothings.hub.datarelease.publisher.SnapshotPublisher(snapshot_manager, *args, **kwargs)[source]¶
Bases:
BasePublisher
- get_pre_post_previous_result(build_doc, key_value)[source]¶
In order to start a pre- or post- pipeline, a first previous result, fed all along the pipeline to the next step, has to be defined, and depends on the type of publisher.
- post_publish(snapshot_name, repo_conf, build_doc)[source]¶
Post-publish hook, running steps declared in config, but also whatever would be defined in a sub-class
- pre_publish(snapshot_name, repo_conf, build_doc)[source]¶
Pre-publish hook, running steps declared in config, but also whatever would be defined in a sub-class
- publish(snapshot, build_name=None, previous_build=None, steps=('pre', 'meta', 'post'))[source]¶
Publish snapshot metadata to S3. If snapshot repository is of type “s3”, data isn’t actually uploaded/published since it’s already there on s3. If type “fs”, some “pre” steps can be added to the RELEASE_CONFIG paramater to archive and upload it to s3. Metadata about the snapshot, release note, etc… is then uploaded in correct buckets as defined in config, and “post” steps can be run afterward.
Though snapshots don’t need any previous version to be applied on, a release note with significant changes between current snapshot and a previous version could have been generated. By default, snapshot name is used to pick one single build document and from the document, get the release note information.
biothings.hub.datarelease.releasenote¶
- class biothings.hub.datarelease.releasenote.ReleaseNoteSource(old_src_build_reader: ReleaseNoteSrcBuildReader, new_src_build_reader: ReleaseNoteSrcBuildReader, diff_stats_from_metadata_file: dict, addon_note: str)[source]¶
Bases:
object
- class biothings.hub.datarelease.releasenote.ReleaseNoteSrcBuildReader(src_build_doc: dict)[source]¶
Bases:
object
- attach_cold_src_build_reader(other: ReleaseNoteSrcBuildReader)[source]¶
Attach a cold src_build reader.
It’s required that self is a hot src_builder reader and other is cold.
- property build_id: str¶
- property build_stats: dict¶
- property build_version: str¶
- property cold_collection_name: str¶
- property datasource_mapping: dict¶
- property datasource_stats: dict¶
- property datasource_versions: dict¶
- class biothings.hub.datarelease.releasenote.ReleaseNoteSrcBuildReaderAdapter(src_build_reader: ReleaseNoteSrcBuildReader)[source]¶
Bases:
object
- property build_stats¶
- property datasource_info¶
- class biothings.hub.datarelease.releasenote.ReleaseNoteTxt(source: ReleaseNoteSource)[source]¶
Bases:
object