BIOBLEND(1) | BioBlend | BIOBLEND(1) |
bioblend - BioBlend Documentation
BioBlend is a Python library for interacting with CloudMan and Galaxy's API.
BioBlend is supported and tested on:
Conceptually, it makes it possible to script and automate the process of cloud infrastructure provisioning and scaling via CloudMan, and running of analyses via Galaxy. In reality, it makes it possible to do things like this:
from bioblend.cloudman import CloudManConfig from bioblend.cloudman import CloudManInstance cfg = CloudManConfig('<your cloud access key>', '<your cloud secret key>', 'My CloudMan', 'ami-<ID>', 'm1.small', '<password>') cmi = CloudManInstance.launch_instance(cfg) cmi.get_status()
from bioblend.cloudman import CloudManInstance cmi = CloudManInstance("<instance IP>", "<password>") cmi.add_nodes(3) cluster_status = cmi.get_status() cmi.remove_nodes(2)
from bioblend.galaxy import GalaxyInstance gi = GalaxyInstance('<Galaxy IP>', key='your API key') libs = gi.libraries.get_libraries() gi.workflows.show_workflow('workflow ID') gi.workflows.run_workflow('workflow ID', input_dataset_map)
from bioblend.galaxy.objects import GalaxyInstance gi = GalaxyInstance("URL", "API_KEY") wf = gi.workflows.list()[0] hist = gi.histories.list()[0] inputs = hist.get_datasets()[:2] input_map = dict(zip(wf.input_labels, inputs)) params = {"Paste1": {"delimiter": "U"}} wf.run(input_map, "wf_output", params=params)
NOTE:
Stable releases of BioBlend are best installed via pip or easy_install from PyPI using something like:
$ pip install bioblend
Alternatively, you may install the most current source code from our Git repository, or fork the project on Github. To install from source, do the following:
# Clone the repository to a local directory $ git clone https://github.com/galaxyproject/bioblend.git # Install the library $ cd bioblend $ python setup.py install
After installing the library, you will be able to simply import it into your Python environment with import bioblend. For details on the available functionality, see the API documentation.
BioBlend requires a number of Python libraries. These libraries are installed automatically when BioBlend itself is installed, regardless whether it is installed via PyPi or by running python setup.py install command. The current list of required libraries is always available from setup.py in the source code repository.
If you also want to run tests locally, some extra libraries are required. To install them, run:
$ python setup.py test
To get started using BioBlend, install the library as described above. Once the library becomes available on the given system, it can be developed against. The developed scripts do not need to reside in any particular location on the system.
It is probably best to take a look at the example scripts in docs/examples source directory and browse the API documentation. Beyond that, it's up to your creativity :).
Anyone interested in contributing or tweaking the library is more then welcome to do so. To start, simply fork the Git repository on Github and start playing with it. Then, issue pull requests.
BioBlend's API focuses around and matches the services it wraps. Thus, there are two top-level sets of APIs, each corresponding to a separate service and a corresponding step in the automation process. Note that each of the service APIs can be used completely independently of one another.
Effort has been made to keep the structure and naming of those API's consistent across the library but because they do bridge different services, some discrepancies may exist. Feel free to point those out and/or provide fixes.
For Galaxy, an alternative object-oriented API is also available. This API provides an explicit modeling of server-side Galaxy instances and their relationships, providing higher-level methods to perform operations such as retrieving all datasets for a given history, etc. Note that, at the moment, the oo API is still incomplete, providing access to a more restricted set of Galaxy modules with respect to the standard one.
API used to manipulate the instantiated infrastructure. For example, scale the size of the compute cluster, get infrastructure status, get service status.
This page describes some sample use cases for CloudMan API and provides examples for these API calls. In addition to this page, there are functional examples of complete scripts in docs/examples directory of the BioBlend source code repository.
CloudMan supports Amazon, OpenStack, OpenNebula, and Eucalyptus based clouds and BioBlend can be used to programatically manipulate CloudMan on any of those clouds. Once launched, the API calls to CloudMan are the same irrespective of the cloud. In order to launch an instance on a given cloud, cloud properties need to be provided to CloudManLauncher. If cloud properties are not specified, CloudManLauncher will default to Amazon cloud properties.
If we want to use a different cloud provider, we need to specify additional cloud properties when creating an instance of the CloudManLauncher class. For example, if we wanted to create a connection to NeCTAR, Australia's national research cloud, we would use the following properties:
from bioblend.util import Bunch nectar = Bunch(
name='NeCTAR',
cloud_type='openstack',
bucket_default='cloudman-os',
region_name='NeCTAR',
region_endpoint='nova.rc.nectar.org.au',
ec2_port=8773,
ec2_conn_path='/services/Cloud',
cidr_range='115.146.92.0/22',
is_secure=True,
s3_host='swift.rc.nectar.org.au',
s3_port=8888,
s3_conn_path='/')
NOTE:
In order to launch a CloudMan cluster on a chosen cloud, we do the following (continuing from the previous example):
from bioblend.cloudman import CloudManConfig from bioblend.cloudman import CloudManInstance cmc = CloudManConfig('<your AWS access key', 'your AWS secret key', 'Cluster name',
'ami-<ID>', 'm1.medium', 'choose_a_password_here', nectar) cmi = CloudManInstance.launch_instance(cmc)
NOTE:
cmi = CloudManInstance('http://115.146.92.174', 'your_UD_password')
We now have a CloudManInstance object that allows us to manage created CloudMan instance via the API. Once launched, it will take a few minutes for the instance to boot and CloudMan start. To check on the status of the machine, (repeatedly) run the following command:
>>> cmi.get_machine_status() {'error': '',
'instance_state': u'pending',
'placement': '',
'public_ip': ''} >>> cmi.get_machine_status() {'error': '',
'instance_state': u'running',
'placement': u'melbourne-qh2',
'public_ip': u'115.146.86.29'}
Once the instance is ready, although it may still take a few moments for CloudMan to start, it is possible to start interacting with the application.
NOTE:
>>> cmi.update()
Having a reference to a CloudManInstance object, we can manage it via the available cloudman-instance-api API:
>>> cmi.initialized False >>> cmi.initialize('SGE') >>> cmi.get_status() {u'all_fs': [],
u'app_status': u'yellow',
u'autoscaling': {u'as_max': u'N/A',
u'as_min': u'N/A',
u'use_autoscaling': False},
u'cluster_status': u'STARTING',
u'data_status': u'green',
u'disk_usage': {u'pct': u'0%', u'total': u'0', u'used': u'0'},
u'dns': u'#',
u'instance_status': {u'available': u'0', u'idle': u'0', u'requested': u'0'},
u'snapshot': {u'progress': u'None', u'status': u'None'}} >>> cmi.get_cluster_size() 1 >>> cmi.get_nodes() [{u'id': u'i-00006016',
u'instance_type': u'm1.medium',
u'ld': u'0.0 0.025 0.065',
u'public_ip': u'115.146.86.29',
u'time_in_state': u'2268'}] >>> cmi.add_nodes(2) {u'all_fs': [],
u'app_status': u'green',
u'autoscaling': {u'as_max': u'N/A',
u'as_min': u'N/A',
u'use_autoscaling': False},
u'cluster_status': u'READY',
u'data_status': u'green',
u'disk_usage': {u'pct': u'0%', u'total': u'0', u'used': u'0'},
u'dns': u'#',
u'instance_status': {u'available': u'0', u'idle': u'0', u'requested': u'2'},
u'snapshot': {u'progress': u'None', u'status': u'None'}} >>> cmi.get_cluster_size() 3
API used to manipulate genomic analyses within Galaxy, including data management and workflow execution.
After you have created an GalaxyInstance object, access various modules via the class fields (see the source for the most up-to-date list): libraries, histories, workflows, datasets, and users are the minimum set supported. For example, to work with histories, and get a list of all the user's histories, the following should be done:
from bioblend import galaxy gi = galaxy.GalaxyInstance(url='http://127.0.0.1:8000', key='your_api_key') hl = gi.histories.get_histories()
After you have created an GalaxyInstance object, access various modules via the class fields (see the source for the most up-to-date list): libraries, histories, workflows, datasets, and users are the minimum set supported. For example, to work with histories, and get a list of all the user's histories, the following should be done:
from bioblend import galaxy gi = galaxy.GalaxyInstance(url='http://127.0.0.1:8000', key='your_api_key') hl = gi.histories.get_histories()
----
Contains possible interaction dealing with Galaxy configuration.
{u'allow_library_path_paste': False,
u'allow_user_creation': True,
u'allow_user_dataset_purge': True,
u'allow_user_deletion': False,
u'enable_unique_workflow_defaults': False,
u'ftp_upload_dir': u'/SOMEWHERE/galaxy/ftp_dir',
u'ftp_upload_site': u'galaxy.com',
u'library_import_dir': u'None',
u'logo_url': None,
u'support_url': u'http://wiki.g2.bx.psu.edu/Support',
u'terms_url': None,
u'user_library_import_dir': None,
u'wiki_url': u'http://g2.trac.bx.psu.edu/'}
----
Contains possible interactions with the Galaxy Datasets
----
Contains possible interactions with the Galaxy Datatype
[u'snpmatrix',
u'snptest',
u'tabular',
u'taxonomy',
u'twobit',
u'txt',
u'vcf',
u'wig',
u'xgmml',
u'xml']
[u'galaxy.datatypes.tabular:Vcf',
u'galaxy.datatypes.binary:TwoBit',
u'galaxy.datatypes.binary:Bam',
u'galaxy.datatypes.binary:Sff',
u'galaxy.datatypes.xml:Phyloxml',
u'galaxy.datatypes.xml:GenericXml',
u'galaxy.datatypes.sequence:Maf',
u'galaxy.datatypes.sequence:Lav',
u'galaxy.datatypes.sequence:csFasta']
----
Contains possible interactions with the Galaxy library folders
----
Contains possible interactions with the Galaxy Forms
[{u'id': u'f2db41e1fa331b3e',
u'model_class': u'FormDefinition',
u'name': u'First form',
u'url': u'/api/forms/f2db41e1fa331b3e'},
{u'id': u'ebfb8f50c6abde6d',
u'model_class': u'FormDefinition',
u'name': u'second form',
u'url': u'/api/forms/ebfb8f50c6abde6d'}]
{u'desc': u'here it is ',
u'fields': [],
u'form_definition_current_id': u'f2db41e1fa331b3e',
u'id': u'f2db41e1fa331b3e',
u'layout': [],
u'model_class': u'FormDefinition',
u'name': u'First form',
u'url': u'/api/forms/f2db41e1fa331b3e'}
----
Contains possible interactions with the Galaxy FTP Files
----
Contains possible interactions with the Galaxy Histories
Contains possible interactions with the Galaxy Groups
[{u'id': u'7c9636938c3e83bf',
u'model_class': u'Group',
u'name': u'My Group Name',
u'url': u'/api/groups/7c9636938c3e83bf'}]
[ {"name": "Listeria", "url": "/api/groups/33abac023ff186c2", "model_class": "Group", "id": "33abac023ff186c2"}, {"name": "LPN", "url": "/api/groups/73187219cd372cf8", "model_class": "Group", "id": "73187219cd372cf8"} ]
{"roles_url": "/api/groups/33abac023ff186c2/roles", "name": "Listeria", "url": "/api/groups/33abac023ff186c2", "users_url": "/api/groups/33abac023ff186c2/users", "model_class": "Group", "id": "33abac023ff186c2"}
----
Contains possible interactions with the Galaxy Histories
{'model_class':'HistoryTagAssociation', 'user_tname': 'NGS_PE_RUN', 'id': 'f792763bee8d277a', 'user_value': None}
NOTE:
Refer to bioblend.galaxy.dataset.DatasetClient.download_dataset() for the other available parameters.
Just an alias for get_most_recently_used_history().
If deleted is set to True, return histories that have been deleted.
???
----
Contains possible interactions with the Galaxy Jobs
[{u'create_time': u'2014-03-01T16:16:48.640550',
u'exit_code': 0,
u'id': u'ebfb8f50c6abde6d',
u'model_class': u'Job',
u'state': u'ok',
u'tool_id': u'fasta2tab',
u'update_time': u'2014-03-01T16:16:50.657399'},
{u'create_time': u'2014-03-01T16:05:34.851246',
u'exit_code': 0,
u'id': u'1cd8e2f6b131e891',
u'model_class': u'Job',
u'state': u'ok',
u'tool_id': u'upload1',
u'update_time': u'2014-03-01T16:05:39.558458'}]
New in version 0.5.3.
This method is designed to scan the list of previously run jobs and find records of jobs that had the exact some input parameters and datasets. This can be used to minimize the amount of repeated work, and simply recycle the old results.
{u'create_time': u'2014-03-01T16:17:29.828624',
u'exit_code': 0,
u'id': u'a799d38679e985db',
u'inputs': {u'input': {u'id': u'ebfb8f50c6abde6d',
u'src': u'hda'}},
u'model_class': u'Job',
u'outputs': {u'output': {u'id': u'a799d38679e985db',
u'src': u'hda'}},
u'params': {u'chromInfo': u'"/opt/galaxy-central/tool-data/shared/ucsc/chrom/?.len"',
u'dbkey': u'"?"',
u'seq_col': u'"2"',
u'title_col': u'["1"]'},
u'state': u'ok',
u'tool_id': u'tab2fasta',
u'update_time': u'2014-03-01T16:17:31.930728'}
----
Contains possible interactions with the Galaxy Data Libraries
:type name: str :param name: Name of the new data library
WARNING:
{u'deleted': True, u'id': u'60e680a037f41974'}
NOTE:
???
NOTE:
???
----
Contains possible interactions with the Galaxy Quota
[{ u'id': u'0604c8a56abe9a50', u'model_class': u'Quota', u'name': u'test ', u'url': u'/api/quotas/0604c8a56abe9a50'}, { u'id': u'1ee267091d0190af', u'model_class': u'Quota', u'name': u'workshop', u'url': u'/api/quotas/1ee267091d0190af'}]
{ u'bytes': 107374182400, u'default': [], u'description': u'just testing', u'display_amount': u'100.0 GB', u'groups': [], u'id': u'0604c8a56abe9a50', u'model_class': u'Quota', u'name': u'test ', u'operation': u'=', u'users': []}
----
Contains possible interactions with the Galaxy Roles
[ {"id": "f2db41e1fa331b3e", "model_class": "Role", "name": "Foo", "url": "/api/roles/f2db41e1fa331b3e"}, {"id": "f597429621d6eb2b", "model_class": "Role", "name": "Bar", "url": "/api/roles/f597429621d6eb2b"} ]
{"description": "Private Role for Foo", "id": "f2db41e1fa331b3e", "model_class": "Role", "name": "Foo", "type": "private", "url": "/api/roles/f2db41e1fa331b3e"}
----
Contains possible interaction dealing with Galaxy tools.
SEE ALSO:
If name is set and multiple names match the given name, all the tools matching the argument will be returned.
SEE ALSO:
See upload_file() for the optional parameters (except file_name).
See upload_file() for the optional parameters (except file_name).
The tool_inputs dict should contain input datasets and parameters in the (largely undocumented) format used by the Galaxy API. Some examples can be found in https://bitbucket.org/galaxy/galaxy-central/src/tip/test/api/test_tools.py .
See upload_file() for the optional parameters.
----
Contains possible interactions with the Galaxy Tool data tables
[{"model_class": "TabularToolDataTable", "name": "fasta_indexes"},
{"model_class": "TabularToolDataTable", "name": "bwa_indexes"}]
{"columns": ["value", "dbkey", "name", "path"],
"fields": [["test id",
"test",
"test name",
"/opt/galaxy-dist/tool-data/test/seq/test id.fa"]],
"model_class": "TabularToolDataTable",
"name": "all_fasta"}
----
Interaction with a Galaxy Tool Shed
[{u'changeset_revision': u'4afe13ac23b6',
u'deleted': False,
u'dist_to_shed': False,
u'error_message': u'',
u'name': u'velvet_toolsuite',
u'owner': u'edward-kirton',
u'status': u'Installed'}]
Changed in version 0.4.1: Changed method name from get_tools to get_repositories to better align with the Tool Shed concepts
SEE ALSO:
Installing the repository into an existing tool panel section requires the tool panel config file (e.g., tool_conf.xml, shed_tool_conf.xml, etc) to contain the given tool panel section:
{u'changeset_revision': u'b17455fb6222',
u'ctx_rev': u'8',
u'owner': u'aaron',
u'status': u'Installed',
u'url': u'/api/tool_shed_repositories/82de4a4c7135b20a'}
Changed in version 0.4.1: Changed method name from show_tool to show_repository to better align with the Tool Shed concepts
----
Contains possible interaction dealing with Galaxy users.
These methods must be executed by a registered Galaxy admin user.
NOTE:
NOTE:
Just an alias for create_remote_user().
[{u'email': u'a_user@example.com',
u'id': u'dda47097d9189f15',
u'url': u'/api/users/dda47097d9189f15'}]
----
Contains possible interactions with the Galaxy visualization
[{u'dbkey': u'eschColi_K12',
u'id': u'df1c7c96fc427c2d',
u'title': u'AVTest1',
u'type': u'trackster',
u'url': u'/api/visualizations/df1c7c96fc427c2d'},
{u'dbkey': u'mm9',
u'id': u'a669f50f8bf55b02',
u'title': u'Bam to Bigwig',
u'type': u'trackster',
u'url': u'/api/visualizations/a669f50f8bf55b02'}]
{u'annotation': None,
u'dbkey': u'mm9',
u'id': u'18df9134ea75e49c',
u'latest_revision': { ... },
u'model_class': u'Visualization',
u'revisions': [u'aa90649bb3ec7dcb', u'20622bc6249c0c71'],
u'slug': u'visualization-for-grant-1',
u'title': u'Visualization For Grant',
u'type': u'trackster',
u'url': u'/u/azaron/v/visualization-for-grant-1',
u'user_id': u'21e4aed91386ca8b'}
----
Contains possible interactions with the Galaxy Workflows
WARNING:
[{u'update_time': u'2015-10-31T22:00:22',
u'uuid': u'c8aa2b1c-801a-11e5-a9e5-8ca98228593c',
u'history_id': u'2f94e8ae9edff68a',
u'workflow_id': u'03501d7626bd192f',
u'state': u'new',
u'model_class': u'WorkflowInvocation',
u'id': u'df7a1f0c02a5b08e'} ]
[{u'id': u'92c56938c2f9b315',
u'name': u'Simple',
u'url': u'/api/workflows/92c56938c2f9b315'}]
{u'id': u'ee0e2b4b696d9092',
u'model_class': u'StoredWorkflow',
u'name': u'Super workflow that solves everything!',
u'published': False,
u'tags': [],
u'url': u'/api/workflows/ee0e2b4b696d9092'}
A mapping of workflow inputs to datasets and dataset collections. The datasets source can be a LibraryDatasetDatasetAssociation (ldda), LibraryDataset (ld), HistoryDatasetAssociation (hda), or HistoryDatasetCollectionAssociation (hdca).
The map must be in the following format: {'<input_index>': {'id': <encoded dataset ID>, 'src': '[ldda, ld, hda, hdca]'}} (e.g. {'2': {'id': '29beef4fadeed09f', 'src': 'hda'}})
This map may also be indexed by the UUIDs of the workflow steps, as indicated by the uuid property of steps returned from the Galaxy API.
{u'inputs': {u'0': {u'src': u'hda', u'id': u'a7db2fac67043c7e', u'uuid': u'7932ffe0-2340-4952-8857-dbaa50f1f46a'}},
u'update_time': u'2015-10-31T22:00:26',
u'uuid': u'c8aa2b1c-801a-11e5-a9e5-8ca98228593c',
u'history_id': u'2f94e8ae9edff68a',
u'workflow_id': u'03501d7626bd192f',
u'state': u'ready',
u'steps': [{u'workflow_step_uuid': u'b81250fd-3278-4e6a-b269-56a1f01ef485',
u'update_time': u'2015-10-31T22:00:26',
u'job_id': None,
u'state': None,
u'workflow_step_label': None,
u'order_index': 0,
u'action': None,
u'model_class': u'WorkflowInvocationStep',
u'workflow_step_id': u'cbbbf59e8f08c98c',
u'id': u'd413a19dec13d11e'},
{u'workflow_step_uuid': u'e62440b8-e911-408b-b124-e05435d3125e',
u'update_time': u'2015-10-31T22:00:26',
u'job_id': u'e89067bb68bee7a0',
u'state': u'new',
u'workflow_step_label':None,
u'order_index': 1,
u'action': None,
u'model_class': u'WorkflowInvocationStep',
u'workflow_step_id': u'964b37715ec9bd22',
u'id': u'2f94e8ae9edff68a'},
],
u'model_class': u'WorkflowInvocation',
u'id': u'df7a1f0c02a5b08e' }
The replacement_params dict should map parameter names in post-job actions (PJAs) to their runtime values. For instance, if the final step has a PJA like the following:
{u'RenameDatasetActionout_file1': {
u'action_arguments': {u'newname': u'${output}'},
u'action_type': u'RenameDatasetAction',
u'output_name': u'out_file1'}}
then the following renames the output dataset to 'foo':
replacement_params = {'output': 'foo'}
see also this email thread.
WARNING:
{u'history': u'64177123325c9cfd',
u'outputs': [u'aa4d3084af404259']}
The replacement_params dict should map parameter names in post-job actions (PJAs) to their runtime values. For instance, if the final step has a PJA like the following:
{u'RenameDatasetActionout_file1': {
u'action_arguments': {u'newname': u'${output}'},
u'action_type': u'RenameDatasetAction',
u'output_name': u'out_file1'}}
then the following renames the output dataset to 'foo':
replacement_params = {'output': 'foo'}
see also this email thread.
WARNING:
{u'inputs': {u'0': {u'src': u'hda', u'id': u'a7db2fac67043c7e', u'uuid': u'7932ffe0-2340-4952-8857-dbaa50f1f46a'}},
u'update_time': u'2015-10-31T22:00:26',
u'uuid': u'c8aa2b1c-801a-11e5-a9e5-8ca98228593c',
u'history_id': u'2f94e8ae9edff68a',
u'workflow_id': u'03501d7626bd192f',
u'state': u'ready',
u'steps': [{u'workflow_step_uuid': u'b81250fd-3278-4e6a-b269-56a1f01ef485',
u'update_time': u'2015-10-31T22:00:26',
u'job_id': None,
u'state': None,
u'workflow_step_label': None,
u'order_index': 0,
u'action': None,
u'model_class': u'WorkflowInvocationStep',
u'workflow_step_id': u'cbbbf59e8f08c98c',
u'id': u'd413a19dec13d11e'},
{u'workflow_step_uuid': u'e62440b8-e911-408b-b124-e05435d3125e',
u'update_time': u'2015-10-31T22:00:26',
u'job_id': u'e89067bb68bee7a0',
u'state': u'new',
u'workflow_step_label':None,
u'order_index': 1,
u'action': None,
u'model_class': u'WorkflowInvocationStep',
u'workflow_step_id': u'964b37715ec9bd22',
u'id': u'2f94e8ae9edff68a'},
],
u'model_class': u'WorkflowInvocation',
u'id': u'df7a1f0c02a5b08e' }
{u'workflow_step_uuid': u'4060554c-1dd5-4287-9040-8b4f281cf9dc',
u'update_time': u'2015-10-31T22:11:14',
u'job_id': None,
u'state': None,
u'workflow_step_label': None,
u'order_index': 2,
u'action': None,
u'model_class': u'WorkflowInvocationStep',
u'workflow_step_id': u'52e496b945151ee8',
u'id': u'63cd3858d057a6d1'}
{u'id': u'92c56938c2f9b315',
u'inputs': {u'23': {u'label': u'Input Dataset', u'value': u''}},
u'name': u'Simple',
u'url': u'/api/workflows/92c56938c2f9b315'}
This is actually a factory class which instantiates the entity-specific clients.
Example: get a list of all histories for a user with API key 'foo':
from bioblend.galaxy.objects import * gi = GalaxyInstance('http://127.0.0.1:8080', 'foo') histories = gi.histories.list()
Clients for interacting with specific Galaxy entity types.
Classes in this module should not be instantiated directly, but used via their handles in GalaxyInstance.
Previews entity summaries provided by REST collection URIs, e.g. http://host:port/api/libraries. Being the most lightweight objects associated to the various entities, these are the ones that should be used to retrieve their basic info.
This method first gets the entity summaries, then gets the complete description for each entity with an additional GET call, so may be slow.
Note that the same name can map to multiple histories.
NOTE:
Note that the same name can map to multiple libraries.
WARNING:
Note that the same name can map to multiple workflows.
WARNING:
A basic object-oriented interface for Galaxy entities.
Wrapper instances wrap deserialized JSON dictionaries such as the ones obtained by the Galaxy web API, converting key-based access to attribute-based access (e.g., library['name'] -> library.name).
Dict keys that are converted to attributes are listed in the BASE_ATTRS class variable: this is the 'stable' interface. Note that the wrapped dictionary is accessible via the wrapped attribute.
Steps are the main building blocks of a Galaxy workflow. A step can be: an input (type 'data_collection_input` or 'data_input`), a computational tool (type 'tool`) or a pause (type 'pause`).
A workflow defines a sequence of steps that produce one or more results from an input dataset.
WARNING:
A workflow is considered runnable on a Galaxy instance if all of the tools it uses are installed in that instance.
The params dict should be structured as follows:
PARAMS = {STEP_ID: PARAM_DICT, ...} PARAM_DICT = {NAME: VALUE, ...}
For backwards compatibility, the following (deprecated) format is also supported:
PARAMS = {TOOL_ID: PARAM_DICT, ...}
in which case PARAM_DICT affects all steps with the given tool id. If both by-tool-id and by-step-id specifications are used, the latter takes precedence.
Finally (again, for backwards compatibility), PARAM_DICT can also be specified as:
PARAM_DICT = {'param': NAME, 'value': VALUE}
Note that this format allows only one parameter to be set per step.
Example: set 'a' to 1 for the third workflow step:
params = {workflow.steps[2].id: {'a': 1}}
WARNING:
NOTE:
Returns: self
NOTE:
See upload_file() for the optional parameters (except file_name).
See upload_file() for the optional parameters.
See upload_file() for the optional parameters.
See upload_file() for the optional parameters.
See upload_data() for info on other params.
Optional keyword arguments: file_type, dbkey.
NOTE:
See upload_data() for info on other params.
See upload_data() for info on other params.
See upload_data() for info on other params.
Id of the folder container. Use container.id instead.
Returns: self
Id of the dataset container. Use container.id instead.
See get_stream() for info on other params.
See get_stream() for param info.
WARNING:
See get_stream() for param info.
Returns: self
WARNING:
The inputs dict should contain input datasets and parameters in the (largely undocumented) format used by the Galaxy API. Some examples can be found in Galaxy's API test suite. The value of an input dataset can also be a Dataset object, which will be automatically converted to the needed format.
Classes derived from this one model the short summaries returned by global getters such as /api/libraries.
Instances of this class wrap dictionaries obtained by getting /api/libraries from Galaxy.
Instances of this class wrap dictionaries obtained by getting /api/histories from Galaxy.
Instances of this class wrap dictionaries obtained by getting /api/workflows from Galaxy.
This page describes some sample use cases for the Galaxy API and provides examples for these API calls. In addition to this page, there are functional examples of complete scripts in the docs/examples directory of the BioBlend source code repository.
To connect to a running Galaxy server, you will need an account on that Galaxy instance and an API key for the account. Instructions on getting an API key can be found at http://wiki.galaxyproject.org/Learn/API .
To open a connection call:
from bioblend.galaxy import GalaxyInstance gi = GalaxyInstance(url='http://example.galaxy.url', key='your-API-key')
We now have a GalaxyInstance object which allows us to interact with the Galaxy server under our account, and access our data. If the account is a Galaxy admin account we also will be able to use this connection to carry out admin actions.
Methods for accessing histories and datasets are grouped under GalaxyInstance.histories.* and GalaxyInstance.datasets.* respectively.
To get information on the Histories currently in your account, call:
>>> gi.histories.get_histories() [{u'id': u'f3c2b0f3ecac9f02',
u'name': u'RNAseq_DGE_BASIC_Prep',
u'url': u'/api/histories/f3c2b0f3ecac9f02'},
{u'id': u'8a91dcf1866a80c2',
u'name': u'June demo',
u'url': u'/api/histories/8a91dcf1866a80c2'}]
This returns a list of dictionaries containing basic metadata, including the id and name of each History. In this case, we have two existing Histories in our account, 'RNAseq_DGE_BASIC_Prep' and 'June demo'. To get more detailed information about a History we can pass its id to the show_history method:
>>> gi.histories.show_history('f3c2b0f3ecac9f02', contents=False) {u'annotation': u'',
u'contents_url': u'/api/histories/f3c2b0f3ecac9f02/contents',
u'id': u'f3c2b0f3ecac9f02',
u'name': u'RNAseq_DGE_BASIC_Prep',
u'nice_size': u'93.5 MB',
u'state': u'ok',
u'state_details': {u'discarded': 0,
u'empty': 0,
u'error': 0,
u'failed_metadata': 0,
u'new': 0,
u'ok': 7,
u'paused': 0,
u'queued': 0,
u'running': 0,
u'setting_metadata': 0,
u'upload': 0 },
u'state_ids': {u'discarded': [],
u'empty': [],
u'error': [],
u'failed_metadata': [],
u'new': [],
u'ok': [u'd6842fb08a76e351',
u'10a4b652da44e82a',
u'81c601a2549966a0',
u'a154f05e3bcee26b',
u'1352fe19ddce0400',
u'06d549c52d753e53',
u'9ec54455d6279cc7'],
u'paused': [],
u'queued': [],
u'running': [],
u'setting_metadata': [],
u'upload': []
}
}
This gives us a dictionary containing the History's metadata. With contents=False (the default), we only get a list of ids of the datasets contained within the History; with contents=True we would get metadata on each dataset. We can also directly access more detailed information on a particular dataset by passing its id to the show_dataset method:
>>> gi.datasets.show_dataset('10a4b652da44e82a') {u'data_type': u'fastqsanger',
u'deleted': False,
u'file_size': 16527060,
u'genome_build': u'dm3',
u'id': 17499,
u'metadata_data_lines': None,
u'metadata_dbkey': u'dm3',
u'metadata_sequences': None,
u'misc_blurb': u'15.8 MB',
u'misc_info': u'Noneuploaded fastqsanger file',
u'model_class': u'HistoryDatasetAssociation',
u'name': u'C1_R2_1.chr4.fq',
u'purged': False,
u'state': u'ok',
u'visible': True}
To upload a local file to a Galaxy server, you can run the upload_file method, supplying the path to a local file:
>>> gi.tools.upload_file('test.txt', 'f3c2b0f3ecac9f02') {u'implicit_collections': [],
u'jobs': [{u'create_time': u'2015-07-28T17:52:39.756488',
u'exit_code': None,
u'id': u'9752b387803d3e1e',
u'model_class': u'Job',
u'state': u'new',
u'tool_id': u'upload1',
u'update_time': u'2015-07-28T17:52:39.987509'}],
u'output_collections': [],
u'outputs': [{u'create_time': u'2015-07-28T17:52:39.331176',
u'data_type': u'galaxy.datatypes.data.Text',
u'deleted': False,
u'file_ext': u'auto',
u'file_size': 0,
u'genome_build': u'?',
u'hda_ldda': u'hda',
u'hid': 16,
u'history_content_type': u'dataset',
u'history_id': u'f3c2b0f3ecac9f02',
u'id': u'59c76a119581e190',
u'metadata_data_lines': None,
u'metadata_dbkey': u'?',
u'misc_blurb': None,
u'misc_info': None,
u'model_class': u'HistoryDatasetAssociation',
u'name': u'test.txt',
u'output_name': u'output0',
u'peek': u'<table cellspacing="0" cellpadding="3"></table>',
u'purged': False,
u'state': u'queued',
u'tags': [],
u'update_time': u'2015-07-28T17:52:39.611887',
u'uuid': u'ff0ee99b-7542-4125-802d-7a193f388e7e',
u'visible': True}]}
If files are greater than 2GB in size, they will need to be uploaded via FTP. Importing files from the user's FTP folder can be done via running the upload tool again:
>>> gi.tools.upload_from_ftp('test.txt', 'f3c2b0f3ecac9f02') {u'implicit_collections': [],
u'jobs': [{u'create_time': u'2015-07-28T17:57:43.704394',
u'exit_code': None,
u'id': u'82b264d8c3d11790',
u'model_class': u'Job',
u'state': u'new',
u'tool_id': u'upload1',
u'update_time': u'2015-07-28T17:57:43.910958'}],
u'output_collections': [],
u'outputs': [{u'create_time': u'2015-07-28T17:57:43.209041',
u'data_type': u'galaxy.datatypes.data.Text',
u'deleted': False,
u'file_ext': u'auto',
u'file_size': 0,
u'genome_build': u'?',
u'hda_ldda': u'hda',
u'hid': 17,
u'history_content_type': u'dataset',
u'history_id': u'f3c2b0f3ecac9f02',
u'id': u'a676e8f07209a3be',
u'metadata_data_lines': None,
u'metadata_dbkey': u'?',
u'misc_blurb': None,
u'misc_info': None,
u'model_class': u'HistoryDatasetAssociation',
u'name': u'test.txt',
u'output_name': u'output0',
u'peek': u'<table cellspacing="0" cellpadding="3"></table>',
u'purged': False,
u'state': u'queued',
u'tags': [],
u'update_time': u'2015-07-28T17:57:43.544407',
u'uuid': u'2cbe8f0a-4019-47c4-87e2-005ce35b8449',
u'visible': True}]}
Methods for accessing Data Libraries are grouped under GalaxyInstance.libraries.*. Most Data Library methods are available to all users, but as only administrators can create new Data Libraries within Galaxy, the create_folder and create_library methods can only be called using an API key belonging to an admin account.
We can view the Data Libraries available to our account using:
>>> gi.libraries.get_libraries() [{u'id': u'8e6f930d00d123ea',
u'name': u'RNA-seq workshop data',
u'url': u'/api/libraries/8e6f930d00d123ea'},
{u'id': u'f740ab636b360a70',
u'name': u'1000 genomes',
u'url': u'/api/libraries/f740ab636b360a70'}]
This gives a list of metadata dictionaries with basic information on each library. We can get more information on a particular Data Library by passing its id to the show_library method:
>>> gi.libraries.show_library('8e6f930d00d123ea') {u'contents_url': u'/api/libraries/8e6f930d00d123ea/contents',
u'description': u'RNA-Seq workshop data',
u'name': u'RNA-Seq',
u'synopsis': u'Data for the RNA-Seq tutorial'}
We can get files into Data Libraries in several ways: by uploading from our local machine, by retrieving from a URL, by passing the new file content directly into the method, or by importing a file from the filesystem on the Galaxy server.
For instance, to upload a file from our machine we might call:
>>> gi.libraries.upload_file_from_local_path('8e6f930d00d123ea', '/local/path/to/mydata.fastq', file_type='fastqsanger')
Note that we have provided the id of the destination Data Library, and in this case we have specified the type that Galaxy should assign to the new dataset. The default value for file_type is 'auto', in which case Galaxy will attempt to guess the dataset type.
Methods for accessing workflows are grouped under GalaxyInstance.workflows.*.
To get information on the Workflows currently in your account, use:
>>> gi.workflows.get_workflows() [{u'id': u'e8b85ad72aefca86',
u'name': u"TopHat + cufflinks part 1",
u'url': u'/api/workflows/e8b85ad72aefca86'},
{u'id': u'b0631c44aa74526d',
u'name': u'CuffDiff',
u'url': u'/api/workflows/b0631c44aa74526d'}]
This returns a list of metadata dictionaries. We can get the details of a particular Workflow, including its steps, by passing its id to the show_workflow method:
>>> gi.workflows.show_workflow('e8b85ad72aefca86') {u'id': u'e8b85ad72aefca86',
u'inputs':
{u'252':
{u'label': u'Input RNA-seq fastq',
u'value': u''
}
},
u'name': u"TopHat + cufflinks part 1",
u'steps':
{u'250':
{u'id': 250,
u'input_steps':
{u'input1':
{u'source_step': 252,
u'step_output': u'output'
}
},
u'tool_id': u'tophat',
u'type': u'tool'
},
u'251':
{u'id': 251,
u'input_steps':
{u'input':
{u'source_step': 250,
u'step_output': u'accepted_hits'
}
},
u'tool_id': u'cufflinks',
u'type': u'tool'
},
u'252':
{u'id': 252,
u'input_steps': {},
u'tool_id': None,
u'type': u'data_input'
}
},
u'url': u'/api/workflows/e8b85ad72aefca86'
}
Workflows can be exported from or imported into Galaxy as JSON. This makes it possible to archive Workflows, or to move them between Galaxy instances.
To export a workflow, we can call:
>>> workflow_string = gi.workflows.export_workflow_json('e8b85ad72aefca86')
This gives us a (rather long) string with a JSON-encoded representation of the Workflow. We can import this string as a new Workflow with:
>>> gi.workflows.import_workflow_json(workflow_string) {u'id': u'c0bacafdfe211f9a',
u'name': u'TopHat + cufflinks part 1 (imported from API)',
u'url': u'/api/workflows/c0bacafdfe211f9a'}
This call returns a dictionary containing basic metadata on the new Workflow object. Since in this case we have imported the JSON string into the original Galaxy instance, we now have a duplicate of the original Workflow in our account:
>>> gi.workflows.get_workflows() [{u'id': u'c0bacafdfe211f9a',
u'name': u'TopHat + cufflinks part 1 (imported from API)',
u'url': u'/api/workflows/c0bacafdfe211f9a'},
{u'id': u'e8b85ad72aefca86',
u'name': u"TopHat + cufflinks part 1",
u'url': u'/api/workflows/e8b85ad72aefca86'},
{u'id': u'b0631c44aa74526d',
u'name': u'CuffDiff',
u'url': u'/api/workflows/b0631c44aa74526d'}]
Instead of using JSON strings directly, Workflows can be exported to or imported from files on the local disk using the export_workflow_to_local_path and import_workflow_from_local_path methods. See the API reference for details.
NOTE:
To run a Workflow, we need to tell Galaxy which datasets to use for which workflow inputs. We can use datasets from Histories or Data Libraries.
Examine the Workflow above. We can see that it takes only one input file. That is:
>>> wf = gi.workflows.show_workflow('e8b85ad72aefca86') >>> wf['inputs'] {u'252':
{u'label':
u'Input RNA-seq fastq',
u'value': u''
} }
There is one input, labelled 'Input RNA-seq fastq'. This input is passed to the Tophat tool and should be a fastq file. We will use the dataset we examined above, under View Histories and Datasets, which had name 'C1_R2_1.chr4.fq' and id '10a4b652da44e82a'.
To specify the inputs, we build a data map and pass this to the run_workflow method. This data map is a nested dictionary object which maps inputs to datasets. We call:
>>> datamap = dict() >>> datamap['252'] = { 'src':'hda', 'id':'10a4b652da44e82a' } >>> gi.workflows.run_workflow('e8b85ad72aefca86', datamap, history_name='New output history') {u'history': u'0a7b7992a7cabaec',
u'outputs': [u'33be8ad9917d9207',
u'fbee1c2dc793c114',
u'85866441984f9e28',
u'1c51aa78d3742386',
u'a68e8770e52d03b4',
u'c54baf809e3036ac',
u'ba0db8ce6cd1fe8f',
u'c019e4cf08b2ac94'
] }
In this case the only input id is '252' and the corresponding dataset id is '10a4b652da44e82a'. We have specified the dataset source to be 'hda' (HistoryDatasetAssociation) since the dataset is stored in a History. See the API reference for allowed dataset specifications. We have also requested that a new History be created and used to store the results of the run, by setting history_name='New output history'.
The run_workflow call submits all the jobs which need to be run to the Galaxy workflow engine, with the appropriate dependencies so that they will run in order. The call returns immediately, so we can continue to submit new jobs while waiting for this workflow to execute. run_workflow returns the id of the output History and of the datasets that will be created as a result of this run. Note that these dataset ids are valid immediately, so we can specify these datasets as inputs to new jobs even before the files have been created, and the new jobs will be added to the queue with the appropriate dependencies.
If we view the output History immediately after calling run_workflow, we will see something like:
>>> gi.histories.show_history('0a7b7992a7cabaec') {u'annotation': u'',
u'contents_url': u'/api/histories/0a7b7992a7cabaec/contents',
u'id': u'0a7b7992a7cabaec',
u'name': u'New output history',
u'nice_size': u'0 bytes',
u'state': u'queued',
u'state_details': {u'discarded': 0,
u'empty': 0,
u'error': 0,
u'failed_metadata': 0,
u'new': 0,
u'ok': 0,
u'paused': 0,
u'queued': 8,
u'running': 0,
u'setting_metadata': 0,
u'upload': 0},
u'state_ids': {u'discarded': [],
u'empty': [],
u'error': [],
u'failed_metadata': [],
u'new': [],
u'ok': [],
u'paused': [],
u'queued': [u'33be8ad9917d9207',
u'fbee1c2dc793c114',
u'85866441984f9e28',
u'1c51aa78d3742386',
u'a68e8770e52d03b4',
u'c54baf809e3036ac',
u'ba0db8ce6cd1fe8f',
u'c019e4cf08b2ac94'],
u'running': [],
u'setting_metadata': [],
u'upload': []
} }
In this case, because the submitted jobs have not had time to run, the output History contains 8 datasets in the 'queued' state and has a total size of 0 bytes. If we make this call again later we should instead see completed output files.
Methods for managing users are grouped under GalaxyInstance.users.*. User management is only available to Galaxy administrators, that is, the API key used to connect to Galaxy must be that of an admin account.
To get a list of users, call:
>>> gi.users.get_users() [{u'email': u'userA@unimelb.edu.au',
u'id': u'975a9ce09b49502a',
u'quota_percent': None,
u'url': u'/api/users/975a9ce09b49502a'},
{u'email': u'userB@student.unimelb.edu.au',
u'id': u'0193a95acf427d2c',
u'quota_percent': None,
u'url': u'/api/users/0193a95acf427d2c'}]
API used to interact with the Galaxy Toolshed, including repository management.
After you have created an ToolShed object, access various modules via the class fields (see the source for the most up-to-date list): repositories are the minimum set supported. For example, to work with a repositories, and get a list of all the public repositories, the following should be done:
from bioblend import toolshed ts = toolshed.ToolShedInstance(url='http://testtoolshed.g2.bx.psu.edu') rl = ts.repositories.get_repositories() tools = ts.tools.search_tools('fastq')
After you have created an ToolShed object, access various modules via the class fields (see the source for the most up-to-date list): repositories are the minimum set supported. For example, to work with a repositories, and get a list of all the public repositories, the following should be done:
from bioblend import toolshed ts = toolshed.ToolShedInstance(url='http://testtoolshed.g2.bx.psu.edu') rl = ts.repositories.get_repositories() tools = ts.tools.search_tools('fastq')
Interaction with a Tool Shed instance repositories
{
"deleted": false,
"deprecated": false,
"description": "new_synopsis",
"homepage_url": "https://github.com/galaxyproject/",
"id": "8cf91205f2f737f4",
"long_description": "this is some repository",
"model_class": "Repository",
"name": "new_repo_17",
"owner": "qqqqqq",
"private": false,
"remote_repository_url": "https://github.com/galaxyproject/tools-devteam",
"times_downloaded": 0,
"type": "unrestricted",
"user_id": "adb5f5c93f827949" }
[{u'deleted': False,
u'description': u'Tools for manipulating data',
u'id': u'175812cd7caaf439',
u'model_class': u'Category',
u'name': u'Text Manipulation',
u'url': u'/api/categories/175812cd7caaf439'},]
New in version 0.5.2.
[{u'times_downloaded': 0, u'user_id': u'5cefd48bc04af6d4', u'description': u'Order Contigs', u'deleted': False, u'deprecated': False, u'private': False, u'url': u'/api/repositories/287bd69f724b99ce', u'owner': u'billybob', u'id': u'287bd69f724b99ce', u'name': u'best_tool_ever'}]
Changed in version 0.4.1: Changed method name from get_tools to get_repositories to better align with the Tool Shed concepts
For example:
[{u'times_downloaded': 269, u'user_id': u'1de29d50c3c44272', u'description': u'Galaxy Freebayes Bayesian genetic variant detector tool', u'deleted': False, u'deprecated': False, u'private': False, u'long_description': u'Galaxy Freebayes Bayesian genetic variant detector tool originally included in the Galaxy code distribution but migrated to the tool shed.', u'url': u'/api/repositories/491b7a3fddf9366f', u'owner': u'devteam', u'id': u'491b7a3fddf9366f', u'name': u'freebayes'}, {u'repository_id': u'491b7a3fddf9366f', u'has_repository_dependencies': False, u'includes_tools_for_display_in_tool_panel': True, u'url': u'/api/repository_revisions/504be8aaa652c154', u'malicious': False, u'includes_workflows': False, u'downloadable': True, u'includes_tools': True, u'changeset_revision': u'd291dc763c4c', u'id': u'504be8aaa652c154', u'includes_tool_dependencies': True, u'includes_datatypes': False}, {u'freebayes': [u'Galaxy Freebayes Bayesian genetic variant detector tool', u'http://takadonet@toolshed.g2.bx.psu.edu/repos/devteam/freebayes', u'd291dc763c4c', u'9', u'devteam', {}, {u'freebayes/0.9.6_9608597d12e127c847ae03aa03440ab63992fedf': {u'repository_name': u'freebayes', u'name': u'freebayes', u'readme': u'FreeBayes requires g++ and the standard C and C++ development libraries. Additionally, cmake is required for building the BamTools API.', u'version': u'0.9.6_9608597d12e127c847ae03aa03440ab63992fedf', u'repository_owner': u'devteam', u'changeset_revision': u'd291dc763c4c', u'type': u'package'}, u'samtools/0.1.18': {u'repository_name': u'freebayes', u'name': u'samtools', u'readme': u'Compiling SAMtools requires the ncurses and zlib development libraries.', u'version': u'0.1.18', u'repository_owner': u'devteam', u'changeset_revision': u'd291dc763c4c', u'type': u'package'}}]}]
[{u'repository_id': u'78f2604ff5e65707', u'has_repository_dependencies': False, u'includes_tools_for_display_in_tool_panel': True, u'url': u'/api/repository_revisions/92250afff777a169', u'malicious': False, u'includes_workflows': False, u'downloadable': True, u'includes_tools': True, u'changeset_revision': u'6e26c5a48e9a', u'id': u'92250afff777a169', u'includes_tool_dependencies': False, u'includes_datatypes': False}, {u'repository_id': u'f9662009da7bfce0', u'has_repository_dependencies': False, u'includes_tools_for_display_in_tool_panel': True, u'url': u'/api/repository_revisions/d3823c748ae2205d', u'malicious': False, u'includes_workflows': False, u'downloadable': True, u'includes_tools': True, u'changeset_revision': u'15a54fa11ad7', u'id': u'd3823c748ae2205d', u'includes_tool_dependencies': False, u'includes_datatypes': False}]
},
u'score': 4.92
},
},
u'score': 4.14
}
], u'hostname': u'https://testtoolshed.g2.bx.psu.edu/', u'page': u'1', u'page_size': u'2', u'total_results': u'64' }
{{u'times_downloaded': 0, u'user_id': u'5cefd48bc04af6d4', u'description': u'Order Contigs', u'deleted': False, u'deprecated': False, u'private': False, u'url': u'/api/repositories/287bd69f724b99ce', u'owner': u'billybob', u'id': u'287bd69f724b99ce', u'name': u'best_tool_ever'}
Changed in version 0.4.1: Changed method name from show_tool to show_repository to better align with the Tool Shed concepts
{u'repository_id': u'491b7a3fddf9366f',
u'has_repository_dependencies': False,
u'includes_tools_for_display_in_tool_panel': True,
u'test_install_error': False,
u'url': u'/api/repository_revisions/504be8aaa652c154',
u'malicious': False,
u'includes_workflows': False,
u'id': u'504be8aaa652c154',
u'do_not_test': False,
u'downloadable': True,
u'includes_tools': True,
u'tool_test_results': {u'missing_test_components': [],,
u'includes_datatypes': False}
For example a successful upload will look like:
{u'content_alert': u'', u'message': u''}
New in version 0.5.2.
BioBlend allows library-wide configuration to be set in external files. These configuration files can be used to specify access keys, for example.
If you'd like to do more than just a mock test, you'll want to point BioBlend to an instance of Galaxy. Do so by exporting the following two variables:
$ export BIOBLEND_GALAXY_URL=http://127.0.0.1:8080 $ export BIOBLEND_GALAXY_API_KEY=<API key>
The unit tests, stored in the tests folder, can be run using nose. From the project root:
$ nosetests
If you've run into issues, found a bug, or can't seem to find an answer to your question regarding the use and functionality of BioBlend, please use Github Issues page to ask your question.
Links to other documentation and libraries relevant to this library:
Enis Afgan
2012-2016, Enis Afgan
July 6, 2016 | 0.7.0 |