Astroquery API Specification¶
Service Class¶
The query tools will be implemented as class methods, so that the standard approach for a given web service (e.g., IRSA, UKIDSS, SIMBAD) will be
from astroquery.service import Service
result = Service.query_object('M 31')
for services that do not require login, and
from astroquery.service import Service
S = Service(user='username',password='password')
result = S.query_object('M 31')
for services that do.
Query Methods¶
The classes will have the following methods where appropriate:
query_object(objectname, ...)
query_region(coordinate, radius=, width=)
get_images(coordinate)
They may also have other methods for querying non-standard data types
(e.g., ADS queries that may return a bibtex
text block).
query_object¶
query_object
is only needed for services that are capable of parsing an
object name (e.g., SIMBAD, Vizier, NED), otherwise query_region
is an
adequate approach, as any name can be converted to a coordinate via the SIMBAD
name parser.
query_region¶
Query a region around a coordinate.
One of these keywords must be specified (no default is assumed):
radius - an astropy Quantity object, or a string that can be parsed into one.
e.g., '1 degree' or 1*u.degree.
If radius is specified, the shape is assumed to be a circle
width - a Quantity. Specifies the edge length of a square box
height - a Quantity. Specifies the height of a rectangular box. Must be passed with width.
Returns a Table
.
get_images¶
Perform a coordinate-based query to acquire images.
Returns a list of HDUList
objects.
Shape keywords are optional - some query services allow searches for images that overlap with a specified coordinate.
(query)_async¶
Includes get_images_async
, query_region_async
, query_object_async
Same as the above query tools, but returns a list of readable file objects instead of a parsed
object so that the data is not downloaded until result.get_data()
is run.
Common Keywords¶
These keywords are common to all query methods:
return_query_payload - Return the POST data that will be submitted as a dictionary
savename - [optional - see discussion below] File path to save the downloaded query to
timeout - timeout in seconds
Asynchronous Queries¶
Some services require asynchronous query submission & download, e.g. Besancon, the NRAO Archive, the Fermi archive, etc. The data needs to be “staged” on the remote server before it can be downloaded. For these queries, the approach is
result = Service.query_region_async(coordinate)
data = result.get_data()
# this will periodically check whether the data is available at the specified URL
Additionally, any service can be queried asynchronously - get_images_async
will return readable objects that can be downloaded at a later time.
Outline of an Example Module¶
Directory Structure:
module/
module/__init__.py
module/core.py
module/tests/test_module.py
__init__.py
contains:
from astropy import config as _config
class Conf(_config.ConfigNamespace):
"""
Configuration parameters for `astroquery.template_module`.
"""
server = _config.ConfigItem(
['http://dummy_server_mirror_1',
'http://dummy_server_mirror_2',
'http://dummy_server_mirror_n'],
'Name of the template_module server to use.'
)
timeout = _config.ConfigItem(
30,
'Time limit for connecting to template_module server.'
)
conf = Conf()
from .core import QueryClass
__all__ = ['QueryClass', 'conf']
core.py
contains:
from ..utils.class_or_instance import class_or_instance
from ..utils import async_to_sync
from . import conf
__all__ = ['QueryClass'] # specifies what to import
@async_to_sync
class QueryClass(astroquery.query.BaseQuery):
server = conf.server
def __init__(self, *args):
""" set some parameters """
# do login here
pass
@class_or_instance
def query_region_async(self, *args, get_query_payload=False):
request_payload = self._args_to_payload(*args)
response = self._request(method="POST", url=self.server,
data=request_payload, timeout=TIMEOUT)
# primarily for debug purposes, but also useful if you want to send
# someone a URL linking directly to the data
if get_query_payload:
return request_payload
return response
@class_or_instance
def get_images_async(self, *args):
image_urls = self.get_image_list(*args)
return [get_readable_fileobj(U) for U in image_urls]
# get_readable_fileobj returns need a "get_data()" method?
@class_or_instance
def get_image_list(self, *args):
request_payload = self._args_to_payload(*args)
response = self._request(method="POST", url=self.server,
data=request_payload, timeout=TIMEOUT)
return self.extract_image_urls(response.text)
def _parse_result(self, result):
# do something, probably with regexp's
return astropy.table.Table(tabular_data)
def _args_to_payload(self, *args):
# convert arguments to a valid requests payload
return dict
Parallel Queries¶
For multiple parallel queries logged in to the same object, you could do:
from astroquery.module import QueryClass
QC = QueryClass(login_information)
results = parallel_map(QC.query_object, ['m31', 'm51', 'm17'],
radius=['1"', '1"', '1"'])
results = [QC.query_object_async(obj, radius=r)
for obj,r in zip(['m31', 'm51', 'm17'], ['1"', '1"', '1"'])]
Here parallel_map()
is a parallel implementation of some map function.
Exceptions¶
What errors should be thrown if queries fail? Failed queries should raise a custom Exception that will include the full html (or xml) of the failure, but where possible should parse the web page’s error message into something useful.
How should timeouts be handled? Timeouts should raise a
TimeoutError
.
Examples¶
Standard usage should be along these lines:
from astroquery.simbad import Simbad
result = Simbad.query_object("M 31")
# returns astropy.Table object
from astroquery.ipac.irsa import Irsa
images = Irsa.get_images("M 31","5 arcmin")
# searches for images in a 5-arcminute circle around M 31
# returns list of HDU objects
images = Irsa.get_images("M 31")
# searches for images overlapping with the SIMBAD position of M 31, if supported by the service?
# returns list of HDU objects
from astroquery.ukidss import Ukidss
Ukidss.login(username, password)
result = Ukidss.query_region("5.0 0.0 gal", catalog='GPS')
# FAILS: no radius specified!
result = Ukidss.query_region("5.0 0.0 gal", catalog='GPS', radius=1)
# FAILS: no assumed units!
result = Ukidss.query_region("5.0 0.0 gal", catalog='GPS', radius='1 arcmin')
# SUCCEEDS! returns an astropy.Table
from astropy.coordinates import SkyCoord
import astropy.units as u
result = Ukidss.query_region(
SkyCoord(5,0,unit=('deg','deg'), frame='galactic'),
catalog='GPS', region='circle', radius=5*u.arcmin)
# returns an astropy.Table
from astroquery.nist import Nist
hydrogen = Nist.query(4000*u.AA, 7000*u.AA, linename='H I', energy_level_unit='eV')
# returns an astropy.Table
For tools in which multiple catalogs can be queried, e.g. as in the UKIDSS
examples, they must be specified. There should also be a list_catalogs
function that returns a list
of catalog name strings:
print(Ukidss.list_catalogs())
Unparseable Data¶
If data cannot be parsed into its expected form (Table
, astropy.io.fits.PrimaryHDU
),
the raw unparsed data will be returned and a Warning
issued.