Querying data¶
Querying sequences to determine allele identity¶
Sequence queries are performed in the sequence definition database. Click ‘Sequence query’ from the contents page.
Paste your sequence in to the box - there is no need to trim. Normally, you can leave the locus setting on ‘All loci’ - the software should identify the correct locus based on your sequence. Sometimes, it may be quicker, however, to select the specific locus or scheme (e.g. MLST) that a locus belongs to.
Note
If the locus you are querying is a shorter version of another, e.g. an MLST fragment of a gene where the full length gene is also defined, you will need to select the specific locus or the scheme from the dropdown box. Leaving the selection on ‘All loci’ will return a match to the longer sequence in preference to the shorter one.
Click ‘Submit’.
If an exact match is found, this will be indicated along with the start position of the locus within your sequence.
If only a partial match is found, the most similar allele is identified along with any nucleotide differences. The varying nucleotide positions are numbered both relative to the pasted in sequence and to the reference sequence. The start position of the locus within your sequence is also indicated.
As an alternative to pasting a sequence in to the box, you can also choose to upload sequences in FASTA format by clicking the file browse button.
Querying whole genome data¶
The sequence query is not limited to single genes. You can also paste or upload whole genomes - these can be in multiple contigs. If you select a specific scheme from the dropdown box, all loci belonging to that scheme will be checked (although only exact matches are reported for a locus if one of the other loci has an exact match). If all loci are matched, scheme fields will also be returned if these are defined. This, for example, enables you to identify the MLST sequence type of a genome in one step.
Searching for specific allele definitions¶
To retrieve specific allele designations, click ‘Sequence attribute search’ on a sequence definition database contents page.
Enter your query using the dropdown search box - additional terms can be added by clicking the ‘+’ button.
Designations can be queried using standard operators.
Click submit.
Click the hyperlinked results to display allele records.
Various search criteria can also be selected by combining with filters. Click the filter heading to display these.
Locus-specific attributes¶
Some loci have extended attribute fields. To query these, you need to navigate to a locus-specific query page, by clicking the ‘locus-specific query’ link on the sequence attribute search page.
Pick the required locus from the dropdown box.
The fields specific for that locus will be added to the dropdown query boxes.
Browsing scheme profile definitions¶
If a sequence definition database has schemes defined that include a primary key field, i.e. collections of loci that together create profiles, e.g. for MLST, click the link to ‘Browse profiles’.
Choose the field to order the results by, the number of results per page to display, and click ‘Browse all records’.
Clicking the hyperlink for any profile will display full information about the profile.
Querying scheme profile definitions¶
click the link to ‘Search profiles’ for the appropriate scheme on the main contents page.
Enter the search criteria you wish to search on. You may also see some drop-down list boxes that allow further filtering of results. You can add search criteria by clicking the ‘+’ button in the ‘Locus/scheme fields’ section. These can be combined using ‘AND’ or ‘OR’.
Each field can be queried using standard operators.
Clicking the hyperlink for any profile will display full information about the profile.
Investigating allele differences¶
Sequence similarity¶
To find sequences most similar to a selected allele within a sequence definition database, click ‘Sequence similarity’ on the contents page.
Enter the locus and allele identifer of the sequence to investigate and the number of nearest matches you’d like to see, then press submit.
A list of nearest alleles will be displayed, along with the percentage identity and number of gaps between the sequences.
Click the appropriate ‘Compare’ button to display a list of nucleotide differences and/or a sequence alignment.
Sequence comparison¶
To directly compare two sequences click ‘Sequence comparison’ from the contents page of a sequence definition database.
Enter the locus and two allele identifiers to compare. Press submit.
A list of nucleotide differences and/or an alignment will be displayed.
See also
Querying isolate data¶
The ‘Search database’ page of an isolate database allows you to search by combinations of provenance criteria, scheme and locus data, and more.
To start with, only one provenance field search box is displayed but more can be added by clicking the ‘+’ button (highlighted). These can be linked together by ‘and’ or ‘or’.
After the search has been submitted, the results will be displayed in a table.
Each field can be queried using standard operators.
More search features are available by clicking the ‘Modify form options’ tab on the right-hand side of the screen.
A tab will be displayed. Different options will be available here depending on the database. Queries will be combined from the values entered in all form sections. Possible options are:
Allele designations/scheme field values
Search by combination of allele designations and/or scheme fields e.g. ST, clonal complex information.
Allele designation status
Search by whether allele designation status is confirmed or provisional.
Tagged sequence status
Search by whether tagged sequence data is available for a locus. You can also search by sequence flags.
Filters
Various filters may be available, including
Publications
Projects
MLST profile completion status
Clonal complex
Sequence bin size
Inclusion/exclusion of old versions
If the interface is modified, a button to ‘Save options’ becomes available within the tab. If this is clicked, the modified form will be displayed the next time you go to the query page.
Query by allele_designation/scheme field¶
Queries can be combined with allele designation/scheme field values.
Make sure that the allele designation/scheme field values fieldset is displayed by selecting it in the ‘Modify form options’ tab.
Designations can be queried using standard operators.
Additional search terms can be combined using the ‘+’ button.
Add your search terms and click ‘Submit’. Allele designation/scheme field queries will be combined with terms entered in other sections.
Query by allele designation status¶
Allele designations can be queried based on their status, i.e. whether they are confirmed or provisional. Queries will be combined from the values entered in all form sections.
Make sure that the allele designation staus fieldset is displayed by selecting it in the ‘Modify form options’ tab.
Select a locus from the dropdown box and either ‘provisional’ or ‘confirmed’. Additional query fields can be displayed by clicking the ‘+’ button. Click ‘Submit’.
Provisional allele designations are marked within the results tables with a pink background. Any scheme field designations that depend on the allele in question, e.g. a MLST ST, will also be marked as provisional.
Query by tagged sequence status¶
Sequence tags identify the region of a contig within an isolate’s sequence bin entries that correspond to a particular locus. The presence or absence of these tags can be queried as can whether or not the sequence has an a flag associated with. These flags designate specific characteristics of the sequences. Queries will be combined from the values entered in all form sections.
Make sure that the tagged sequences status fieldset is displayed by selecting it in the ‘Modify form options’ tab.
Select a specific locus in the dropdown box (or alternatively ‘any locus’) and a status. Available status values are:
untagged
The locus has not been tagged within the sequence bin.
tagged
The locus has been tagged within the sequence bin.
complete
The locus sequence is complete.
incomplete
The locus sequence is incomplete - normally because it continues beyond the end of a contig.
flagged: any
The sequence for the locus has a flag set.
flagged: none
The sequence for the locus does not have a flag set.
flagged: <specific flag>
The sequence for the locus has the specific flag chosen.
See also
Query filters¶
There are various filters that can additionally be applied to queries, or the filters can be applied solely on their own so that they filter the entire database.
Make sure that the filters fieldset is displayed by selecting it in the ‘Modify form options’ tab.
The filters displayed will depend on the database and what has been defined within it. Common filters are:
Publication - Select one or more publication that has been linked to isolate records.
Project - Select one or more project that isolates belong to.
Profile completion - This is commonly displayed for MLST schemes. Available options are:
complete - All loci of the scheme have alleles designated.
incomplete - One or more loci have not yet been designated.
partial - The scheme is incomplete, but at least one locus has an allele designated.
started - At least one locus has an allele designated. The scheme mat be complete or partial.
not started - The scheme has no loci with alleles designated.
Sequence bin - Specify whether any sequence data has been associated with a record. Specific threshold values may be selected if these have been set up for the database.
Provenance fields - Dropdown list boxes of values for specific provenance fields may be present if set for the database. Users can choose to add additional filters.
Querying by allelic profile¶
If a scheme, such as MLST, has been defined for either an isolate database it is possible to query the database against complete or partial allelic profiles. Even if no scheme is defined, queries can be made against all loci. This can also be done in sequence definition databases if the scheme has a primary key field defined.
On the index page, click ‘Search by combinations of loci (profiles)’ for any defined scheme. Enter either a partial (any combination of loci) or complete profile.
If multiple schemes are defined, you may have to select the scheme you wish to query in the ‘Schemes’ dropdown box and click ‘Select’.
Enter the combination of alleles that you want to query for. Fields can be left blank.
Alternatively, for scheme profiles, you can enter a primary key value (e.g. ST) and select ‘Autofill’ to automatically fill in the associated profile.
Select the number of loci that you’d like to match in the options dropdown box. Available options are:
Exact or nearest match
Exact match only
x or more matches
y or more matches
z or more matches
Where x,y, and z will range from n-1 to 1 where n is the number of loci in the scheme.
Click ‘Submit’.
Retrieving list of isolates or profiles¶
Both isolate and sequence definition databases can be queried against a list of values matching any criteria (isolate provenace fields, alleles, or scheme fields).
Click ‘List query’ on the main contents page.
Select the attribute you wish to search against in the drop-down list box and enter the list of attributes in the box (one per line). Click ‘Submit’.
Retrieving isolates by linked publication¶
Click ‘Publications’ in the Breakdown section of the contents page.
A list of publications linked by isolates within the database will be displayed.
These can be filtered by author and/or year, and the sort order changed.
To display the isolate records for any of the displayed publications, click the button to the right of the citation.
The abstract of the paper will be displayed (if available), along with all isolates linked to it.
User-configurable options¶
The BIGSdb user interface is configurable in a number of ways. Choices made are remembered between sessions. If the database requires you to log on, the options are associated with your user account, whereas if it is a public database, that you haven’t logged in to, the options are associated with a browser cookie so they will be remembered if you connect from the same computer (using the same browser).
Most options are set by clicking the ‘Set general options’ link on the database contents page. Most of the available options are visible for isolate databases, whereas sequence definition databases have fewer available.
General options¶
The general options tab is displayed by default. If another tab is being shown, click the ‘General options’ header.
The general tab allows the following options to be modified:
Records per page
Page bar position
Nucleotides per line - Some analyses display sequence alignments. This option allows you to set the width of these alignments so suit your display.
Flanking sequence length - This sets the length of flanking sequence upstream and downstream of a particular locus that is included whenever a sequence is displayed. Flanking sequences are displayed fainter that the locus sequence.
Locus aliases - Loci can have multiple names (aliases). Setting this option will display all alternative names in results tables.
Tooltips (beginner’s mode) - Most query forms have help available in the form of information tooltips. These can be switched on/off here. They can also be toggled off by clicking the Toggle: ‘i’ button at the top-right of the display of some pages.
Click ‘Set options’ to remember any changes you make.
Main results table¶
The ‘main results table’ tab contains options for the display of paged results following a query.
Click the ‘Main results table’ header to display the tab.
The ‘main results table’ tab will scroll up.
This tab allows the following options to be modified:
Hyperlink allele designations - Hyperlinks point to an information page about the particular allele definition. Depending on the locus, these may exist on a different website.
Differentiate provisional allele designations - Allele designations can be set as confirmed or provisional, usually depending on the method of assignment. Selecting this option will display provisional designations in a different colour to confirmed designations.
Sequence bin records - Displays a tooltip linking to the sequence tag if available.
Sequence bin size - Displays the size of the sum of all contigs associated with each isolate record.
Isolate record display¶
The ‘isolate record display’ tab contains options for the display of a full isolate record.
Click the ‘Isolate record display’ tab to display the tab.
The ‘Isolate record display’ tab will scroll up.
This tab allows the following options to be modified:
Differentiate provisional allele designations - Allele designations can be set as confirmed or provisional, usually depending on the method of assignment. Selecting this option will display provisional designations in a different colour to confirmed designations.
Display sender, curator and last updated records - Displays a tooltip containing sender information next to each allele designation.
Sequence bin information - Displays a tooltip with information about the position of the sequence if tagged within the sequence bin.
Allele flags - Displays information about whether alleles have flags defined in sequence definition databases.
Display full information about sample records - Used when the database is used as part of a basic laboratory information management system (LIMS). This option will display records of samples available for the displayed isolate.
Provenance field display¶
The ‘provenance field display’ tab contains checkboxes for fields to display in the main results table.
Click the ‘Provenance field display’ tab to display the tab.
The ‘Provenance field display’ tab will scroll up.
Some fields will be checked by default - these are defined during database setup (maindisplay option).
Check any fields that you wish to be displayed and then click ‘Set options’. You can return to the default selection by clicking ‘Default’ followed by ‘Set options’.
Query filters¶
The ‘query filters’ tab contains checkboxes for provenance fields and scheme completion status. Checking these results in drop-down list box filters appearing in the query page filters fieldset.
Click the ‘Query filters’ tab to display the tab.
The ‘Query filters’ tab will scroll up.
A list of possible filters appears. Click any checkbox for a filter you would like to make available. Click ‘Set options’ when done. You can return to the default selection by clicking ‘Default’ followed by ‘Set options’.
Modifying locus and scheme display options¶
Whether or not loci, schemes or scheme fields are displayed in result tables, isolate records, or within query dropdown boxes can all be set with default options when first defined. These attributes can, however, be overridden by a user, and these selections will be remembered between sessions.
The procedure to modify these attributes is the same for locus, schemes or scheme fields, so the steps for loci will be demonstrated only.
Click the appropriate link on the isolate contents page.
Either select the locus id by querying for it directly.
Designations can be queried using standard operators.
Alternatively, you can search by filtering loci by schemes. Click the ‘Filter query by’ header and select the scheme in the dropdown box.
Once loci have been selected, click Customize ‘locus options’.
You can then choose to add or remove individual loci from the selection by clicking the appropriate checkboxes. At the bottom of the page are a number of attributes that you can change - clicking ‘Change’ will affect all selected loci.
Possible options for loci are:
isolate_display - Sets how the locus is displayed within an isolate record:
allele only - display only identifier
sequence - display the full sequence
hide - don’t show at all
main_display - Sets whether the locus is displayed in the main results table following a query.
query_field - Sets whether the locus appears in dropdown list boxes to be used within queries.
analysis - Sets whether the locus can be used in data analysis functions.
Note
Settings for loci can be overridden by those set for schemes that they are members of. For example, if you set a locus to be displayed within a main results table, but that locus is a member of a scheme and you set that scheme not to be displayed, then the locus will not be shown. Conversely, if you set a scheme to be displayed, but set its member locus not to be shown, then that locus will not be displayed (but other loci and scheme fields may be, depending on their independent settings).