############## Database setup ############## There are two types of BIGSdb database: * sequence definition databases, containing * allele sequences and their identifiers * scheme data, e.g. MLST profile definitions * isolate databases, containing * isolate provenance metadata * genome sequences * allele designations for loci defined in sequence definition databases. These two databases are independent but linked. A single isolate database can communicate with multiple sequence definition databases and vice versa. Different access restrictions can be placed on different databases. Databases are described in XML files telling BIGSdb everything it needs to know about them. Isolate databases can have any fields defined for the isolate table,allowing customisation of metadata - these fields are described in the XML file (config.xml) and must match the fields defined in the database itself. ****************** Creating databases ****************** There are templates available for the sequence definition and isolate databases. These are SQL scripts found in the sql directory. To create a database, you will need to log in as the postgres user and use these templates. For example to create a new sequence definition database called bigsdb_test_seqdef, navigate to the sql directory and log in as the postgres user, e.g. :: sudo su postgres then :: createdb bigsdb_test_seqdef psql -f seqdef.sql bigsdb_test_seqdef Create an isolate database the same way: :: createdb bigsdb_test_isolates psql -f isolatedb.sql bigsdb_test_isolates The standard fields in the isolate table are limited to essential fields required by the system. To add new fields, you need to log in to the database and alter this table. For example, to add fields for country and year, first log in to the newly created isolate database as the postgres user: :: psql bigsdb_test_isolates and alter the isolate table: :: ALTER TABLE isolates ADD country text; ALTER TABLE isolates ADD year int; Remember that any fields added to the table need to be described in the config.xml file for this database. The xml directory of the software archive contains example XML files for sequence definition and isolate databases (rename these to config.xml). The isolates_config.xml file contains the minimum required isolate table fields and matches the isolate table that will be generated using the isolatedb.sql SQL script. ******************************* Database-specific configuration ******************************* Each BIGSdb database on a system has its own configuration directory, by default in /etc/bigsdb/dbases. The database has a short configuration name used to specify it in a web query and this matches the name of the configuration sub-directory, e.g. http://pubmlst.org/cgi-bin/bigsdb/bigsdb.pl?db=pubmlst_neisseria_isolates is the URL of the front page of the PubMLST Neisseria isolate database whose configuration settings are stored in /etc/bigsdb/dbases/pubmlst_neisseria_isolates. This database sub-directory contains a number of files (hyperlinks lead to the files used on the Neisseria database): * :download:`config.xml ` - the database configuration file. Fields defined here correspond to fields in the isolate table of the database. * banner.html - optional file containing text that will appear as a banner within the database index pages. HTML markup can be used within this text. * header.html - HTML markup that is inserted at the top of all pages. This can be used to set up site-specific menubars and logos. * footer.html - HTML markup that is inserted at the bottom of all pages. * curate_header.html - HTML markup that is inserted at the top of all curator's interface pages. * curate_footer.html - HTML markup that is inserted at the bottom of all curator's interface pages. * profile_submit.html - HTML markup for text that is inserted in to the submission interface prior to profile submission finalization. This can be used to add specific instructions such as the requirement to make an isolate submission. * allele_submit.html - HTML markup for text that is inserted in to the submission interface prior to allele submission finalization. This can be used to add specific instructions such as the requirement to attach Sanger trace files. * registration_success.txt - Text file containing message content to be used in an automated E-mail when granting access to a user who has requested access to the database using the site-wide account system (where auto-registration is not enabled). The header and footer files can alternatively be placed in the root directory of the web site, or in /etc/bigsdb, for site-wide use. If files exist in multiple locations, they are used in the following order of preference: database config directory > web root directory > /etc/bigsdb. There are four additional files, site_header.html, site_footer.html, curate_site_header.html and curate_site_footer.html which are used when either bigsdb.pl or bigscurate.pl are called without a database configuration. These should be placed in the root directory of the web site or in /etc/bigsdb. You can also add HTML meta attributes (such as a favicon) by including a file called meta.html in the database configuration directory. For example to set a favicon this file can contain something like the following: :: These attributes will appear in the section of the HTML page. .. _xml: *********************************************** XML configuration attributes used in config.xml *********************************************** The following lists describes the attributes used in the config.xml file that is used to describe databases. .. _isolate_xml: Isolate database XML attributes =============================== Please note that database structure described by the field elements must match the physical structure of the database isolate table. Required attributes are in **bold**:: Top level element. Contains child elements: system and field.:: Any value set here can be overridden in a :ref:`system.overrides file`. * **authentication** * Method of authentication: either 'builtin' or 'apache'. See :ref:`user authentication `. * **db** * Name of database on system. * **dbtype** * Type of database: either 'isolates' or 'sequences'. * **description** * Description of database used throughout interface (see also 'formatted_description'). * align_limit * Overrides the sequence export record alignment limit in the Sequence Export plugin. Default: '200'. * all_plugins * Enable all appropriate plugins for database: either 'yes' or 'no', default 'no'. * annotation * Semi-colon separated list of accession numbers with descriptions (separated by a \|), eg. 'AL157959|Z2491;AM421808|FAM18;NC_002946|FA 1090;NC_011035|NCCP11945;NC_014752|020-06'. Currently used only by Genome Comparator plugin. * BLAST * Enable Blast plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the Blast plugin can be disabled by setting this attribute to 'no'. * BURST * Enable BURST plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the BURST plugin can be disabled by setting this attribute to 'no'. * cache_schemes * Enable automatic refreshing of scheme field caches when batch adding new isolates: either 'yes' or 'no', default 'no'. * See :ref:`scheme caching`. * CodonUsage * Enable Codon Usage plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the Codon Usage plugin can be disabled by setting this attribute to 'no'. * codon_usage_limit * Overrides the record limit for the Codon Usage plugin. Default: '500'. * contig_analysis_limit * Overrides the isolate number limit for the Contig Export plugin. Default: '1000'. * ContigExport * Enable contig export plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the contig export plugin can be disabled by setting this attribute to 'no'. * curate_config * The database configuration that should be used for curation if different from the current configuration. This is used when the submission system is being used so that curation links in the 'Manage submissions' pages for curators load the correct database configuration. * curate_link * URL to curator's interface, which can be relative or absolute. This will be used to create a link in the public interface dropdown menu. * curate_path_includes * Partial path of the bigscurate.pl script used to curate the database. See user authentication. * curate_script * Relative web path to curation script. Default ‘bigscurate.pl’ (version 1.11+). * This is only needed if automated submissions are enabled. If bigscurate.pl is in a different directory from bigsdb.pl, you need to include the whole web path, e.g. /cgi-bin/private/bigsdb/bigscurate.pl. * curators_only * Set to 'yes' to prevent ordinary authenticated users having access to database configuration. This is only effective if read_access is set to 'authenticated_users'. This may be useful if you have different configurations for curation and querying with some data hidden in the configuration used by standard users. Default 'no'. * daily_pending_submissions * Overrides the daily limit on pending submissions that a user can submit via the web submission system. Default: '15'. * daily_rest_submissions_limit * Overrides the limit on number of submissions that can be made to the database via the RESTful interface. This is useful to prevent flooding of the submission system by aberrant scripts. Default: '100'. * default_access * The default access to the database configuration, either 'allow' or 'deny'. If 'allow', then specific users can be denied access by creating a file called 'users.deny' containing usernames (one per line) in the configuration directory. If 'deny' then specific users can be allowed by creating a file called 'users.allow' containing usernames (one per line) in the configuration directory. See :ref:`default access `. * default_private_records * The default number of private isolate records that a user can upload. The user account must have a status of either 'submitter', 'curator', or 'admin'. This value is used to set the private_quota field when creating a new user record (which can be overridden for individual users). Changing it will not affect the quotas of existing users. Default: '0'. * default_seqdef_config * Isolate databases only: Name of the default seqdef database configuration used with this database. Used to automatically fill in details when adding new loci. * default_seqdef_dbase * Isolate databases only: Name of the default seqdef database used with this database. Used to automatically fill in details when adding new loci. * default_seqdef_script * Isolate databases only: URL of BIGSdb script running the seqdef database (default: '/cgi-bin/bigsdb/bigsdb.pl'). * delete_retire_only * Set to 'yes' to retire the id of any isolate that is deleted. This prevents re-use of ids. This setting will override the global setting in bigsdb.conf. * disable_updates * Set to 'yes' to prevent updates. This is useful when moving databases or temporarily running on a backup server. * disable_update_message * Message shown when updates are disabled. * eav_fields * Name to call sparsely-populated fields. Default: 'secondary metadata'. * eav_field_icon * Icon class from FontAwesome to use on isolate info page for sparsely- populated fields. Default 'fas fa-microscope'. * eav_groups * Comma-separated list of category names that sparsely-populated fields can be grouped in to. If this value is set, a category drop-down list will appear when adding or updating sparsely-populated fields. You can add an icon to appear by following the name with a pipe symbol (|) and an icon class from the FontAwesome library, e.g. 'Vaccine reactivity|fas fa-syringe,Risk factors|fas fa-smoking'. * export_limit * Overrides the default allowed number of data points (isolates x columns) to export. Default: '25000000'. * fast_scan * Sets whether fast mode scanning is enabled via the web interface. This will scan all loci together, using exemplar sequences. In cases where multiple loci are being scanned this should be significantly faster than the standard locus-by-locus scan, but it will take longer for the first results to appear. :ref:`Allele exemplars` should be defined if you enable this option. Set to 'yes' to enable. Default: 'no'. * fieldgroup1 - fieldgroup10 * Allows multiple fields to be queried as a group. Value should be the name of the group followed by a colon (:) followed by a comma-separated list of fields to group, e.g. identifiers:id,strain,other_name. * formatted_description * Markdown formatted description of database. If set, this will be used throughout the HTML interface wherever formatting can be applied (main body of text) and overrides the value set in 'db_description'. Currently only supports *\*italics\** and **\*\*bold\*\***. * genepresence_record_limit * Overrides the record number limit (isolates x loci) for the Gene Presence plugin. Default: 500000 (this can also be set globally in bigsdb.conf). * genepresence_taxa_limit * Overrides the isolate limit for the Gene Presence plugin. Default: 10000 (this can also be set globally in bigsdb.conf). * GenomeComparator * Enable Genome Comparator plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the Genome Comparator plugin can be disabled by setting this attribute to 'no'. * genome_comparator_limit * Overrides the isolate number limit for the Genome Comparator plugin. Default: 1000 (this can also be set globally in bigsdb.conf). * genome_comparator_max_ref_loci * Overrides the limit on number of loci allowed in a reference genome. Default: 10000. * genome_comparator_threads * The number of threads to use for data gathering (BLAST, database queries) to populate data structure for Genome Comparator analysis. You should not set this to less than 2 as this will prevent job cancelling due to the way isolates are queued. Default: '2'. * hide_unused_schemes * Sets whether a scheme is shown in a main results table if none of the isolates on that page have any data for the specific scheme: either 'yes' or 'no', default 'no'. * host * Host name/IP address of machine hosting isolate database, default 'localhost'. * itol_record_limit * Overrides the maximum number of records that can be included in an ITOL job. Default: 2000 (this can also be set globally in bigsdb.conf). * itol_seq_limit * Overrides the maximum number of sequeneces (records x loci) that can be included in an ITOL job. Default: 100,000 (this can also be set globally in bigsdb.conf). * job_priority * Integer with default job priority for offline jobs (default:5). * job_quota * Integer with number of offline jobs that can be queued or currently running for this database. * labelfield * Field that is used to describe record in isolate info page, default 'isolate'. * locus_aliases * Display locus aliases and use them in dropdown lists by default: must be either 'yes' or 'no', default 'no'. This option can be overridden by a user preference. * locus_superscript_prefix * Superscript the first letter of a locus name if it is immediately following by an underscore, e.g. f_abcZ would be displayed as fabcZ within the interface: must be either 'yes' or 'no', default 'no'. This can be used to designate gene fragments (or any other meaning you like). * maindisplay_aliases * Default setting for whether isolates aliases are displayed in main results tables: either 'yes' or 'no', default 'no'. This setting can be overridden by individual user preferences. * Microreact * Enable Microreact plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the Microreact plugin can be disabled by setting this attribute to 'no'. Note that for the plugin to be active, a country field containing a defined list of allowed values and an integer year field must be defined in the isolates table. * microreact_record_limit * Overrides the maximum number of records that can be included in a Microreact job. Default: 2000 (this can also be set globally in bigsdb.conf). * microreact_seq_limit * Overrides the maximum number of sequences (records x loci) that can be included in an Microreact job. Default: 100,000 (this can also be set globally in bigsdb.conf). * new_version * Set to 'no' to prevent copying field value when creating a new version of the isolate record. * noshow * Comma-separated list of fields not to use in breakdown statistic plugins. * no_publication_filter * Isolate databases only: Switches off display of publication filter in isolate query form by default: either 'yes' or 'no', default 'no'. * only_sets * Don't allow option to view the 'whole database' - only list sets that have been defined: either 'yes' or 'no', default 'no'. * password * Password for access to isolates database, default 'remote'. * pcr_limit * Overrides the isolate number limit for the in silico PCR plugin. Default: '10000'. * PhyloViz * Enable third party PhyloViz plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the PhyloViz plugin can be disabled by setting this attribute to 'no'. * port * Port number that the isolate host is listening on, default '5432'. * privacy * Displays E-mail address for sender in isolate information page if set to 'no'. Default 'yes'. * public_login * Optionally allow users to log in to a public database - this is useful as any jobs will be associated with the user and their preferences will also be linked to the account. Set to 'no' to disable. Default 'yes'. * query_script * Relative web path to bigsdb script. Default ‘bigsdb.pl’ (version 1.11+). * This is only needed if automated submissions are enabled. If bigsdb.pl is in a different directory from bigscurate.pl, you need to include the whole web path, e.g. /cgi-bin/bigsdb/bigsdb.pl. * read_access * Describes who can view data: either 'public' for everybody or 'authenticated_users' for anybody who has been able to log in. Default 'public'. * recommended_schemes * Comma-separated list of recommended schemes to suggest to Genome Comparator users. If lots of schemes are defined, a user may be tempted to click 'All loci' and this may not be the best option. Populating this attribute, results in an additional list of preferred schemes that can be chosen. * related_databases * Semi-colon separated list of links to related BIGSdb databases on the system. This should be in the form of database configuration name followed by a '|' and the description, e.g. 'pubmlst_neisseria_seqdef|Sequence and profile definitions'. This is used to populate the dropdown menu. * remote_contigs * Optionally allow the use of remote contigs. These are stored in a remote BIGSdb database, accessible via the RESTful API. Set to 'yes' to enable. * rest_kiosk * If 'kiosk' attribute is set, then the REST interface will be disabled for the configuration unless a value is set here. The only supported value currently is 'sequenceQuery' which will enable API routes for querying sequences. * rMLSTSpecies * Enable rMLST Species identifier plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the plugin can be disabled by setting this attribute to 'no'. Note that for the plugin to be active, a country field containing a defined list of allowed values and an integer year field must be defined in the isolates table. * script_path_includes * Partial path of the bigsdb.pl script used to access the database. See :ref:`user authentication `. * SeqbinBreakdown * Enable Sequence bin breakdown plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the plugin can be disabled by setting this attribute to 'no'. Note that for the plugin to be active, a country field containing a defined list of allowed values and an integer year field must be defined in the isolates table. * seqbin_size_threshold * Sets the size values in Mbp to enable for the :ref:`seqbin filter `. * Example: seqbin_size_threshold="0.5,1,2,4". * seq_export_limit * Overrides the sequence export limit (records x loci) in the Sequence Export plugin. Default: '1000000'. * sets * Use :ref:`sets `: either 'yes' or 'no', default 'no'. * set_id * Force the use of a specific set when accessing database via this XML configuration: Value is the name of the set. * start_id * Defines the minimum record id to be used when uploading new isolate records. This can be useful when it is anticipated that two databases may be merged and it would be easier to do so if the id numbers in the two databases were different. Default: '1'. * submissions * Enable automated submission system: either 'yes' or 'no', default 'no' (version 1.11+). * The curate_script and query_script paths should also be set, either in the bigsdb.conf file (for site-wide configuration) or within the system attribute of config.xml. * submissions_deleted_days * Overrides the default number of days before closed submissions are deleted from the system. Default: '90'. * TagStatus * Enable Tag status plugin: either 'yes' or 'no'. If no value is set then the plugin will not be available unless the all_plugins attribute is set to 'yes'. If the all_plugins attribute is set to 'yes', the plugin can be disabled by setting this attribute to 'no'. Note that for the plugin to be active, a country field containing a defined list of allowed values and an integer year field must be defined in the isolates table. * tblastx_tagging * Sets whether tagging can be performed using TBLASTX: either 'yes' or 'no', default 'no'. * total_pending_submissions * Overrides the total limit on pending submissions that a user can submit via the web submission system. Default: '20'. * user * Username for access to isolates database, default 'apache'. * user_job_quota * Integer with number of offline jobs that can be queued or currently running for this database by any specific user - this parameter is only effective if users have to log in. * user_projects * Sets whether authenticated users can create their own projects in order to group isolates: either 'yes' or 'no', default 'no'. * view * Database view containing isolate data, default 'isolates'. * views * Comma-separated list of views of the isolate table defined in the database. This is used to set a view for a set, or to restrict loci or schemes to a subset of isolate data. * webroot * URL of web root, which can be relative or absolute. This is used to provide a hyperlinked item in the dropdown menu. Default '/'. * webroot_label * Label text for the breadcrumb link defined by the webroot value. This can be formatted using Markdown. Currently only supports *\*italics\** and **\*\*bold\*\***. .. _isolate_xml_field: :: Element content: Field name + optional list of allowed values, e.g.:: epidemiology * **type** * Data type: int, text, float, bool, or date. * comments * optional * Comments about the field. These will be displayed in the field description plugin and as tooltips within the curation interface. * curate_only * Set to 'yes' to hide field unless logged-in user is a curator or admin. * default * Default value. This will be entered automatically in the web form but can be overridden. * dropdown * Select if you want this field to have its own dropdown filter box on the query page. If the field has an option list it will use the values in it, otherwise all values defined in the database will be included: 'yes' or 'no', default 'no'. This setting can be overridden by individual user preferences. * length * Length of field, default 12. * log_delete * Sets if the field value will be recorded in the log table if the isolate is deleted. Set to 'yes' or 'no', default is 'no'. The id and isolate name are always recorded if deletion is logged. * maindisplay * Sets if field is displayed in the main table after a database search, 'yes' or 'no', default 'yes'. This setting can be overridden by individual user preferences. * max * Maximum value for integer and date types. Special values such as CURRENT_YEAR and CURRENT_DATE can be used. * min * Minimum value for integer and date types. * multiple * Sets if field allows multiple values to be set for it, 'yes' or 'no', default 'no'. If set to 'yes', then the underlying field in the database must be an ARRAY type, e.g. text[]. * no_curate * Setting this will hide the field in the curator interface and prevent it from being manually modified. This is useful for fields that are populated by automated scripts or database triggers. Can be 'yes' or 'no', default 'no'. * no_submissions * Setting this will hide the field in the submission template. The field is still available if it is added back to the template manually. * optlist * Sets if this field has a list of allowed values, default 'no'. Surround each option with an