DOKK / manpages / debian 12 / chado-utils / gmod_gff3_preprocessor.pl.1p.en
GMOD_GFF3_PREPROCESSOR(1p) User Contributed Perl Documentation GMOD_GFF3_PREPROCESSOR(1p)

$0 - Prepares a GFF3 file for bulk loading into a chado database.

  % gmod_gff_preprocessor [options] --gfffile <filename>

 --gfffile        The file containing GFF3 (optional, can read
                     from stdin)
 --outfile        The name kernel that will be used for naming result files
 --splitfile      Split the files into more manageable chunks, providing
                     an argument to control splitting
 --onlysplit      Split the files and then quit (ie, don't sort)
 --nosplit        Don't split the files (ie, only sort)
 --hasrefseq      Set this if the file contains a reference sequence line
                     (Only needed if not splitting files)
 --dbprofile      Specify a gmod.conf profile name (otherwise use default)
 --inheritance_tiers How many levels of inheritance do you expect tis file
                     to have (default: 3)

splitfile -- Just setting this flag to 1 will cause the file to be split by reference sequence. If you provide an optional argument, it will be further split according to these rules:

 source=1     Splits files according to the value in the source column
 source=a,b,c Puts lines with sources that match (via regular expression)
                     'a', 'b', or 'c' in a separate file
 type=a,b,c   Puts lines with types that match 'a', 'b', or 'c' in a
                     separate file

For example, if you wanted all of your analysis results to go in a separate file, you could indicate '--splitfile type=match', and all cDNA_match, EST_match and cross_genome_match features would go into separate files (separate by reference sequence).

inheritence_tiers -- The number of levels of inheritance this file has. For example, if the file has "central dogma" genes in it (gene/mRNA/ exon,polypeptide), then it has 3. Up to 4 is supported but the higher the number, the more slowly it performs. If you don't know, 3 is a reasonable guess.

FASTA sequence

If the GFF3 file contains FASTA sequence at the end, the sequence will be placed in a separate file with the extension '.fasta'. This fasta file can be loaded separately after the split and/or sorted GFF3 files are loaded, using the command:

  gmod_bulk_load_gff3.pl -g <fasta file name>

Scott Cain <cain@cshl.org>

Copyright (c) 2006-2007

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

2019-12-05 perl v5.30.0