| OBIUNIQ(1) | OBITools | OBIUNIQ(1) |
obiuniq - description of obiuniq
The obiuniq command is in some way analog to the standard Unix uniq -c command.
Instead of working text line by text line as the standard Unix tool, the processing is done on sequence records.
A sequence record is a complex object composed of an identifier, a set of attributes (key=value), a definition, and the sequence itself.
The obiuniq command groups together sequence records. Then, for each group, a sequence record is printed.
A group is defined by the sequence and optionally by the values of a set of attributes specified with the -c option.
As the identifier, the set of attributes (key=value) and the definition of the sequence records that are grouped together may be different, two options (-m and -i) allow refining how these parts of the records are reported.
When a taxonomy is loaded (-d or -t options), the merged_taxid attribute is created and records the number of times each taxid has been found in the group (it may be empty if no sequence record has a taxid attribute in the group). In addition, a set of taxonomy-related attributes are generated for each group having at least one sequence record with a taxid attribute. The taxid attribute of the sequence group is set to the last common ancestor of the taxids of the group. All other taxonomy-related attributes created (species, genus, family, species_name, genus_name, family_name, rank, scientific_name) give information on the last common ancestor.
Example:
> obiuniq -m sample seq1.fasta > seq2.fasta
Dereplicates sequences and keeps the value distribution of the sample attribute in the new attribute merged_sample.
Example:
> obiuniq -c sample seq1.fasta > seq2.fasta
Dereplicates sequences within each sample.
The OBITools Development Team - LECA
2019 - 2015, OBITool Development Team
| January 28, 2019 | 1.02 12 |