samtools-import(1) | Bioinformatics tools | samtools-import(1) |
samtools-import - converts FASTQ files to unmapped SAM/BAM/CRAM
samtools import [options] [ fastq_file ... ]
Reads one or more FASTQ files and converts them to unmapped SAM, BAM or CRAM. The input files may be automatically decompressed if they have a .gz extension.
The simplest usage in the absence of any other command line options is to provide one or two input files.
If a single file is given, it will be interpreted as a single-ended sequencing format unless the read names end with /1 and /2 in which case they will be labelled as PAIRED with READ1 or READ2 BAM flags set. If a pair of filenames are given they will be read from alternately to produce an interleaved output file, also setting PAIRED and READ1 / READ2 flags.
The filenames may be explicitly labelled using -1 and -2 for READ1 and READ2 data files, -s for an interleaved paired file (or one half of a paired-end run), -0 for unpaired data and explicit index files specified with --i1 and --i2. These correspond to typical output produced by Illumina bcl2fastq and match the output from samtools fastq. The index files will set both the BC barcode code and it's associated QT quality tag.
The Illumina CASAVA identifiers may also be processed when the -i option is given. This tag will be processed for READ1 / READ2, whether or not the read failed processing (QCFAIL flag), and the barcode sequence which will be added to the BC tag. This can be an alternative to explicitly specifying the index files, although note that doing so will not fill out the barcode quality tag.
Operationally there is no difference between the -s and -0 options as given an interleaved file with /1 and /2 read name endings both will correctly set the PAIRED, READ1 and READ2 flags, and given data with no suffixes and no CASAVA identifiers being processed both will leave the data as unpaired. However their inclusion here is for more descriptive command lines and to improve the header comment describing the samtools fastq decode command.
If specified multiple times this appends to the RG line, automatically adding tabs between invocations.
Convert a single-ended fastq file to an unmapped CRAM. Both of these commands perform the same action.
samtools import -0 in.fastq -o out.cram samtools import in.fastq > out.cram
Convert a pair of Illumina fastqs containing CASAVA identifiers to BAM, adding the barcode information to the BC auxiliary tag.
samtools import -i -1 in_1.fastq -2 in_2.fastq -o out.bam samtools import -i in_[12].fastq > out.bam
Specify the read group. These commands are equivalent
samtools import -r "$(echo -e 'ID:xyz\tPL:ILLUMINA')" in.fq samtools import -r "$(echo -e '@RG\tID:xyz\tPL:ILLUMINA')" in.fq samtools import -r ID:xyz -r PL:ILLUMINA in.fq
Create an unmapped BAM file from a set of 4 Illumina fastqs from bcf2fastq, consisting of two read and two index tags. The CASAVA identifier is used only for setting QC pass / failure status.
samtools import -i -1 R1.fq -2 R2.fq --i1 I1.fq --i2 I2.fq -o out.bam
Convert a pair of CASAVA barcoded fastq files to unmapped CRAM with an incremental record counter, then sort this by minimiser in order to reduce file space. The reversal process is also shown using samtools sort and samtools fastq.
samtools import -i in_1.fq in_2.fq --order ro -O bam,level=0 | \
samtools sort -@4 -M -o out.srt.cram - samtools sort -@4 -O bam -u -t ro out.srt.cram | \
samtools fastq -1 out_1.fq -2 out_2.fq -i --index-format "i*i*"
Written by James Bonfield of the Wellcome Sanger Institute.
samtools(1), samtools-fastq(1)
Samtools website: <http://www.htslib.org/>
2 September 2022 | samtools-1.16.1 |