SCF(3pm) | User Contributed Perl Documentation | SCF(3pm) |
Bio::SCF - Perl extension for reading and writing SCF sequence files
use Bio::SCF;
# tied interface tie %hash,'Bio::SCF','my_scf_file.scf';
my $sequence_length = $hash{bases_length}; my $chromatogram_sample_length = $hash{samples_length}; my $third_base = $hash{bases}[2]; my $quality_score = $hash{$third_base}[2]; my $sample_A_at_time_1400 = $hash{samples}{A}[1400];
# change the third base and write out new file $hash{bases}[2] = 'C'; tied (%hash)->write('new.scf');
# object-oriented interface my $scf = Bio::SCF->new('my_scf_file.scf'); my $sequence_length = $scf->bases_length; my $chromatogram_sample_length = $scf->samples_length; my $third_base = $scf->bases(2); my $quality_score = $scf->score(2); my $sample_A_at_time_1400 = $scf->sample('A',1400);
# change the third base and write out new file $scf->bases(2,'C'); $scf->write('new.scf');
This module provides a perl interface to SCF DNA sequencing files. It has both tied hash and an object-oriented interfaces. It provides the ability to read fields from SCF files and limited ability to modify them and write them back.
Key Value --- ----- bases_length Number of called bases in the sequence (read-only) samples_length Number of samples in the file (read-only) version SCF version (read-only) code_set Code set used to code bases (read-only) comments Structured comments (read-only) bases Array reference to a list of the base calls index Array reference to a list of the sample position for each of the base calls (e.g. the position of the base calling peak) A An array reference that can be used to determine the probability that the base in position $i is an "A". G An array reference that can be used to determine the probability that the base in position $i is a "G". C An array reference that can be used to determine the probability that the base in position $i is a "C". T An array reference that can be used to determine the probability that the base in position $i is a "T". samples A hash reference with keys "A", "C", "G" and "T". The value of each hash is an array reference to the list of intensity values for each sample.
To get the length of the called sequence: $scf{bases_length}
To get the value of the called sequence at position 3: $scf{bases}[3]
To get the sample position at which base 3 was called: $scf{index}[3]
To get the value of the "C" curve under base 3: $scf{samples}{C}[$scf{index}[3]]
To get the probability that base 3 is a "C": $scf{C}[3]
To print out the chromatogram as a four-column list:
my $samples = $scf{samples}; for (my $i = 0; $i<$scf{samples_length}; $i++) { print join "\t",$samples->{C}[$i],$samples->{G}[$i], $samples->{A}[$i],$samples->{T}[$i],"\n"; }
$samples->{C}[500] = 0;
my $sample_index = $scf->index(5); my ($g,$a,$t,$c) = map { $scf->sample($_,$sample_index) } qw(G A T C);
If you provide a new value for the sample index, it will be updated.
my ($g,$a,$t,$c) = map { $scf->base_score($_,5) } qw(G A T C);
If you provide a new value for the base probability score, it will be updated.
Reading information from a preexisting file:
tie %scf, 'Bio::SCF', "data.scf"; print "Base calls:\n"; for ( my $i=0; $i<$scf{bases}; $i++ ){ print "$scf{base}[$i] "; } print "\n"; print "Intensity values for the A curve\n"; for ( my $i=0; $i<$scf{samples}; $i++ ){ print "$scf{sample}{A}[$i]; } print "\n";
Another example, where we set all bases to "A", indexes to 10 and write the file back:
my $obj = tie %scf,'Bio::SCF','data.scf'; for (0...@{$scf{bases}}-1){ $scf{base}[$_] = "A"; $obj->set('index', $_, 10); } $obj->write('data.scf');
Dmitri Priimak, priimak@cshl.org (1999)
with some cleanups by Lincoln Stein, lstein@cshl.edu (2006)
This package and its accompanying libraries is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0. Refer to LICENSE for the full license text. In addition, please see DISCLAIMER for disclaimers of warranty.
2022-10-19 | perl v5.36.0 |