Release Notes¶
Version 1.3.0: September 29, 2023¶
Backwards Incompatible Changes:
- Some functions now require the parameter
locus:makeChangeoClone. IngroupGenes,locuswas previously required only for single cell data, now it is also required for bulk data.
General:
- Updated dependencies to ggplot2 >= 3.4.0, airr >= 1.4.1, igraph >= 1.5.0.
- Updated the example data
ExampleTreesto use the igprah 1.5.0 format. See https://r.igraph.org/news/index.html#igraph-150 for details. - Performance improvements in
collapseDuplicates.
Diversity:
- Fixed a bug in
plotDiversityCurveandplotAbundanceCurvewhere limits were not being applied correctly to zoom in the plots.
Gene:
- Fixed a bug in
groupGeneswhere TCR chains where not being considered when detecting heavy chain sequences prior to subsetting.
Version 1.2.1: September 19, 2022¶
General:
- Fixed bug in parsing of TCR gene names.
- Fixed missing import of
ape::read.fastq.
Version 1.2.0: October 31, 2021¶
General:
- Updated dependencies to R >= 4.0 and ggplot2 >= 3.3.4.
- Removed lazyeval dependency.
- Added
junctionAlignment, which counts the number of nucleotides in the reference germline not present in the alignment, and the number of V and J nucleotides in the CDR3.
Gene Usage:
- Fixed a bug in
getFamilywhere temporary designation gene names were not being correctly subset to the cluster (family) level.
Lineage:
- Fixed a bug in
runPhylipwhich was causingbuildPhylipLineageto fail when run on Windows.
Version 1.1.0: February 6, 2021¶
General:
- Added
readFastqDb, which reads a repertoire’s .fastq file and imports the sequencing quality scores forsequence_alignment. AddedmaskPositionsByQualitymasks positions that have a sequencing quality score lower than the specified threshold. The convenience functiongetPositionQualitywill create adata.framewith quality scores per position. - Added a vignette describing how to read/write Change-O and AIRR Rearrangement formatted files.
- Increased
dplyrdependency to v1.0. - Added the BioConductor dependencies Biostrings, GenomicAlignments, and IRanges.
- In
padSeqEnds, the argumentmod3=TRUEhas been added so that sequences are padded to a length that is a multiple of 3. - Fixed a bug in
translateDNAwhereNAvalues weren’t being translated properly.
Amino Acid Analysis:
- Fixed a conflict in the default argument settings of
aminoAcidProperties, which will now default tont=TRUE.
Diversity:
+ Added a parameter to countClones (remove_na) that will remove all rows with NA
values in the clone column if TRUE (default) and issue a warning with how many were removed.
If FALSE, those rows will be kept instead.
Gene Usage:
- Added the function
getLocusto extract the locus information from the segment call. - Added the function
getChainto define the chain from the segment or locus call. - Changed the check for empty columns in
countGenesto give a warning instead of an error so as not to disrupt running workflows. - Fixed a bug in
getSegmentwhere filtering of non-localized genes was not being applied when called fromgetFamily, because the “NL” part of the name was removed before the filtering step. - Updated regular expressions in
getAllele,getGene,getFamilyandgetLocus, to parse constant region gene names correctly. - Updated regular expressions in
getSegmentto be able to parse constant region gene names correctly and not remove the “D” from “IGHD” whenstrip_d=TRUE.
Lineage:
- Added
branch_lengthargument tobuildPhylipLineage, and augmentedgraphToPhyloandphyloToGraphto track intermediate sequence in nodes for phylo object. - Added a parameter to
countGenes(remove_na) that will remove all rows with NA values in the gene column ifTRUE(default) and issue a warning with how many were removed. IfFALSE, those rows will be kept instead.
Version 1.0.2: July 17, 2020¶
Diversity:
- Fixed a bug in
plotDiversityTestthat caused all values ofqto appear on the plot rather than just the specified one.
Gene Usage:
- Fixed a major bug in the single-cell mode of
groupGeneswhere thev_call
column was being used in instead of thej_callcolumn for J gene grouping. - Added support for TCR genes to
groupGenes. - Changed the
only_ighargument ofgroupGenestoonly_heavy.
Version 1.0.1: May 8, 2020¶
Backwards Incompatible Changes:
- Changed default expected data format from the Change-O data format to the
AIRR Rearrangement standard. For example: where functions used the column
name
V_CALL(Change-O) as the default to identify the field that stored the V gene calls, they now usev_call(AIRR). That means, scripts that relied on default values (previously,v_call="V_CALL"), will now fail if calls to the functions are not updated to reflect the correct value for the data. If data are in the Change-O format, the current default valuev_call="v_call"will fail to identify the column with the V gene calls as the columnv_calldoesn’t exist. In this case,v_call="V_CALL"needs to be specified in the function call. ExampleDbconverted to the AIRR Rearrangement standard and examples updated accordingly. The legacy Change-O version is available asExampleDbChangeo.- For consistency with the style of the new data format default, other field
names have been updated to use the same capitalization. This change affects:
- amino acid physicochemical properties (e.g.
GRAVYtogravy); countGenes,countClones(e.g.,SEQ_COUNTtoseq_count)estimateAbundance(e.g.,RANKtorank)groupGenes(e.g.,VJ_GROUPtovj_group)collapseDuplicatesandmakeChangeoClone(e.g.,SEQUENCE_IDtosequence_id,COLLAPSE_COUNTtocollapse_count)- lineage tree functions (
summarizeTrees,getPathLengths,getMRCA,tableEdges,testEdges) also return columns in lower case (e.g.,parent,child,outdegree,steps,annotation,pvalue)
- amino acid physicochemical properties (e.g.
IG_COLORnames converted to official C region identifiers (IGHA, IGHD, IGHE, IGHG, IGHM, IGHK, IGHL).
General:
- License changed to AGPL-3.
baseThemelooks is now consistent acrosssizingoptions.cpuCountwill now return1if the core count cannot be determined.- Fixed a bug in
padSeqEndswherein thepad_charargument was being ignored.
Diversity:
- Fixed documentation error in diversity vignette for viewing test results.
estimateAbundanceslotclone_bynow contains the name of the column with the clonal group identifier, as specified in the function call. For example, if the function was called withclone="clone_id", then theclone_byslot will beclone_id.
Lineage:
- Renamed the
buildPhylipLineageargumentsvcall,jcallanddnapars_exectov_call,j_callandphylip_exec, respectively.
Version 0.3.0: July 17, 2019¶
Deprecated:
rarefyDiversityis deprecated in favor ofalphaDiversity, which includes the same functionality.testDiversityis deprecated. The test calculation have been added to the normal output ofalphaDiversity.
General:
- Added
apeandtibbledependencies.
Lineage:
- Added
readIgphymlto read in IgPhyML output andcombineIgphymlto combine parameter estimates across samples. - Added
graphToPhyloandphyloToGraphto allow conversion between graph and phylo formats.
Diversity:
- Fixed a bug in
estimateAbundancewhere setting theclonecolumn to a non-default value produced an error. - Added rarefaction options to
estimateAbundancethrough themin_n,max_n, anduniformarguments. - Moved the rarefaction calculation for the diversity functions into
estimateAbundance.alphaDiversitywill callestimateAbundancefor bootstrapping if not provided an existingAbundanceCurveobject. - Restructured the
DiversityCurveandAbundanceCurveobjects to accomodate the new diversity methods.
Gene Usage:
groupGenesnow supports grouping by V gene, J gene, and junction length (junc_len) as well, in addition to grouping by V gene and J gene without junction length. Also added support for single-cell input data with the addition of new argumentscell_id,locus, andonly_igh.
Version 0.2.11: September 12, 2018¶
General:
- Added
nonsquareDistfunction to calculate the non-square distance matrix of sequences. - Exported some internal utility functions to make them available to dependent
packages:
progressBar,baseTheme,checkColumnsandcpuCount.
Diversity:
estimateAbundance, andplotAbundanceCurve, will now allowgroup=NULLto be specified to performance abundance calculations on ungrouped data.
Gene Usage:
- Added
fillargument tocountGenes. When setTRUEthis adds zeroes to thegrouppairs that do not exist in the data. - Added new function
groupGenesto group sequences sharing same V and J gene.
Toplogy Analysis:
- Fixed a bug in tableEdges causing it to fail when no parent/child
relationships exist when specifying
indirect=TRUE. makeChangeoClonewill now issue an error and terminate, instead of continuing with a warning, when all sequences are not the same length.
Version 0.2.10: March 30, 2018¶
General:
- Fixed a bug in
IPUAC_AAwherein X was not properly matching against Q. - Changed behavior in
getAAMatrixto treat * (stop codon) as a mismatch.
Version 0.2.9: March 21, 2018¶
General:
- Added explicit type casting for known columns to
readChangeoDb. - Added the
padSeqEndsfunction which pads sequences with Ns to make then equal in length. - Added verification of unique sequence IDs to
collapseDuplicates.
Diversity:
- Added the
uniformargument torarefyDiversityallowing users to toggle uniform vs non-uniform sampling. - Renamed
plotAbundancetoplotAbundanceCurve. - Changed
estimateAbundancereturn object from a data.frame to a newAbundanceCurvecustom class. - Set default
plotcall forAbundanceCurvetoplotAbundanceCurve. - Added the
annotateargument fromplotDiversityCurvetoplotAbundanceCurve. - Added the
scoreargument toplotDiversityCurveto toggle between plotting diversity or evenness. - Added the function
plotDiversityTestto generate a simple plot ofDiversityTestobject summaries.
Gene Usage:
- Added the
omit_nlargument togetAllele,getGeneandgetFamilyto allow optional filtering of non-localized (NL) genes.
Lineage:
- Fixed a bug in
makeChangeoClonepreventing it from interpreting theidargument correctly. - Added the
pad_endargument tomakeChangeoCloneto allow automatic padding of ends to make sequences the same length.
Version 0.2.8: September 21, 2017¶
General:
- Updated Rcpp dependency to 0.12.12.
- Added
dryargument tocollapseDuplicateswhich will annotate duplicate sequences but not remove them when set toTRUE. - Fixed a bug where
collapseDuplicateswas returning one sequence if all sequences were considered ambiguous.
Lineage:
- Added ability to change masking character and distance matrix used in
makeChangeoCloneandbuildPhylipLineagefor purposes of (optionally) treating indels as mismatches. - Fixed a bug in
buildPhylipLineagewhen PHYLIP doesn’t generate inferred sequences and has only one block.
Version 0.2.7: June 12, 2017¶
General:
- Fixed a bug in
readChangeoDbcausing theselectargument to do nothing. - Added progress package dependency.
- Internal changes to support Rcpp 0.12.11.
Gene Usage:
- Renamed the count/frequency columns output by
countGeneswhen thecloneargument is specified toCLONE_COUNT/CLONE_FREQ. - Added a vignette describing basic gene usage analysis.
Version 0.2.6: March 21, 2017¶
General:
- License changed to Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
- Removed data.table dependency and added readr dependency.
- Performance improvements in
readChangeoDbandwriteChangeoDb.
Version 0.2.5: August 5, 2016¶
General:
- Fixed a bug in
seqDist()wherein distance was not properly calculated in some sequences containing gap characters. - Added stop and gap characters to
getAAMatrix()return matrix.
Version 0.2.4: July 20, 2016¶
General:
- Added Rcpp and data.table dependencies.
- Modified
readChangeoDb()to wrapdata.table::fread()instead ofutils::read.table()if the input file is not compressed. - Ported
testSeqEqual(),getSeqDistance()andgetSeqMatrix()to C++ to improve performance ofcollapseDuplicates()and other dependent functions. - Renamed
testSeqEqual(),getSeqDistance()andgetSeqMatrix()toseqEqual(),seqDist()andpairwiseDist(), respectively. - Added
pairwiseEqual()which creates a logical sequence distance matrix; TRUE if sequences are identical, FALSE if not, excluding Ns and gaps. - Added translation of ambiguous and gap characters to
XintranslateDNA(). - Fixed bug in
collapseDuplicates()wherein the input data type sanity check would cause the vignette to fail to build under R 3.3. - Replaced the
ExampleDb.gzfile with a larger, more clonal,ExampleDbdata object. - Replaced
ExampleTreeswith a larger set of trees. - Renamed
multiggplot()togridPlot().
Amino Acid Analysis:
- Set default to
normalize=FALSEfor charge calculations to be more consistent with previously published repertoire sequencing results.
Diversity Analysis:
- Added a
progressargument torarefyDiversity()andtestDiversity()to enable the (previously default) progress bar. - Fixed a bug in
estimateAbundance()were the function would fail if there was only a single input sequence per group. - Changed column names in
dataandsummaryslots ofDiversityTestto uppercase for consistency with other tools. - Added dispatching of
plottoplotDiversityCurveforDiversityCurveobjects.
Gene Usage:
- Added
sortGenes()function to sort V(D)J genes by name or locus position. - Added
cloneargument tocountGenes()to allow restriction of gene abundance to one gene per clone.
Topology Analysis:
- Added a set of functions for lineage tree topology analysis.
- Added a vignette showing basic tree topology analysis.
Version 0.2.3: February 22, 2016¶
General:
- Fixed a bug wherein the package would not build on R < 3.2.0 due to changes
in
base::nchar(). - Changed R dependency to R >= 3.1.2.
Version 0.2.2: January 29, 2016¶
General:
- Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0.
- Internal changes to conform to CRAN policies.
Amino Acid Analysis:
- Fixed bug where arguments for the
aliphatic()function were not being passed through the ellipsis argument ofaminoAcidProperties(). - Improved amino acid analysis vignette.
- Added check for correctness of amino acids sequences to
aminoAcidProperties(). - Renamed
AA_TRANStoABBREV_AA.
Diversity:
- Added evenness and bootstrap standard deviation to
rarefyDiversity()output.
Lineage:
- Added
ExampleTreesdata with example output frombuildPhylipLineage().
Version 0.2.1: December 18, 2015¶
General:
- Removed plyr dependency.
- Added dplyr, lazyeval and stringi dependencies.
- Added strict requirement for igraph version >= 1.0.0.
- Renamed
getDNADistMatrix()andgetAADistMatrix()togetDNAMatrixandgetAAMatrix(), respectively. - Added
getSeqMatrix()which calculates a pairwise distance matrix for a set of sequences. - Modified default plot sizing to be more appropriate for export to PDF figures with 7-8 inch width.
- Added
multiggplot()function for performing multiple panel plots.
Amino Acid Analysis:
- Migrated amino acid property analysis from Change-O CTL to alakazam.
Includes the new functions
gravy(),bulk(),aliphatic(),polar(),charge(),countPatterns()andaminoAcidProperties().
Annotation:
- Added support for unusual TCR gene names, such as ‘TRGVA*01’.
- Added removal of ‘D’ label (gene duplication) from gene names when parsed
with
getSegment(),getAllele(),getGene()andgetFamily(). May be disabled by providing the argumentstrip_d=FALSE. - Added
countGenes()to tabulate V(D)J allele, gene and family usage.
Diversity:
- Added several functions related to analysis of clone size distributions,
including
countClones(),estimateAbundance()andplotAbundance(). - Renamed
resampleDiversity()torarefyDiversity()and changed many of the internals. Bootstrapping is now performed on an inferred complete relative abundance distribution. - Added support for inclusion of copy number in clone size determination
within
rarefyDiversity()andtestDiversity(). - Diversity scores and confidence intervals within
rarefyDiversity()andtestDiversity()are now calculated using the mean and standard deviation of the bootstrap realizations, rather than the median and upper/lower quantiles. - Added ability to add counts to the legend in
plotDiversityCurve().
Version 0.2.0: June 15, 2015¶
Initial public release.
General:
- Added citations for the
citation("alakazam")command.
Version 0.2.0.beta-2015-05-30: May 30, 2015¶
Lineage:
- Added more error checking to
buildPhylipLineage().
Version 0.2.0.beta-2015-05-26: May 26, 2015¶
Lineage:
- Fixed issue where
buildPhylipLineage()would hang on R 3.2 due to R change request PR#15508.
Version 0.2.0.beta-2015-05-05: May 05, 2015¶
Prerelease for review.