Chemistry::Mol - Molecule object toolkit
use Chemistry::Mol;
$mol = Chemistry::Mol->new(id => "mol_id", name => "my molecule");
$c = $mol->new_atom(symbol => "C", coords => [0,0,0]);
$o = $mol->new_atom(symbol => "O", coords => [0,0,1.23]);
$mol->new_bond(atoms => [$c, $o], order => 3);
print $mol->print;
This package, along with Chemistry::Atom and Chemistry::Bond,
includes basic objects and methods to describe molecules.
The core methods try not to enforce a particular convention. This
means that only a minimal set of attributes is provided by default, and some
attributes have very loosely defined meaning. This is because each program
and file type has different idea of what each concept (such as bond and atom
type) means. Bonds are defined as a list of atoms (typically two) with an
arbitrary type. Atoms are defined by a symbol and a Z, and may have 3D and
internal coordinates (2D coming soon).
See also Chemistry::Obj for generic attributes.
- Chemistry::Mol->new(name
=> value, ...)
- Create a new Mol object with the specified attributes.
$mol = Chemistry::Mol->new(id => 'm123', name => 'my mol')
is the same as
Chemistry::Mol->new()
$mol->id('m123')
$mol->name('my mol')
- $mol->add_atom($atom, ...)
- Add one or more Atom objects to the molecule. Returns the last atom
added.
- $mol->atom_class
- Returns the atom class that a molecule or molecule class expects to use by
default. Chemistry::Mol objects return "Chemistry::Atom", but
subclasses will likely override this method.
- $mol->new_atom(name => value, ...)
- Shorthand for
"$mol->add_atom($mol->atom_class->new(name
=> value, ...))".
- $mol->delete_atom($atom, ...)
- Deletes an atom from the molecule. It automatically deletes all the bonds
in which the atom participates as well. $atom
should be a Chemistry::Atom reference. This method also accepts the atom
index, but this use is deprecated (and buggy if multiple indices are
given, unless they are in descending order).
- $mol->add_bond($bond, ...)
- Add one or more Bond objects to the molecule. Returns the last bond
added.
- $mol->bond_class
- Returns the bond class that a molecule or molecule class expects to use by
default. Chemistry::Mol objects return "Chemistry::Bond", but
subclasses will likely override this method.
- $mol->new_bond(name => value, ...)
- Shorthand for
"$mol->add_bond($mol->bond_class->new(name
=> value, ...))".
- $mol->delete_bond($bond, ...)
- Deletes a bond from the molecule. $bond should be
a Chemistry::Bond object.
- $mol->by_id($id)
- Return the atom or bond object with the corresponding id.
- $mol->atoms($n1, ...)
- Returns the atoms with the given indices, or all by default. Indices start
from one, not from zero.
- $mol->atoms_by_name($name)
- Returns the atoms with the given name (treated as an anchored regular
expression).
- $mol->sort_atoms($sub_ref)
- Sort the atoms in the molecule by using the comparison function given in
$sub_ref. This function should take two atoms as
parameters and return -1, 0, or 1 depending on whether the first atom
should go before, same, or after the second atom. For example, to sort by
atomic number, you could use the following:
$mol->sort_atoms( sub { $_[0]->Z <=> $_[1]->Z } );
Note that the atoms are passed as parameters and not as the
package variables $a and
$b like the core sort function does. This is
because $mol->sort will likely be called from
another package and we don't want to play with another package's symbol
table.
- $mol->bonds($n1, ...)
- Returns the bonds with the given indices, or all by default. Indices start
from one, not from zero.
- $mol->print(option => value...)
- Convert the molecule to a string representation. If no options are given,
a default YAML-like format is used (this may change in the future).
Otherwise, the format should be specified by using the
"format" option.
- $s = $mol->sprintf($format)
- Format interesting molecular information in a concise way, as specified by
a printf-like format.
%n - name
%f - formula
%f{formula with format} - (note: right braces within
the format should be escaped with a backslash)
%s - SMILES representation
%S - canonical SMILES representation
%m - mass
%8.3m - mass, formatted as %8.3f with core sprintf
%q - formal charge
%a - atom count
%b - bond count
%t - type
%i - id
%% - %
For example, if you want just about everything:
$mol->sprintf("%s - %n (%f). %a atoms, %b bonds; "
. "mass=%m; charge =%q; type=%t; id=%i");
Note that you have to "use
Chemistry::File::SMILES" before using
%s or %S on
"$mol->sprintf".
- $mol->printf($format)
- Same as "$mol->sprintf", but prints
to standard output automatically. Used for quick and dirty molecular
information dumping.
- Chemistry::Mol->parse($string,
option => value...)
- Parse the molecule encoded in $string. The format
should be specified with the the
"format" option; otherwise, it will be
guessed.
- Chemistry::Mol->read($fname,
option => value ...)
- Read a file and return a list of Mol objects, or croaks if there was a
problem. The type of file will be guessed if not specified via the
"format" option.
Note that only registered file readers will be used. Readers
may be registered using
"register_format()"; modules that
include readers (such as Chemistry::File::PDB) usually register them
automatically when they are loaded.
Automatic decompression of gzipped files is supported if the
Compress::Zlib module is installed. Files ending in .gz are assumed to
be compressed; otherwise it is possible to force decompression by
passing the gzip => 1 option (or no decompression with gzip =>
0).
- $mol->write($fname, option => value ...)
- Write a molecule file, or croak if there was a problem. The type of file
will be guessed if not specified via the
"format" option.
Note that only registered file formats will be used.
Automatic gzip compression is supported if the IO::Zlib module
is installed. Files ending in .gz are assumed to be compressed;
otherwise it is possible to force compression by passing the gzip =>
1 option (or no compression with gzip => 0). Specific compression
levels between 2 (fastest) and 9 (most compressed) may also be used
(e.g., gzip => 9).
- Chemistry::Mol->file($file,
option => value ...)
- Create a Chemistry::File-derived object for reading or writing to a file.
The object can then be used to read the molecules or other information in
the file.
This has more flexibility than calling
"Chemistry::Mol->read" when dealing
with multi-molecule files or files that have higher structure or that
have information that does not belong to the molecules themselves. For
example, a reaction file may have a list of molecules, but also general
information like the reaction name, yield, etc. as well as the
classification of the molecules as reactants or products. The exact
information that is available will depend on the file reader class that
is being used. The following is a hypothetical example for reading MDL
rxnfiles.
# assuming this module existed...
use Chemistry::File::Rxn;
my $rxn = Chemistry::Mol->file('test.rxn');
$rxn->read;
$name = $rxn->name;
@reactants = $rxn->reactants; # mol objects
@products = $rxn->products;
$yield = $rxn->yield; # a number
Note that only registered file readers will be used. Readers
may be registered using register_format(); modules that include
readers (such as Chemistry::File::PDB) usually register them
automatically.
- Chemistry::Mol->register_format($name,
$ref)
- Register a file type. The identifier $name must be
unique. $ref is either a class name (a package) or
an object that complies with the Chemistry::File interface (e.g., a
subclass of Chemistry::File). If $ref is omitted,
the calling package is used automatically. More than one format can be
registered at a time, but then $ref must be
included for each format (e.g., Chemistry::Mol->register_format(format1
=> "package1", format2 => package2).
The typical user doesn't have to care about this function. It
is used automatically by molecule file I/O modules.
- Chemistry::Mol->formats
- Returns a list of the file formats that have been installed by
register_format()
- $mol->mass
- Return the molar mass. This is just the sum of the masses of the atoms.
See Chemistry::Atom::mass for details such as the handling of
isotopes.
- $mol->charge
- Return the charge of the molecule. By default it returns the sum of the
formal charges of the atoms. However, it is possible to set an arbitrary
charge by calling
"$mol->charge($new_charge)"
- $mol->formula_hash
- Returns a hash reference describing the molecular formula. For methane it
would return { C => 1, H => 4 }.
- $mol->formula($format)
- Returns a string with the formula. The format can be specified as a
printf-like string with the control sequences specified in the
Chemistry::File::Formula documentation.
- my $mol2 =
$mol->clone;
- Makes a copy of a molecule. Note that this is a deep copy; if your
molecule has a pointer to the rest of the universe, the entire universe
will be cloned!
By default, clone() uses Storable to copy the Perl data
structure. Clone can be used instead by setting variable
$Chemistry::Mol::clone_backend to
"Clone" (default is
"Storable"). The documentation of
Storable claims Clone is less memory-intensive.
- my $mol2 =
$mol->safe_clone;
- Like clone, it makes a deep copy of a molecule. The difference is that the
copy is not "exact" in that new molecule and its atoms and bonds
get assigned new IDs. This makes it safe to combine cloned molecules. For
example, this is an error:
# XXX don't try this at home!
my $mol2 = Chemistry::Mol->combine($mol1, $mol1);
# the atoms in $mol1 will clash
But this is ok:
# the "safe clone" of $mol1 will have new IDs
my $mol2 = Chemistry::Mol->combine($mol1, $mol1->safe_clone);
- ($distance, $atom_here, $atom_there) = $mol->distance($obj)
- Returns the minimum distance to $obj, which can be
an atom, a molecule, or a vector. In scalar context it returns only the
distance; in list context it also returns the atoms involved. The current
implementation for calculating the minimum distance between two molecules
compares every possible pair of atoms, so it's not efficient for large
molecules.
- my $bigmol =
Chemistry::Mol->combine($mol1, $mol2, ...)
- $mol1->combine($mol2, $mol3, ...)
- Combines several molecules in one bigger molecule. If called as a class
method, as in the first example, it returns a new combined molecule
without altering any of the parameters. If called as an instance method,
as in the second example, all molecules are combined into
$mol1 (but $mol2,
$mol3, ...) are not altered. Note: Make
sure you don't combine molecules which contain atoms with duplicate IDs
(for example, if they were cloned).
- my @mols =
$mol->separate
- Separates a molecule into "connected fragments". The original
object is not modified; the fragments are clones of the original ones.
Example: if you have ethane (H3CCH3) and you delete the C-C bond, you have
two CH3 radicals within one molecule object ($mol). When you call
$mol->separate you get two molecules, each one
with a CH3.
- $mol->sprout_hydrogens
- Convert all the implicit hydrogen atoms in the molecule to explicit atoms.
It does not generate coordinates for the atoms.
- $mol->collapse_hydrogens
- Convert all the explicit hydrogen atoms in the molecule to implicit
hydrogens. (Exception: hydrogen atoms that are adjacent to a hydrogen atom
are not collapsed.)
- $mol->add_implicit_hydrogens
- Use heuristics to figure out how many implicit hydrogens should each atom
in the molecule have to satisfy its normal "organic"
valence.
- Chemistry::Mol->register_descriptor($name
=> $sub_ref)
- Adds a callback that can be used to add functionality to the molecule
class (originally meant to add custom molecule descriptors.) A descriptor
is a function that takes a molecule object as its only argument and
returns a value or values. For example, to add a descriptor function that
computes the number of atoms:
Chemistry::Mol->register_descriptor(
number_of_atoms => sub {
my $mol = shift;
return scalar $mol->atoms;
}
);
The descriptor is accessed by name via the
"descriptor" instance method:
my $n = $mol->descriptor('number_of_atoms');
- my $value =
$mol->descriptor($descriptor_name)
- Calls a previously registered descriptor function giving it
$mol as an argument, as shown above for
"register_descriptor".
<https://github.com/perlmol/Chemistry-Mol>
Chemistry::Atom, Chemistry::Bond, Chemistry::File,
Chemistry::Tutorial
Ivan Tubert-Brohman <itub@cpan.org>
Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This
program is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.