Image::MetaData::JPEG(3pm) | User Contributed Perl Documentation | Image::MetaData::JPEG(3pm) |
Image::MetaData::JPEG - Perl extension for showing/modifying JPEG (meta)data.
use Image::MetaData::JPEG; # Create a new JPEG file structure object my $image = new Image::MetaData::JPEG('somepicture.jpg'); die 'Error: ' . Image::MetaData::JPEG::Error() unless $image; # Get a list of references to comment segments my @segments = $image->get_segments('COM', 'INDEXES'); # Get the JPEG picture dimensions my ($dim_x, $dim_y) = $image->get_dimensions(); # Show all JPEG segments and their content print $image->get_description(); # Retrieve a specific value from Exif meta-data my $image_data = $image->get_Exif_data('IMAGE_DATA', 'TEXTUAL'); print $image_data->{DateTimeOriginal}->[0], "\n"; # Modify the DateTime tag for the main image $image->set_Exif_data({'DateTime' => '1994:07:23 12:14:51'}, 'IMAGE_DATA', 'ADD'); # Delete all meta-data segments (please, don't) $image->drop_segments('METADATA'); # Rewrite file to disk after your modifications $image->save('new_file_name.jpg'); # ... and a lot more methods for viewing/modifying meta-data, which # are accessed through the $file or $segments[$index] references.
The purpose of this module is to read/modify/rewrite meta-data segments in JPEG (Joint Photographic Experts Group format) files, which can contain comments, thumbnails, Exif information (photographic parameters), IPTC information (editorial parameters) and similar data.
Each JPEG file is made of consecutive segments (tagged data blocks), and the actual row picture data. Most of these segments specify parameters for decoding the picture data into a bitmap; some of them, namely the COMment and APPlication segments, contain instead meta-data, i.e., information about how the photo was shot (usually added by a digital camera) and additional notes from the photograph. These additional pieces of information are especially valuable for picture databases, since the meta-data can be saved together with the picture without resorting to additional database structures. See the appendix about the structure of JPEG files for technical details.
This module works by breaking a JPEG file into individual segments. Each file is associated to an Image::MetaData::JPEG structure object, which contains one Image::MetaData::JPEG::Segment object for each segment. Segments with a known format are then parsed, and their content can be accessed in a structured way for display. Some of them can even be modified and then rewritten to disk.
$Image::MetaData::JPEG::show_warnings = undef;
my $file = new Image::MetaData::JPEG('a_file_name.jpg'); my $file = new Image::MetaData::JPEG(\ $a_JPEG_stream);
The constructor method accepts two optional arguments, a regular expression and an option string. If the regular expression is present, it is matched against segment names, and only those segments with a positive match are parsed (they are nonetheless stored); this allows for some speed-up if you just need partial information, but be sure not to miss something necessary; e.g., SOF segments are needed for reading the picture dimensions. For instance, if you just want to manipulate the comments, you could set the string to 'COM'.
my $file = new Image::MetaData::JPEG('a_file_name.jpg', 'COM');
The third optional argument is an option string. If it matches the string 'FASTREADONLY', only the segments matching the regular expression are actually stored; also, everything which is found after a Start Of Scan is completely neglected. This allows for very large speed-ups, but, obviously, you cannot rebuild the file afterwards, so this is only for getting information fast, e.g., when doing a directory scan.
my $file = new Image::MetaData::JPEG('a_file.jpg', 'COM', 'FASTREADONLY');
Nota bene: an old version of "Arles Image Web Page Creator" had a bug which caused the application to generate JPEG's with illegal comment segments, reportedly due to a bug in the Intel JPEG library the developers used at that time (these segments had to 0x00 bytes appended). It is true that a JPEG file with garbage between segments is to be considered invalid, but some libraries like IJG's try to forgive, so this module tries to forgive too, if the amount of garbage isn't too large (only a warning is printed).
die 'Error: ' . Image::MetaData::JPEG::Error() unless $file;
my @segments = $file->get_segments($regex, $do_indexes);
$file->drop_segments('^APP1$');
$file->insert_segments([$my_comment_1, $my_comment_2], 3, 1);
print $file->get_description(); my ($dim_x, $dim_y) = $file->get_dimensions();
my $new_position = $file->find_new_app_segment_position('APP2');
print "Creation of $newJPEG failed!" unless $file->save($newJPEG);
An example of how to proficiently use the in-memory feature to read the content of a JPEG thumbnail is the following (see later for get_Exif_data, and also do some error checking!):
my $thumbnail = $file->get_Exif_data('THUMBNAIL'); print Image::MetaData::JPEG->new($thumbnail)->get_description();
printf 'Invalid %s!\n', $segment->{name} if $segment->{error};
my $records = $segment->{records}; printf '%s has %d records\n', $segment->{name}, scalar @$records;
my $segments = $file->get_segments('APP0'); my $segment = $$segments[0]; print "I found it!\n" if $segment->search_record('Identifier');
If you are interested only in the Record's value, you can use the search_record_value method, a simple wrapper around search_record(): it returns the record value (with "JPEG::Record::get_value") if the search is successful, undef otherwise.
print "Its value is: ", $segment->search_record_value('Identifier');
Nota bene: the returned record is initialised with a "fake" $REFERENCE record pointing to the records member of the current segment; this record is therefore returned if search_record is invoked without arguments. For the same reason, search_record_value invoked without arguments returns the records member:
$segment->search_record_value() eq $this->{records} || print "error!";
Note that this method preliminarly saves a reference to the old segment data area and restores it if the update process fails (if this happens, a warning is generated). One wonders wheather there are there cleverer ways to handle this case (any suggestion is welcome). It is however better to have a corrupt object in memory, than a corrupt object written over the original. Currently, this is restricted to the possibility that an updated segment becomes too large.
$segment->update();
for my $segment ($file->get_segments('APP13')) { $segment->reparse_as('APP2') if $segment->{error} && $segment->search_record('Identifier') =~ 'ICC_PROFILE'; $segment->update(); }
eval { $segment->output_segment_data($output_handle) } || print "A terrible output error occurred! Help me.\n";
print $segment->get_description(); print 'Size is 4 + ' . $segment->size();
printf 'The numeric key 0x%04x means %s', $record->{key}, JPEG_lookup('APP13@Photoshop_RECORDS', $record->{key}); printf 'This record contains %d values\n', scalar @{$record->{values}};
A Record's type can be one among the following predefined constants:
0 $NIBBLES two 4-bit unsigned integers (private) 1 $BYTE An 8-bit unsigned integer 2 $ASCII A variable length ASCII string 3 $SHORT A 16-bit unsigned integer 4 $LONG A 32-bit unsigned integer 5 $RATIONAL Two LONGs (numerator and denominator) 6 $SBYTE An 8-bit signed integer 7 $UNDEF A generic variable length string 8 $SSHORT A 16-bit signed integer 9 $SLONG A 32-bit signed integer (2's complement) 10 $SRATIONAL Two SLONGs (numerator and denominator) 11 $FLOAT A 32-bit float (a single float) 12 $DOUBLE A 64-bit float (a double float) 13 $REFERENCE A Perl list reference (internal)
$UNDEF is used for not-better-specified binary data. A record of a numeric type can have multiple elements in its @{values} list ($NIBBLES implies an even number); an $UNDEF or $ASCII type record instead has only one element, but its length can vary. Last, a $REFERENCE record holds a single Perl reference to another record list: this allows for the construction of a sort of directory tree in a Segment.
for my $record (@{$segment->{records}}) { print "Subdir found\n" if $record->get_category() eq 'p'; }
print $record->get_description($names);
sub show_directory { my ($segment, $records, $names) = @_; my @subdirs = (); for my $record (@$records) { print $record->get_description($names); push @subdirs, $record if $record->get_category() eq 'p'; } foreach my $subdir (@subdirs) { my $directory = $subdir->get_value(); push @$names, $subdir->{key}; printf 'Subdir %s (%d records)', $names, scalar @$directory; show_directory($segment, $directory, $names); pop @$names; } } show_directory($segment, $segment->{records}, [ $segment->{name} ]);
my ($key, $type, $count, $dataref) = $record->get();
my $number = $file->get_number_of_comments(); my @comments = $file->get_comments();
$file->add_comment('a' x 100000);
$file->set_comment(0, 'This is the new comment');
$file->remove_comment(0); $file->remove_all_comments();
$file->join_comments('---', 2, 5, 8);
APP0 Segments are written by older cameras adopting the JFIF (JPEG File Interchange Format), or one of its extensions, for storing images. JFIF files use the APP0 application Segment for inserting configuration data and a JPEG or RGB packed thumbnail image. The format is described in the appendix about the APP0 structure, including the names of all possible tags. It is of course possible to access each APP0 Segment individually by means of the get_segments and search_record_value methods. A snippet of code for doing this is the following:
for my $segment ($file->get_segments('APP0')) { my $iden = $segment->search_record_value('Identifier'); my $xdim = $segment->search_record_value('Xthumbnail'); my $ydim = $segment->search_record_value('Ythumbnail'); printf 'Segment type: %s; dimensions: %dx%d\n', substr($iden, 0, -1), $xdim, $ydim; printf '%15s => %s\n', $_->{key}, $_->get_value() for $segment->{records}; }
my $data = $file->get_app0_data(); printf '%15s => %s\n', $_, (($_=~/..Thumbnail/)?'...':$$data{$_});
The DCT Exif (Exchangeable Image File format) standard provides photographic meta-data in the APP1 section. Various tag-values pairs are stored in groups called IFDs (Image File Directories), where each group refers to a different kind of information; one can find data about how the photo was shot, GPS data, thumbnail data and so on ... (see the appendix about the APP1 segment structure for more details). This module provides a number of methods for managing Exif data without dealing with the details of the low level representation. Note that, given the complicated structure of an Exif APP1 segment (where extensive use of "pointers" is made), some digital cameras and graphic programs decide to leave some unused space in the JPEG file. The dump routines of this module, on the other hand, leave no unused space, so just calling update() on an Exif APP1 segment even without modifying its content can give you a smaller file (some tens of kilobytes can be saved).
my $num = $file->retrieve_app1_Exif_segment(-1); my $ref = $file->retrieve_app1_Exif_segment($num - 1);
my $ref = $file->provide_app1_Exif_segment();
$file->remove_app1_Exif_info(-1);
How to inspect your Exif data
All Exif records are natively identified by numeric tags (keys), which can be "translated" into a human-readable form by using the Exif standard docs; only a few fields in the Exif APP1 preamble (they are not Exif records) are always identified by this module by means of textual tags. The $type argument selects the output format for the record keys (tags):
* NUMERIC: record tags are native numeric keys * TEXTUAL: record tags are human-readable (default)
Of course, record values are never translated. If a numeric Exif tag is not known, a custom textual key is created with "Unknown_tag_" followed by its numerical value (this solves problems with non-standard tags). The subset of Exif tags returned by this method is determined by the value of $what, which can be one of:
$what returned info returned type --------------------------------------------------------------------- ALL (default) everything but THUMBNAIL ref. to hash of hashes IMAGE_DATA a merge of IFD0_DATA and SUBIFD_DATA ref. to flat hash THUMB_DATA this is an alias for IFD1_DATA ref. to flat hash THUMBNAIL the actual (un)compressed thumbnail ref. to scalar ROOT_DATA header records (TIFF and similar) ref. to flat hash IFD0_DATA primary image TIFF tags ref. to flat hash SUBIFD_DATA Exif private tags ref. to flat hash MAKERNOTE_DATA MakerNote tags (if struct. is known) ref. to flat hash GPS_DATA GPS data of the primary image ref. to flat hash INTEROP_DATA interoperability data ref. to flat hash IFD1_DATA thumbnail-related TIFF tags ref. to flat hash
Setting $what equal to 'ALL' returns a reference to a hash of hashes, whose top-level hash contains the following keys: ROOT_DATA, IFD0_DATA, SUBIFD_DATA, GPS_DATA, INTEROP_DATA, MAKERNOTE_DATA and IFD1_DATA; each key corresponds to a second-level hash containing a copy of all Exif records present in the IFD (sub)directory corresponding to the key (if this directory is not present or contains no records, the second-level hash exists and is empty). Note that the Exif record values' format is not checked to be valid according to the Exif standard. This is, in some sense, consistent with the fact that also "unknown" tags are included in the output. This complicated structure is more easily explained by showing an example (see also the section about valid Exif tags for details on possible records):
my $hash_ref = $segment->get_Exif_data('ALL', 'TEXTUAL'); can give $hash_ref = { 'ROOT_DATA' => { 'Signature' => [ 42 ], 'Endianness' => [ 'MM' ], 'Identifier' => [ "Exif\000\000" ], 'ThumbnailData' => [ ... image ... ], }, 'IFD1_DATA' => { 'ResolutionUnit' => [ 2 ], 'JPEGInterchangeFormatLength' => [ 3922 ], 'JPEGInterchangeFormat' => [ 2204 ], 'Orientation' => [ 1 ], 'XResolution' => [ 72, 1 ], 'Compression' => [ 6 ], 'YResolution' => [ 72, 1 ], }, 'SubIFD_DATA' => { 'ApertureValue' => [ 35, 10 ], 'PixelXDimension' => [ 2160 ], etc., etc. .... 'ExifVersion' => [ '0210' ], }, 'MAKERNOTE_DATA' => {}, 'IFD0_DATA' => { 'Model' => [ "KODAK DX3900 ZOOM DIGITAL CAMERA\000" ], 'ResolutionUnit' => [ 2 ], etc., etc. ... 'YResolution' => [ 230, 1 ], }, 'GPS_DATA' => {}, 'INTEROP_DATA' => { 'InteroperabilityVersion' => [ '0100' ], 'InteroperabilityIndex' => [ "R98\000" ], }, };
Setting $what equal to '*_DATA' returns a reference to a flat hash, corresponding to one or more IFD (sub)dirs. For instance, 'IMAGE_DATA' is a merge of 'IFD0_DATA' and 'SUBIFD_DATA': this interface is simpler for the end-user, because there is only one dereference level; also, he/she does not need to be aware of the partition of records related to the main image into two IFDs. If the (sub)directory is not present or contains no records, the returned hash exists and is empty. With reference to the previous example:
my $hash_ref = $segment->get_Exif_data('IMAGE_DATA', 'TEXTUAL'); gives $hash_ref = { 'ResolutionUnit' => [ 2 ], 'JPEGInterchangeFormatLength' => [ 3922 ], 'JPEGInterchangeFormat' => [ 2204 ], 'Orientation' => [ 1 ], 'XResolution' => [ 72, 1 ], 'Compression' => [ 6 ], 'YResolution' => [ 72, 1 ], 'ApertureValue' => [ 35, 10 ], 'PixelXDimension' => [ 2160 ], etc., etc. .... 'ExifVersion' => [ '0210' ], };
Last, setting $what to 'THUMBNAIL' returns a reference to a copy of the actual Exif thumbnail image (this is not included in the set returned by 'THUMB_DATA'); if there is no thumbnail, a reference to the empty string is returned (the undefined value cannot be used, because it is assumed that it corresponds to an error condition here). Note that the pointed scalar may be quite large (~ 10^1 KB). If the thumbnail is in JPEG format (this corresponds to the 'Compression' property, in IFD1, set to 6), you can create another JPEG picture object from it, like in the following example:
my $data_ref = $segment->get_Exif_data('THUMBNAIL'); my $thumb = new Image::MetaData::JPEG($data_ref); print $thumb->get_description();
If you are only interested in reading Exif data in a standard configuration, you can skip the segment-search calls and use directly JPEG::get_Exif_data (a method of the JPEG class, so you only need a JPEG structure object). This is an interface to the method with the same name in the Segment class, acting on the first Exif APP1 Segment (if no such segment is present, the undefined value is returned) and passing the arguments through. Note that most JPEG files with Exif data contain at most one Exif APP1 segment, so you are not going to loose anything here. A snippet of code for visualising Exif data looks like this:
while (my ($d, $h) = each %{$image->get_Exif_data('ALL')}) { while (my ($t, $a) = each %$h) { printf '%-25s\t%-25s\t-> ', $d, $t; s/([\000-\037\177-\377])/sprintf '\\%02x',ord($1)/ge, $_ = (length $_ > 30) ? (substr($_,0,30) . ' ... ') : $_, printf '%-5s', $_ for @$a; print "\n"; } }
How to modify your Exif data
Similarly to the getter case, there is a set_Exif_data method callable from a picture object, which does nothing more than looking for the first Exif APP1 segment (creating it, if there is none) and invoke the method with the same name in the Segment class, passing its arguments through. So, the remaining of this section will concentrate on the Segment method. The problem of setting a new thumbnail or erasing it is dealt with in the last paragraphs of this section. (The APP1 Exif structure is quite complicated, and the number of different possible cases when trying to modify it is very large; therefore, designing a clean and intuitive interface for this task is not trivial. Fell free to suggest improvements and cleaner interfaces).
Exif records are usually characterised by a numeric key (a tag); this was already discussed in the "getter" section. Since these keys, for valid records, can be translated from numeric to textual form and back, the end user has the freedom to use whichever form better fits his needs. The two forms can even be mixed in the same "setter" call: the method will take care to translate textual tags to numeric tags when possible, and reject the others; then, it will proceed as if all tags were numeric from the very beginning. Records with unknown textual or numeric tags are always rejected.
The arguments to set_Exif_data are $data, $what and $action. The $data argument must be a hash reference to a flat hash, containing the key - record values pairs supplied by the user. The "value" part of each hash element can be an array reference (containing a list of values for the record, remember that some records are multi-valued) or a single scalar (this is internally converted to a reference to an array containing only the supplied scalar). If a record value is supposed to be a null terminated string, the user can supply a Perl scalar without the final null character (it will be inserted automatically).
The $what argument must be a scalar, and it selects the portion of the Exif APP1 segment concerned by the set_Exif_data call. So, obviously, the end user can modify only one section at a time; this is a simplification (for the developer of course) but also for the end user, because trying to set all Exif-like values in one go would require an offensively complicated data structure to specify the destination of each record (note that some records in different sections can have the same numerical tag, so a plain hash would not trivially work). Valid values for $what are (MakerNote data are not currently modifiable):
$what modifies ... $data type -------------------------------------------------------------------- IMAGE_DATA as IFD0_DATA and SUBIFD_DATA ref. to flat hash THUMB_DATA this is an alias for IFD1_DATA ref. to flat hash THUMBNAIL the actual (un)compressed thumbnail ref. to scalar/object ROOT_DATA header records (endianness) ref. to flat hash IFD0_DATA primary image TIFF tags ref. to flat hash SUBIFD_DATA Exif private tags ref. to flat hash GPS_DATA GPS data of the primary image ref. to flat hash INTEROP_DATA interoperability data in SubIFD ref. to flat hash IFD1_DATA thumbnail-related TIFF tags ref. to flat hash
The $action argument controls whether the setter adds ($action = 'ADD') records to a given data directory or replaces ($action = 'REPLACE') them. In the first case, each user-supplied record replaces the existing version of that record if present, and simply inserts the record if it was not already present; however, existing records with no counterpart in the user supplied $data hash remain untouched. In the second case, the record directory is cleared before inserting user data. Note that, since Exif and Exif-like records are non-repeatable in nature, there is no need of an 'UPDATE' action, like for IPTC (see the IPTC section).
The set_Exif_data routine first checks that the concerned segment is of the appropriate type (Exif APP1), that $data is a hash reference (a scalar reference for the thumbnail), and that $action and $what are valid. If $action is undefined, it defaults to 'REPLACE'. Then, an appropriate (sub)IFD is created, if absent, and all user-supplied records are checked for consistency (have a look at the appendixes for this). Last, records are set in increasing (numerical) tag order, and mandatory data are added, if not present. The return value of the setter routine is always a hash reference; in general it contains records rejected by the specialised routines. If an error occurs in a very early stage of the setter, this reference contains a single entry with key='ERROR' and value set to some meaningful error message. So, returning a reference to an empty hash means that everything was OK. An example, concerning the much popular task of changing the DateTime record, follows:
$dt = '1994:07:23 12:14:51'; $hash = $image->set_Exif_data({'DateTime' => $dt}, 'IMAGE_DATA', 'ADD'); print "DateTime record rejected\n" if %$hash;
Depending on $what, some of the following notes apply:
my $image = new Image::MetaData::JPEG('original_image.jpg'); my $thumb = new Image::MetaData::JPEG('some_thumbnail.jpg'); $image->set_Exif_data($thumb, 'THUMBNAIL'); $image->save('modified_image.jpg'); $image->set_Exif_data(\ '', 'THUMBNAIL'); $image->save('thumbless_image.jpg');
XMP (eXtensible Metadata Platform) is a technology, conceived by Adobe Systems, to tag graphic files with metadata, and to manage them during a lifetime made of multiple processing steps. Its serialisation (the actual way metadata are saved in the file) is based on RDF (Resource Description Framework) implemented as an application of XML. Its flexibility allows to accomodate existing, future and private metadata schemas. In a JPEG file, XMP information is included alongside Exif and IPTC data, and is stored in an APP1 segment on its own starting with the XMP namespace URI and followed by the actual XMP packet (see XMP APP1 segment structure for more details).
XMP was introduced in 2001 as part of Adobe Acrobat version 5.01. Adobe has a trademark on XMP, and retains control over its specification. Source code for the XMP software-development kit was released by Adobe, but with a custom license, whose compatibility with the GNU public license and open-source nature altogether is questioned.
Adobe's Photoshop program, a de-facto standard for image manipulation, has, since long, used the APP13 segment for storing non-graphical information, such as layers, paths, ecc..., including editorial information modelled on IPTC/NAA recommendations. This module provides a number of methods for managing Photoshop/IPTC data without dealing with the details of the low level representation (although sometimes this means taking some decisions for the end user ....). The structure of the IPTC data block(s) is managed in detail and separately from the rest, although this block is a sort of "sub-case" of Photoshop information. The interface is intentionally similar to that for Exif data.
All public methods have a $what argument selecting which part of the APP13 segment you are working with. The default is 'IPTC'. If $what is invalid, an exception is always raised. The kind of information you can access with different values of $what is explained in the following (have a look at the appendices about valid Photoshop-style and IPTC tags for further details):
$what: Concerned pieces of information: ----------- -------------------------------- 'IPTC' or Editorial information like caption, abstract, author, 'IPTC_2' copyright notice, byline, shot site, user defined keywords, and many more; in practise, all what is covered by the IPTC Application Record 2. This is the most common option; the default value of $what, 'IPTC', is a synonym for 'IPTC_2' for backward compatibility (NOT a merge of 'IPTC_1/2'). 'IPTC_1' This refers to more obscure pieces of information, contained in the IPTC Envelope Record 1. One is rarely interested by this, exception made for the "Coded Character Set" tag, which is necessary to define a character set different from ASCII (i.e., when you don't write or read in English). 'PHOTOSHOP' Alpha channels, colour information, transfer functions, or 'PS_8BIM' and many other details concerning the visual rendering of or 'PS_8BPS' the picture. These fields are most often only modified by or 'PS_PHUT' an image manipulation program, and not directly by the user. Recent versions of Photoshop (>= 4.0) use a resource data block type equal to '8BIM', and this is the default in this module (so, 'PHOTOSHOP' and 'PS_8BIM' are synonyms). However, some other older or undocumented resource data block types are also allowed.
my $num_IPTC = $file->retrieve_app13_segment(-1, 'IPTC'); my $ref_IPTC = $file->retrieve_app13_segment($num - 1, 'IPTC');
my $ref_Photoshop = $file->provide_app13_segment('PHOTOSHOP');
$file->remove_app13_info(3, 'PHOTOSHOP'); $file->remove_app13_info(-1, 'IPTC'); $file->remove_app13_info(0, 'IPTC_1');
How to inspect and modify your IPTC data
Once you have a Segment reference pointing to your favourite IPTC-enabled APP13 Segment, you may want to have a look at the records it contains. Use the get_app13_data method for this: its behaviour is controlled by the $type and $what argument (here, $what is 'IPTC_1' or 'IPTC_2' alias 'IPTC', of course). It returns a reference to a hash containing a copy of the list of the appropriate IPTC records, if present, undef otherwise: each element of the hash is a pair (key, arrayref), where arrayref points to an array with the real values (some IPTC records are repeatable so multiple values are possible). The record keys can be the native numeric keys ($type eq 'NUMERIC') or translated textual keys ($type eq 'TEXTUAL', default); in any case, the record values are untranslated. If a numeric key stored in the JPEG file is unknown, and a textual translation is requested, the name of the key becomes "Unknown_tag_$tag". Note that there is no check on the validity of IPTC records' values: their format is not checked and one or multiple values can be attached to a single tag independently of its repeatability. This is, in some sense, consistent with the fact that also "unknown" tags are included in the output. If $type or $what is invalid, an exception is thrown out. An example of how to extract and display IPTC data is given here:
my $hash_ref = $segment->get_app13_data('TEXTUAL', 'IPTC'); while (my ($key, $vals) = each %$hash_ref) { printf "# %20s =", $key; print " '$_'" for @$vals; print "\n"; } ### This could print: # DateCreated = '19890207' # ByLine = 'Interesting picture' 'really' # Category = 'POL' # Keywords = 'key-1' 'key-2' 'key-99' # OriginatingProgram = 'Mapivi'
- ADD : new records are added and nothing is deleted; however, if you try to add a non-repeatable record which is already present, the newly supplied value ejects (replaces) the pre-existing value. - UPDATE : new records replace those characterised by the same tags, but the others are preserved. This makes it possible to modify some repeatable IPTC records without deleting the other tags. - REPLACE : all records present in the IPTC subdirectory are deleted before inserting the new ones (this is the default action).
If, after implementing the changes required by $action, any mandatory dataset (according to the IPTC standard), is still undefined, it is added automatically. This often concerns version datasets, with numeric index 0.
The return value is a reference to a hash containing the rejected key-values entries. The entries of %$data are not modified. An entry in the %$data hash can be rejected for various reasons (you might want to have a look at appendix about valid IPTC tags for further information): a) the tag is undefined or not known; b) the entry value is undefined or points to an empty array; c) the non-repeatability constraint is violated; d) the tag is marked as invalid; e) a value is undefined f) the length of a value is invalid; g) a value does not match its mandatory regular expression.
$segment->set_app13_data($additional_data, 'ADD', 'IPTC');
A snippet of code for changing IPTC data looks like this:
my $segment = $file->provide_app13_segment('IPTC'); my $hashref_1 = { CodedCharacterSet => "\033\045G" }; # UTF-8 my $hashref_2 = { ObjectName => 'prova', ByLine => 'ciao', Keywords => [ 'donald', 'duck' ], SupplementalCategory => ['arte', 'scienza', 'diporto'] }; $segment->set_app13_data($hashref_2, 'REPLACE', 'IPTC'); $segment->provide_app13_subdir('IPTC_1'); $segment->set_app13_data($hashref_1, 'ADD', 'IPTC_1');
my $hashref = $file->get_app13_data('TEXTUAL', 'IPTC'); while (my ($tag, $val_arrayref) = each %$hashref) { printf '%25s --> ', $tag; print "$_ " for @$val_arrayref; print "\n"; }
$file->set_app13_data($hashref, 'UPDATE', 'IPTC');
How to inspect and modify your Photoshop data
The procedure of inspecting and modifying Photoshop data (i.e., non-IPTC data in a Photoshop-style APP13 segment) is analogous to that for IPTC data, but with $what set to 'PHOTOSHOP' (alias 'PS_8BIM'), or to the seldom used 'PS_8BPS' and 'PS_PHUT'. The whole description will not be repeated here, have a look at the IPTC section for it: this section takes only care to point out differences. If you are not acquainted with the structure of an APP13 segment and its terminology (e.g., "resource data block"), have a look at the Photoshop-style tags' section.
About get_app13_data, it should only be pointed out that resource block names are appended to the list of values for each tag (even if they are undefined), so the list length is alway even. Things are more complicated for set_app13_data: non-IPTC Photoshop specifications are less uniform than IPTC ones, and checking the correctness of user supplied data would be an enumerative task. Currently, this module does not perform any syntax check on non-IPTC data, but this could change in the future (any contribution is welcome); only tags (or, how they are called in this case, "resource block identifiers") are checked for being in the allowed tags list (see the Photoshop-style tags' table for details). The IPTC/NAA tag is of course rejected: IPTC data must be inserted with $what set to 'IPTC' or its siblings.
Although not explicitly stated, it seems that non-IPTC Photoshop tags are non-repeatable (let me know if not so), so two resource blocks with the same tag shouldn't exist. For this reason, the 'UPDATE' action is changed internally to 'ADD'. Moreover, since the resource block structure is not explored, all resource blocks are treated as single-valued and the value type is $UNDEF. So, in the user-supplied data hash, if a tag key returns a data array reference, only the first element (which cannot be undefined) of the array is used as resource block value: if a second element is present, it is used as resource block name (which is otherwise set to the null string). Suppling more than two elements is an error and causes the record to be rejected.
my $segment = $file->provide_app13_segment('PHOTOSHOP'); my $hashref = { GlobalAngle => pack('N', 0x1e), GlobalAltitude => pack('N', 0x1e), CopyrightFlag => "\001", IDsBaseValue => [ pack('N', 1), 'Layer ID Generator Base' ] }; $segment->set_app13_data($hashref, 'ADD', 'PHOTOSHOP');
There are currently eight fields whose purpose is to store a date in a JPEG picture, namely 'DateTime', 'DateTimeOriginal' and 'DateTimeDigitized' (in IFD0/1 or SubIFD), 'GPSDateStamp' (in the GPS section), and 'ReleaseDate', 'ExpirationDate', 'DateCreated' and 'DigitalCreationDate' (in the IPTC section). Most of these dates refer to some electronic treatment of images, a kind of process which was not available before the late twentieth century. Two of them refer to release and expiration dates in the IPTC standard, and should therefore not be set to a date before the introduction of the standard itself. However, there exist users who want to use some of these fields in a non-conventional way to refer to dates when analog photography but not digital photography was available. For this reason, all tags (but one) can be written with a year starting from 1800 (and not from 1900 as in earlier releases). Users are however advised to check the "specifications" for these tags before setting the date and take responsibility for their non-conventionality.
There is one notable exception to the previous considerations, that is the IPTC 'DateCreated' dataset, which should explicitly refer to the creation date of the object represented in the picture, which can be many centuries in the past. For this dataset a special regular expression is provided which allows a date in the full ISO-8601 YYYY-MM-DD format (however, it should be noted that even ISO-8601 does not allow a date before 0AD, so not all masterworks from ancient Greece can be tagged in this way ... let me know if I am wrong). I am, of course, still open to suggestions and reconsiderations on this subject.
A widespread problem with Exif maker notes is that there is no common standard for how to parse and rewrite the information in the MakerNote data area. This is the reason why most programs dealing with Exif JPEG files corrupt the MakerNote on saving, or decide to drop it altogether (be aware that there existed programs known to hang when they try to read a corrupt maker note).
In fact, many maker notes contain a non-standard IFD structure, with some tags storing file offsets (see the documentation page describing the IFD structure). Therefore, saving a maker note without regard for internal offsets' adjustment reduces the note mostly to garbage. Re-dumping a maker note after changing the Exif APP1 segment endianness incurs the same problem, because no internal byte-swap is performed.
A few countermeasures have been introduced in this package to try to cure some maker note problems. The first one concerns the correct byte order (the endianness, which is not always the same used in the Exif segment), which needs not to be known in advance; it is in fact determined by using the fact that, if the note is IFD-like (even non-standard), the number of tags is always in the range [1,255], so the two-bytes tag count has always the most significant byte set to zero, and the least significant byte set to non-zero.
There is also a prediction and correction mechanism for the offsets in the interoperability arrays, based on the simple assumption that the absolute value of offsets can be wrong, but their differences are always right, so, if one can get the first one right ... a good bet is the address of the byte immediately following the next_IFD link (or the tag list, if this link is absent). If the parsing process does not end successfully, this mechanism is enabled and its "corrected" findings are stored instead of the original ones if it is able to cure the problems (i.e., if the second try at parsing the note is successful).
A lot of other routines for modifying other meta-data could be added in the future. The following is a list of the current status of various meta-data Segments (only APP and COM Segments).
Segment Possible content Status * COM User comments parse/read/write * APP0 JFIF data (+ thumbnail) parse/read * APP1 Exif or XMP data parse/read[Exif]/write[Exif] * APP1 Maker notes parse/read * APP2 FPXR data or ICC profiles parse * APP3 additional Exif-like data parse * APP4 HPSC nothing * APP12 PreExif ASCII meta parse * APP13 IPTC and PhotoShop data parse/read/write * APP14 Adobe tags parse
USE WITH CAUTION! THIS IS EXPERIMENTAL SOFTWARE!
This module is still experimental, and not yet finished. In particular, it is far from being well tested, and some interfaces could change depending on user feedback. The ability to modify maker notes is not yet implemented (moreover, have a look at the MakerNote appendix for a general note on the problem of MakerNote corruption). APP13 data spanning multiple Segments are not correctly read/written. Most of APP12 Segments do not fit the structure parsed by parse_app12(), probably there is some standard I don't know.
Other packages are available in the free software arena, with a feature set showing a large overlap with that found in this package; a probably incomplete list follows. However, none of them is (or was) completely satisfactory with respect to the package's objectives, which are: being a single package dealing with all types of meta-information in read/write mode in a JPEG (and possibly TIFF) file; depending on the least possible number of non standard packages and/or external programs or libraries; being open-source and written in Perl. Of course, most of these objectives are far from being reached ....
Stefano Bettelli, bettelli@cpan.org
Copyright (C) 2004,2005,2006 by Stefano Bettelli
This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License. See the COPYING and LICENSE file for the license terms.
Have a look at the technical appendixes of the Image::MetaData::JPEG module [M in the following], packaged as separate documents: they contain a description of segment structures [M::Structures], and lists of valid tags [M::TagLists], including a tentative description of some MakerNote formats [M::MakerNotes]. See also your current perl(1) documentation, an explanation for the General Public License and the manual pages of the following optional Perl modules: Image::ExifTool(3pm), Image::IPTCInfo(3pm), JPEG::JFIF(3pm), Image::EXIF(3pm) and Image::Info(3pm).
2022-03-07 | perl v5.34.0 |