Biblio::COUNTER - COUNTER Codes of Practice report processing
# --- Process a report
# (1) Using Biblio::COUNTER
$report = Biblio::COUNTER->report(\*STDIN)->process;
$report = Biblio::COUNTER->report($file)->process;
# (2) Using Biblio::COUNTER::Processor or a subclass
$processor = Biblio::COUNTER::Processor::Default->new;\
$report = $processor->run(\*STDIN);
$report = $processor->run($file);
$report = $processor->run(Biblio::COUNTER->new(\*STDIN));
# --- Access information in a processed report
warn "Invalid report" unless $report->is_valid;
# Access data in the report
$name = $report->name;
# e.g., "Database Report 2 (R2)"
$descrip = $report->description;
# e.g., "Turnaways by Month and Database"
$date_run = $report->date_run;
# e.g., "2008-04-11"
$criteria = $report->criteria;
$publisher = $report->publisher;
$platform = $report->platform;
$periods = $report->periods;
# e.g., [ "2008-01", "2008-02", "2008-03" ]
foreach $rec ($report->records) {
$title = $rec->{title};
$publisher = $rec->{publisher};
$platform = $rec->{platform};
$count = $rec->{count};
foreach $period (@periods) {
$period_count = $count->{$period};
while (($metric, $n) = each %$period_count) {
# e.g., ("turnaways", 3)
}
}
}
Because the COUNTER Codes of Practice are so poorly written and
documented, with incomplete specifications and inconsistent terminology, it
has been necessary to make certain assumptions and normalizations in the
code and documentation of this module.
First, all reports must be in plain text, tab- or comma-delimited
format; Excel spreadsheets are not allowed. (To convert an Excel spreadsheet
to tab-delimited text, consider using Spreadsheet::ParseExcel::Simple.
(XML formats may be handled in a future version of this
module.)
Some terminology notes are in order:
- name
- The name of a report fully denotes the report's type and the
version of the COUNTER Codes of Practice that defines it. For example,
"Journal Report 1 (R2)". COUNTER
sometimes refers to this as the report title.
- description
- This is the phrase, also defined by the COUNTER Codes of Practice, that
describes the contents of a report. For example, Journal Report 1
is described as "Number of Successful Full-Text
Article Requests by Month and
Journal".
- code
- This is the term I use for the short name that identifies the type, but
not the version, of a COUNTER report. For example,
"JR1" is the code for Journal Report
1 reports.
- metric
- A metric is a particular measure of usage (or non-usage), including the
number of searches or sessions in a database or the number of full-text
articles in a journal downloaded successfully.
- report($how,
[%args])
- Create a new report instance.
Set $how to a
glob ref (e.g., "\*STDIN") to specify
a filehandle from which an existing report will be read.
Specify a report name in
$how (e.g.,
"Database Report 2 (R2)") to
instantiate a new, empty report. (Report generation is not yet
implemented.)
%args may contain
any of the following:
- treat_blank_counts_as_zero
- If set to a true value, a blank cell where a count was expected is treated
as if it contained a zero; otherwise, blank counts are silently ignored
(the default).
- change_not_available_to_blank
- If set to a true value (the default), the value
"n/a") in a count field will be changed
to the empty string. It will never be treated as if it were zero,
regardless of the treat_blank_counts_as_zero setting.
- callback
- Get or set a reference to a hash of
($event,
$code) pairs specifying what
to do for each of the events described under CALLBACKS.
Each $code must
be a coderef, not the name of a function or method.
If an event is not specified in this hash, then the default
action for the event will be taken.
- name
- Get or set the report's name: this is the official name defined by the
COUNTER codes of practice. See "REPORTS SUPPORTED" for a
complete list of the reports supported by this version of
Biblio::COUNTER.
- code
- Get or set the report's code: this is the short string (e.g.,
"JR1") that identifies the type, but
not the version, of a COUNTER report.
- description
- Get or set the report's description: this is the official description
defined by the COUNTER codes of practice. For example, the
"Journal Report 1
(R2)" report has the description "Number
of Successful Full-Text Article Requests by Month
and Journal".
- date_run
- Get or set the date on which the report was run. The date, if valid, is in
the ISO8601 standard form
"YYYY-MM-DD".
- criteria
- Get or set the report criteria. This is a free text field.
- periods
- Get or set the periods for which the report contains counts. To simplify
things, periods are returned (and must be set) in the ISO 8601 standard
form "YYYY-MM".
- publisher
- Get or set the publisher common to all of the resources in the
report.
- platform
- Get or set the platform common to all of the resources in the report.
While processing a report, a number of different events
occur. For example, a fixed event occurs when a field whose value is
invalid is corrected. For event different kind of event, a callback
may be specified that is triggered each time the event occurs; see the
report method for how to specify a callback.
Callbacks must be coderefs, not function or method names.
For example, the following callbacks may be used to provide an
indication of the progress in processing it:
$record_number = 0;
%callbacks = (
'begin_report' => sub {
print STDERR "Beginning report: ";
},
'end_header' => sub {
my ($report, $header) = @_;
print STDERR $report->name, "\n";
}
'end_record' => sub {
my ($report, $record) = @_;
++$record_number;
print STDERR "$record_number "
if $record_number % 20 == 0;
print STDERR "\n"
if $record_number % 200 == 0;
},
'end_report' => sub {
my ($report) = @_;
if ($report->is_valid) {
print STDERR "OK\n";
}
else {
print STDERR "INVALID\n";
}
},
);
By default, the only callback defined is for output; it
prints each line of input (corrected, if there were correctable problems) to
standard output. (Spurious blank lines are not printed.)
Events fall into four broad categories: structure,
validation, tracing, and data.
Logically, a COUNTER report has the following structure:
report
header
body
record
record
...
- begin_file($report,
$file)
- Parsing of the given file is beginning. This is always the first event
triggered. At the time this callback is invoked, the report has not yet
been identified.
- end_file($report,
$file)
- Parsing of the given file has ended. This is always the last event
triggered.
- begin_report($report)
- Processing of the report is beginning. At the time this callback is
invoked, the report has not yet been identified.
- end_report($report)
- Processing of the report has ended. This is always the last event
triggered.
- Processing of the report's header is beginning. The header is everything
before the first data row.
$header is a
reference to an empty hash; the callback code may, if it wishes, put
something into this hash.
- Processing of the report's header is complete.
$header is a
reference to the same hash referenced in the begin_header
callback, but which now contains one or more of the elements listed
below. (These elements are described under METHODS above):
- date_run
- The date on which the report was run, in the ISO8601 standard form
"YYYY-MM-DD".
- criteria
- The report criteria.
- description
- The report's description (e.g., "Turnaways by Month
and Database").
- periods
- The periods for which the report contains counts, in the ISO 8601 standard
form "YYYY-MM".
- publisher
- The publisher common to all of the resources in the report.
- platform
- The platform common to all of the resources in the report.
- begin_body($report)
- Processing of the report's body is beginning. The body is the part of the
report that contains data rows.
- end_body($report)
- Processing of the report's body is complete.
- begin_record($report,
$record)
- Processing of a new record is beginning. (In some COUNTER reports, a
record occupies more than one row.)
$record is a
reference to a hash, which is empty at the time the event is
triggered.
- end_record($report,
$record)
- $record is a reference to a
hash that contains the data found in the record (title, publisher, counts,
etc.). Fields that are invalid and uncorrectable will not be
represented in the hash -- e.g., if a title is blank then there will be no
title element in the hash.
Each of these events is triggered when a cell (or, in the case of
skip_blank_row, a row) is validated.
The cell's row and column (e.g.,
"D7") may be retrieved by calling
"$report->current_position".
Note that a single cell may trigger more than one validation event
-- e.g., a cell may be trimmed and then deleted -- and there is no guarantee
that these events will occur in any particular order.
- ok($report,
$field_name,
$value)
- A cell's value is valid.
- trimmed($report,
$field_name,
$value)
- Whitespace has been trimmed from the beginning and/or end of a cell.
- fixed($report,
$field_name,
$old_value,
$new_value)
- A cell's value was invalid but has been corrected.
- cant_fix($report,
$field_name,
$value,
$expected)
- A cell's value is invalid and cannot be corrected. The expected value may
be an exact string (e.g., "EBSCOhost")
or merely a general hint (e.g.,
"<issn>").
- deleted($report,
$value)
- A spurious cell has been deleted. (A this time, this only occurs for blank
cells at the end of a row.)
- skip_blank_row($report)
- A blank row that doesn't belong here has been skipped.
- input($line)
- A line of input has been read.
- line($line_number)
- The report processor has moved to the next line of input.
- output($line)
- A new line of output is ready. The default is to print the line to
standard output. Both valid and invalid lines, including invalid lines
that could not be corrected as well as those that could be corrected,
trigger an output event. Blank lines that have been skipped do not.
- count($report,
$scope,
$metric,
$period,
$value)
- A valid count has been identified within the report.
$scope is either
"report" (for summary counts that
appear at the top of the report) or
"record" (for counts that occur within
the body of the report).
$metric is the
type of event being counted, and is always one of the following:
- "requests"
- "searches"
- "sessions"
- "turnaways"
$period is a year and
month, in the ISO8601 form "YYYY-MM".
$value is the number
of requests (or searches, or whatever).
count events are not triggered for blank counts
unless the treat_blank_counts_as_zero option was set to a true value
when the report was instantiated.
- count_ytd($report,
$scope,
$metric,
$value) =item
count_ytd_html($report,
$scope,
$metric,
$value) =item
count_ytd_pdf($report,
$scope,
$metric,
$value)
- A valid YTD count has been identified.
$scope is either
"report" (for summary counts that
appear at the top of the report) or
"record" (for counts that occur within
the body of the report).
Biblio::COUNTER implements processing of text-format (comma- or
tab-delimited) COUNTER reports only. XML formats are not supported at this
time.
The following is a list of COUNTER reports, with full name and
description, that are supported by this version of Biblio::COUNTER:
- "Journal Report 1 (R2)"
- Number of Successful Full-Text Article Requests by Month and Journal
- "Journal Report 1a (R2)"
- Number of Successful Full-Text Article Requests from an Archive by Month
and Journal
- "Journal Report 2 (R2)"
- Turnaways by Month and Journal
- "Database Report 1 (R2)"
- Total Searches and Sessions by Month and Database
- "Database Report 2 (R2)"
- Turnaways by Month and Database
- "Database Report 3 (R2)"
- Total Searches and Sessions by Month and Service
Other reports, including Release 3 reports, will be supported in
the future.
<http://www.projectcounter.org/>
Paul Hoffman (nkuitse AT cpan DOT org).
Copyright 2008 Paul M. Hoffman.
This is free software, and is made available under the same terms
as Perl itself.