News::Scan - gather and report Usenet newsgroup statistics
use News::Scan;
my $scan = News::Scan->new;
This module provides a class whose objects can be used to gather
and report Usenet newsgroup statistics.
- new ( [ OPTIONS ] )
- "OPTIONS" is a list of named parameters
(i.e. given in key-value pairs). Valid options are
- Group
- The value of this option is the name of the newsgroup you wish to
scan.
- From
- The value of this option should be either 'spool'
or 'NNTP' (case is not significant). Any other
value will produce an error (see the
"error" method description below). A
value of 'spool' indicates that you would like to
scan articles in a spool (see the Spool option below). A value of
'NNTP' indicates that articles should be retrieved
from your NNTP server (see the NNTPServer option below).
- Spool
- The value of this option should be the path to the spool directory that
contains the articles you would like to scan. This option is ignored
unless the value of From is 'spool'.
- NNTPServer
- The value of this option (in the form server:port, with both
being optional--see Net::NNTP for the semantics of omitting one or both of
these parameters) indicates the NNTP server from which to retrieve
articles. This option is ignored unless From is
'NNTP'. See the description of the
NNTPAuthLogin and NNTPAuthPasswd options below.
- NNTPAuthLogin
- The value of this option should be a valid NNTP authentication login for
your NNTP server. This option is only necessary if your NNTP server
requires authentication.
- NNTPAuthPasswd
- The value of this option should be the password corresponding to the login
in NNTPAuthLogin. Having this hardcoded in a script is evil, and
there should be a much better way.
- Period
- The value of this option indicates the length of the period (in days)
immediately prior to invocation of the program from which you would like
to scan articles. The default period is seven (7) days.
- QuoteRE
- The value of this option is a Perl regular expression that accepts quoted
lines and rejects unquoted or original lines. The default regular
expression is
"^\s{0,3}(?:"|:|\S+>|\+\+)>.
- Exclude
- The value of this option should be a reference to an array containing
regular expressions that accept email addresses of posters whose articles
you wish to ignore.
- Aliases
- The value of this option should be a reference to a hash whose keys are
email addresses that should be transformed into the email addresses that
are their corresponding values, i.e. "alias
=" 'real@address'>.
- configure ( [
OPTIONS ] )
- "OPTIONS" is a list of named parameters
identical to those accepted by "new".
Re-"configure"-ing an object after
scanning is probably a bad idea. This method returns
"undef" if it encounters an error.
The following methods are the actual underlying methods used to
set and retrieve the configuration options of the same name (modulo
case):
- name ( [ NEWSGROUP-NAME ]
)
- spool ( [ SPOOL-DIRECTORY ]
)
- period ( [ INTERVAL-LENGTH
] )
- aliases ( [
ALIASES-HASHREF ] )
- from ( 'NNTP' | 'spool'
)
- quote_re ( [
QUOTE-REGEX-ARRAYREF ] )
- exclude ( [
EXCLUSION-REGEX-ARRAYREF ] )
- nntp_server ( [
[ NNTP-SERVER ]:[ NNTP-PORT ] ] )
- nntp_auth_login
( [ LOGIN ] )
- nntp_auth_passwd
( [ PASSWORD ] )
These methods can be used to retrieve information from the
"News::Scan" object or ask it to perform
some action.
- error ( [ MESSAGE ]
)
- Use this method to determine whether an object has encountered an error
condition. The return value of "error"
is guaranteed to be 0 after any method completes
successfully (except "error"). (Keep in
mind that this will also overwrite any previous error message.) If there
has been an error, this method should return some useful message.
If provided, "MESSAGE" sets
the object's error message.
- articles
- Returns the number of articles accounted for.
- volume
- Returns the volume of traffic (in bytes) to the newsgroup in the
period.
- Returns the volume (in bytes) generated by headers.
- Returns the number of lines consumed by headers.
- body_volume
- Returns the volume (in bytes) generated by message bodies.
- body_lines
- Returns the number of lines consumed by message bodies.
- orig_volume
- Returns the volume (in bytes) of text which has been determined to be
original (see QuoteRE). Note that original traffic is a subset of
body traffic.
- orig_lines
- Returns the number of lines that are determined to be original.
- signatures
- Returns the number of messages that had a cutline (/^-- $/).
- sig_volume
- Returns the volume (in bytes) generated by signatures.
- sig_lines
- Returns the number of lines consumed by signatures.
- earliest ( [ TIME ]
)
- Use this method to determine the date (in seconds since the Epoch) that
the oldest article found within the period was posted to Usenet.
If "TIME" is given, it is
treated as a candidate for the earliest article. If
"TIME" is successful (i.e. is less
than the previous earliest), this method returns
1, else 0.
- latest ( [ TIME ]
)
- Use this method to determine the date (in seconds since the Epoch) that
the youngest article found within the period was posted to Usenet.
If "TIME" is given, it is
treated as a candidate for the latest article. If
"TIME" is successful (i.e. is greater
than the previous latest), this method returns
1, else 0.
- excludes
- Returns the list of regular expressions used to determine whether an
article from a given email address should be ignored.
- posters
- Returns a reference to a hash whose keys are email addresses and whose
values are "News::Scan::Poster" objects
corresponding to those email addresses. See News::Scan::Poster.
- threads
- Returns a reference to a hash whose keys are subjects and whose values are
"News::Scan::Thread" objects
corresponding to those subjects. See News::Scan::Thread.
- crossposts
- Returns a reference to a hash whose keys are newsgroup names and whose
values are the number of times the corresponding groups have been
crossposted to.
- collect
- Use this method to mirror the articles from the specified NNTP server to
the specified spool. Please be kind to the NNTP server.
- scan
- Instruct the object to gather information about the newsgroup.
See the eg/ directory in the News-Scan distribution,
available from the CPAN--http://www.perl.com/CPAN/.
perlre, News::Scan::Poster, News::Scan::Thread,
News::Scan::Article, Net::NNTP
Greg Bacon <gbacon@cs.uah.edu>
Copyright (c) 1997 Greg Bacon. All Rights Reserved. This library
is free software. You may distribute and/or modify it under the same terms
as Perl itself.