pullnews - Pull news from multiple news servers and feed it to
another
pullnews [-BhnOqRx] [-a hashfeed]
[-b fraction] [-c config] [-C
width] [-d level] [-f fraction]
[-F fakehop] [-g groups] [-G
newsgroups] [-H headers] [-k checkpt]
[-l logfile] [-L size] [-m
header_pats] [-M num] [-N timeout]
[-p port] [-P hop_limit] [-Q
level] [-r file] [-s
to-server[:port][_tlsmode]] [-S max-run]
[-t retries] [-T connect-pause] [-w
num] [-z article-pause] [-Z group-pause]
[from-server ...]
The "Net::NNTP" module must be
installed. This module is available as part of the libnet distribution and
comes with recent versions of Perl. For older versions of Perl, you can
download it from <http://www.cpan.org/>.
pullnews reads a config file named pullnews.marks,
and connects to the upstream servers given there as a reader client. This
file is looked for in pathdb when pullnews is run as the user
set in runasuser in inn.conf (which is by default the
"news" user); otherwise, this file is
looked for in the running user's home directory.
By default, pullnews connects to all servers listed in the
configuration file, but you can limit pullnews to specific servers by
listing them on the command line: a whitespace-separated list of server
names can be specified, like from-server for one of them. For each
server it connects to, it pulls over articles and feeds them to the
destination server via the IHAVE or POST commands. This means that the
system pullnews is run on must have feeding access to the destination
news server.
pullnews is designed for very small sites that do not want
to bother setting up traditional peering and is not meant for handling large
feeds.
- -a hashfeed
- This option is a deterministic way to control the flow of articles and to
split a feed. The hashfeed parameter must be in the form
"value/mod" or
"start-end/mod". The Message-ID of each
article is hashed using MD5, which results in a 128-bit hash. The lowest
32 bits are then taken by default as the hashfeed value (which is
an integer). If the hashfeed value modulus
"mod" plus one equals
"value" or is between
"start" and
"end", pullnews will feed the
article. All these numbers must be integers.
For instance:
pullnews -a 1/2 Feeds about 50% of all articles.
pullnews -a 2/2 Feeds the other 50% of all articles.
Another example:
pullnews -a 1-3/10 Feeds about 30% of all articles.
pullnews -a 4-5/10 Feeds about 20% of all articles.
pullnews -a 6-10/10 Feeds about 50% of all articles.
You can use an extended syntax of the form
"value/mod:offset" or
"start-end/mod:offset" (using an
underscore "_" instead of a colon
":" is also recognized). As MD5
generates a 128-bit return value, it is possible to specify from which
byte-offset the 32-bit integer used by hashfeed starts. The default
value for "offset" is
":0" and thirteen overlapping values
from ":0" to
":12" can be used. Only up to four
totally independent values exist:
":0",
":4",
":8" and
":12".
Therefore, it allows generating a second level of
deterministic distribution. Indeed, if pullnews feeds
"1/2", it can go on splitting thanks
to "1-3/9:4" for instance. Up to four
levels of deterministic distribution can be used.
The algorithm is compatible with the one used by
Diablo 5.1 and up.
- -b fraction
- Backtrack on server numbering reset. Specify the proportion
(0.0 to 1.0) of a group's
articles to pull when the server's article number is less than our high
for that group. When fraction is 1.0, pull
all the articles on a renumbered server. The default is to do
nothing.
- -B
- Feed is header-only, that is to say pullnews only feeds the headers
of the articles, plus one blank line. It adds the Bytes header field if
the article does not already have one, and keeps the body only if the
article is a control article.
- -c config
- Normally, the config file is stored in pullnews.marks in
pathdb when pullnews is run as the news user, or otherwise
in the running user's home directory. If -c is given, config
will be used as the config file instead. This is useful if you're running
pullnews as a system user on an automated basis out of cron or as
an individual user, rather than the news user.
See "CONFIG FILE" below for the format of this
file.
- -C width
- Use width characters per line for the progress table. The default
value is 50.
- -d level
- Set the debugging level to the integer level (up to
4); more debugging output will be logged as this
increases. The default value is 0.
- -f fraction
- This changes the proportion of articles to get from each group to
fraction and should be in the range 0.0 to
1.0 (1.0 being the
default).
- -F fakehop
- Prepend fakehop as a host to the Path header field body of articles
fed.
- -g groups
- Specify a collection of groups to get. groups is a list of
newsgroups separated by commas (only commas, no spaces). Each group must
be defined in the config file, and only the remote hosts that carry those
groups will be contacted. Note that this is a simple list of groups, not a
wildmat expression, and wildcards are not supported.
- -G newsgroups
- Add the comma-separated list of groups newsgroups to each server in
the configuration file (see also -g and -w).
- -h
- Print a usage message and exit.
- -H headers
- Remove these named header fields (colon-separated list) from fed
articles.
- -k checkpt
- Checkpoint (save) the config file every checkpt articles (default
is 0, that is to say at the end of the
session).
- -l logfile
- Log progress/stats to logfile (default is
"stdout").
- -L size
- Specify the largest wanted article size in bytes. The default is to
download all articles, whatever their size. When this option is used,
pullnews will first retrieve overview data (if available) of each
newsgroup to process so as to obtain articles sizes, before deciding which
articles to actually download.
- -m
header_pats
- Feed an article based on header field body matching. The argument is a
number of whitespace-separated tuples (each tuple being a colon-separated
header field name and regular expression). For instance:
-m "Hdr1:regexp1 !Hdr2:regexp2 #Hdr3:regexp3 !#Hdr4:regexp4"
specifies that the article will be passed only if the
"Hdr1" header field body matches
"regexp1" and the
"Hdr2" header field body does not
match "regexp2". Besides, if the
"Hdr3" header field body matches
"regexp3", that header is removed; and
if the "Hdr4" header field body does
not match "regexp4", that header is
removed.
- -M num
- Specify the maximum number of articles (per group) to process. The default
is to process all new articles. See also -f.
- -n
- Do nothing but read articles -- does not feed articles downstream,
writes no rnews file, does not update the config file.
- -N timeout
- Specify the timeout length, as timeout seconds, when establishing
an NNTP connection.
- -O
- Use an optimized mode: pullnews checks whether the article already
exists on the downstream server, before downloading it. It may help for
huge articles or a slow link to upstream hosts.
- -p port
- Connect to the destination news server on a port other than the default of
119. This option does not change the port used to
connect to the source news servers.
- -P hop_limit
- Restrict feeding an article based on the number of hops it has already
made. Count the hops in the Path header field body (hop_count),
feeding the article only when hop_limit is
"+num" and hop_count is more than
num; or hop_limit is
"-num" and hop_count is less than
num.
- -q
- Print out less status information while running.
- -Q level
- Set the quietness level ("-Q 2" is
equivalent to "-q"). The higher this
value, the less gets logged. The default is
0.
- -r file
- Rather than feeding the downloaded articles to a destination server,
instead create a batch file that can later be fed to a server using
rnews. See rnews(1) for more information about the batch
file format.
- -R
- Be a reader (use MODE READER and POST commands) to the downstream server.
Some posts will then be rejected because of unexpected injection header
fields, obsolete or incorrectly formatted header fields, or with a date
too far in the past. You may then want to set artcutoff to
0 in inn.conf, and use the -H flag
to strip unwanted header fields. Even with that, a few articles may still
be rejected.
The default is to behave like a feeder and use the IHAVE
command. (You'll have to allow in incoming.conf the connections
from pullnews so that it is recognized as a feeder.)
- -s
to-server[:port][_tlsmode]
- Normally, pullnews will feed the articles it retrieves to the news
server running on localhost. To connect to a different host, specify a
server with the -s flag. You can also specify the port with this
same flag or use -p. Default port is 119.
The connection is by default unencrypted. To negotiate a TLS
encryption layer, you can set tlsmode to
"TLS" for implicit TLS (negotiated
immediately upon connection on a dedicated port) or
"STARTTLS" for explicit TLS (the
appropriate command will be sent before authenticating or feeding
messages). Examples of use are:
pullnews -s news.server.com
pullnews -s news.server.com_STARTTLS
pullnews -s news.server.com:433_TLS
Note that not all NNTP servers implement TLS for feeding
articles.
- -S max-run
- Specify the maximum time max-run in seconds for pullnews to
run.
- -t retries
- The maximum number (retries) of attempts to connect to a server or
reconnect to a server if the socket is unexpectedly closed (see also
-T). The default is 0.
- -T
connect-pause
- Pause connect-pause seconds between connection retries (see also
-t). The default is 1.
- -w num
- Set each group's high water mark (last received article number) to
num. If num is negative, calculate Current+num
instead (i.e. get the last num articles). Therefore, a num
of 0 will re-get all articles on the server;
whereas a num of "-0" will get no
old articles, setting the water mark to Current (the most recent
article on the server).
- -x
- If the -x flag is used, an Xref header field is added to any
article that lacks one. It can be useful for instance if articles are fed
to a news server which has xrefslave set in inn.conf.
- -z
article-pause
- Sleep article-pause seconds between articles. The default is
0.
- -Z
group-pause
- Sleep group-pause seconds between groups. The default is
0.
The config file for pullnews is divided into blocks, one
block for each remote server to connect to. A block begins with the host
line (which must have no leading whitespace) and contains just the hostname
of the remote server with optional port and TLS mode (with the same
semantics as the -s flag), optionally followed by authentication
details (username and password for that server). Note that authentication
details can also be provided for the downstream server (a host line for
"localhost" or the hostname specified with
the -s flag could be added for it in the configuration file, with no
newsgroup to fetch).
Following the host line should be one or more newsgroup lines
which start with whitespace followed by the name of a newsgroup to retrieve.
Only one newsgroup should be listed on each line.
pullnews will update the config file to include the time
the group was last checked and the highest numbered article successfully
retrieved and transferred to the destination server. It uses this data to
avoid doing duplicate work the next time it runs.
The full syntax is:
<host>[:<port>][_<tlsmode>] [<username> <password>]
<group> [<time> <high>]
<group> [<time> <high>]
where the <host> line must not have leading whitespace and
the <group> lines must.
A typical configuration file would be:
# Format: group date high
data.pa.vix.com
rec.bicycles.racing 908086612 783
rec.humor.funny 908086613 18
comp.programming.threads
nnrp.vix.com pull sekret
comp.std.lisp
news.server.com:563_TLS joe password
news.software.nntp
Note that an earlier run of pullnews has filled in details
about the last article downloads from the two rec.* groups. The two comp.*
groups and the news.* group were just added by the user and have not yet
been checked.
The nnrp.vix.com server requires authentication, and
pullnews will use the username
"pull" and the password
"sekret" (without any encryption
layer).
The connection to news.server.com will be encrypted with implicit
TLS on port 563. Joe's password won't be sent in plaintext.
- pathbin/pullnews
- The Perl script itself used to pull news from upstream servers and feed it
to another news server.
- pathdb/pullnews.marks
or ~/pullnews.marks
- The default config file. It is stored in pullnews.marks in
pathdb when pullnews is run as the news user, or otherwise
in the running user's home directory.
pullnews was written by James Brister for INN. The
documentation was rewritten in POD by Russ Allbery
<eagle@eyrie.org>.
Geraint A. Edwards greatly improved pullnews, adding
no more than 16 new recognized flags, fixing some bugs and
integrating the backupfeed contrib script by Kai Henningsen, adding
again 6 other flags.