| MediaWiki::DumpFile::Compat::Revisions(3pm) | User Contributed Perl Documentation | MediaWiki::DumpFile::Compat::Revisions(3pm) |
Parse::MediaWikiDump::Revisions - Object capable of processing dump files with multiple revisions per article
This object is used to access the metadata associated with a MediaWiki instance and provide an iterative interface for extracting the individual article revisions out of the same. To guarantee that there is only a single revision per article use the Parse::MediaWikiDump::Pages object.
use MediaWiki::DumpFile::Compat;
$pmwd = Parse::MediaWikiDump->new;
$revisions = $pmwd->revisions('pages-articles.xml');
$revisions = $pmwd->revisions(\*FILEHANDLE);
#print the title and id of each article inside the dump file
while(defined($page = $revisions->next)) {
print "title '", $page->title, "' id ", $page->id, "\n";
}
#!/usr/bin/perl
use strict;
use warnings;
use MediaWiki::DumpFile::Compat;
my $file = shift(@ARGV) or die "must specify a MediaWiki dump of the current pages";
my $title = shift(@ARGV) or die "must specify an article title";
my $pmwd = Parse::MediaWikiDump->new;
my $dump = $pmwd->revisions($file);
my $found = 0;
binmode(STDOUT, ':utf8');
binmode(STDERR, ':utf8');
#this is the only currently known value but there could be more in the future
if ($dump->case ne 'first-letter') {
die "unable to handle any case setting besides 'first-letter'";
}
$title = case_fixer($title);
while(my $revision = $dump->next) {
if ($revision->title eq $title) {
print STDERR "Located text for $title revision ", $revision->revision_id, "\n";
my $text = $revision->text;
print $$text;
$found = 1;
}
}
print STDERR "Unable to find article text for $title\n" unless $found;
exit 1;
#removes any case sensativity from the very first letter of the title
#but not from the optional namespace name
sub case_fixer {
my $title = shift;
#check for namespace
if ($title =~ /^(.+?):(.+)/) {
$title = $1 . ':' . ucfirst($2);
} else {
$title = ucfirst($title);
}
return $title;
}
This class was updated to support version 0.4 dump files from a MediaWiki instance but it does not currently support any of the new information available in those files.
| 2022-06-15 | perl v5.34.0 |