MediaWiki::DumpFile::Compat::Revisions(3pm) | User Contributed Perl Documentation | MediaWiki::DumpFile::Compat::Revisions(3pm) |
Parse::MediaWikiDump::Revisions - Object capable of processing dump files with multiple revisions per article
This object is used to access the metadata associated with a MediaWiki instance and provide an iterative interface for extracting the individual article revisions out of the same. To guarantee that there is only a single revision per article use the Parse::MediaWikiDump::Pages object.
use MediaWiki::DumpFile::Compat; $pmwd = Parse::MediaWikiDump->new; $revisions = $pmwd->revisions('pages-articles.xml'); $revisions = $pmwd->revisions(\*FILEHANDLE); #print the title and id of each article inside the dump file while(defined($page = $revisions->next)) { print "title '", $page->title, "' id ", $page->id, "\n"; }
#!/usr/bin/perl use strict; use warnings; use MediaWiki::DumpFile::Compat; my $file = shift(@ARGV) or die "must specify a MediaWiki dump of the current pages"; my $title = shift(@ARGV) or die "must specify an article title"; my $pmwd = Parse::MediaWikiDump->new; my $dump = $pmwd->revisions($file); my $found = 0; binmode(STDOUT, ':utf8'); binmode(STDERR, ':utf8'); #this is the only currently known value but there could be more in the future if ($dump->case ne 'first-letter') { die "unable to handle any case setting besides 'first-letter'"; } $title = case_fixer($title); while(my $revision = $dump->next) { if ($revision->title eq $title) { print STDERR "Located text for $title revision ", $revision->revision_id, "\n"; my $text = $revision->text; print $$text; $found = 1; } } print STDERR "Unable to find article text for $title\n" unless $found; exit 1; #removes any case sensativity from the very first letter of the title #but not from the optional namespace name sub case_fixer { my $title = shift; #check for namespace if ($title =~ /^(.+?):(.+)/) { $title = $1 . ':' . ucfirst($2); } else { $title = ucfirst($title); } return $title; }
This class was updated to support version 0.4 dump files from a MediaWiki instance but it does not currently support any of the new information available in those files.
2022-06-15 | perl v5.34.0 |