Parse::MediaWikiDump::Revisions(3pm) | User Contributed Perl Documentation | Parse::MediaWikiDump::Revisions(3pm) |
Parse::MediaWikiDump::Revisions - Object capable of processing dump files with multiple revisions per article
This object is used to access the metadata associated with a MediaWiki instance and provide an iterative interface for extracting the indidivual article revisions out of the same. To gurantee that there is only a single revision per article use the Parse::MediaWikiDump::Revisions object.
$pmwd = Parse::MediaWikiDump->new; $revisions = $pmwd->revisions('pages-articles.xml'); $revisions = $pmwd->revisions(\*FILEHANDLE); #print the title and id of each article inside the dump file while(defined($page = $revisions->next)) { print "title '", $page->title, "' id ", $page->id, "\n"; }
This software is being RETIRED - MediaWiki::DumpFile is the official successor to Parse::MediaWikiDump and includes a compatibility library called MediaWiki::DumpFile::Compat that is 100% API compatible and is a near perfect standin for this module. It is faster in all instances where it counts and is actively maintained. Any undocumented deviation of MediaWiki::DumpFile::Compat from Parse::MediaWikiDump is considered a bug and will be fixed.
#!/usr/bin/perl use strict; use warnings; use Parse::MediaWikiDump; my $file = shift(@ARGV) or die "must specify a MediaWiki dump of the current pages"; my $title = shift(@ARGV) or die "must specify an article title"; my $pmwd = Parse::MediaWikiDump->new; my $dump = $pmwd->revisions($file); my $found = 0; binmode(STDOUT, ':utf8'); binmode(STDERR, ':utf8'); #this is the only currently known value but there could be more in the future if ($dump->case ne 'first-letter') { die "unable to handle any case setting besides 'first-letter'"; } $title = case_fixer($title); while(my $revision = $dump->next) { if ($revision->title eq $title) { print STDERR "Located text for $title revision ", $revision->revision_id, "\n"; my $text = $revision->text; print $$text; $found = 1; } } print STDERR "Unable to find article text for $title\n" unless $found; exit 1; #removes any case sensativity from the very first letter of the title #but not from the optional namespace name sub case_fixer { my $title = shift; #check for namespace if ($title =~ /^(.+?):(.+)/) { $title = $1 . ':' . ucfirst($2); } else { $title = ucfirst($title); } return $title; }
This class was updated to support version 0.4 dump files from a MediaWiki instance but it does not currently support any of the new information available in those files.
2022-10-14 | perl v5.34.0 |