From: Jeremie Nikaes <jeremie.nikaes@ensimag.imag.fr>
To: git@vger.kernel.org
Cc: "Jeremie Nikaes" <jeremie.nikaes@ensimag.imag.fr>,
"Arnaud Lacurie" <arnaud.lacurie@ensimag.imag.fr>,
"Claire Fousse" <claire.fousse@ensimag.imag.fr>,
"David Amouyal" <david.amouyal@ensimag.imag.fr>,
"Matthieu Moy" <matthieu.moy@grenoble-inp.fr>,
"Sylvain Boulmé" <sylvain.boulme@imag.fr>
Subject: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
Date: Thu, 9 Jun 2011 15:15:59 +0200 [thread overview]
Message-ID: <1307625360-10973-1-git-send-email-jeremie.nikaes@ensimag.imag.fr> (raw)
Implement a gate between git and mediawiki, allowing git users to
push and pull objects from mediawiki just as one would do with a
classic git repository thanks to remote-helpers.
Currently supported commands are :
git clone mediawiki::http://onewiki.com
git pull
You need the following packages installed (available on common
repositories):
libmediawiki-api-perl
libdatetime-format-iso8601-perl
Use remote helpers in order to be as transparent as possible
to the git user.
Download Mediawiki revisions through the Mediawiki API and then
fast-import into git.
Mediawiki revisions and git commits are linked thanks to notes bound to
commits.
The import part is done on a refs/mediawiki/<remote> branch before
coming to refs/remote/origin/master (Huge thanks to Jonathan Nieder
for his help)
For now, the whole wiki is cloned, but it will be possible to clone only
some pages: the clone is based on a list of pages which is now all
pages.
Code clarified & improved with the help of Jeff King and Junio C Hamano.
We were not able to reproduce the empty timestamp bug noticed by Jeff
King, thus needing some further testing. A placeholder is still
implemented just in case. Its value is the value of the last valid
timestamp received + 1
With "use encoding 'utf-8'" non-iso characters are now fully supported
in both file content and filename.
A small helper run_git is also added to execute any git command, helping
to also utf-8 encode results from git commands.
However, utf-8 encoding for filenames could raise problems if different
file systems handle utf-8 filenames differently. A uri_escape of
mediawiki filenames could be imaginable, and is still to be discussed
further.
Partial cloning is supported using the following syntax :
"git clone mediawiki::http://wikiurl##A_Page##Another_Page"
As always, this url is kept in .git/config, helping to always keep
track of these specific pages
Signed-off-by: Jérémie Nikaes <jeremie.nikaes@ensimag.imag.fr>
Signed-off-by: Arnaud Lacurie <arnaud.lacurie@ensimag.imag.fr>
Signed-off-by: Claire Fousse <claire.fousse@ensimag.imag.fr>
Signed-off-by: David Amouyal <david.amouyal@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Signed-off-by: Sylvain Boulmé <sylvain.boulme@imag.fr>
---
-- diff with v2
Added the timestamp placeholder
Utf8 encoding now properly supported
Partial cloning functionality added
contrib/mw-to-git/git-remote-mediawiki | 322 ++++++++++++++++++++++++++++
contrib/mw-to-git/git-remote-mediawiki.txt | 7 +
2 files changed, 329 insertions(+), 0 deletions(-)
create mode 100755 contrib/mw-to-git/git-remote-mediawiki
create mode 100644 contrib/mw-to-git/git-remote-mediawiki.txt
diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
new file mode 100755
index 0000000..176ff09
--- /dev/null
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -0,0 +1,322 @@
+#! /usr/bin/perl
+
+use strict;
+use MediaWiki::API;
+use DateTime::Format::ISO8601;
+use encoding 'utf8';
+use URI::Escape;
+use warnings;
+
+# Mediawiki filenames can contain forward slashes. This variable decides by which pattern they should be replaced
+my $slash_replacement = "%2F";
+
+my $remotename = $ARGV[0];
+# Current syntax to fetch only a set of pages mediawiki::http://mediawikiurl##A_Page##Another_Page
+my @pages_titles = split(/##/,$ARGV[1]);
+my $url = shift (@pages_titles);
+
+
+# commands parser
+my $entry;
+my @cmd;
+while (1) {
+ $| = 1; #flush STDOUT
+ $entry = <STDIN>;
+ chomp($entry);
+ @cmd = split(/ /,$entry);
+ if (defined($cmd[0])) {
+ if ($cmd[0] eq "capabilities") {
+ last unless (!defined($cmd[1]));
+ mw_capabilities();
+ } elsif ($cmd[0] eq "list") {
+ last unless (!defined($cmd[2]));
+ mw_list($cmd[1]);
+ } elsif ($cmd[0] eq "import") {
+ last unless ($cmd[1] ne "" && !defined($cmd[2]));
+ mw_import($cmd[1]);
+ } elsif ($cmd[0] eq "option") {
+ last unless ($cmd[1] ne "" && $cmd[2] ne "" && !defined($cmd[3]));
+ mw_option($cmd[1],$cmd[2]);
+ } elsif ($cmd[0] eq "push") {
+ # Check the pattern <src>:<dst>
+ my @pushargs = split(/:/,$cmd[1]);
+ last unless ($pushargs[1] ne "" && !defined($pushargs[2]));
+ mw_push($pushargs[0],$pushargs[1]);
+ } else {
+ print STDERR "Unknown capability. Aborting...\n";
+ last;
+ }
+ } else {
+ # End of input
+ last;
+ }
+
+}
+
+########################## Functions ##############################
+sub get_pages{
+ my $mediawiki = MediaWiki::API->new;
+ $mediawiki->{config}->{api_url} = "$url/api.php";
+
+ my $pages;
+ if (!@pages_titles){
+ $pages = $mediawiki->list({
+ action => 'query',
+ list => 'allpages',
+ aplimit => 500,
+ });
+ if (!defined($pages)) {
+ print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
+ print STDERR "fatal: make sure '$url/api.php' is a valid page\n";
+ exit 1;
+ }
+ return @$pages;
+ } else {
+ #the list of titles should follow the pattern 'page1|page2|...'
+ my $titles = "";
+ foreach my $title (@pages_titles){
+ $titles.="$title|";
+ }
+ #supress the last | that is add in the foreach
+ chop($titles);
+
+ $pages = $mediawiki->api({
+ action => 'query',
+ titles => $titles,
+ });
+ if (!defined($pages)) {
+ print STDERR "fatal: None of the pages exist \n";
+ exit 1;
+ }
+ return values (%{$pages->{query}->{pages}});
+ }
+}
+
+sub run_git {
+ open(my $git, "-|:encoding(UTF-8)", "git " . $_[0]);
+ my $res = do { local $/; <$git> };
+ close($git);
+
+ return $res;
+}
+
+
+sub get_last_local_revision {
+ # Get note regarding last mediawiki revision
+ my $note = run_git("notes --ref=mediawiki show refs/mediawiki/$remotename/master 2>/dev/null");
+ my @note_info = split(/ /, $note);
+
+ my $lastrevision_number;
+ if (!(defined($note_info[0]) && $note_info[0] eq "mediawiki_revision:")) {
+ print STDERR "No previous mediawiki revision found";
+ $lastrevision_number = 0;
+ } else {
+ # Notes are formatted : mediawiki_revision: #number
+ $lastrevision_number = $note_info[1];
+ chomp($lastrevision_number);
+ print STDERR "Last local mediawiki revision found is $lastrevision_number ";
+ }
+ return $lastrevision_number;
+}
+
+sub get_last_remote_revision {
+ my $mediawiki = MediaWiki::API->new;
+ $mediawiki->{config}->{api_url} = "$url/api.php";
+
+ my @pages = get_pages();
+
+ my $max_rev_num = 0;
+
+ foreach my $page (@pages) {
+ my $id = $page->{pageid};
+
+ my $query = {
+ action => 'query',
+ prop => 'revisions',
+ rvprop => 'ids',
+ pageids => $id,
+ };
+
+ my $result = $mediawiki->api($query);
+
+ my $lastrev = pop(@{$result->{query}->{pages}->{$id}->{revisions}});
+
+ $max_rev_num = ($lastrev->{revid} > $max_rev_num ? $lastrev->{revid} : $max_rev_num);
+ }
+
+ print STDERR "Last remote revision found is $max_rev_num\n";
+ return $max_rev_num;
+}
+
+sub literal_data {
+ my ($content) = @_;
+ print STDOUT "data ", bytes::length($content), "\n", $content;
+}
+
+sub mw_capabilities {
+ # Revisions are imported to the private namespace
+ # refs/mediawiki/$remotename/ by the helper and fetched into
+ # refs/remotes/$remotename later by fetch.
+ print STDOUT "refspec refs/heads/*:refs/mediawiki/$remotename/*\n";
+ print STDOUT "import\n";
+ print STDOUT "list\n";
+ print STDOUT "option\n";
+ print STDOUT "push\n";
+ print STDOUT "\n";
+}
+
+sub mw_list {
+ # MediaWiki do not have branches, we consider one branch arbitrarily
+ # called master
+ print STDOUT "? refs/heads/master\n";
+ print STDOUT '@'."refs/heads/master HEAD\n";
+ print STDOUT "\n";
+
+}
+
+sub mw_option {
+ print STDOUT "unsupported\n";
+}
+
+sub mw_import {
+ my @wiki_name = split(/:\/\//,$url);
+ my $wiki_name = $wiki_name[1];
+
+ my $mediawiki = MediaWiki::API->new;
+ $mediawiki->{config}->{api_url} = "$url/api.php";
+
+ my @pages = get_pages();
+
+ my @revisions;
+ print STDERR "Searching revisions...\n";
+ my $fetch_from = get_last_local_revision() + 1;
+ if ($fetch_from == 1) {
+ print STDERR ", fetching from beginning\n";
+ } else {
+ print STDERR ", fetching from here\n";
+ }
+ my $n = 1;
+ foreach my $page (@pages) {
+ my $id = $page->{pageid};
+
+ print STDERR "$n/", scalar(@pages), ": ". $page->{title}."\n";
+ $n++;
+
+ my $query = {
+ action => 'query',
+ prop => 'revisions',
+ rvprop => 'ids',
+ rvdir => 'newer',
+ rvstartid => $fetch_from,
+ rvlimit => 500,
+ pageids => $id,
+ };
+
+ my $revnum = 0;
+ # Get 500 revisions at a time due to the mediawiki api limit
+ while (1) {
+ my $result = $mediawiki->api($query);
+
+ # Parse each of those 500 revisions
+ foreach my $revision (@{$result->{query}->{pages}->{$id}->{revisions}}) {
+ my $page_rev_ids;
+ $page_rev_ids->{pageid} = $page->{pageid};
+ $page_rev_ids->{revid} = $revision->{revid};
+ push (@revisions, $page_rev_ids);
+ $revnum++;
+ }
+ last unless $result->{'query-continue'};
+ $query->{rvstartid} = $result->{'query-continue'}->{revisions}->{rvstartid};
+ }
+ print STDERR " Found ", $revnum, " revision(s).\n";
+ }
+
+ # Creation of the fast-import stream
+ print STDERR "Fetching & writing export data...\n";
+
+ $n = 0;
+ my $last_timestamp = 0; #Placeholer in case $rev->timestamp is undefined
+
+ foreach my $pagerevids (sort {$a->{revid} <=> $b->{revid}} @revisions) {
+ #fetch the content of the pages
+ my $query = {
+ action => 'query',
+ prop => 'revisions',
+ rvprop => 'content|timestamp|comment|user|ids',
+ revids => $pagerevids->{revid},
+ };
+
+ my $result = $mediawiki->api($query);
+
+ my $rev = pop(@{$result->{query}->{pages}->{$pagerevids->{pageid}}->{revisions}});
+
+ $n++;
+ my $user = $rev->{user} || 'Anonymous';
+
+ if (!defined($rev->{timestamp})) {
+ $last_timestamp++;
+ } else {
+ $last_timestamp = $rev->{timestamp};
+ }
+ my $dt = DateTime::Format::ISO8601->parse_datetime($last_timestamp);
+
+ my $comment = defined $rev->{comment} ? $rev->{comment} : '*Empty MediaWiki Message*';
+
+ my $title = $result->{query}->{pages}->{$pagerevids->{pageid}}->{title};
+ $title =~ y/ /_/;
+ #$title = uri_escape($title); #It would be nice to use uri_escape to be cross compatible
+ #on different file systems handling accentuated characters differently
+ $title =~ s/\//$slash_replacement/g;
+
+ my $content = $rev->{'*'};
+ # This \n is important. This is due to mediawiki's way to handle end of files.
+ $content .= "\n";
+
+ print STDERR "$n/", scalar(@revisions), ": Revision #$pagerevids->{revid} of $title\n";
+
+ print STDOUT "commit refs/mediawiki/$remotename/master\n";
+ print STDOUT "mark :$n\n";
+ print STDOUT "committer $user <$user\@$wiki_name> ", $dt->epoch, " +0000\n";
+ literal_data($comment);
+ # If it's not a clone, needs to know where to start from
+ if ($fetch_from != 1 && $n == 1) {
+ print STDOUT "from refs/mediawiki/$remotename/master^0\n";
+ }
+ print STDOUT "M 644 inline $title.mw\n";
+ literal_data($content);
+ print STDOUT "\n\n";
+
+
+
+
+ # mediawiki revision number in the git note
+ if ($fetch_from == 1 && $n == 1) {
+ print STDOUT "reset refs/notes/mediawiki\n";
+ }
+ print STDOUT "commit refs/notes/mediawiki\n";
+ print STDOUT "committer $user <$user\@$wiki_name> ", $dt->epoch, " +0000\n";
+ literal_data("note added by git-mediawiki");
+ if ($fetch_from != 1 && $n == 1) {
+ print STDOUT "from refs/notes/mediawiki^0\n";
+ }
+ print STDOUT "N inline :$n\n";
+ literal_data("mediawiki_revision: " . $pagerevids->{revid});
+ print STDOUT "\n\n";
+ }
+
+ if ($fetch_from == 1) {
+ if ($n != 0) {
+ print STDOUT "reset $_[0]\n"; #$_[0] contains refs/heads/master
+ print STDOUT "from :$n\n";
+ } else {
+ print STDERR "You appear to have cloned an empty mediawiki\n";
+ #Something has to be done remote-helper side. If nothing is done, an error is
+ #thrown saying that HEAD is refering to unknown object 0000000000000000000
+ }
+ }
+
+}
+
+sub mw_push {
+ print STDERR "Push not yet implemented\n";
+}
diff --git a/contrib/mw-to-git/git-remote-mediawiki.txt b/contrib/mw-to-git/git-remote-mediawiki.txt
new file mode 100644
index 0000000..4d211f5
--- /dev/null
+++ b/contrib/mw-to-git/git-remote-mediawiki.txt
@@ -0,0 +1,7 @@
+Git-Mediawiki is a project which aims the creation of a gate
+between git and mediawiki, allowing git users to push and pull
+objects from mediawiki just as one would do with a classic git
+repository thanks to remote-helpers.
+
+For more information, visit the wiki at
+https://github.com/Bibzball/Git-Mediawiki/wiki
--
1.7.4.1
next reply other threads:[~2011-06-09 13:17 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 13:15 Jeremie Nikaes [this message]
2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
2011-06-09 17:15 ` Junio C Hamano
2011-06-09 14:03 ` [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Sverre Rabbelier
2011-06-09 14:30 ` Jérémie NIKAES
2011-06-09 14:32 ` Sverre Rabbelier
2011-06-09 22:44 ` Jeff King
2011-06-10 0:21 ` Jeff King
2011-06-10 6:31 ` Arnaud Lacurie
2011-06-10 7:22 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1307625360-10973-1-git-send-email-jeremie.nikaes@ensimag.imag.fr \
--to=jeremie.nikaes@ensimag.imag.fr \
--cc=arnaud.lacurie@ensimag.imag.fr \
--cc=claire.fousse@ensimag.imag.fr \
--cc=david.amouyal@ensimag.imag.fr \
--cc=git@vger.kernel.org \
--cc=matthieu.moy@grenoble-inp.fr \
--cc=sylvain.boulme@imag.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).