git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaud Lacurie <arnaud.lacurie@ensimag.imag.fr>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
	"Jérémie Nikaes" <jeremie.nikaes@ensimag.imag.fr>,
	"Claire Fousse" <claire.fousse@ensimag.imag.fr>,
	"David Amouyal" <david.amouyal@ensimag.imag.fr>,
	"Matthieu Moy" <matthieu.moy@grenoble-inp.fr>,
	"Sylvain Boulmé" <sylvain.boulme@imag.fr>
Subject: Re: [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled
Date: Thu, 2 Jun 2011 22:58:44 +0200	[thread overview]
Message-ID: <BANLkTimCONrGyX=v39RXL4pxGWbkLVROPA@mail.gmail.com> (raw)
In-Reply-To: <7vy61kcdu8.fsf@alter.siamese.dyndns.org>

2011/6/2 Junio C Hamano <gitster@pobox.com>:
> Arnaud Lacurie <arnaud.lacurie@ensimag.imag.fr> writes:
>
>>  contrib/mw-to-git/git-remote-mediawiki     |  252 ++++++++++++++++++++++++++++
>>  contrib/mw-to-git/git-remote-mediawiki.txt |    7 +
>>  2 files changed, 259 insertions(+), 0 deletions(-)
>
> It is pleasing to see that a half of a custom backend can be done in just
> 250 lines of code.  I understand that this is a work-in-progress with many
> unnecessary lines spitting debugging output to STDERR, whose removal will
> further shrink the code?

To give you an idea, the whole thing with the other part is about 400
lines for now.
There are a lot of output to STDERR that will be eventually removed.
We still have to
decide what "option verbosity" should display before removing to many of them.


>> +# commands parser
>> +my $loop = 1;
>> +my $entry;
>> +my @cmd;
>> +while ($loop) {
>
> This is somewhat unusual-looking loop control.
>
> Wouldn't "while (1) { ...; last if (...); if (...) { last; } }" do?

Ok for that loop control.

>> +     $| = 1; #flush STDOUT
>> +     $entry = <STDIN>;
>> +     print STDERR $entry;
>> +     chomp($entry);
>> +     @cmd = undef;
>> +     @cmd = split(/ /,$entry);
>> +     switch ($cmd[0]) {
>> +             case "capabilities" {
>> +                     if ($cmd[1] eq "") {
>> +                             mw_capabilities();
>> +                     } else {
>> +                            $loop = 0;
>
> I presume that this is "We were expecting to read capabilities command but
> found something unexpected; let's abort". Don't you want to say something
> to the user here, perhaps on STDERR?
>

Actually, we based that "no error messages displayed" policy on the
git-remote-http which aborts if something unexpected is found. We could add an
error message though.

>> +                     }
>> +             }
>> ...
>> +             case "option" {
>> +                     mw_option($cmd[1],$cmd[2]);
>> +             }
>
> No error checking only for this one?

There should be one. That's a miss.



>> +                     my @pushargs = split(/:/,$cmd[1]);
>> +                     if ($pushargs[1] ne "" && $pushargs[2] eq ""
>> +                     && (substr($pushargs[0],0,1) eq "+")) {
>> +                             mw_push(substr($pushargs[0],1),$pushargs[1]);
>> +                     } else {
>> +                            $loop = 0;
>> +                     }
>
> Is "push" always forcing?

The tests are done in the mw_push function which is still being implemented.
The push will abort if there is something to pull, just like git acts.

>> +sub mw_import {
>> +     my @wiki_name = split(/:\/\//,$url);
>> +     my $wiki_name = $wiki_name[1];
>> +
>> +     my $mediawiki = MediaWiki::API->new;
>> +     $mediawiki->{config}->{api_url} = "$url/api.php";
>> +
>> +     my $pages = $mediawiki->list({
>> +             action => 'query',
>> +             list => 'allpages',
>> +             aplimit => 500,
>> +     });
>> +     if ($pages == undef) {
>> +             print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
>> +             print STDERR "fatal: make sure '$url/api.php' is a valid page\n";
>> +             exit;
>> +     }
>> +
>> +     my @revisions;
>> +     print STDERR "Searching revisions...\n";
>> +     my $fetch_from = get_last_local_revision() + 1;
>> +     my $n = 1;
>> +     foreach my $page (@$pages) {
>> +             my $id = $page->{pageid};
>> +
>> +             print STDERR "$n/", scalar(@$pages), ": $page->{title}\n";
>> +             $n++;
>> +
>> +             my $query = {
>> +                     action => 'query',
>> +                     prop => 'revisions',
>> +                     rvprop => 'ids',
>> +                     rvdir => 'newer',
>> +                     rvstartid => $fetch_from,
>> +                     rvlimit => 500,
>> +                     pageids => $page->{pageid},
>> +             };
>> +
>> +             my $revnum = 0;
>> +             # Get 500 revisions at a time due to the mediawiki api limit
>
> It's nice that you can dig deeper with rvlimit increments. I wonder if
> 'allpages' also let's you retrieve more than 500 pages in total by somehow
> iterating over the set of pages.

Yes it is possible. And works with the removal of the "\n" peff
pointed out in the previous message.

>> +     # Creation of the fast-import stream
>> +     print STDERR "Fetching & writing export data...\n";
>> +     binmode STDOUT, ':binary';
>> +     $n = 0;
>> +
>> +     foreach my $pagerevids (sort {$a->{revid} <=> $b->{revid}} @revisions) {
>> +             #fetch the content of the pages
>> +             my $query = {
>> +                     action => 'query',
>> +                     prop => 'revisions',
>> +                     rvprop => 'content|timestamp|comment|user|ids',
>> +                     revids => $pagerevids->{revid},
>> +             };
>> +
>> +             my $result = $mediawiki->api($query);
>> +
>> +             my $rev = pop(@{$result->{query}->{pages}->{$pagerevids->{pageid}}->{revisions}});
>
> Is the list of per-page revisions guaranteed to be sorted (not a
> rhetorical question; just asking)?

Yes it is.


>> +             print "commit refs/mediawiki/$remotename/master\n";
>> +             print "mark :$n\n";
>> +             print "committer $user <$user\@$wiki_name> ", $dt->epoch, " +0000\n";
>> +             print "data ", bytes::length(encode_utf8($comment)), "\n", encode_utf8($comment);
>
> Calling encode_utf8() twice on the same data?  How big is this $comment
> typically?  Or does encode_utf8() somehow memoize?
>
>> +             # If it's not a clone, needs to know where to start from
>> +             if ($fetch_from != 1 && $n == 1) {
>> +                     print "from refs/mediawiki/$remotename/master^0\n";
>> +             }
>> +             print "M 644 inline $title.wiki\n";
>> +             print "data ", bytes::length(encode_utf8($content)), "\n", encode_utf8($content);
>
> Same for $content, which presumably is larger than $comment...
>
> Perhaps a small helper
>
>        sub literal_data {
>                my ($content) = @_;
>                print "data ", bytes::length($content), "\n", $content;
>        }
>
> would help here, above, and below where you create a "note" on this
> commit?
>
>> +             # mediawiki revision number in the git note
>> +             my $note_comment = encode_utf8("note added by git-mediawiki");
>> +             my $note_comment_length = bytes::length($note_comment);
>> +             my $note_content = encode_utf8("mediawiki_revision: " . $pagerevids->{revid} . "\n");
>> +             my $note_content_length = bytes::length($note_content);
>> +
>> +             if ($fetch_from == 1 && $n == 1) {
>> +                     print "reset refs/notes/commits\n";
>> +             }
>> +             print "commit refs/notes/commits\n";
>> +             print "committer $user <user\@example.com> ", $dt->epoch, " +0000\n";
>> +             print "data ", $note_comment_length, "\n", $note_comment;
>
> With that, this will become
>
>        literal_data(encode_utf8("note added by git-mediawiki"));
>
> and you don't need two extra variables.  Same for $note_content*.

Thank you very much for this helper, it'll help factoring the code and
reducing the number of variables.


Thank you very much for your help and comments on that project.

Regards

-- 
Arnaud Lacurie

      reply	other threads:[~2011-06-02 20:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-02  9:28 [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled Arnaud Lacurie
2011-06-02 17:03 ` Jeff King
2011-06-02 20:28   ` Arnaud Lacurie
2011-06-02 22:49     ` Jeff King
2011-06-02 22:37   ` Matthieu Moy
2011-06-03  3:43     ` Jeff King
2011-06-02 18:01 ` Junio C Hamano
2011-06-02 20:58   ` Arnaud Lacurie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTimCONrGyX=v39RXL4pxGWbkLVROPA@mail.gmail.com' \
    --to=arnaud.lacurie@ensimag.imag.fr \
    --cc=claire.fousse@ensimag.imag.fr \
    --cc=david.amouyal@ensimag.imag.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeremie.nikaes@ensimag.imag.fr \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=sylvain.boulme@imag.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).