git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaud Lacurie <arnaud.lacurie@ensimag.imag.fr>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org,
	"Jérémie Nikaes" <jeremie.nikaes@ensimag.imag.fr>,
	"Claire Fousse" <claire.fousse@ensimag.imag.fr>,
	"David Amouyal" <david.amouyal@ensimag.imag.fr>,
	"Matthieu Moy" <matthieu.moy@grenoble-inp.fr>,
	"Sylvain Boulmé" <sylvain.boulme@imag.fr>
Subject: Re: [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled
Date: Thu, 2 Jun 2011 22:28:33 +0200	[thread overview]
Message-ID: <BANLkTi=eYg3uT1hQZO03i4MLyhRkPzXK6w@mail.gmail.com> (raw)
In-Reply-To: <20110602170327.GA2928@sigill.intra.peff.net>

2011/6/2 Jeff King <peff@peff.net>:
> On Thu, Jun 02, 2011 at 11:28:31AM +0200, Arnaud Lacurie wrote:
>
>> +sub mw_import {
>> [...]
>> +             # Get 500 revisions at a time due to the mediawiki api limit
>> +             while (1) {
>> +                     my $result = $mediawiki->api($query);
>> +
>> +                     # Parse each of those 500 revisions
>> +                     foreach my $revision (@{$result->{query}->{pages}->{$id}->{revisions}}) {
>> +                             my $page_rev_ids;
>> +                             $page_rev_ids->{pageid} = $page->{pageid};
>> +                             $page_rev_ids->{revid} = $revision->{revid};
>> +                             push (@revisions, $page_rev_ids);
>> +                             $revnum++;
>> +                     }
>> +                     last unless $result->{'query-continue'};
>> +                     $query->{rvstartid} = $result->{'query-continue'}->{revisions}->{rvstartid};
>> +                     print "\n";
>> +             }
>
> What is this newline at the end here for? With it, my import reliably
> fails with:
>
>  fatal: Unsupported command:
>  fast-import: dumping crash report to .git/fast_import_crash_6091
>
> Removing it seems to make things work.

 Yes we actually found it today. It slipped as we've never fetched
pages with more than 500 revisions since it got there...

>> +             # mediawiki revision number in the git note
>> +             my $note_comment = encode_utf8("note added by git-mediawiki");
>> +             my $note_comment_length = bytes::length($note_comment);
>> +             my $note_content = encode_utf8("mediawiki_revision: " . $pagerevids->{revid} . "\n");
>> +             my $note_content_length = bytes::length($note_content);
>> +
>> +             if ($fetch_from == 1 && $n == 1) {
>> +                     print "reset refs/notes/commits\n";
>> +             }
>> +             print "commit refs/notes/commits\n";
>
> Should these go in refs/notes/commits? I don't think we have a "best
> practices" yet for the notes namespaces, as it is still a relatively new
> concept. But I always thought "refs/notes/commits" would be for the
> user's "regular" notes, and that programmatic things would get their own
> notes, like "refs/notes/mediawiki".
>
That's a good idea, we didn't think notes could actually not go in
refs/notes/commits. This will be perfect to distinguish the user notes
from ours.
>
>> +             } else {
>> +                     print STDERR "You appear to have cloned an empty mediawiki\n";
>> +                     #What do we have to do here ? If nothing is done, an error is thrown saying that
>> +                     #HEAD is refering to unknown object 0000000000000000000
>> +             }
>
> Hmm. We do allow cloning empty git repos. It might be nice for there to
> be some way for a remote helper to signal "everything OK, but the result
> is empty". But I think that is probably something that needs to be added
> to the remote-helper protocol, and so is outside the scope of your
> script (maybe it is as simple as interpreting the null sha1 as "empty";
> I dunno).
>

Yes, that's a problem we've been running into. We didn't really know
how to solve it.

> Overall, it's looking pretty good. I like that I can resume a
> half-finished import via "git fetch". Though I do have one complaint:
> running "git fetch" fetches the metainfo for every revision of every
> page, just as it does for an initial clone. Is there something in the
> mediawiki API to say "show me revisions since N" (where N would be the
> mediawiki revision of the tip of what we imported)?

I am not sure I understand your question. Because actually, we are
supporting this,
thanks to git notes. Like when you git fetch after a clone, it checks
only the last revisions

Thank you very much for your help !

Arnaud Lacurie

  reply	other threads:[~2011-06-02 20:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-02  9:28 [RFC/PATCH] Added a remote helper to interact with mediawiki, pull & clone handled Arnaud Lacurie
2011-06-02 17:03 ` Jeff King
2011-06-02 20:28   ` Arnaud Lacurie [this message]
2011-06-02 22:49     ` Jeff King
2011-06-02 22:37   ` Matthieu Moy
2011-06-03  3:43     ` Jeff King
2011-06-02 18:01 ` Junio C Hamano
2011-06-02 20:58   ` Arnaud Lacurie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=eYg3uT1hQZO03i4MLyhRkPzXK6w@mail.gmail.com' \
    --to=arnaud.lacurie@ensimag.imag.fr \
    --cc=claire.fousse@ensimag.imag.fr \
    --cc=david.amouyal@ensimag.imag.fr \
    --cc=git@vger.kernel.org \
    --cc=jeremie.nikaes@ensimag.imag.fr \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=peff@peff.net \
    --cc=sylvain.boulme@imag.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).