git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Claire Fousse <claire.fousse@gmail.com>
Cc: git@vger.kernel.org,
	"matthieu.moy" <Matthieu.Moy@grenoble-inp.fr>,
	Sylvain Boulme <Sylvain.Boulme@imag.fr>
Subject: Re: Git-Mediawiki : Question about Jeff King's import script
Date: Thu, 26 May 2011 11:42:14 -0400	[thread overview]
Message-ID: <20110526154214.GA4049@sigill.intra.peff.net> (raw)
In-Reply-To: <BANLkTi=nLZV_QCyKT8LOhzkJYoJD6J4wPA@mail.gmail.com>

On Thu, May 26, 2011 at 05:18:11PM +0200, Claire Fousse wrote:

> We based our script on what you called a few months ago the "quick and
> dirty perl script" for the import part and have a few questions about
> it.
> First of all, just in case, here is your original script :
> http://article.gmane.org/gmane.comp.version-control.git/167560
> 
> It seems like you first used a hashmap for it to be transformed later
> into a flat list / array. What is the reasoning behind this ? Why not
> create an array right away ?

The hashmap is actually backed by an on-disk key/value database.  The
purpose of this was to allow resuming an import that had failed in the
middle (since even for a moderate-sized wiki like the git wiki, the
import was quite slow).

So the hashmap is indexed by page id, and each value contains an array
of revisions for that page. If we see a page id that we've already done,
we can skip importing it.

If you wanted to do it all at once, yes, you could build a flat array of
revisions, with each revision mentioning the page that it came from, and
just keep appending to the array as you read more data from the wiki.
And then at the end, sort that array based on timestamp to get the
chronological ordering of changes.

Hope that helps,
-Peff

  reply	other threads:[~2011-05-26 15:42 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-26 15:18 Git-Mediawiki : Question about Jeff King's import script Claire Fousse
2011-05-26 15:42 ` Jeff King [this message]
2011-05-27  9:05   ` Claire Fousse
2011-05-27 12:45 ` Alexandre Dulaunoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110526154214.GA4049@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=Sylvain.Boulme@imag.fr \
    --cc=claire.fousse@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).