Git import of the recent full enwiki dump

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Git import of the recent full enwiki dump
@ 2010-04-16 23:47 Richard Hartmann
  2010-04-17  0:19 ` Sverre Rabbelier
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Hartmann @ 2010-04-16 23:47 UTC (permalink / raw)
  To: wikitech-l, git

-- This email has been sent to two lists --

Hi all,

I would be interested to import the whole enwiki dump [1] into git[2].

This data set is probably the largest set of changes on earth, so
it's highly interesting to see what git will make of it.

As of right now, I am trying to import on my local machine, but
my first, rough, projections tell me my machine will melt down at
some point ;)

Assuming my local import fails, I would appreciate it if this could
be added to wikitech's longer-term todo list.
If anyone has access to a system with several TiB of free disk
space which they can spare for a week or three, it would be
awesome. If given shell access, I can take care of this task,
but I would be happy to assist anyone attempting it, as well.

If need be, I can get various people from various communities
to vouch for me, my character & that I Do Not Break Stuff.

Richard Hartmann

PS: If anyone attempts to do this, please poke me. Either
via email or RichiH on freenode, OFTC and IRCnet.

[1] http://download.wikimedia.org/enwiki/20100130/
[2] http://git-scm.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-16 23:47 Git import of the recent full enwiki dump Richard Hartmann
@ 2010-04-17  0:19 ` Sverre Rabbelier
  2010-04-17  0:48   ` Sebastian Bober
  2010-04-17  1:10   ` Richard Hartmann
  0 siblings, 2 replies; 12+ messages in thread
From: Sverre Rabbelier @ 2010-04-17  0:19 UTC (permalink / raw)
  To: Richard Hartmann
  Cc: Git List, Avery Pennarun, Nicolas Pitre, Shawn O. Pearce,
	Sam Vilain

Heya,

[-wikitech-l, if they should be kept on the cc please re-add, I assume
that the discussion of the git aspects are not relevant to that list]

On Sat, Apr 17, 2010 at 01:47, Richard Hartmann
<richih.mailinglist@gmail.com> wrote:
> This data set is probably the largest set of changes on earth, so
> it's highly interesting to see what git will make of it.

I think that git might actually be able to handle it. Git's been known
not to handle _large files_ very well, but a lot of history/a lot of
files is something different. Assuming you do the import incrementally
using something like git-fast-import (feeding it with a custom
exporter that uses the dump as it's input) you shouldn't even need an
extraordinary machine to do it (although you'd need a lot of storage).

> As of right now, I am trying to import on my local machine, but
> my first, rough, projections tell me my machine will melt down at
> some point ;)

How are you importing? Did you script the process that does something
like 'move next revision of file in place && git add . && git commit'?
I don't know how well that would work since I reckon the worktree will
be huge. Speaking of which, it might make sense to separate the
worktree by prefix, so articles starting with "aa" go under the "aa"
directory, etc?

Anyway, other gits might have more interesting things to say, cc-ed is
Avery, who has been working on a tool to back-up entire harddrives in
git. Also cc-ed are Nico and Shawn who both have a lot of experience
with the object backend and the pack implementation. Also, Sam, who
has worked on importing the entire Perl history into git, not sure how
big that is though, but they have a lot of changesets too I think.
There's a bunch of people that have worked on importing stuff like KDE
into git, who might have interesting things to add, but I don't know
who those are.

Hope that helps, and if you do convert it (and it turns out to be
usable, and you decide to keep it up to date somehow), put it up
somewhere! :)

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-17  0:19 ` Sverre Rabbelier
@ 2010-04-17  0:48   ` Sebastian Bober
  2010-04-17  0:53     ` Shawn O. Pearce
  2010-04-17  1:10   ` Richard Hartmann
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Bober @ 2010-04-17  0:48 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Richard Hartmann, Git List, Avery Pennarun, Nicolas Pitre,
	Shawn O. Pearce, Sam Vilain

On Sat, Apr 17, 2010 at 02:19:40AM +0200, Sverre Rabbelier wrote:
> Heya,
> 
> [-wikitech-l, if they should be kept on the cc please re-add, I assume
> that the discussion of the git aspects are not relevant to that list]
> 
> On Sat, Apr 17, 2010 at 01:47, Richard Hartmann
> <richih.mailinglist@gmail.com> wrote:
> > This data set is probably the largest set of changes on earth, so
> > it's highly interesting to see what git will make of it.
> 
> I think that git might actually be able to handle it. Git's been known
> not to handle _large files_ very well, but a lot of history/a lot of
> files is something different. Assuming you do the import incrementally
> using something like git-fast-import (feeding it with a custom
> exporter that uses the dump as it's input) you shouldn't even need an
> extraordinary machine to do it (although you'd need a lot of storage).

The question would be, how the commits and the trees are laid out.
If every wiki revision shall be a git commit, then we'd need to handle
300M commits. And we have 19M wiki pages (that would be files). The tree
objects would be very large and git-fast-import would crawl.

Some tests with the german wikipedia have shown that importing the blobs
is doable on normal hardware. Getting the trees and commits into git
was not possible up to now, as fast-import was just to slow (and getting
slower after 1M commits).

I had the idea of having an importer that would just handle this special
case (1 file change per commit), but didn't get around to try that yet.

bye,
  Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-17  0:48   ` Sebastian Bober
@ 2010-04-17  0:53     ` Shawn O. Pearce
  2010-04-17  1:01       ` Sebastian Bober
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn O. Pearce @ 2010-04-17  0:53 UTC (permalink / raw)
  To: Sebastian Bober
  Cc: Sverre Rabbelier, Richard Hartmann, Git List, Avery Pennarun,
	Nicolas Pitre, Sam Vilain

Sebastian Bober <sbober@servercare.de> wrote:
> The question would be, how the commits and the trees are laid out.
> If every wiki revision shall be a git commit, then we'd need to handle
> 300M commits. And we have 19M wiki pages (that would be files). The tree
> objects would be very large and git-fast-import would crawl.
> 
> Some tests with the german wikipedia have shown that importing the blobs
> is doable on normal hardware. Getting the trees and commits into git
> was not possible up to now, as fast-import was just to slow (and getting
> slower after 1M commits).

Well, to be fair to fast-import, its tree handling code is linear
scan based, because that's how any other part of Git handles trees.

If you just toss all 19M wiki pages into a single top level tree,
that's going to take a very long time to locate the wiki page
talking about Zoos.

> I had the idea of having an importer that would just handle this special
> case (1 file change per commit), but didn't get around to try that yet.

Really, fast-import should be able to handle this well, assuming you
aren't just tossing all 19M files into a single massive directory
and hoping for the best.  Because *any* program working on that
sort of layout will need to spit out the 19M entry tree object on
each and every commit, just so it can compute the SHA-1 checksum
to get the tree name for the commit.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-17  0:53     ` Shawn O. Pearce
@ 2010-04-17  1:01       ` Sebastian Bober
  2010-04-17  1:44         ` [spf:guess] " Sam Vilain
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Bober @ 2010-04-17  1:01 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Sverre Rabbelier, Richard Hartmann, Git List, Avery Pennarun,
	Nicolas Pitre, Sam Vilain

On Fri, Apr 16, 2010 at 05:53:42PM -0700, Shawn O. Pearce wrote:
> Sebastian Bober <sbober@servercare.de> wrote:
> > The question would be, how the commits and the trees are laid out.
> > If every wiki revision shall be a git commit, then we'd need to handle
> > 300M commits. And we have 19M wiki pages (that would be files). The tree
> > objects would be very large and git-fast-import would crawl.
> > 
> > Some tests with the german wikipedia have shown that importing the blobs
> > is doable on normal hardware. Getting the trees and commits into git
> > was not possible up to now, as fast-import was just to slow (and getting
> > slower after 1M commits).
> 
> Well, to be fair to fast-import, its tree handling code is linear
> scan based, because that's how any other part of Git handles trees.
> 
> If you just toss all 19M wiki pages into a single top level tree,
> that's going to take a very long time to locate the wiki page
> talking about Zoos.
> 

I'm not dissing fast-import, it's fantastic. We tried with 2-10 level
deep trees (the best depth being 3), but after some million commits it
just got unbearably slow, with the ETA constantly rising.

That was because of tree creation, and SHA1 computing of these tree
objects.

> > I had the idea of having an importer that would just handle this special
> > case (1 file change per commit), but didn't get around to try that yet.
> 
> Really, fast-import should be able to handle this well, assuming you
> aren't just tossing all 19M files into a single massive directory
> and hoping for the best.  Because *any* program working on that
> sort of layout will need to spit out the 19M entry tree object on
> each and every commit, just so it can compute the SHA-1 checksum
> to get the tree name for the commit.
> 
> -- 
> Shawn.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [spf:guess] Re: Git import of the recent full enwiki dump
  2010-04-17  1:01       ` Sebastian Bober
@ 2010-04-17  1:44         ` Sam Vilain
  2010-04-17  1:58           ` Sebastian Bober
  0 siblings, 1 reply; 12+ messages in thread
From: Sam Vilain @ 2010-04-17  1:44 UTC (permalink / raw)
  To: Sebastian Bober
  Cc: Shawn O. Pearce, Sverre Rabbelier, Richard Hartmann, Git List,
	Avery Pennarun, Nicolas Pitre

On Sat, 2010-04-17 at 03:01 +0200, Sebastian Bober wrote:
> I'm not dissing fast-import, it's fantastic. We tried with 2-10 level
> deep trees (the best depth being 3), but after some million commits it
> just got unbearably slow, with the ETA constantly rising.

How often are you checkpointing?  Like any data import IME, you can't
leave transactions going indefinitely and expect good performance!

Would it be at all possible to consider using a submodule for each page?
With a super-project commit which is updated for every day of updates or
so.

This will create a natural partitioning of the data set in a way which
is likely to be more useful and efficient to work with.  Hand-held
devices can be shipped with a "shallow" clone of the main repository,
with shallow clones of the sub-repositories too (in such a setup, the
device would not really use a checkout of course to save space).  Then,
history for individual pages could be extended as required.  The device
could "update" the master history, so it would know in summary form
which pages have changed.  It would then go on to fetch updates for
individual pages that the user is watching, or potentially even get them
all.  There's an interesting next idea here: device-to-device update
bundles.  And another one: distributed update; if, instead of writing to
a "master" version - the action of editing a wiki page becomes to create
a fork and the editorial process promotes these forks to be the master
version in the superproject.  Users which have pulled the full
repository for a page will be able to see other peoples' forks, to get
"latest" versions or for editing purposes.  This adds not only a
distributed update action, but the ability to have decent peer
review/editorial process without it being arduous.

Without good data set partitioning I don't think I see the above
workflow being as possible.  I was approaching the problem by first
trying to back a SQL RDBMS to git, eg MySQL or SQLite (postgres would be
nice, but probably much harder) - so I first set out by designing a
table store.  But the representation of the data is not important, just
the distributed version of it.

Actually this raises the question - what is it that you are trying to
achieve with this wikipedia import?

Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [spf:guess] Re: Git import of the recent full enwiki dump
  2010-04-17  1:44         ` [spf:guess] " Sam Vilain
@ 2010-04-17  1:58           ` Sebastian Bober
  2010-04-17  3:34             ` [spf:guess] " Sam Vilain
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Bober @ 2010-04-17  1:58 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Shawn O. Pearce, Sverre Rabbelier, Richard Hartmann, Git List,
	Avery Pennarun, Nicolas Pitre

On Sat, Apr 17, 2010 at 01:44:56PM +1200, Sam Vilain wrote:
> On Sat, 2010-04-17 at 03:01 +0200, Sebastian Bober wrote:
> > I'm not dissing fast-import, it's fantastic. We tried with 2-10 level
> > deep trees (the best depth being 3), but after some million commits it
> > just got unbearably slow, with the ETA constantly rising.
> 
> How often are you checkpointing?  Like any data import IME, you can't
> leave transactions going indefinitely and expect good performance!

We have tried checkpointing (even stopping/starting fast-import) every
10,000 - 100,000 commits. That does mitigate some speed and memory
issues of fast-import. But in the end fast-import lost time at every
restart / checkpoint.

> Would it be at all possible to consider using a submodule for each page?
> With a super-project commit which is updated for every day of updates or
> so.
> 
> This will create a natural partitioning of the data set in a way which
> is likely to be more useful and efficient to work with.  Hand-held
> devices can be shipped with a "shallow" clone of the main repository,
> with shallow clones of the sub-repositories too (in such a setup, the
> device would not really use a checkout of course to save space).  Then,
> history for individual pages could be extended as required.  The device
> could "update" the master history, so it would know in summary form
> which pages have changed.  It would then go on to fetch updates for
> individual pages that the user is watching, or potentially even get them
> all.  There's an interesting next idea here: device-to-device update
> bundles.  And another one: distributed update; if, instead of writing to
> a "master" version - the action of editing a wiki page becomes to create
> a fork and the editorial process promotes these forks to be the master
> version in the superproject.  Users which have pulled the full
> repository for a page will be able to see other peoples' forks, to get
> "latest" versions or for editing purposes.  This adds not only a
> distributed update action, but the ability to have decent peer
> review/editorial process without it being arduous.
> 
> Without good data set partitioning I don't think I see the above
> workflow being as possible.  I was approaching the problem by first
> trying to back a SQL RDBMS to git, eg MySQL or SQLite (postgres would be
> nice, but probably much harder) - so I first set out by designing a
> table store.  But the representation of the data is not important, just
> the distributed version of it.

Yep, we had many ideas how to partition the data. All that was not tried
up to now, because we had the hope to get it done the "straight" way.
But that may not be possible.

> Actually this raises the question - what is it that you are trying to
> achieve with this wikipedia import?

Ultimately, having a distributed Wikipedia. Having the possibility to
fork or branch Wikipedia, to have an inclusionist and exclusionist
Wikipedia all in one.



bye,
  Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [spf:guess] Re: [spf:guess] Re: Git import of the recent full enwiki dump
  2010-04-17  1:58           ` Sebastian Bober
@ 2010-04-17  3:34             ` Sam Vilain
  2010-04-17  7:48               ` Sebastian Bober
  0 siblings, 1 reply; 12+ messages in thread
From: Sam Vilain @ 2010-04-17  3:34 UTC (permalink / raw)
  To: Sebastian Bober
  Cc: Shawn O. Pearce, Sverre Rabbelier, Richard Hartmann, Git List,
	Avery Pennarun, Nicolas Pitre

On Sat, 2010-04-17 at 03:58 +0200, Sebastian Bober wrote:
> > Without good data set partitioning I don't think I see the above
> > workflow being as possible.  I was approaching the problem by first
> > trying to back a SQL RDBMS to git, eg MySQL or SQLite (postgres would be
> > nice, but probably much harder) - so I first set out by designing a
> > table store.  But the representation of the data is not important, just
> > the distributed version of it.
> 
> Yep, we had many ideas how to partition the data. All that was not tried
> up to now, because we had the hope to get it done the "straight" way.
> But that may not be possible.

I just don't think it's a practical aim or even useful.  Who really
wants the complete history of all wikipedia pages?  Only a very few -
libraries, national archives, and some collectors.

> We have tried checkpointing (even stopping/starting fast-import) every
> 10,000 - 100,000 commits. That does mitigate some speed and memory
> issues of fast-import. But in the end fast-import lost time at every
> restart / checkpoint.

One more thought - fast-import really does work better if you send it
all the versions of a blob in sequence so that it can write out deltas
the first time around.

Another advantage of the per-page partitioning is that they can
checkpoint/gc independently, allowing for more parallelization of the
job.

> > Actually this raises the question - what is it that you are trying to
> > achieve with this wikipedia import?
> 
> Ultimately, having a distributed Wikipedia. Having the possibility to
> fork or branch Wikipedia, to have an inclusionist and exclusionist
> Wikipedia all in one.

This sounds like far too much fun for me to miss out on, now downloading
enwiki-20100312-pages-meta-history.xml.7z :-) and I will give this a
crack!

Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [spf:guess] Re: [spf:guess] Re: Git import of the recent full enwiki dump
  2010-04-17  3:34             ` [spf:guess] " Sam Vilain
@ 2010-04-17  7:48               ` Sebastian Bober
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Bober @ 2010-04-17  7:48 UTC (permalink / raw)
  To: Sam Vilain
  Cc: Shawn O. Pearce, Sverre Rabbelier, Richard Hartmann, Git List,
	Avery Pennarun, Nicolas Pitre

On Sat, Apr 17, 2010 at 03:34:52PM +1200, Sam Vilain wrote:
> On Sat, 2010-04-17 at 03:58 +0200, Sebastian Bober wrote:
> > > Without good data set partitioning I don't think I see the above
> > > workflow being as possible.  I was approaching the problem by first
> > > trying to back a SQL RDBMS to git, eg MySQL or SQLite (postgres would be
> > > nice, but probably much harder) - so I first set out by designing a
> > > table store.  But the representation of the data is not important, just
> > > the distributed version of it.
> > 
> > Yep, we had many ideas how to partition the data. All that was not tried
> > up to now, because we had the hope to get it done the "straight" way.
> > But that may not be possible.
> 
> I just don't think it's a practical aim or even useful.  Who really
> wants the complete history of all wikipedia pages?  Only a very few -
> libraries, national archives, and some collectors.

Heh, exactly. And I just want to see, if it can be done.

> > We have tried checkpointing (even stopping/starting fast-import) every
> > 10,000 - 100,000 commits. That does mitigate some speed and memory
> > issues of fast-import. But in the end fast-import lost time at every
> > restart / checkpoint.
> 
> One more thought - fast-import really does work better if you send it
> all the versions of a blob in sequence so that it can write out deltas
> the first time around.

This is already done thah way.

> Another advantage of the per-page partitioning is that they can
> checkpoint/gc independently, allowing for more parallelization of the
> job.
> 
> > > Actually this raises the question - what is it that you are trying to
> > > achieve with this wikipedia import?
> > 
> > Ultimately, having a distributed Wikipedia. Having the possibility to
> > fork or branch Wikipedia, to have an inclusionist and exclusionist
> > Wikipedia all in one.
> 
> This sounds like far too much fun for me to miss out on, now downloading
> enwiki-20100312-pages-meta-history.xml.7z :-) and I will give this a
> crack!


Please have a look at a smaller wiki for testing, and the project at

  git://github.com/sbober/levitation-perl.git

provides several ways to parse the XML and to generate the fast-import
input in its branches.


bye,
  Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-17  0:19 ` Sverre Rabbelier
  2010-04-17  0:48   ` Sebastian Bober
@ 2010-04-17  1:10   ` Richard Hartmann
  2010-04-17  1:18     ` Shawn O. Pearce
  2010-04-17  1:25     ` Sebastian Bober
  1 sibling, 2 replies; 12+ messages in thread
From: Richard Hartmann @ 2010-04-17  1:10 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Git List, Avery Pennarun, Nicolas Pitre, Shawn O. Pearce,
	Sam Vilain

On Sat, Apr 17, 2010 at 02:19, Sverre Rabbelier <srabbelier@gmail.com> wrote:

> Assuming you do the import incrementally
> using something like git-fast-import (feeding it with a custom
> exporter that uses the dump as it's input) you shouldn't even need an
> extraordinary machine to do it (although you'd need a lot of storage).

I am using a Python script [1] to import the XML dump.


> Speaking of which, it might make sense to separate the
> worktree by prefix, so articles starting with "aa" go under the "aa"
> directory, etc?

Very good idea. What command would I need to send to
git-fast-import to do that?


> Hope that helps, and if you do convert it (and it turns out to be
> usable, and you decide to keep it up to date somehow), put it up
> somewhere! :)

It did.
I will make it available if it turns out to be useful. Keeping it up to
date might be harder unless they keep on releasing new
(incremental) snapshots.


Thanks,
Richard


[1] http://github.com/scy/levitation/blob/master/import.py

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-17  1:10   ` Richard Hartmann
@ 2010-04-17  1:18     ` Shawn O. Pearce
  2010-04-17  1:25     ` Sebastian Bober
  1 sibling, 0 replies; 12+ messages in thread
From: Shawn O. Pearce @ 2010-04-17  1:18 UTC (permalink / raw)
  To: Richard Hartmann
  Cc: Sverre Rabbelier, Git List, Avery Pennarun, Nicolas Pitre,
	Sam Vilain

Richard Hartmann <richih.mailinglist@gmail.com> wrote:
> On Sat, Apr 17, 2010 at 02:19, Sverre Rabbelier <srabbelier@gmail.com> wrote:
> > Speaking of which, it might make sense to separate the
> > worktree by prefix, so articles starting with "aa" go under the "aa"
> > directory, etc?
> 
> Very good idea. What command would I need to send to
> git-fast-import to do that?

When you send the 'M' command around line 479, just set the filename
to 'aa/aardvark' or whatever it is.  fast-import will automatically
create directories by splitting on forward slashes.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Git import of the recent full enwiki dump
  2010-04-17  1:10   ` Richard Hartmann
  2010-04-17  1:18     ` Shawn O. Pearce
@ 2010-04-17  1:25     ` Sebastian Bober
  1 sibling, 0 replies; 12+ messages in thread
From: Sebastian Bober @ 2010-04-17  1:25 UTC (permalink / raw)
  To: Richard Hartmann
  Cc: Sverre Rabbelier, Git List, Avery Pennarun, Nicolas Pitre,
	Shawn O. Pearce, Sam Vilain

On Sat, Apr 17, 2010 at 03:10:56AM +0200, Richard Hartmann wrote:
> On Sat, Apr 17, 2010 at 02:19, Sverre Rabbelier <srabbelier@gmail.com> wrote:
> 
> > Assuming you do the import incrementally
> > using something like git-fast-import (feeding it with a custom
> > exporter that uses the dump as it's input) you shouldn't even need an
> > extraordinary machine to do it (although you'd need a lot of storage).
> 
> I am using a Python script [1] to import the XML dump.

There is also a version available at (plug):

  git://github.com/sbober/levitation-perl.git

That is a bit faster and consumes less memory (and is written in Perl).
But that, too, will not be able to handle enwiki at the moment.

> 
> 
> > Speaking of which, it might make sense to separate the
> > worktree by prefix, so articles starting with "aa" go under the "aa"
> > directory, etc?
> 
> Very good idea. What command would I need to send to
> git-fast-import to do that?

levitation does that already. 

> 
> > Hope that helps, and if you do convert it (and it turns out to be
> > usable, and you decide to keep it up to date somehow), put it up
> > somewhere! :)
> 
> It did.
> I will make it available if it turns out to be useful. Keeping it up to
> date might be harder unless they keep on releasing new
> (incremental) snapshots.

If desired, I could produce input files for git-fast-import for a larger
wiki (like german or japanese wikipedia), so that other people might
have a look at the performance.


bye,
  Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-17  7:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-16 23:47 Git import of the recent full enwiki dump Richard Hartmann
2010-04-17  0:19 ` Sverre Rabbelier
2010-04-17  0:48   ` Sebastian Bober
2010-04-17  0:53     ` Shawn O. Pearce
2010-04-17  1:01       ` Sebastian Bober
2010-04-17  1:44         ` [spf:guess] " Sam Vilain
2010-04-17  1:58           ` Sebastian Bober
2010-04-17  3:34             ` [spf:guess] " Sam Vilain
2010-04-17  7:48               ` Sebastian Bober
2010-04-17  1:10   ` Richard Hartmann
2010-04-17  1:18     ` Shawn O. Pearce
2010-04-17  1:25     ` Sebastian Bober

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).