From: Matt Mackall <mpm@selenic.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Sean <seanlkml@sympatico.ca>,
linux-kernel <linux-kernel@vger.kernel.org>,
git@vger.kernel.org
Subject: Re: Mercurial 0.4b vs git patchbomb benchmark
Date: Fri, 29 Apr 2005 12:12:08 -0700 [thread overview]
Message-ID: <20050429191207.GX21897@waste.org> (raw)
In-Reply-To: <Pine.LNX.4.58.0504291006450.18901@ppc970.osdl.org>
On Fri, Apr 29, 2005 at 10:09:38AM -0700, Linus Torvalds wrote:
>
>
> On Fri, 29 Apr 2005, Matt Mackall wrote:
> >
> > That's because no one paid attention until I posted performance
> > numbers comparing it to git! Mercurial's goals are:
> >
> > - to scale to the kernel development process
> > - to do clone/pull style development
> > - to be efficient in CPU, memory, bandwidth, and disk space
> > for all the common SCM operations
> > - to have strong repo integrity
>
> Ok, sounds good. Have you looked at how it scales over time, ie what
> happens with files that have a lot of delta's?
I've done things like 10000 commits of a pair of revisions to printk.c
and it maintains consistently high speed and compression throughout that
range. I've also done things like commit all 500 revisions of
linux/Makefile from bkcvs. This took a couple seconds and resulted in
an 88k repo file (bkcvs takes 250k).
I haven't tried the whole kernel history corpus yet, but I've
committed all the 2.6 releases without any difficulties popping up and
I've had handling >1M total file revisions in my head since I sat down
to work on it. I'll maybe take a stab at a full history import next
week, if vacation doesn't interfere too much.
One downside Mercurial has is that long-lived repos can get fragmented on
disk. Things get defragmented to some extent as you go by doing COW on
files that are shared between local branches clones. Also a complete
defrag is a simple cp -a or equivalent, so I think this is not a big
deal.
Here's an excerpt from http://selenic.com/mercurial/notes.txt on how
the back-end works.
---
Revlogs:
The fundamental storage type in Mercurial is a "revlog". A revlog is
the set of all revisions to a file. Each revision is either stored
compressed in its entirety or as a compressed binary delta against the
previous version. The decision of when to store a full version is made
based on how much data would be needed to reconstruct the file. This
lets us ensure that we never need to read huge amounts of data to
reconstruct a file, regardless of how many revisions of it we store.
In fact, we should always be able to do it with a single read,
provided we know when and where to read. This is where the index comes
in. Each revlog has an index containing a special hash (nodeid) of the
text, hashes for its parents, and where and how much of the revlog
data we need to read to reconstruct it. Thus, with one read of the
index and one read of the data, we can reconstruct any version in time
proportional to the file size.
Similarly, revlogs and their indices are append-only. This means that
adding a new version is also O(1) seeks.
Generally revlogs are used to represent revisions of files, but they
also are used to represent manifests and changesets.
--
Mathematics is the supreme nostalgia of our time.
next prev parent reply other threads:[~2005-04-29 19:15 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-26 0:41 Mercurial 0.3 vs git benchmarks Matt Mackall
2005-04-26 1:49 ` Daniel Phillips
2005-04-26 2:08 ` Linus Torvalds
2005-04-26 2:30 ` Mike Taht
2005-04-26 3:04 ` Linus Torvalds
2005-04-26 4:00 ` Linus Torvalds
2005-04-26 11:13 ` Chris Mason
2005-04-26 15:09 ` Magnus Damm
2005-04-26 15:38 ` Chris Mason
2005-04-26 16:23 ` Magnus Damm
2005-04-26 18:18 ` Chris Mason
2005-04-26 20:56 ` Andrew Morton
2005-04-26 21:07 ` Linus Torvalds
2005-04-26 22:50 ` H. Peter Anvin
2005-04-26 22:56 ` Andrew Morton
2005-04-26 23:43 ` H. Peter Anvin
2005-04-27 15:01 ` Florian Weimer
2005-04-27 15:13 ` Thomas Glanzmann
2005-04-27 18:54 ` H. Peter Anvin
2005-04-27 19:01 ` Thomas Glanzmann
2005-04-27 19:57 ` Theodore Ts'o
2005-04-27 20:06 ` Thomas Glanzmann
2005-04-27 20:35 ` H. Peter Anvin
2005-04-27 20:39 ` Thomas Glanzmann
2005-04-27 20:47 ` Florian Weimer
2005-04-27 20:55 ` Florian Weimer
2005-04-27 21:04 ` H. Peter Anvin
2005-04-27 21:06 ` Florian Weimer
2005-04-27 21:32 ` Theodore Ts'o
2005-04-27 19:55 ` Theodore Ts'o
2005-04-27 6:34 ` Ingo Molnar
2005-04-27 21:10 ` Bill Davidsen
2005-04-27 21:39 ` Linus Torvalds
2005-04-26 16:42 ` Linus Torvalds
2005-04-26 17:39 ` Chris Mason
2005-04-26 19:52 ` Chris Mason
2005-04-26 18:15 ` H. Peter Anvin
2005-04-26 20:30 ` Bill Davidsen
2005-04-26 16:11 ` Bill Davidsen
2005-04-26 4:01 ` Matt Mackall
2005-04-26 4:20 ` Linus Torvalds
2005-04-26 4:09 ` Chris Wedgwood
2005-04-26 4:22 ` Andreas Gal
2005-04-26 4:22 ` Linus Torvalds
2005-04-29 6:01 ` Mercurial 0.4b vs git patchbomb benchmark Matt Mackall
2005-04-29 6:40 ` Sean
2005-04-29 7:40 ` Matt Mackall
2005-04-29 8:40 ` Sean
2005-04-29 14:34 ` Linus Torvalds
2005-04-29 15:18 ` Morten Welinder
2005-04-29 16:52 ` Matt Mackall
2005-05-02 16:10 ` Bill Davidsen
2005-05-02 19:02 ` Sean
2005-05-02 22:02 ` Linus Torvalds
2005-05-02 22:30 ` Matt Mackall
2005-05-02 22:49 ` Linus Torvalds
2005-05-03 0:00 ` Matt Mackall
2005-05-03 2:48 ` Linus Torvalds
2005-05-03 3:29 ` Matt Mackall
2005-05-03 4:18 ` Linus Torvalds
2005-05-03 4:24 ` Linus Torvalds
2005-05-03 4:27 ` Matt Mackall
2005-05-03 8:45 ` Chris Wedgwood
2005-04-29 15:44 ` Tom Lord
2005-04-29 15:58 ` Linus Torvalds
2005-04-29 17:34 ` Tom Lord
2005-04-29 17:56 ` Linus Torvalds
2005-04-29 18:08 ` Tom Lord
2005-04-29 18:33 ` Sean
2005-04-29 18:54 ` Tom Lord
2005-04-29 19:13 ` Sean
2005-04-29 19:22 ` Tom Lord
2005-04-29 19:28 ` Tom Lord
2005-04-29 19:47 ` Noel Maddy
2005-04-29 19:54 ` Tom Lord
2005-04-29 20:13 ` Andrew Timberlake-Newell
2005-04-29 20:26 ` Tom Lord
2005-04-29 20:57 ` Andrew Timberlake-Newell
2005-04-29 20:16 ` Morgan Schweers
2005-04-29 20:21 ` Noel Maddy
2005-04-29 20:42 ` git network protocol David Lang
2005-04-29 21:15 ` Daniel Barkalow
2005-04-29 20:44 ` Mercurial 0.4b vs git patchbomb benchmark Tom Lord
2005-04-29 21:57 ` Denys Duchier
2005-04-29 20:29 ` Signed commit vulnerabilities? (was: Mercurial 0.4b vs git patchbomb benchmark) Kevin Smith
2005-04-29 21:45 ` Mercurial 0.4b vs git patchbomb benchmark Horst von Brand
2005-05-02 21:06 ` Tom Lord
2005-05-03 0:24 ` Kevin Smith
2005-05-02 16:15 ` Bill Davidsen
2005-04-29 16:37 ` Matt Mackall
2005-04-29 17:09 ` Linus Torvalds
2005-04-29 19:12 ` Matt Mackall [this message]
2005-04-29 19:50 ` Linus Torvalds
2005-04-29 20:23 ` Matt Mackall
2005-04-29 20:49 ` Linus Torvalds
2005-04-29 21:20 ` Matt Mackall
2005-04-29 16:46 ` Bill Davidsen
2005-04-29 20:19 ` Andrea Arcangeli
2005-04-29 22:30 ` Olivier Galibert
2005-04-29 22:47 ` Andrea Arcangeli
2005-04-29 20:30 ` Andrea Arcangeli
2005-04-29 20:39 ` Matt Mackall
2005-04-30 2:52 ` Andrea Arcangeli
2005-04-30 15:20 ` Matt Mackall
2005-04-30 16:37 ` Andrea Arcangeli
2005-05-02 15:49 ` Bill Davidsen
2005-05-02 16:14 ` Valdis.Kletnieks
2005-05-03 17:40 ` Bill Davidsen
2005-05-04 2:10 ` Mercurial 0.4b vs git patchbomb benchmark (/usr/bin/env again) David A. Wheeler
2005-05-02 16:17 ` Mercurial 0.4b vs git patchbomb benchmark Andrea Arcangeli
2005-05-02 16:31 ` Linus Torvalds
2005-05-02 17:18 ` Daniel Jacobowitz
2005-05-02 17:32 ` Linus Torvalds
2005-05-02 18:17 ` Edgar Toernig
2005-05-02 20:54 ` Sam Ravnborg
2005-05-02 17:20 ` Ryan Anderson
2005-05-02 17:31 ` Linus Torvalds
2005-05-02 21:17 ` Kyle Moffett
2005-05-03 17:43 ` Bill Davidsen
-- strict thread matches above, loose matches on Subject: below --
2005-04-30 14:44 Adam J. Richter
2005-04-30 16:06 ` Matt Mackall
[not found] <3YQn9-8qX-5@gated-at.bofh.it>
[not found] ` <3ZLEF-56n-1@gated-at.bofh.it>
[not found] ` <3ZM7L-5ot-13@gated-at.bofh.it>
[not found] ` <3ZN3P-69A-9@gated-at.bofh.it>
[not found] ` <3ZNdz-6gK-9@gated-at.bofh.it>
2005-05-03 1:16 ` Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>
2005-05-03 1:29 ` Matt Mackall
2005-05-03 16:22 ` Bill Davidsen
2005-05-03 17:14 ` Rene Scharfe
2005-05-04 17:51 ` Bill Davidsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050429191207.GX21897@waste.org \
--to=mpm@selenic.com \
--cc=git@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=seanlkml@sympatico.ca \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).