From: Jakub Narebski <jnareb@gmail.com>
To: Matt Mackall <mpm@selenic.com>
Cc: mercurial@selenic.com, git@vger.kernel.org,
Junio C Hamano <junkio@cox.net>
Subject: Re: newbie questions about git design and features (some wrt hg)
Date: Thu, 1 Feb 2007 00:58:42 +0100 [thread overview]
Message-ID: <200702010058.43431.jnareb@gmail.com> (raw)
In-Reply-To: <20070131222507.GO10108@waste.org>
Matt Mackall wrote:
> On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
>> Theodore Tso wrote:
>>
>>> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
>>>> I think hg modifies files as it goes, which could cause some issues
>>>> when a writer is aborted. I'm sure they have thought about the
>>>> problem and tried to make it safe, but there isn't anything safer
>>>> than just leaving the damn thing alone. :)
>>>
>>> To be fair hg modifies files using O_APPEND only. That isn't quite
>>> as safe as "only creating new files", but it is relatively safe.
>>
>>>From (libc.info):
>>
>> -- Macro: int O_APPEND
[...]
>> I don't quote understand how that would help hg (Mercurial) to have
>> operations like commit, pull/fetch or push atomic, i.e. all or
>> nothing.
>
> That's because it's unrelated.
[...]
> Mercurial has write-side locks so there can only ever be one writer at
> a time. There are no locks needed on the read side, so there can be
> any number of readers, even while commits are happening.
>
>> What happens if operation is interrupted (e.g. lost connection to
>> network during fetch)?
>
> We keep a simple transaction journal. As Mercurial revlogs are
> append-only, rolling back a transaction just means truncating all
> files in a transaction to their original length.
Thanks a lot for complete answer. So Mercurial uses write-side locks
for dealing with concurrent operations, and transaction journal for
dealing with interrupted operations. I guess that incomplete transactions
are rolled back on next hg command...
I guess (please correct me if I'm wrong) that git uses "put reference
after putting data" scheme, and write-side lock in few places when it
is needed.
>> In git both situations result in some prune-able and fsck-visible crud in
>> repository, but repository stays uncorrupted, and all operations are atomic
>> (all or nothing).
>
> If a Mercurial transaction is interrupted and not rolled back, the
> result is prune-able and fsck-visible crud. But this doesn't happen
> much in practice.
>
> The claim that's been made is that a) truncate is unsafe because Linux
> has historically had problems in this area and b) git is safer because
> it doesn't do this sort of thing.
>
> My response is a) those problems are overstated and Linux has never
> had difficulty with the sorts of straightforward single writer
> operations Mercurial uses and b) normal git usage involves regular
> rewrites of data with packing operations that makes its exposure to
> filesystem bugs equivalent or greater.
Rewrites in git perhaps are (or should be) regular, but need not be often.
And with new idea/feature of kept packs rewrite need not be of full data.
One command which _is_ (a bit) unsafe in git is git-prune. I'm not sure
if it could be made safe. But not doing prune affects only a bit
repository size (where git is best I think of all SCMs) and not performance.
On the other hand hg repository structure (namely log like append changelog
/ revlog to store commits) makes it I think hard to have multiple persistent
branches.
Sidenote 1: it looks like git is optimized for speed of merge and checkout
(branch switching, or going to given point in history for bisect), and
probably accidentally for multi-branch repos, while Mercurial is optimized
for speed of commit and patch.
Sidenote 2: Mercurial repository structure might make it use "file-ids"
(perhaps implicitely), with all the disadvantages (different renames
on different branches) of those.
> In either case, both provide strong integrity checks with recursive
> SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
> "back-up"!) so this is largely a non-issue relative to traditional
> systems.
Integrity checks can tell you that repository is corrupted, but it would
be better if it didn't get corrupted in first place.
Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
data, only delta chain store it?
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2007-01-31 23:57 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce
2007-01-31 1:55 ` Theodore Tso
2007-01-31 10:56 ` Jakub Narebski
2007-01-31 20:01 ` Junio C Hamano
2007-01-31 22:25 ` Matt Mackall
2007-01-31 23:58 ` Jakub Narebski [this message]
2007-02-01 0:34 ` Matt Mackall
2007-02-01 0:57 ` Jakub Narebski
2007-02-01 7:59 ` Simon 'corecode' Schubert
2007-02-01 10:09 ` Johannes Schindelin
2007-02-01 10:15 ` Simon 'corecode' Schubert
2007-02-01 10:49 ` Johannes Schindelin
2007-02-01 16:28 ` Linus Torvalds
2007-02-01 19:36 ` Eric Wong
2007-02-01 21:13 ` Linus Torvalds
2007-02-02 9:55 ` Jakub Narebski
2007-02-02 13:51 ` Simon 'corecode' Schubert
2007-02-02 14:23 ` Jakub Narebski
2007-02-02 15:02 ` Shawn O. Pearce
2007-02-02 15:38 ` Mark Wooding
2007-02-02 16:09 ` Jakub Narebski
2007-02-02 16:42 ` Linus Torvalds
2007-02-02 16:59 ` Jakub Narebski
2007-02-02 17:11 ` Linus Torvalds
2007-02-02 17:59 ` Brendan Cully
2007-02-02 18:19 ` Jakub Narebski
2007-02-02 19:28 ` Brendan Cully
2007-02-02 18:27 ` Giorgos Keramidas
2007-02-02 19:01 ` Linus Torvalds
2007-02-03 21:20 ` Giorgos Keramidas
2007-02-03 21:37 ` Matthias Kestenholz
2007-02-03 21:41 ` Linus Torvalds
2007-02-03 21:45 ` Jakub Narebski
2007-02-02 18:32 ` Linus Torvalds
2007-02-02 19:26 ` Brendan Cully
2007-02-02 19:42 ` Linus Torvalds
2007-02-02 19:55 ` Brendan Cully
2007-02-02 20:15 ` Jakub Narebski
2007-02-02 20:21 ` Linus Torvalds
2007-02-02 16:03 ` Matt Mackall
2007-02-02 17:18 ` Jakub Narebski
2007-02-02 17:37 ` Matt Mackall
2007-02-02 18:44 ` Jakub Narebski
2007-02-02 19:56 ` Jakub Narebski
2007-02-03 20:06 ` Brendan Cully
2007-02-03 20:55 ` Jakub Narebski
2007-02-03 21:00 ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37 ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31 3:38 ` Mike Coleman
2007-01-31 4:35 ` Linus Torvalds
2007-01-31 4:57 ` Junio C Hamano
2007-01-31 16:22 ` Linus Torvalds
2007-01-31 16:41 ` Johannes Schindelin
2007-01-31 7:11 ` Mike Coleman
2007-01-31 15:03 ` Nicolas Pitre
2007-01-31 16:58 ` Mike Coleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200702010058.43431.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=mercurial@selenic.com \
--cc=mpm@selenic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.