Re: newbie questions about git design and features (some wrt hg)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matt Mackall <mpm@selenic.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: mercurial@selenic.com, git@vger.kernel.org,
	Junio C Hamano <junkio@cox.net>
Subject: Re: newbie questions about git design and features (some wrt hg)
Date: Wed, 31 Jan 2007 18:34:29 -0600	[thread overview]
Message-ID: <20070201003429.GQ10108@waste.org> (raw)
In-Reply-To: <200702010058.43431.jnareb@gmail.com>

On Thu, Feb 01, 2007 at 12:58:42AM +0100, Jakub Narebski wrote:
> Matt Mackall wrote:
> > On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
> >> Theodore Tso wrote:
> >> 
> >>> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
> >>>> I think hg modifies files as it goes, which could cause some issues
> >>>> when a writer is aborted.  I'm sure they have thought about the
> >>>> problem and tried to make it safe, but there isn't anything safer
> >>>> than just leaving the damn thing alone.  :)
> >>> 
> >>> To be fair hg modifies files using O_APPEND only.  That isn't quite
> >>> as safe as "only creating new files", but it is relatively safe.
> >> 
> >>>From (libc.info):
> >> 
> >>  -- Macro: int O_APPEND
> [...] 
> >> I don't quote understand how that would help hg (Mercurial) to have
> >> operations like commit, pull/fetch or push atomic, i.e. all or
> >> nothing. 
> > 
> > That's because it's unrelated.
> [...]
> > Mercurial has write-side locks so there can only ever be one writer at
> > a time. There are no locks needed on the read side, so there can be
> > any number of readers, even while commits are happening.
> > 
> >> What happens if operation is interrupted (e.g. lost connection to
> >> network during fetch)?
> > 
> > We keep a simple transaction journal. As Mercurial revlogs are
> > append-only, rolling back a transaction just means truncating all
> > files in a transaction to their original length.
> 
> Thanks a lot for complete answer. So Mercurial uses write-side locks
> for dealing with concurrent operations, and transaction journal for
> dealing with interrupted operations. I guess that incomplete transactions
> are rolled back on next hg command...

They are either automatically rolled back on abort or if that fails
for some reason like power failure the user is prompted to run "hg
recover" to complete the rollback. We also save the last transaction
journal which allows one level of undo for pulls/commits.

> I guess (please correct me if I'm wrong) that git uses "put reference
> after putting data" scheme, and write-side lock in few places when it
> is needed.

Mercurial also uses a "put reference after putting data" which is what
allows us to have no read vs write locking.
  
> >> In git both situations result in some prune-able and fsck-visible crud in
> >> repository, but repository stays uncorrupted, and all operations are atomic
> >> (all or nothing).
> > 
> > If a Mercurial transaction is interrupted and not rolled back, the
> > result is prune-able and fsck-visible crud. But this doesn't happen
> > much in practice.
> > 
> > The claim that's been made is that a) truncate is unsafe because Linux
> > has historically had problems in this area and b) git is safer because
> > it doesn't do this sort of thing. 
> > 
> > My response is a) those problems are overstated and Linux has never
> > had difficulty with the sorts of straightforward single writer
> > operations Mercurial uses and b) normal git usage involves regular
> > rewrites of data with packing operations that makes its exposure to
> > filesystem bugs equivalent or greater.
> 
> Rewrites in git perhaps are (or should be) regular, but need not be often.
> And with new idea/feature of kept packs rewrite need not be of full data.

If the set of files in a given commit (say tip) gets spread out across
an arbitrary number of packs ordered by last modification time,
performance degrades to O(n) lookups and random seeking.

> One command which _is_ (a bit) unsafe in git is git-prune. I'm not sure
> if it could be made safe. But not doing prune affects only a bit
> repository size (where git is best I think of all SCMs) and not performance.
> 
> On the other hand hg repository structure (namely log like append changelog
> / revlog to store commits) makes it I think hard to have multiple persistent
> branches.

Not sure why you think that. There are some difficulties here, but
they're mostly owing to the fact that we've always emphasized the one
branch per repo approach as being the most user-friendly.

> Sidenote 1: it looks like git is optimized for speed of merge and checkout
> (branch switching, or going to given point in history for bisect), and
> probably accidentally for multi-branch repos, while Mercurial is optimized
> for speed of commit and patch.

I think all of these things are comparable.

> Sidenote 2: Mercurial repository structure might make it use "file-ids"
> (perhaps implicitely), with all the disadvantages (different renames
> on different branches) of those.

Nope.

> > In either case, both provide strong integrity checks with recursive
> > SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
> > "back-up"!) so this is largely a non-issue relative to traditional
> > systems.
> 
> Integrity checks can tell you that repository is corrupted, but it would
> be better if it didn't get corrupted in first place.

Obviously. Hence our append-only design. Data that's written to a repo
is never rewritten, which minimizes exposure to software bugs and I/O
errors.
 
> Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
> data, only delta chain store it?

We use zlib compression of deltas and have since April 6, 2005.

-- 
Mathematics is the supreme nostalgia of our time.

next prev parent reply	other threads:[~2007-02-01  0:47 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce
2007-01-31  1:55   ` Theodore Tso
2007-01-31 10:56     ` Jakub Narebski
2007-01-31 20:01       ` Junio C Hamano
2007-01-31 22:25       ` Matt Mackall
2007-01-31 23:58         ` Jakub Narebski
2007-02-01  0:34           ` Matt Mackall [this message]
2007-02-01  0:57             ` Jakub Narebski
2007-02-01  7:59               ` Simon 'corecode' Schubert
2007-02-01 10:09                 ` Johannes Schindelin
2007-02-01 10:15                   ` Simon 'corecode' Schubert
2007-02-01 10:49                     ` Johannes Schindelin
2007-02-01 16:28                     ` Linus Torvalds
2007-02-01 19:36                       ` Eric Wong
2007-02-01 21:13                         ` Linus Torvalds
2007-02-02  9:55             ` Jakub Narebski
2007-02-02 13:51               ` Simon 'corecode' Schubert
2007-02-02 14:23                 ` Jakub Narebski
2007-02-02 15:02                   ` Shawn O. Pearce
2007-02-02 15:38               ` Mark Wooding
2007-02-02 16:09                 ` Jakub Narebski
2007-02-02 16:42                   ` Linus Torvalds
2007-02-02 16:59                     ` Jakub Narebski
2007-02-02 17:11                       ` Linus Torvalds
2007-02-02 17:59                     ` Brendan Cully
2007-02-02 18:19                       ` Jakub Narebski
2007-02-02 19:28                         ` Brendan Cully
2007-02-02 18:27                       ` Giorgos Keramidas
2007-02-02 19:01                         ` Linus Torvalds
2007-02-03 21:20                           ` Giorgos Keramidas
2007-02-03 21:37                             ` Matthias Kestenholz
2007-02-03 21:41                             ` Linus Torvalds
2007-02-03 21:45                             ` Jakub Narebski
2007-02-02 18:32                       ` Linus Torvalds
2007-02-02 19:26                         ` Brendan Cully
2007-02-02 19:42                           ` Linus Torvalds
2007-02-02 19:55                             ` Brendan Cully
2007-02-02 20:15                               ` Jakub Narebski
2007-02-02 20:21                               ` Linus Torvalds
2007-02-02 16:03               ` Matt Mackall
2007-02-02 17:18                 ` Jakub Narebski
2007-02-02 17:37                   ` Matt Mackall
2007-02-02 18:44                     ` Jakub Narebski
2007-02-02 19:56                       ` Jakub Narebski
2007-02-03 20:06                         ` Brendan Cully
2007-02-03 20:55                           ` Jakub Narebski
2007-02-03 21:00                             ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37   ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31  3:38   ` Mike Coleman
2007-01-31  4:35     ` Linus Torvalds
2007-01-31  4:57       ` Junio C Hamano
2007-01-31 16:22         ` Linus Torvalds
2007-01-31 16:41           ` Johannes Schindelin
2007-01-31  7:11       ` Mike Coleman
2007-01-31 15:03     ` Nicolas Pitre
2007-01-31 16:58       ` Mike Coleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070201003429.GQ10108@waste.org \
    --to=mpm@selenic.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=junkio@cox.net \
    --cc=mercurial@selenic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.