git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Mackall <mpm@selenic.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: mercurial@selenic.com, git@vger.kernel.org
Subject: Re: newbie questions about git design and features (some wrt hg)
Date: Wed, 31 Jan 2007 16:25:07 -0600	[thread overview]
Message-ID: <20070131222507.GO10108@waste.org> (raw)
In-Reply-To: <eppshi$1l4$1@sea.gmane.org>

On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
> Theodore Tso wrote:
> 
> > On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
> >> I think hg modifies files as it goes, which could cause some issues
> >> when a writer is aborted.  I'm sure they have thought about the
> >> problem and tried to make it safe, but there isn't anything safer
> >> than just leaving the damn thing alone.  :)
> > 
> > To be fair hg modifies files using O_APPEND only.  That isn't quite as
> > safe as "only creating new files", but it is relatively safe.
> 
> >From (libc.info):
> 
>  -- Macro: int O_APPEND
>      The bit that enables append mode for the file.  If set, then all
>      `write' operations write the data at the end of the file, extending
>      it, regardless of the current file position.  This is the only
>      reliable way to append to a file.  In append mode, you are
>      guaranteed that the data you write will always go to the current
>      end of the file, regardless of other processes writing to the
>      file.  Conversely, if you simply set the file position to the end
>      of file and write, then another process can extend the file after
>      you set the file position but before you write, resulting in your
>      data appearing someplace before the real end of file.
> 
> I don't quote understand how that would help hg (Mercurial) to have
> operations like commit, pull/fetch or push atomic, i.e. all or nothing.

That's because it's unrelated.

> In hg you have to update individual files (blobs buckets) storing delta
> and perhaps full version, update manifest file (flat tree) and update
> changelog (commit): what happens if for example there are two concurrent
> operations trying to update repository, e.g. two push operations in parallel
> (from two different developers), or fetch from cron and commit?

Mercurial has write-side locks so there can only ever be one writer at
a time. There are no locks needed on the read side, so there can be
any number of readers, even while commits are happening.

> What happens if operation is interrupted (e.g. lost connection to
> network during fetch)?

We keep a simple transaction journal. As Mercurial revlogs are
append-only, rolling back a transaction just means truncating all
files in a transaction to their original length.

> In git both situations result in some prune-able and fsck-visible crud in
> repository, but repository stays uncorrupted, and all operations are atomic
> (all or nothing).

If a Mercurial transaction is interrupted and not rolled back, the
result is prune-able and fsck-visible crud. But this doesn't happen
much in practice.

The claim that's been made is that a) truncate is unsafe because Linux
has historically had problems in this area and b) git is safer because
it doesn't do this sort of thing. 

My response is a) those problems are overstated and Linux has never
had difficulty with the sorts of straightforward single writer
operations Mercurial uses and b) normal git usage involves regular
rewrites of data with packing operations that makes its exposure to
filesystem bugs equivalent or greater.

In either case, both provide strong integrity checks with recursive
SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
"back-up"!) so this is largely a non-issue relative to traditional
systems.

-- 
Mathematics is the supreme nostalgia of our time.

  parent reply	other threads:[~2007-01-31 22:42 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce
2007-01-31  1:55   ` Theodore Tso
2007-01-31 10:56     ` Jakub Narebski
2007-01-31 20:01       ` Junio C Hamano
2007-01-31 22:25       ` Matt Mackall [this message]
2007-01-31 23:58         ` Jakub Narebski
2007-02-01  0:34           ` Matt Mackall
2007-02-01  0:57             ` Jakub Narebski
2007-02-01  7:59               ` Simon 'corecode' Schubert
2007-02-01 10:09                 ` Johannes Schindelin
2007-02-01 10:15                   ` Simon 'corecode' Schubert
2007-02-01 10:49                     ` Johannes Schindelin
2007-02-01 16:28                     ` Linus Torvalds
2007-02-01 19:36                       ` Eric Wong
2007-02-01 21:13                         ` Linus Torvalds
2007-02-02  9:55             ` Jakub Narebski
2007-02-02 13:51               ` Simon 'corecode' Schubert
2007-02-02 14:23                 ` Jakub Narebski
2007-02-02 15:02                   ` Shawn O. Pearce
2007-02-02 15:38               ` Mark Wooding
2007-02-02 16:09                 ` Jakub Narebski
2007-02-02 16:42                   ` Linus Torvalds
2007-02-02 16:59                     ` Jakub Narebski
2007-02-02 17:11                       ` Linus Torvalds
2007-02-02 17:59                     ` Brendan Cully
2007-02-02 18:19                       ` Jakub Narebski
2007-02-02 19:28                         ` Brendan Cully
2007-02-02 18:27                       ` Giorgos Keramidas
2007-02-02 19:01                         ` Linus Torvalds
2007-02-03 21:20                           ` Giorgos Keramidas
2007-02-03 21:37                             ` Matthias Kestenholz
2007-02-03 21:41                             ` Linus Torvalds
2007-02-03 21:45                             ` Jakub Narebski
2007-02-02 18:32                       ` Linus Torvalds
2007-02-02 19:26                         ` Brendan Cully
2007-02-02 19:42                           ` Linus Torvalds
2007-02-02 19:55                             ` Brendan Cully
2007-02-02 20:15                               ` Jakub Narebski
2007-02-02 20:21                               ` Linus Torvalds
2007-02-02 16:03               ` Matt Mackall
2007-02-02 17:18                 ` Jakub Narebski
2007-02-02 17:37                   ` Matt Mackall
2007-02-02 18:44                     ` Jakub Narebski
2007-02-02 19:56                       ` Jakub Narebski
2007-02-03 20:06                         ` Brendan Cully
2007-02-03 20:55                           ` Jakub Narebski
2007-02-03 21:00                             ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37   ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31  3:38   ` Mike Coleman
2007-01-31  4:35     ` Linus Torvalds
2007-01-31  4:57       ` Junio C Hamano
2007-01-31 16:22         ` Linus Torvalds
2007-01-31 16:41           ` Johannes Schindelin
2007-01-31  7:11       ` Mike Coleman
2007-01-31 15:03     ` Nicolas Pitre
2007-01-31 16:58       ` Mike Coleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070131222507.GO10108@waste.org \
    --to=mpm@selenic.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=mercurial@selenic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).