git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	Jeff King <peff@peff.net>,
	git@vger.kernel.org
Subject: Re: [PATCH 00/17] Remove assumptions about refname lifetimes
Date: Mon, 20 May 2013 19:03:52 +0200	[thread overview]
Message-ID: <CALKQrgd3=5Rb5wbDGTV7PQfcingimYf08yEHcbfSCkjGDUU_Ng@mail.gmail.com> (raw)
In-Reply-To: <7vfvxhwqt1.fsf@alter.siamese.dyndns.org>

On Mon, May 20, 2013 at 6:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Johan Herland <johan@herland.net> writes:
>> For server-class installations we need ref storage that can be read
>> (and updated?) atomically, and the current system of loose + packed
>> files won't work since reading (and updating) more than a single file
>> is not an atomic operation. Trivially, one could resolve this by
>> dropping loose refs, and always using a single packed-refs file, but
>> that would make it prohibitively expensive to update refs (the entire
>> packed-refs file must be rewritten for every update).
>>
>> Now, observe that we don't have these race conditions in the object
>> database, because it is an add-only immutable data store.
>>
>> What if we stored the refs as a tree object in the object database,
>> referenced by a single (loose) ref?
>
> What is the cost of updating a single branch with that scheme?
>
> Doesn't it end up recording roughly the same amount of information
> as updating a single packed-refs file that is flat, but with the
> need to open a few tree objects (top-level, refs/, and refs/heads/),
> writing out a blob that stores the object name at the tip, computing
> the updated trees (refs/heads/, refs/ and top-level), and then
> finally doing the compare-and-swap of that single loose ref?

Yes, except that when you update packed-refs, you have to write out
the _entire_ file, whereas with this scheme you only have to write out
the part of the refs hierarchy you actually touched (e.g. rewriting
refs/heads/foo would not have to write out anything inside
refs/tags/*). If you have small number of branches, and a huge number
of tags, this scheme might end up being cheaper than writing the
entire packed-refs. But in general, it's probably much more expensive
to go via the odb.

> You may guarantee atomicity but it is the same granularity of
> atomicity as a single packed-refs file.

Yes, as I argued elsewhere in this thread: It seems that _any_
filesystem-based solution must resort to having all updates depend on
a single file in order to guarantee atomicity.

> When you are updating a
> branch while somebody else is updating a tag, of course you do not
> have to look at refs/tags/ in your operation and you can write out
> your final packed-refs equivalent tree to the object database
> without racing with the other process, but the top-level you come up
> with and the top-level the other process comes up with (which has
> an up-to-date refs/tags/ part, but has a stale refs/heads/ part from
> your point of view) have to race to update that single loose ref,
> and one of you have to back out.

True.

> That "backing out" can be made more intelligently than just dying
> with "compare and swap failed--please retry" message, e.g. you at
> that point notice what you started with, what the other party did
> while you were working on (i.e. updating refs/tags/), and three-way
> merge the refs tree, and in cases where "all refs recorded as loose
> refs" scheme wouldn't have resulted in problematic conflict, such a
> three-way merge would resolve trivially (you updated refs/heads/ and
> the update by the other process to refs/tags/ would not conflict
> with what you did).  But the same three-way merge scheme can be
> employed with the current flat single packed-refs scheme, can't it?

Yes. (albeit without reusing the machinery we already have for doing
three-way merges)

> Even worse, what is the cost of looking up the value of a single
> branch?  You would need to open a few tree objects and the leaf blob
> that records the object name the ref points at, wouldn't you?

Yes. Probably a showstopper, although single branch/ref lookup might
not be so common on server side, as it is on the user side.

> Right now, such a look-up is either opening a single small file and
> reading the first 41 bytes off of it, and falling back (when the ref
> in question is packed) to read a single packed-refs file and finding
> the ref you want from it.
>
> So...

Yep...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

  parent reply	other threads:[~2013-05-20 17:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-19 20:26 [PATCH 00/17] Remove assumptions about refname lifetimes Michael Haggerty
2013-05-19 20:26 ` [PATCH 01/17] describe: make own copy of refname Michael Haggerty
2013-05-19 20:26 ` [PATCH 02/17] fetch: make own copies of refnames Michael Haggerty
2013-05-19 20:26 ` [PATCH 03/17] add_rev_cmdline(): make a copy of the name argument Michael Haggerty
2013-05-19 20:26 ` [PATCH 04/17] builtin_diff_tree(): make it obvious that function wants two entries Michael Haggerty
2013-05-21 17:27   ` Junio C Hamano
2013-05-23  7:19     ` Michael Haggerty
2013-05-19 20:27 ` [PATCH 05/17] cmd_diff(): use an object_array for holding trees Michael Haggerty
2013-05-21 17:30   ` Junio C Hamano
2013-05-23  7:21     ` Michael Haggerty
2013-05-19 20:27 ` [PATCH 06/17] cmd_diff(): rename local variable "list" -> "entry" Michael Haggerty
2013-05-19 20:27 ` [PATCH 07/17] cmd_diff(): make it obvious which cases are exclusive of each other Michael Haggerty
2013-05-19 20:27 ` [PATCH 08/17] revision: split some overly-long lines Michael Haggerty
2013-05-21 17:34   ` Junio C Hamano
2013-05-23  6:27     ` Michael Haggerty
2013-05-23 17:08       ` Junio C Hamano
2013-05-19 20:27 ` [PATCH 09/17] gc_boundary(): move the check "alloc <= nr" to caller Michael Haggerty
2013-05-21 17:49   ` Junio C Hamano
2013-05-23  7:09     ` Michael Haggerty
2013-05-23 18:02       ` Junio C Hamano
2013-05-19 20:27 ` [PATCH 10/17] get_revision_internal(): make check less mysterious Michael Haggerty
2013-05-21 17:38   ` Junio C Hamano
2013-05-23  6:39     ` Michael Haggerty
2013-05-19 20:27 ` [PATCH 11/17] object_array: add function object_array_filter() Michael Haggerty
2013-05-19 20:27 ` [PATCH 12/17] object_array_remove_duplicates(): rewrite to reduce copying Michael Haggerty
2013-05-19 20:27 ` [PATCH 13/17] fsck: don't put a void*-shaped peg in a char*-shaped hole Michael Haggerty
2013-05-19 20:27 ` [PATCH 14/17] find_first_merges(): initialize merges variable using initializer Michael Haggerty
2013-05-19 20:27 ` [PATCH 15/17] find_first_merges(): remove unnecessary code Michael Haggerty
2013-05-19 20:27 ` [RFC 16/17] object_array_entry: copy name before storing in name field Michael Haggerty
2013-05-20 10:33   ` Johan Herland
2013-05-20 14:42     ` Michael Haggerty
2013-05-20 16:44       ` Jeff King
2013-05-20 21:34         ` Michael Haggerty
2013-05-19 20:27 ` [RFC 17/17] refs: document the lifetime of the refname passed to each_ref_fn Michael Haggerty
2013-05-20 10:28 ` [PATCH 00/17] Remove assumptions about refname lifetimes Johan Herland
2013-05-20 12:15   ` Michael Haggerty
2013-05-20 16:37   ` Junio C Hamano
2013-05-20 16:59     ` Jeff King
2013-05-20 17:08       ` Johan Herland
2013-05-20 18:03       ` Junio C Hamano
2013-05-20 17:03     ` Johan Herland [this message]
2013-05-21 18:39       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALKQrgd3=5Rb5wbDGTV7PQfcingimYf08yEHcbfSCkjGDUU_Ng@mail.gmail.com' \
    --to=johan@herland.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).