git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>, Stefan Beller <sbeller@google.com>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH 5/5] ref_transaction_commit(): only keep one lockfile open at a time
Date: Sat, 25 Apr 2015 08:08:28 +0200	[thread overview]
Message-ID: <553B2F5C.3010007@alum.mit.edu> (raw)
In-Reply-To: <1429875349-29736-6-git-send-email-mhagger@alum.mit.edu>

On 04/24/2015 01:35 PM, Michael Haggerty wrote:
> The old code locked all of the references in the first loop, leaving
> all of the lockfiles open until later loops. Since we might be
> updating a lot of references, this could cause file descriptor
> exhaustion.
> 
> But there is no reason to keep the lockfiles open after the first
> loop:
> 
> * For references being deleted or verified, we don't have to write
>   anything into the lockfile, so we can close it immediately.
> 
> * For references being updated, we can write the new SHA-1 into the
>   lockfile immediately and close the lockfile without renaming it into
>   place. In the second loop, the already-written contents can be
>   renamed into place using commit_ref_update().
> 
> To simplify the bookkeeping across loops, add a new REF_NEEDS_COMMIT
> bit to update->flags, which keeps track of whether the corresponding
> lockfile needs to be committed, as opposed to just unlocked.
> 
> This change fixes two tests in t1400.

I realize that I should have highlighted another, perhaps more serious
bug that is fixed by this change. The old code was roughly

    for update in updates:
        acquire locks and check old_sha
    for update in updates:
        if changing value:
            write_ref_to_lockfile()
            commit_ref_update()
    for update in updates:
        if deleting value:
            unlink()
    for update in updates:
        if reference still locked:
            unlock_ref()

The atomicity of the update depends on all preconditions being checked
in the first loop, before any changes have start being committed. The
problem is that write_ref_to_lockfile() (which used to be part of
write_ref_sha1()), which happens in the second loop, checks some more
preconditions:

* It verifies that new_sha1 is a valid object
* If the reference being updated is a branch, it verifies that new_sha1
  points at a commit object (as opposed to a tag, tree, or blob).

If either of these preconditions fails, the "transaction" might be
aborted after some reference updates have already been permanently
committed. In other words, the all-or-nothing promise of "git update-ref
--stdin" and "git push --atomic" would be violated.

I'm sorry that I didn't fully realize this problem earlier.

After this patch, the code looks like

    for update in updates:
        acquire locks and check old_sha
        if changing value:
            write_ref_to_lockfile()
    for update in updates:
        if changing value:
            commit_ref_update()
    for update in updates:
        if deleting value:
            unlink()
    for update in updates:
        if reference still locked:
            unlock_ref()

This has the important effect that the pre-checks in
write_ref_to_lockfile() are carried out before any changes have been
committed, so if any of those checks fail, the transaction can be rolled
back correctly.

Given that "git push --atomic" is one of the main new features of 2.4.0,
it would be unfortunate for the release to contain this bug, plus the
bug that atomic pushes can fail due to file descriptor exhaustion.

Is this problem important enough to justify delaying the release to get
the fix in?

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  reply	other threads:[~2015-04-25  6:08 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-24 11:35 [PATCH 0/5] Avoid file descriptor exhaustion in ref_transaction_commit() Michael Haggerty
2015-04-24 11:35 ` [PATCH 1/5] write_ref_to_lockfile(): new function, extracted from write_ref_sha1() Michael Haggerty
2015-04-24 11:35 ` [PATCH 2/5] commit_ref_update(): " Michael Haggerty
2015-04-24 11:35 ` [PATCH 3/5] write_ref_sha1(): inline function at callers Michael Haggerty
2015-04-24 11:35 ` [PATCH 4/5] ref_transaction_commit(): remove the local flags variables Michael Haggerty
2015-04-24 17:30   ` Jeff King
2015-04-24 21:15     ` Michael Haggerty
2015-04-24 21:19       ` Jeff King
2015-04-24 21:51   ` Eric Sunshine
2015-04-24 22:22     ` Michael Haggerty
2015-04-24 11:35 ` [PATCH 5/5] ref_transaction_commit(): only keep one lockfile open at a time Michael Haggerty
2015-04-25  6:08   ` Michael Haggerty [this message]
2015-04-25  6:57     ` Junio C Hamano
2015-04-25 19:21       ` Junio C Hamano
2015-04-28  4:36         ` Jeff King
2015-04-24 17:26 ` [PATCH 0/5] Avoid file descriptor exhaustion in ref_transaction_commit() Jeff King
2015-04-24 19:13   ` Stefan Beller
2015-04-25 19:27   ` Junio C Hamano
2015-04-25  4:28 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=553B2F5C.3010007@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).