All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: David Turner <dturner@twopensource.com>, git@vger.kernel.org
Subject: Re: [PATCH v4 13/21] refs: resolve symbolic refs first
Date: Thu, 18 Feb 2016 12:59:58 +0100	[thread overview]
Message-ID: <56C5B23E.6090905@alum.mit.edu> (raw)
In-Reply-To: <1455755367.7528.57.camel@twopensource.com>

On 02/18/2016 01:29 AM, David Turner wrote:
> On Fri, 201-02-12 at 15:09 +0100, Michael Haggerty wrote:]
>> On 02/05/2016 08:44 PM, David Turner wrote:
>>> Before committing ref updates, split symbolic ref updates into two
>>> parts: an update to the underlying ref, and a log-only update to
>>> the
>>> symbolic ref.  This ensures that both references are locked
>>> correctly
>>> while their reflogs are updated.
>>>
>>> It is still possible to confuse git by concurrent updates, since
>>> the
>>> splitting of symbolic refs does not happen under lock. So a
>>> symbolic ref
>>> could be replaced by a plain ref in the middle of this operation,
>>> which
>>> would lead to reflog discontinuities and missed old-ref checks.
>>
>> This patch is doing too much at once for my little brain to follow.
>>
>> My first hangup is the change to setting RESOLVE_REF_NO_RECURSE
>> unconditionally in lock_ref_sha1_basic(). I count five callers of
>> that
>> function and see no justification for why the change is OK in the
>> context of each caller. Here are some thoughts:
>>
>> * The call from files_create_symref() sets REF_NODEREF, so it is
>> unaffected by this change.
> 
> Yes.
> 
>> * The call from files_transaction_commit() is preceded by a call to
>> dereference_symrefs(), which I assume effectively replaces the need
>> for
>> RESOLVE_REF_NO_RECURSE.
> 
> Yes.
> 
>> * There are two calls from files_rename_ref(). Why is it OK to do
>> without RESOLVE_REF_NO_RECURSE there?
>>
>>   * For the oldrefname call, I suppose the justification is the
>> "(flag &
>> REF_ISSYMREF)" check earlier in the function. (But does this
>> introduce a
>> significant TOCTOU race?)
> 
> The refs code as a whole seems likely to have TOCTOU issues. In
> general, anywhere we check/set flag & REF_ISSYMREF without holding a
> lock, we have a potential problem.  I haven't generally tried to handle
> these cases, since they're not presently handled.  

I agree that we don't do so well here, though I think that most races
would result in reading/writing a ref that was pointed to by the symref
a moment ago, which is usually indistinguishable to the user from their
update having gone through the moment before the symref was updated. So
I don't think your change makes this bit of code significantly worse.

> The central problem with this area of the code is that commit interacts
> so intimately with the locking machinery.  I understand some of why
> it's done that way.  In particular, your change to ref locking to not
> hold lots of open files was a big win for us at Twitter.  But this
> means that it's hard to deal with cross-backend ref updates: you want
> to hold multiple locks, and backends don't have the machinery for it.
> 
> We could add backend hooks to specifically lock and unlock refs. Then
> the backend commit code would just be handled a bundle of locked refs
> and would commit them.  This might be hairy, but it could fix the
> TOCTOU problems.  So, first lock the outer refs, then split out updates
> for any which are symbolic refs, and lock those. Finally, commit all
> updates (split by backend).

As chance would have it, for an internal GitHub project I've implemented
hooks that can be called *during* a ref transaction. The hooks can, for
example, take arbitrary actions between the time that the reflocks are
all acquired and the time that the updates start to be committed. I
didn't submit this code upstream because I didn't think that it would
benefit other users, but many it would be useful for implementing
split-backend reference transaction commits. E.g., the primary reference
transaction could run the secondary backend's commit while holding the
locks for the primary backend references.

Let me think about it.

I don't think this is urgent though. The current code is not
significantly racy in mainstream usage scenarios, right?

> One downside of this is that right now, the backend API is relatively
> close to the front-end, and this would leak what should be an
> implementation detail.  But maybe this is necessary to knit multiple
> backends together.  
> 
> But I'm not sure that this is necessary right now, because I'm not sure
> that I'm actually making TOCTOU issues much worse. 

Agreed.

> [...]
> That's a legit complaint.  The problem, as you note, is that doing some
> of these steps completely independently doesn't work.  But I'll try
> splitting out what I can.

Thanks!

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  reply	other threads:[~2016-02-18 12:07 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-05 19:44 [PATCH v4 00/20] refs backend David Turner
2016-02-05 19:44 ` [PATCH v4 01/21] refs: add a backend method structure with transaction functions David Turner
2016-02-05 19:44 ` [PATCH v4 02/21] refs: add methods for misc ref operations David Turner
2016-02-11  7:45   ` Michael Haggerty
2016-02-12  1:09     ` David Turner
2016-02-05 19:44 ` [PATCH v4 03/21] refs: add methods for the ref iterators David Turner
2016-02-11  8:42   ` Michael Haggerty
2016-02-12  1:08     ` David Turner
2016-02-05 19:44 ` [PATCH v4 04/21] refs: add do_for_each_per_worktree_ref David Turner
2016-02-05 19:44 ` [PATCH v4 05/21] refs: add methods for reflog David Turner
2016-02-05 19:44 ` [PATCH v4 06/21] refs: add method for initial ref transaction commit David Turner
2016-02-05 19:44 ` [PATCH v4 07/21] refs: add method for delete_refs David Turner
2016-02-05 19:44 ` [PATCH v4 08/21] refs: add methods to init refs db David Turner
2016-02-11  8:54   ` Michael Haggerty
2016-02-11 21:15     ` David Turner
2016-02-05 19:44 ` [PATCH v4 09/21] refs: add method to rename refs David Turner
2016-02-11  9:00   ` Michael Haggerty
2016-02-11 21:12     ` David Turner
2016-02-05 19:44 ` [PATCH v4 10/21] refs: make lock generic David Turner
2016-02-05 19:44 ` [PATCH v4 11/21] refs: move duplicate check to common code David Turner
2016-02-05 19:44 ` [PATCH v4 12/21] refs: allow log-only updates David Turner
2016-02-11 10:03   ` Michael Haggerty
2016-02-11 21:23     ` David Turner
2016-02-05 19:44 ` [PATCH v4 13/21] refs: resolve symbolic refs first David Turner
2016-02-12 14:09   ` Michael Haggerty
2016-02-18  0:29     ` David Turner
2016-02-18 11:59       ` Michael Haggerty [this message]
2016-02-05 19:44 ` [PATCH v4 14/21] refs: always handle non-normal refs in files backend David Turner
2016-02-12 15:07   ` Michael Haggerty
2016-02-18  2:44     ` David Turner
2016-02-18 12:07       ` Michael Haggerty
2016-02-18 18:32         ` David Turner
2016-02-05 19:44 ` [PATCH v4 15/21] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-12 15:26   ` Michael Haggerty
2016-02-17 20:47     ` David Turner
2016-02-18 14:12       ` Michael Haggerty
2016-02-05 19:44 ` [PATCH v4 16/21] refs: check submodules ref storage config David Turner
2016-02-05 19:44 ` [PATCH v4 17/21] clone: allow ref storage backend to be set for clone David Turner
2016-02-05 19:44 ` [PATCH v4 18/21] svn: learn ref-storage argument David Turner
2016-02-05 19:44 ` [PATCH v4 19/21] refs: add register_ref_storage_backends() David Turner
2016-02-12 15:42   ` Michael Haggerty
2016-02-17 20:32     ` David Turner
2016-02-05 19:44 ` [PATCH v4 20/21] refs: add LMDB refs storage backend David Turner
2016-02-11  8:48   ` Michael Haggerty
2016-02-11 21:21     ` David Turner
2016-02-12 17:01   ` Michael Haggerty
2016-02-13  1:23     ` David Turner
2016-02-14 12:04   ` Duy Nguyen
2016-02-15  9:57     ` Duy Nguyen
2016-02-16 22:01       ` David Turner
2016-02-17 20:32     ` David Turner
2016-02-05 19:44 ` [PATCH v4 21/21] refs: tests for lmdb backend David Turner
2016-02-08 23:37 ` [PATCH v4 00/20] refs backend Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C5B23E.6090905@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.