All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "René Scharfe" <l.s.r@web.de>,
	git@vger.kernel.org, "Keith McGuigan" <kmcguigan@twitter.com>
Subject: Re: [PATCH v3] merge: fix cache_entry use-after-free
Date: Mon, 19 Oct 2015 18:27:29 -0400	[thread overview]
Message-ID: <1445293649.3418.16.camel@twopensource.com> (raw)
In-Reply-To: <xmqq1tcuhkb2.fsf@gitster.mtv.corp.google.com>

On Fri, 2015-10-16 at 09:04 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > We also do dozens or hundreds of merges per day and only saw this quite
> > rarely.  Interestingly, we could only trigger it with an alternate
> > hashing function for git's hashmap implementation, and only once a
> > certain bug in that implementation was *fixed*.  But that was just a
> > trigger; it was certainly not the cause.  The bug would, in general,
> > have caused more hash collisions due to worse mixing, but I believe it
> > is possible that some particular collision would have been present in
> > the non-bugged version of the code but not in the bugged version.
> >
> > It may have also had something to do with a case-insensitive filesystem;
> > we never saw anyone reproduce it on anything but OS X, and even there it
> > was quite difficult to reproduce.
> >
> > In short, I don't think we know why the bug was not seen widely.
> 
> It has been a long time since I looked at name-hash.c and I am fuzzy
> on what the four functions (cache|index)_(file|dir)_exists are meant
> for, but I have this feeling that the original premise of the patch,
> "we may free a ce because we no longer use it in the index, but it
> may still need to keep a reference to it in name-hash" may be wrong
> in the first place.  The proposed "fix" conceptually feels wrong.
>
> The whole point of the name-hash is so that we can detect collisions
> in two names, one of which wants to have a file in one place while
> the other wants to have a directory, at the same path in the index.
> The pathnames and cache entries registered in the name-hash have to
> be the ones that are relevant to the index in quesition.  If a new
> ce will be added to the index, the name-hash will have to know about
> its path (and that is what CE_HASHED bit is about).  On the other
> hand, if you are going to remove an existing ce from the index, its
> sub-paths should no longer prevent other cache entries to be there.
> 
> E.g. if you have "a/b", it must prevent "A" from entering the index
> and the name-hash helps you to do so; when you remove "a/b", then
> name-hash must now allow "A" to enter the index.  So "a/b" must be
> removed from the name-hash by calling remove_name_hash() and normal
> codepaths indeed do so.
>
> I do not doubt the existence of "use-after-free bug" you observed,
> but I am not convinced that refcounting is "fixing" the problem; it
> smells like papering over a different bug that is the root cause of
> the use-after-free.
> 
> For example, if we forget to "unhash" a ce from name-hash when we
> remove a ce from the index (or after we "hashed" it, expecting to
> add it to the index, but in the end decided not to add to the index,
> perhaps), we would see a now-freed ce still in the name-hash.
> Checking a path against the name-hash in such a state would have to
> use the ce->name from that stale ce, which is a use-after-free bug.
> 
> In such a situation, isn't the real cause of the bug the fact that
> the stale ce that is no longer relevant to the true index contents
> still in name-hash?  The refcounting does not fix the fact that a
> ce->name of a stale ce that is no longer relevant being used for D/F
> collision checking.
> 
> I am not saying that I found such a codepath that forgets to unhash,
> but from the overall design and purpose of the name-hash subsystem,
> anything that deliberately _allows_ a stale ce that does not exist
> in the index in there smells like a workaround going in a wrong
> direction.

I've spent some time looking into this, but I don't quite have a repro.
I do have some comments which might be interesting.

The problem is not the name_hash, but with the dir_hash.  The dir_hash
is only even populated on case-insensitive filesystems (which is why you
and Linus don't see this bug).  The dir_hash is not designed to catch
d/f conflicts, but rather case conflicts (one side of a merge has
foo/bar; the other has FOO/bar).  For each directory, it maintains a
pointer to a cache_entry.  Then add_to_index uses that cache_entry to 
rewrite incoming entries' parent directories to the expected case.
Weirdly, that part of add_to_index is, so far as I can tell, never
called during a merge.  So where's are we using the freed value?
Probably in dir_entry_cmp, while adding another entry to the hashmap;
that's where our crash seems to have happened.  And our failure depended
on the details of the hash function as well; if a certain collision did
not happen, we would not see it.

There is, I think, another way to handle this: we could *copy* the path
into the struct dir_entry instead of pointing to a cache_entry.  But
this seems like a bunch of extra work; the refcounting is simpler.  

  reply	other threads:[~2015-10-19 22:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-14 22:07 [PATCH v3] merge: fix cache_entry use-after-free David Turner
2015-10-15  3:35 ` René Scharfe
2015-10-15 19:02   ` David Turner
2015-10-15 20:38     ` René Scharfe
2015-10-15 20:51     ` Junio C Hamano
2015-10-16  7:05       ` David Turner
2015-10-16 16:04         ` Junio C Hamano
2015-10-19 22:27           ` David Turner [this message]
2015-10-20  2:13             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445293649.3418.16.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kmcguigan@twitter.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.