All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git mailing list <git@vger.kernel.org>, Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH 0/1] merge: fix cache_entry use-after-free
Date: Fri, 09 Oct 2015 18:16:03 -0400	[thread overview]
Message-ID: <1444428963.8836.36.camel@twopensource.com> (raw)
In-Reply-To: <xmqqmvvtqgei.fsf@gitster.mtv.corp.google.com>

+Duy Nguyen, who knows the split index better.

On Thu, 2015-10-08 at 13:00 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > From: Keith McGuigan <kmcguigan@twitter.com>
> >
> > During merges, we would previously free entries that we no longer need
> > in the destination index.  But those entries might also be stored in
> > the dir_entry cache, and when a later call to add_to_index found them,
> > they would be used after being freed.
> >
> > To prevent this, add a ref count for struct cache_entry.  Whenever
> > a cache entry is added to a data structure, the ref count is incremented;
> > when it is removed from the data structure, it is decremented.  When
> > it hits zero, the cache_entry is freed.
> >
> > Signed-off-by: David Turner <dturner@twopensource.com>
> > Signed-off-by: Keith McGuigan <kmcguigan@twitter.com>
> > ---
> 
> Thanks.
> 
> > @@ -213,6 +214,32 @@ struct cache_entry {
> >  struct pathspec;
> >  
> >  /*
> > + * Increment the cache_entry reference count.  Should be called
> > + * whenever a pointer to a cache_entry is retained in a data structure,
> > + * thus marking it as alive.
> > + */
> > +static inline void add_ce_ref(struct cache_entry *ce)
> > +{
> > +	assert(ce != NULL && ce->ref_count >= 0);
> > +	++ce->ref_count;
> > +}
> 
> We tend to prefer post-increment when the distinction between pre-
> or post- does not make any difference, which is the case here.

Will fix.

> > diff --git a/name-hash.c b/name-hash.c
> > index 702cd05..f12c919 100644
> > --- a/name-hash.c
> > +++ b/name-hash.c
> > @@ -92,7 +93,9 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
> >  	struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));
> >  	while (dir && !(--dir->nr)) {
> >  		struct dir_entry *parent = dir->parent;
> > -		hashmap_remove(&istate->dir_hash, dir, NULL);
> > +		struct dir_entry *removed = hashmap_remove(&istate->dir_hash, dir, NULL);
> > +		assert(removed == dir);
> > +		drop_ce_ref(dir->ce);
> 
> This is curious.  In remove_name_hash() you do not have the
> corresponding assert.  Why is it necessary here (or is it
> unnecessary over there)?

It is unnecessary in any case: it's an assert rather than a check for an
expected (or even unexpected) case.  That just happens to be where Keith
first managed to track down the use-after free, so that's where he
happened to put the assert while trying to debug it.  I'm in general in
favor of more asserts rather than fewer, so I would prefer to keep it
in.  But I can remove it if you like.

> > @@ -147,6 +151,7 @@ void remove_name_hash(struct index_state *istate, struct cache_entry *ce)
> >  		return;
> >  	ce->ce_flags &= ~CE_HASHED;
> >  	hashmap_remove(&istate->name_hash, ce, ce);
> > +	drop_ce_ref(ce);
> >  
> >  	if (ignore_case)
> >  		remove_dir_entry(istate, ce);
> > diff --git a/read-cache.c b/read-cache.c
> > index 87204a5..442de34 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -53,6 +53,7 @@ static const char *alternate_index_output;
> >  static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
> >  {
> >  	istate->cache[nr] = ce;
> > +	add_ce_ref(ce);
> >  	add_name_hash(istate, ce);
> >  }
> 
> What happens to the existing entry that used to be istate->cache[nr],
> which may or may not be 'ce' that is replacing it?
> 
> It turns out that all three calling sites are safe, but it is not
> immediately obvious why.  Perhaps some comment in front of the
> function is in order, to warn those who may have to add a new caller
> or restructure the existing calling chain, that istate->cache[nr] is
> expected not to hold a live reference when the function is called,
> or something?

Happy to add it if you want, but to me it was clear without because if
there were a value in istate->cache[nr], that old value would be leaked.
Let me know.

> > @@ -314,8 +314,10 @@ void replace_index_entry_in_base(struct index_state *istate,
> >  	    istate->split_index->base &&
> >  	    old->index <= istate->split_index->base->cache_nr) {
> >  		new->index = old->index;
> > -		if (old != istate->split_index->base->cache[new->index - 1])
> > -			free(istate->split_index->base->cache[new->index - 1]);
> > +		if (old != istate->split_index->base->cache[new->index - 1]) {
> > +			struct cache_entry *ce = istate->split_index->base->cache[new->index - 1];
> > +			drop_ce_ref(ce);
> > +		}
> >  		istate->split_index->base->cache[new->index - 1] = new;
> 
> Does 'new' already have the right refcount at this point?

I spoke to Keith, and he thinks we do need an add_ce_ref there. I'll fix
that on the reroll.  Duy, do you agree?

  reply	other threads:[~2015-10-09 22:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-08 18:47 [PATCH 0/1] merge: fix cache_entry use-after-free David Turner
2015-10-08 18:47 ` David Turner
2015-10-08 20:00   ` Junio C Hamano
2015-10-09 22:16     ` David Turner [this message]
2015-10-09 22:51       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1444428963.8836.36.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.