All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@parallels.com>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Christoph Lameter <cl@gentwo.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH -mm 7/8] slub: make dead caches discard free slabs immediately
Date: Tue, 3 Jun 2014 12:16:57 +0400	[thread overview]
Message-ID: <20140603081655.GA6013@esperanza> (raw)
In-Reply-To: <CAAmzW4P=kUAJwozBPPos+uUewzSDnE43P6NcGYKNpBjjfv1EWA@mail.gmail.com>

On Mon, Jun 02, 2014 at 11:03:51PM +0900, Joonsoo Kim wrote:
> 2014-06-02 20:47 GMT+09:00 Vladimir Davydov <vdavydov@parallels.com>:
> > Hi Joonsoo,
> >
> > On Mon, Jun 02, 2014 at 01:24:36PM +0900, Joonsoo Kim wrote:
> >> On Sat, May 31, 2014 at 03:04:58PM +0400, Vladimir Davydov wrote:
> >> > On Fri, May 30, 2014 at 09:57:10AM -0500, Christoph Lameter wrote:
> >> > > On Fri, 30 May 2014, Vladimir Davydov wrote:
> >> > >
> >> > > > (3) is a bit more difficult, because slabs are added to per-cpu partial
> >> > > > lists lock-less. Fortunately, we only have to handle the __slab_free
> >> > > > case, because, as there shouldn't be any allocation requests dispatched
> >> > > > to a dead memcg cache, get_partial_node() should never be called. In
> >> > > > __slab_free we use cmpxchg to modify kmem_cache_cpu->partial (see
> >> > > > put_cpu_partial) so that setting ->partial to a special value, which
> >> > > > will make put_cpu_partial bail out, will do the trick.
> > [...]
> >> I think that we can do (3) easily.
> >> If we check memcg_cache_dead() in the end of put_cpu_partial() rather
> >> than in the begin of put_cpu_partial(), we can avoid the race you
> >> mentioned. If someone do put_cpu_partial() before dead flag is set,
> >> it can be zapped by who set dead flag. And if someone do
> >> put_cpu_partial() after dead flag is set, it can be zapped by who
> >> do put_cpu_partial().
> >
> > After put_cpu_partial() adds a frozen slab to a per cpu partial list,
> > the slab becomes visible to other threads, which means it can be
> > unfrozen and freed. The latter can trigger cache destruction. Hence we
> > shouldn't touch the cache, in particular call memcg_cache_dead() on it,
> > after calling put_cpu_partial(), otherwise we can get use-after-free.
> >
> > However, what you propose makes sense if we disable irqs before adding a
> > slab to a partial list and enable them only after checking if the cache
> > is dead and unfreezing all partials if so, i.e.
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index d96faa2464c3..14b9e9a8677c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2030,8 +2030,15 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >         struct page *oldpage;
> >         int pages;
> >         int pobjects;
> > +       unsigned long flags;
> > +       int irq_saved = 0;
> >
> >         do {
> > +               if (irq_saved) {
> > +                       local_irq_restore(flags);
> > +                       irq_saved = 0;
> > +               }
> > +
> >                 pages = 0;
> >                 pobjects = 0;
> >                 oldpage = this_cpu_read(s->cpu_slab->partial);
> > @@ -2062,8 +2069,16 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >                 page->pobjects = pobjects;
> >                 page->next = oldpage;
> >
> > +               local_irq_save(flags);
> > +               irq_saved = 1;
> > +
> >         } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
> >                                                                 != oldpage);
> > +
> > +       if (memcg_cache_dead(s))
> > +               unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
> > +
> > +       local_irq_restore(flags);
> >  #endif
> >  }
> >
> >
> > That would be safe against possible cache destruction, because to remove
> > a slab from a per cpu partial list we have to run on the cpu it was
> > frozen on. Disabling irqs makes it impossible.
> 
> Hmm... this is also a bit ugly.
> How about following change?
> 
> Thanks.
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 2b1ce69..6adab87 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2058,6 +2058,21 @@ static void put_cpu_partial(struct kmem_cache
> *s, struct page *page, int drain)
> 
>         } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
>                                                                 != oldpage);
> +
> +       if (memcg_cache_dead(s)) {
> +               bool done = false;
> +               unsigned long flags;

Suppose we are preempted here. In the meanwhile all objects are freed to
the cache, all frozen pages are unfrozen and also freed. The cache
destruction is then scheduled (patch 2 of this set). Then when this
thread continues execution it will operate on the cache that was
destroyed - use-after-free.

I admit, this is very unlikely, but can we ignore this possibility?

Thanks.

> +
> +               local_irq_save(flags);
> +               if (this_cpu_read(s->cpu_slab->partial) == page) {
> +                       done = true;
> +                       unfreeze_partials(s, this_cpu_ptr);
> +               }
> +               local_irq_restore(flags);
> +
> +               if (!done)
> +                       flush_all(s);
> +       }
>  #endif
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Christoph Lameter <cl@gentwo.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH -mm 7/8] slub: make dead caches discard free slabs immediately
Date: Tue, 3 Jun 2014 12:16:57 +0400	[thread overview]
Message-ID: <20140603081655.GA6013@esperanza> (raw)
In-Reply-To: <CAAmzW4P=kUAJwozBPPos+uUewzSDnE43P6NcGYKNpBjjfv1EWA@mail.gmail.com>

On Mon, Jun 02, 2014 at 11:03:51PM +0900, Joonsoo Kim wrote:
> 2014-06-02 20:47 GMT+09:00 Vladimir Davydov <vdavydov@parallels.com>:
> > Hi Joonsoo,
> >
> > On Mon, Jun 02, 2014 at 01:24:36PM +0900, Joonsoo Kim wrote:
> >> On Sat, May 31, 2014 at 03:04:58PM +0400, Vladimir Davydov wrote:
> >> > On Fri, May 30, 2014 at 09:57:10AM -0500, Christoph Lameter wrote:
> >> > > On Fri, 30 May 2014, Vladimir Davydov wrote:
> >> > >
> >> > > > (3) is a bit more difficult, because slabs are added to per-cpu partial
> >> > > > lists lock-less. Fortunately, we only have to handle the __slab_free
> >> > > > case, because, as there shouldn't be any allocation requests dispatched
> >> > > > to a dead memcg cache, get_partial_node() should never be called. In
> >> > > > __slab_free we use cmpxchg to modify kmem_cache_cpu->partial (see
> >> > > > put_cpu_partial) so that setting ->partial to a special value, which
> >> > > > will make put_cpu_partial bail out, will do the trick.
> > [...]
> >> I think that we can do (3) easily.
> >> If we check memcg_cache_dead() in the end of put_cpu_partial() rather
> >> than in the begin of put_cpu_partial(), we can avoid the race you
> >> mentioned. If someone do put_cpu_partial() before dead flag is set,
> >> it can be zapped by who set dead flag. And if someone do
> >> put_cpu_partial() after dead flag is set, it can be zapped by who
> >> do put_cpu_partial().
> >
> > After put_cpu_partial() adds a frozen slab to a per cpu partial list,
> > the slab becomes visible to other threads, which means it can be
> > unfrozen and freed. The latter can trigger cache destruction. Hence we
> > shouldn't touch the cache, in particular call memcg_cache_dead() on it,
> > after calling put_cpu_partial(), otherwise we can get use-after-free.
> >
> > However, what you propose makes sense if we disable irqs before adding a
> > slab to a partial list and enable them only after checking if the cache
> > is dead and unfreezing all partials if so, i.e.
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index d96faa2464c3..14b9e9a8677c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2030,8 +2030,15 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >         struct page *oldpage;
> >         int pages;
> >         int pobjects;
> > +       unsigned long flags;
> > +       int irq_saved = 0;
> >
> >         do {
> > +               if (irq_saved) {
> > +                       local_irq_restore(flags);
> > +                       irq_saved = 0;
> > +               }
> > +
> >                 pages = 0;
> >                 pobjects = 0;
> >                 oldpage = this_cpu_read(s->cpu_slab->partial);
> > @@ -2062,8 +2069,16 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >                 page->pobjects = pobjects;
> >                 page->next = oldpage;
> >
> > +               local_irq_save(flags);
> > +               irq_saved = 1;
> > +
> >         } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
> >                                                                 != oldpage);
> > +
> > +       if (memcg_cache_dead(s))
> > +               unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
> > +
> > +       local_irq_restore(flags);
> >  #endif
> >  }
> >
> >
> > That would be safe against possible cache destruction, because to remove
> > a slab from a per cpu partial list we have to run on the cpu it was
> > frozen on. Disabling irqs makes it impossible.
> 
> Hmm... this is also a bit ugly.
> How about following change?
> 
> Thanks.
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 2b1ce69..6adab87 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2058,6 +2058,21 @@ static void put_cpu_partial(struct kmem_cache
> *s, struct page *page, int drain)
> 
>         } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
>                                                                 != oldpage);
> +
> +       if (memcg_cache_dead(s)) {
> +               bool done = false;
> +               unsigned long flags;

Suppose we are preempted here. In the meanwhile all objects are freed to
the cache, all frozen pages are unfrozen and also freed. The cache
destruction is then scheduled (patch 2 of this set). Then when this
thread continues execution it will operate on the cache that was
destroyed - use-after-free.

I admit, this is very unlikely, but can we ignore this possibility?

Thanks.

> +
> +               local_irq_save(flags);
> +               if (this_cpu_read(s->cpu_slab->partial) == page) {
> +                       done = true;
> +                       unfreeze_partials(s, this_cpu_ptr);
> +               }
> +               local_irq_restore(flags);
> +
> +               if (!done)
> +                       flush_all(s);
> +       }
>  #endif
>  }

  parent reply	other threads:[~2014-06-03  8:17 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-30 13:51 [PATCH -mm 0/8] memcg/slab: reintroduce dead cache self-destruction Vladimir Davydov
2014-05-30 13:51 ` Vladimir Davydov
2014-05-30 13:51 ` [PATCH -mm 1/8] memcg: cleanup memcg_cache_params refcnt usage Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 14:31   ` Christoph Lameter
2014-05-30 14:31     ` Christoph Lameter
2014-05-30 13:51 ` [PATCH -mm 2/8] memcg: destroy kmem caches when last slab is freed Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 14:32   ` Christoph Lameter
2014-05-30 14:32     ` Christoph Lameter
2014-05-30 13:51 ` [PATCH -mm 3/8] memcg: mark caches that belong to offline memcgs as dead Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 14:33   ` Christoph Lameter
2014-05-30 14:33     ` Christoph Lameter
2014-05-30 13:51 ` [PATCH -mm 4/8] slub: never fail kmem_cache_shrink Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 14:46   ` Christoph Lameter
2014-05-30 14:46     ` Christoph Lameter
2014-05-31 10:18     ` Vladimir Davydov
2014-05-31 10:18       ` Vladimir Davydov
2014-06-02 15:13       ` Christoph Lameter
2014-06-02 15:13         ` Christoph Lameter
2014-05-30 13:51 ` [PATCH -mm 5/8] slab: remove kmem_cache_shrink retval Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 14:49   ` Christoph Lameter
2014-05-30 14:49     ` Christoph Lameter
2014-05-31 10:27     ` Vladimir Davydov
2014-05-31 10:27       ` Vladimir Davydov
2014-06-02 15:16       ` Christoph Lameter
2014-06-02 15:16         ` Christoph Lameter
2014-06-03  9:06         ` Vladimir Davydov
2014-06-03  9:06           ` Vladimir Davydov
2014-06-03 14:48           ` Christoph Lameter
2014-06-03 14:48             ` Christoph Lameter
2014-06-03 19:00             ` Vladimir Davydov
2014-06-03 19:00               ` Vladimir Davydov
2014-05-30 13:51 ` [PATCH -mm 6/8] slub: do not use cmpxchg for adding cpu partials when irqs disabled Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 13:51 ` [PATCH -mm 7/8] slub: make dead caches discard free slabs immediately Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 14:57   ` Christoph Lameter
2014-05-30 14:57     ` Christoph Lameter
2014-05-31 11:04     ` Vladimir Davydov
2014-05-31 11:04       ` Vladimir Davydov
2014-06-02  4:24       ` Joonsoo Kim
2014-06-02  4:24         ` Joonsoo Kim
2014-06-02 11:47         ` Vladimir Davydov
2014-06-02 11:47           ` Vladimir Davydov
2014-06-02 14:03           ` Joonsoo Kim
2014-06-02 14:03             ` Joonsoo Kim
2014-06-02 15:17             ` Christoph Lameter
2014-06-02 15:17               ` Christoph Lameter
2014-06-03  8:16             ` Vladimir Davydov [this message]
2014-06-03  8:16               ` Vladimir Davydov
2014-06-04  8:53               ` Joonsoo Kim
2014-06-04  8:53                 ` Joonsoo Kim
2014-06-04  9:47                 ` Vladimir Davydov
2014-06-04  9:47                   ` Vladimir Davydov
2014-05-30 13:51 ` [PATCH -mm 8/8] slab: reap dead memcg caches aggressively Vladimir Davydov
2014-05-30 13:51   ` Vladimir Davydov
2014-05-30 15:01   ` Christoph Lameter
2014-05-30 15:01     ` Christoph Lameter
2014-05-31 11:19     ` Vladimir Davydov
2014-05-31 11:19       ` Vladimir Davydov
2014-06-02 15:24       ` Christoph Lameter
2014-06-02 15:24         ` Christoph Lameter
2014-06-03 20:18         ` Vladimir Davydov
2014-06-03 20:18           ` Vladimir Davydov
2014-06-02  4:41   ` Joonsoo Kim
2014-06-02  4:41     ` Joonsoo Kim
2014-06-02 12:10     ` Vladimir Davydov
2014-06-02 12:10       ` Vladimir Davydov
2014-06-02 14:01       ` Joonsoo Kim
2014-06-02 14:01         ` Joonsoo Kim
2014-06-03  8:21         ` Vladimir Davydov
2014-06-03  8:21           ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140603081655.GA6013@esperanza \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.