From: Vladimir Davydov <vdavydov@parallels.com>
To: Christoph Lameter <cl@linux.com>
Cc: hannes@cmpxchg.org, mhocko@suse.cz, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH RFC 1/3] slub: keep full slabs on list for per memcg caches
Date: Fri, 16 May 2014 17:06:30 +0400 [thread overview]
Message-ID: <20140516130629.GE32113@esperanza> (raw)
In-Reply-To: <alpine.DEB.2.10.1405151011210.24665@gentwo.org>
On Thu, May 15, 2014 at 10:15:10AM -0500, Christoph Lameter wrote:
> On Thu, 15 May 2014, Vladimir Davydov wrote:
>
> > > That will significantly impact the fastpaths for alloc and free.
> > >
> > > Also a pretty significant change the logic of the fastpaths since they
> > > were not designed to handle the full lists. In debug mode all operations
> > > were only performed by the slow paths and only the slow paths so far
> > > supported tracking full slabs.
> >
> > That's the minimal price we have to pay for slab re-parenting, because
> > w/o it we won't be able to look up for all slabs of a particular per
> > memcg cache. The question is, can it be tolerated or I'd better try some
> > other way?
>
> AFACIT these modifications all together will have a significant impact on
> performance.
>
> You could avoid the refcounting on free relying on the atomic nature of
> cmpxchg operations. If you zap the per cpu slab then the fast path will be
> forced to fall back to the slowpaths where you could do what you need to
> do.
Hmm, looking at __slab_free once again, I tend to agree that we could
rely on cmpxchg to do re-parenting: we could freeze all slabs of the
cache being re-parented forcing every on-going kfree to do only a
cmpxchg w/o touching any lists and taking any locks, and then unfreeze
all the frozen slabs to the target cache. No need in the ugly "slow
mode" I introduced in this patch set would be necessary then.
But w/o ref-counting how can we make sure that all kfrees to the cache
we are going to re-parent have been completed so that it can be safely
destroyed? An example:
CPU0: CPU1:
----- -----
kfree(obj):
page = virt_to_head_page(obj)
s = page->slab_cache
slab_free(s, page, obj):
<<< gets preempted here
reparent_slab_cache:
for each slab page
[...]
page->slab_cache = target_cache;
kmem_cache_destroy(old_cache)
<<< continues execution
c = s->cpu_slab /* s points to the previous owner cache,
so we use-after-free here */
If kfree were not preemptable, we could make reparent_slab_cache wait
for all cpus to schedule() before destroying the cache to avoid this,
but since it is, we need ref-counting...
Thanks.
> There is no tracking of full slabs without adding much more logic to the
> fastpath. You could force any operation that affects tne full list into
> the slow path. But that also would have an impact.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@parallels.com>
To: Christoph Lameter <cl@linux.com>
Cc: <hannes@cmpxchg.org>, <mhocko@suse.cz>,
<akpm@linux-foundation.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>
Subject: Re: [PATCH RFC 1/3] slub: keep full slabs on list for per memcg caches
Date: Fri, 16 May 2014 17:06:30 +0400 [thread overview]
Message-ID: <20140516130629.GE32113@esperanza> (raw)
In-Reply-To: <alpine.DEB.2.10.1405151011210.24665@gentwo.org>
On Thu, May 15, 2014 at 10:15:10AM -0500, Christoph Lameter wrote:
> On Thu, 15 May 2014, Vladimir Davydov wrote:
>
> > > That will significantly impact the fastpaths for alloc and free.
> > >
> > > Also a pretty significant change the logic of the fastpaths since they
> > > were not designed to handle the full lists. In debug mode all operations
> > > were only performed by the slow paths and only the slow paths so far
> > > supported tracking full slabs.
> >
> > That's the minimal price we have to pay for slab re-parenting, because
> > w/o it we won't be able to look up for all slabs of a particular per
> > memcg cache. The question is, can it be tolerated or I'd better try some
> > other way?
>
> AFACIT these modifications all together will have a significant impact on
> performance.
>
> You could avoid the refcounting on free relying on the atomic nature of
> cmpxchg operations. If you zap the per cpu slab then the fast path will be
> forced to fall back to the slowpaths where you could do what you need to
> do.
Hmm, looking at __slab_free once again, I tend to agree that we could
rely on cmpxchg to do re-parenting: we could freeze all slabs of the
cache being re-parented forcing every on-going kfree to do only a
cmpxchg w/o touching any lists and taking any locks, and then unfreeze
all the frozen slabs to the target cache. No need in the ugly "slow
mode" I introduced in this patch set would be necessary then.
But w/o ref-counting how can we make sure that all kfrees to the cache
we are going to re-parent have been completed so that it can be safely
destroyed? An example:
CPU0: CPU1:
----- -----
kfree(obj):
page = virt_to_head_page(obj)
s = page->slab_cache
slab_free(s, page, obj):
<<< gets preempted here
reparent_slab_cache:
for each slab page
[...]
page->slab_cache = target_cache;
kmem_cache_destroy(old_cache)
<<< continues execution
c = s->cpu_slab /* s points to the previous owner cache,
so we use-after-free here */
If kfree were not preemptable, we could make reparent_slab_cache wait
for all cpus to schedule() before destroying the cache to avoid this,
but since it is, we need ref-counting...
Thanks.
> There is no tracking of full slabs without adding much more logic to the
> fastpath. You could force any operation that affects tne full list into
> the slow path. But that also would have an impact.
next prev parent reply other threads:[~2014-05-16 13:06 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-13 13:48 [PATCH RFC 0/3] kmemcg slab reparenting Vladimir Davydov
2014-05-13 13:48 ` Vladimir Davydov
2014-05-13 13:48 ` [PATCH RFC 1/3] slub: keep full slabs on list for per memcg caches Vladimir Davydov
2014-05-13 13:48 ` Vladimir Davydov
2014-05-14 16:16 ` Christoph Lameter
2014-05-14 16:16 ` Christoph Lameter
2014-05-15 6:34 ` Vladimir Davydov
2014-05-15 6:34 ` Vladimir Davydov
2014-05-15 15:15 ` Christoph Lameter
2014-05-15 15:15 ` Christoph Lameter
2014-05-16 13:06 ` Vladimir Davydov [this message]
2014-05-16 13:06 ` Vladimir Davydov
2014-05-16 15:05 ` Christoph Lameter
2014-05-16 15:05 ` Christoph Lameter
2014-05-13 13:48 ` [PATCH RFC 2/3] percpu-refcount: allow to get dead reference Vladimir Davydov
2014-05-13 13:48 ` Vladimir Davydov
2014-05-13 13:48 ` [PATCH RFC 3/3] slub: reparent memcg caches' slabs on memcg offline Vladimir Davydov
2014-05-13 13:48 ` Vladimir Davydov
2014-05-14 16:20 ` Christoph Lameter
2014-05-14 16:20 ` Christoph Lameter
2014-05-15 7:16 ` Vladimir Davydov
2014-05-15 7:16 ` Vladimir Davydov
2014-05-15 15:16 ` Christoph Lameter
2014-05-15 15:16 ` Christoph Lameter
2014-05-16 13:22 ` Vladimir Davydov
2014-05-16 13:22 ` Vladimir Davydov
2014-05-16 15:03 ` Christoph Lameter
2014-05-16 15:03 ` Christoph Lameter
2014-05-19 15:24 ` Vladimir Davydov
2014-05-19 15:24 ` Vladimir Davydov
2014-05-19 16:03 ` Christoph Lameter
2014-05-19 16:03 ` Christoph Lameter
2014-05-19 18:27 ` Vladimir Davydov
2014-05-19 18:27 ` Vladimir Davydov
2014-05-21 13:58 ` Vladimir Davydov
2014-05-21 13:58 ` Vladimir Davydov
2014-05-21 14:45 ` Christoph Lameter
2014-05-21 14:45 ` Christoph Lameter
2014-05-21 15:14 ` Vladimir Davydov
2014-05-21 15:14 ` Vladimir Davydov
2014-05-22 0:15 ` Christoph Lameter
2014-05-22 0:15 ` Christoph Lameter
2014-05-22 14:07 ` Vladimir Davydov
2014-05-22 14:07 ` Vladimir Davydov
2014-05-21 14:41 ` Christoph Lameter
2014-05-21 14:41 ` Christoph Lameter
2014-05-21 15:04 ` Vladimir Davydov
2014-05-21 15:04 ` Vladimir Davydov
2014-05-22 0:13 ` Christoph Lameter
2014-05-22 0:13 ` Christoph Lameter
2014-05-22 13:47 ` Vladimir Davydov
2014-05-22 13:47 ` Vladimir Davydov
2014-05-22 19:25 ` Christoph Lameter
2014-05-22 19:25 ` Christoph Lameter
2014-05-23 15:26 ` Vladimir Davydov
2014-05-23 15:26 ` Vladimir Davydov
2014-05-23 17:45 ` Christoph Lameter
2014-05-23 17:45 ` Christoph Lameter
2014-05-23 19:57 ` Vladimir Davydov
2014-05-23 19:57 ` Vladimir Davydov
2014-05-27 14:38 ` Christoph Lameter
2014-05-27 14:38 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140516130629.GE32113@esperanza \
--to=vdavydov@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.