Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Roman Gushchin <guro@fb.com>
To: Waiman Long <longman@redhat.com>
Cc: Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jonathan Corbet <corbet@lwn.net>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Shakeel Butt <shake>
Subject: Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
Date: Thu, 27 Jun 2019 21:24:25 +0000	[thread overview]
Message-ID: <20190627212419.GA25233@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <063752b2-4f1a-d198-36e7-3e642d4fcf19@redhat.com>

On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote:
> On 6/26/19 4:19 PM, Roman Gushchin wrote:
> >>  
> >> +#ifdef CONFIG_MEMCG_KMEM
> >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
> >> +				    void __maybe_unused *arg)
> >> +{
> >> +	struct kmem_cache *s;
> >> +
> >> +	if (memcg == root_mem_cgroup)
> >> +		return;
> >> +	mutex_lock(&slab_mutex);
> >> +	list_for_each_entry(s, &memcg->kmem_caches,
> >> +			    memcg_params.kmem_caches_node) {
> >> +		kmem_cache_shrink(s);
> >> +	}
> >> +	mutex_unlock(&slab_mutex);
> >> +	cond_resched();
> >> +}
> > A couple of questions:
> > 1) how about skipping already offlined kmem_caches? They are already shrunk,
> >    so you probably won't get much out of them. Or isn't it true?
> 
> I have been thinking about that. This patch is based on the linux tree
> and so don't have an easy to find out if the kmem caches have been
> shrinked. Rebasing this on top of linux-next, I can use the
> SLAB_DEACTIVATED flag as a marker for skipping the shrink.
> 
> With all the latest patches, I am still seeing 121 out of a total of 726
> memcg kmem caches (1/6) that are deactivated caches after system bootup
> one of the test systems. My system is still using cgroup v1 and so the
> number may be different in a v2 setup. The next step is probably to
> figure out why those deactivated caches are still there.

It's not a secret: these kmem_caches are holding objects, which are in use.
It's a drawback of the current slab accounting implementation: every
object holds a whole page and the corresponding kmem_cache. It's optimized
for a large number of objects, which are created and destroyed within
the life of the cgroup (e.g. task_structs), and it works worse for long-living
objects like vfs cache.

Long-term I think we need a different implementation for long-living objects,
so that objects belonging to different memory cgroups can share the same page
and kmem_caches.

It's a fairly big change though.

> 
> > 2) what's your long-term vision here? do you think that we need to shrink
> >    kmem_caches periodically, depending on memory pressure? how a user
> >    will use this new sysctl?
> Shrinking the kmem caches under extreme memory pressure can be one way
> to free up extra pages, but the effect will probably be temporary.
> > What's the problem you're trying to solve in general?
> 
> At least for the slub allocator, shrinking the caches allow the number
> of active objects reported in slabinfo to be more accurate. In addition,
> this allow to know the real slab memory consumption. I have been working
> on a BZ about continuous memory leaks with a container based workloads.
> The ability to shrink caches allow us to get a more accurate memory
> consumption picture. Another alternative is to turn on slub_debug which
> will then disables all the per-cpu slabs.

I see... I agree with Michal here, that extending drop_caches sysctl isn't
the best idea. Isn't it possible to achieve the same effect using slub sysfs?

Thanks!

WARNING: multiple messages have this Message-ID (diff)

From: Roman Gushchin <guro@fb.com>
To: Waiman Long <longman@redhat.com>
Cc: Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	"David Rientjes" <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jonathan Corbet <corbet@lwn.net>,
	Luis Chamberlain <mcgrof@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>
Subject: Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs
Date: Thu, 27 Jun 2019 21:24:25 +0000	[thread overview]
Message-ID: <20190627212419.GA25233@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <063752b2-4f1a-d198-36e7-3e642d4fcf19@redhat.com>

On Thu, Jun 27, 2019 at 04:57:50PM -0400, Waiman Long wrote:
> On 6/26/19 4:19 PM, Roman Gushchin wrote:
> >>  
> >> +#ifdef CONFIG_MEMCG_KMEM
> >> +static void kmem_cache_shrink_memcg(struct mem_cgroup *memcg,
> >> +				    void __maybe_unused *arg)
> >> +{
> >> +	struct kmem_cache *s;
> >> +
> >> +	if (memcg == root_mem_cgroup)
> >> +		return;
> >> +	mutex_lock(&slab_mutex);
> >> +	list_for_each_entry(s, &memcg->kmem_caches,
> >> +			    memcg_params.kmem_caches_node) {
> >> +		kmem_cache_shrink(s);
> >> +	}
> >> +	mutex_unlock(&slab_mutex);
> >> +	cond_resched();
> >> +}
> > A couple of questions:
> > 1) how about skipping already offlined kmem_caches? They are already shrunk,
> >    so you probably won't get much out of them. Or isn't it true?
> 
> I have been thinking about that. This patch is based on the linux tree
> and so don't have an easy to find out if the kmem caches have been
> shrinked. Rebasing this on top of linux-next, I can use the
> SLAB_DEACTIVATED flag as a marker for skipping the shrink.
> 
> With all the latest patches, I am still seeing 121 out of a total of 726
> memcg kmem caches (1/6) that are deactivated caches after system bootup
> one of the test systems. My system is still using cgroup v1 and so the
> number may be different in a v2 setup. The next step is probably to
> figure out why those deactivated caches are still there.

It's not a secret: these kmem_caches are holding objects, which are in use.
It's a drawback of the current slab accounting implementation: every
object holds a whole page and the corresponding kmem_cache. It's optimized
for a large number of objects, which are created and destroyed within
the life of the cgroup (e.g. task_structs), and it works worse for long-living
objects like vfs cache.

Long-term I think we need a different implementation for long-living objects,
so that objects belonging to different memory cgroups can share the same page
and kmem_caches.

It's a fairly big change though.

> 
> > 2) what's your long-term vision here? do you think that we need to shrink
> >    kmem_caches periodically, depending on memory pressure? how a user
> >    will use this new sysctl?
> Shrinking the kmem caches under extreme memory pressure can be one way
> to free up extra pages, but the effect will probably be temporary.
> > What's the problem you're trying to solve in general?
> 
> At least for the slub allocator, shrinking the caches allow the number
> of active objects reported in slabinfo to be more accurate. In addition,
> this allow to know the real slab memory consumption. I have been working
> on a BZ about continuous memory leaks with a container based workloads.
> The ability to shrink caches allow us to get a more accurate memory
> consumption picture. Another alternative is to turn on slub_debug which
> will then disables all the per-cpu slabs.

I see... I agree with Michal here, that extending drop_caches sysctl isn't
the best idea. Isn't it possible to achieve the same effect using slub sysfs?

Thanks!

next prev parent reply	other threads:[~2019-06-27 21:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 17:42 [PATCH 0/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs Waiman Long
2019-06-24 17:42 ` [PATCH 1/2] mm, memcontrol: Add memcg_iterate_all() Waiman Long
2019-06-27 15:07   ` Michal Hocko
2019-06-27 21:03     ` Waiman Long
2019-06-28  7:10       ` Michal Hocko
2019-06-24 17:42 ` [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs Waiman Long
2019-06-26 20:19   ` Roman Gushchin
2019-06-26 20:19     ` Roman Gushchin
2019-06-27 20:57     ` Waiman Long
2019-06-27 20:57       ` Waiman Long
2019-06-27 21:24       ` Roman Gushchin [this message]
2019-06-27 21:24         ` Roman Gushchin
2019-06-27 21:31         ` Waiman Long
2019-06-27 21:31           ` Waiman Long
2019-06-28 15:32         ` Christopher Lameter
2019-06-28 15:32           ` Christopher Lameter
2019-06-28 16:33           ` Roman Gushchin
2019-06-28 16:33             ` Roman Gushchin
2019-06-28 17:16           ` Yang Shi
2019-06-28 17:16             ` Yang Shi
2019-06-28 17:30             ` Roman Gushchin
2019-06-28 17:30               ` Roman Gushchin
2019-06-27 21:25       ` Luis Chamberlain
2019-06-27 21:25         ` Luis Chamberlain
2019-06-27 15:15   ` Michal Hocko
2019-06-27 21:16     ` Waiman Long
2019-06-28  7:31       ` Michal Hocko
2019-07-02 18:41         ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190627212419.GA25233@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.