From: Waiman Long <longman@redhat.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@kernel.org>, Roman Gushchin <guro@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
Date: Thu, 20 Jun 2019 10:23:41 -0400 [thread overview]
Message-ID: <cfc6c800-1cb4-e2f2-e6d9-f0571c11a47b@redhat.com> (raw)
In-Reply-To: <CALvZod7pdOx0a1v4oX5-7ZfCykM8iwRwPkW-+gbO1B4+j1SXqw@mail.gmail.com>
On 6/19/19 7:48 PM, Shakeel Butt wrote:
> Hi Waiman,
>
> On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@redhat.com> wrote:
>> There are concerns about memory leaks from extensive use of memory
>> cgroups as each memory cgroup creates its own set of kmem caches. There
>> is a possiblity that the memcg kmem caches may remain even after the
>> memory cgroups have been offlined. Therefore, it will be useful to show
>> the status of each of memcg kmem caches.
>>
>> This patch introduces a new <debugfs>/memcg_slabinfo file which is
>> somewhat similar to /proc/slabinfo in format, but lists only information
>> about kmem caches that have child memcg kmem caches. Information
>> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>>
>> A portion of a sample output of the file was:
>>
>> # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
>> rpc_inode_cache root 13 51 1 1
>> rpc_inode_cache 48 0 0 0 0
>> fat_inode_cache root 1 45 1 1
>> fat_inode_cache 41 2 45 1 1
>> xfs_inode root 770 816 24 24
>> xfs_inode 92 22 34 1 1
>> xfs_inode 88:dead 1 34 1 1
>> xfs_inode 89:dead 23 34 1 1
>> xfs_inode 85 4 34 1 1
>> xfs_inode 84 9 34 1 1
>>
>> The css id of the memcg is also listed. If a memcg is not online,
>> the tag ":dead" will be attached as shown above.
>>
>> Suggested-by: Shakeel Butt <shakeelb@google.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 57 insertions(+)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 58251ba63e4a..2bca1558a722 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -17,6 +17,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/seq_file.h>
>> #include <linux/proc_fs.h>
>> +#include <linux/debugfs.h>
>> #include <asm/cacheflush.h>
>> #include <asm/tlbflush.h>
>> #include <asm/page.h>
>> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
>> return 0;
>> }
>> module_init(slab_proc_init);
>> +
>> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
>> +/*
>> + * Display information about kmem caches that have child memcg caches.
>> + */
>> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
>> +{
>> + struct kmem_cache *s, *c;
>> + struct slabinfo sinfo;
>> +
>> + mutex_lock(&slab_mutex);
> On large machines there can be thousands of memcgs and potentially
> each memcg can have hundreds of kmem caches. So, the slab_mutex can be
> held for a very long time.
But that is also what /proc/slabinfo does by doing mutex_lock() at
slab_start() and mutex_unlock() at slab_stop(). So the same problem will
happen when /proc/slabinfo is being read.
When you are in a situation that reading /proc/slabinfo take a long time
because of the large number of memcg's, the system is in some kind of
trouble anyway. I am saying that we should not improve the scalability
of this patch. It is just that some nasty race conditions may pop up if
we release the lock and re-acquire it latter. That will greatly
complicate the code to handle all those edge cases.
> Our internal implementation traverses the memcg tree and then
> traverses 'memcg->kmem_caches' within the slab_mutex (and
> cond_resched() after unlock).
For cgroup v1, the setting of the CONFIG_SLUB_DEBUG option will allow
you to iterate and display slabinfo just for that particular memcg. I am
thinking of extending the debug controller to do similar thing for
cgroup v2.
>> + seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
>> + seq_puts(m, " <active_slabs> <num_slabs>\n");
>> + list_for_each_entry(s, &slab_root_caches, root_caches_node) {
>> + /*
>> + * Skip kmem caches that don't have any memcg children.
>> + */
>> + if (list_empty(&s->memcg_params.children))
>> + continue;
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> + get_slabinfo(s, &sinfo);
>> + seq_printf(m, "%-17s root %6lu %6lu %6lu %6lu\n",
>> + cache_name(s), sinfo.active_objs, sinfo.num_objs,
>> + sinfo.active_slabs, sinfo.num_slabs);
>> +
>> + for_each_memcg_cache(c, s) {
>> + struct cgroup_subsys_state *css;
>> + char *dead = "";
>> +
>> + css = &c->memcg_params.memcg->css;
>> + if (!(css->flags & CSS_ONLINE))
>> + dead = ":dead";
> Please note that Roman's kmem cache reparenting patch series have made
> kmem caches of zombie memcgs a bit tricky. On memcg offlining the
> memcg kmem caches are reparented and the css->id can get recycled. So,
> we want to know that the a kmem cache is reparented and which memcg it
> belonged to initially. Determining if a kmem cache is reparented, we
> can store a flag on the kmem cache and for the previous memcg we can
> use fhandle. However to not make this more complicated, for now, we
> can just have the info that the kmem cache was reparented i.e. belongs
> to an offlined memcg.
I need to play with Roman's kmem cache reparenting patch a bit more to
see how to properly recognize a reparent'ed kmem cache. What I have
noticed is that the dead kmem caches that I saw at boot up were gone
after applying his patch. So that is a good thing.
For now, I think the current patch is good enough for its purpose. I may
send follow-up if I see something that can be improved.
Cheers,
Longman
next prev parent reply other threads:[~2019-06-20 14:24 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-19 17:16 [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file Waiman Long
2019-06-19 23:48 ` Shakeel Butt
2019-06-20 14:23 ` Waiman Long [this message]
2019-06-20 14:39 ` Shakeel Butt
2019-06-20 14:48 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfc6c800-1cb4-e2f2-e6d9-f0571c11a47b@redhat.com \
--to=longman@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.