From: <gregkh@linuxfoundation.org>
To: akpm@linux-foundation.org,cl@linux.com,gregkh@linuxfoundation.org,iamjoonsoo.kim@lge.com,linux-mm@kvack.org,penberg@kernel.org,rientjes@google.com,songmuchun@bytedance.com,sunnanyong@huawei.com
Cc: <stable-commits@vger.kernel.org>
Subject: Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 4.19-stable tree
Date: Thu, 22 Jul 2021 16:51:42 +0200 [thread overview]
Message-ID: <1626965502161250@kroah.com> (raw)
In-Reply-To: <20210720082048.2797315-1-sunnanyong@huawei.com>
This is a note to let you know that I've just added the patch titled
mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
to the 4.19-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.patch
and it can be found in the queue-4.19 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From sunnanyong@huawei.com Thu Jul 22 16:42:41 2021
From: Nanyong Sun <sunnanyong@huawei.com>
Date: Tue, 20 Jul 2021 16:20:48 +0800
Subject: mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
To: <songmuchun@bytedance.com>, <cl@linux.com>, <penberg@kernel.org>, <rientjes@google.com>, <iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <stable@vger.kernel.org>
Message-ID: <20210720082048.2797315-1-sunnanyong@huawei.com>
From: Nanyong Sun <sunnanyong@huawei.com>
The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
kmem_cache destroy") introduced a problem: If one thread destroy a
kmem_cache A and another thread concurrently create a kmem_cache B,
which is mergeable with A and has same size with A, the B may fail to
create due to the duplicate sysfs node.
The scenario in detail:
1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
mergeable, it decreases A's refcount and if refcount is 0, then call
memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
a while.
Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
present, it will be deleted in shutdown_cache() which will be called
after flush_memcg_workqueue() is done and lock the slab_mutex again.
2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
is mergeable with A(their size is same), it gain the lock of slab_mutex,
then call __kmem_cache_alias() trying to find a mergeable node, because
of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
memory leak at non-root kmem_cache destroy"), B is not mergeable with
A whose memcg_params.dying is true.
int slab_unmergeable(struct kmem_cache *s)
if (s->refcount < 0)
return 1;
/*
* Skip the dying kmem_cache.
*/
if (s->memcg_params.dying)
return 1;
return 0;
}
So B has to create its own sysfs node by calling:
create_cache->
__kmem_cache_create->
sysfs_slab_add->
kobject_init_and_add
Because B is mergeable itself, its filename of sysfs node is based on its size,
like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
node of A is still present now, so kobject_init_and_add() will return
fail and result in kmem_cache_create() fail.
Concurrently modprobe and rmmod the two modules below can reproduce the issue
quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.
LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
controller") almost refactored memcg slab.
A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
the slab_mutex lock can prevent the race between alias kmem_cache creating thread
and root kmem_cache destroying thread. In the destroying thread, after
flush_memcg_workqueue() is done, judge the refcount again, if someone
reference it again during un-lock time, we don't need to destroy the kmem_cache
completely, we can reuse it.
Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
fix memory leak at non-root kmem_cache destroy"), compare to the fail of
kmem_cache_create, the memory leak in special scenario seems less harmful.
Call trace:
sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x198
show_stack+0x24/0x30
dump_stack+0xb0/0x100
sysfs_warn_dup+0x6c/0x88
sysfs_create_dir_ns+0x104/0x120
kobject_add_internal+0xd0/0x378
kobject_init_and_add+0x90/0xd8
sysfs_slab_add+0x16c/0x2d0
__kmem_cache_create+0x16c/0x1d8
create_cache+0xbc/0x1f8
kmem_cache_create_usercopy+0x1a0/0x230
kmem_cache_create+0x50/0x68
init_se_kmem_caches+0x38/0x258 [target_core_mod]
target_core_init_configfs+0x8c/0x390 [target_core_mod]
do_one_initcall+0x54/0x230
do_init_module+0x64/0x1ec
load_module+0x150c/0x16f0
__se_sys_finit_module+0xf0/0x108
__arm64_sys_finit_module+0x24/0x30
el0_svc_common+0x80/0x1c0
el0_svc_handler+0x78/0xe0
el0_svc+0x10/0x260
kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
kmem_cache_create(se_sess_cache) failed with error -17
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x198
show_stack+0x24/0x30
dump_stack+0xb0/0x100
kmem_cache_create_usercopy+0xa8/0x230
kmem_cache_create+0x50/0x68
init_se_kmem_caches+0x38/0x258 [target_core_mod]
target_core_init_configfs+0x8c/0x390 [target_core_mod]
do_one_initcall+0x54/0x230
do_init_module+0x64/0x1ec
load_module+0x150c/0x16f0
__se_sys_finit_module+0xf0/0x108
__arm64_sys_finit_module+0x24/0x30
el0_svc_common+0x80/0x1c0
el0_svc_handler+0x78/0xe0
el0_svc+0x10/0x260
Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/slab_common.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -311,14 +311,6 @@ int slab_unmergeable(struct kmem_cache *
if (s->refcount < 0)
return 1;
-#ifdef CONFIG_MEMCG_KMEM
- /*
- * Skip the dying kmem_cache.
- */
- if (s->memcg_params.dying)
- return 1;
-#endif
-
return 0;
}
@@ -918,6 +910,16 @@ void kmem_cache_destroy(struct kmem_cach
get_online_mems();
mutex_lock(&slab_mutex);
+
+ /*
+ * Another thread referenced it again
+ */
+ if (READ_ONCE(s->refcount)) {
+ spin_lock_irq(&memcg_kmem_wq_lock);
+ s->memcg_params.dying = false;
+ spin_unlock_irq(&memcg_kmem_wq_lock);
+ goto out_unlock;
+ }
#endif
err = shutdown_memcg_caches(s);
Patches currently in stable-queue which might be from sunnanyong@huawei.com are
queue-4.19/mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.patch
next prev parent reply other threads:[~2021-07-22 14:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-20 8:20 [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed Nanyong Sun
2021-07-22 14:44 ` Greg KH
2021-07-22 14:51 ` gregkh [this message]
2021-07-22 14:52 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 5.4-stable tree gregkh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1626965502161250@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=songmuchun@bytedance.com \
--cc=stable-commits@vger.kernel.org \
--cc=sunnanyong@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.