From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Nanyong Sun <sunnanyong@huawei.com>
Subject: [PATCH 5.4 50/71] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
Date: Thu, 22 Jul 2021 18:31:25 +0200 [thread overview]
Message-ID: <20210722155619.545713554@linuxfoundation.org> (raw)
In-Reply-To: <20210722155617.865866034@linuxfoundation.org>
From: Nanyong Sun <sunnanyong@huawei.com>
The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
kmem_cache destroy") introduced a problem: If one thread destroy a
kmem_cache A and another thread concurrently create a kmem_cache B,
which is mergeable with A and has same size with A, the B may fail to
create due to the duplicate sysfs node.
The scenario in detail:
1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
mergeable, it decreases A's refcount and if refcount is 0, then call
memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
a while.
Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
present, it will be deleted in shutdown_cache() which will be called
after flush_memcg_workqueue() is done and lock the slab_mutex again.
2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
is mergeable with A(their size is same), it gain the lock of slab_mutex,
then call __kmem_cache_alias() trying to find a mergeable node, because
of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
memory leak at non-root kmem_cache destroy"), B is not mergeable with
A whose memcg_params.dying is true.
int slab_unmergeable(struct kmem_cache *s)
if (s->refcount < 0)
return 1;
/*
* Skip the dying kmem_cache.
*/
if (s->memcg_params.dying)
return 1;
return 0;
}
So B has to create its own sysfs node by calling:
create_cache->
__kmem_cache_create->
sysfs_slab_add->
kobject_init_and_add
Because B is mergeable itself, its filename of sysfs node is based on its size,
like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
node of A is still present now, so kobject_init_and_add() will return
fail and result in kmem_cache_create() fail.
Concurrently modprobe and rmmod the two modules below can reproduce the issue
quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.
LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
controller") almost refactored memcg slab.
A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
the slab_mutex lock can prevent the race between alias kmem_cache creating thread
and root kmem_cache destroying thread. In the destroying thread, after
flush_memcg_workqueue() is done, judge the refcount again, if someone
reference it again during un-lock time, we don't need to destroy the kmem_cache
completely, we can reuse it.
Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
fix memory leak at non-root kmem_cache destroy"), compare to the fail of
kmem_cache_create, the memory leak in special scenario seems less harmful.
Call trace:
sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x198
show_stack+0x24/0x30
dump_stack+0xb0/0x100
sysfs_warn_dup+0x6c/0x88
sysfs_create_dir_ns+0x104/0x120
kobject_add_internal+0xd0/0x378
kobject_init_and_add+0x90/0xd8
sysfs_slab_add+0x16c/0x2d0
__kmem_cache_create+0x16c/0x1d8
create_cache+0xbc/0x1f8
kmem_cache_create_usercopy+0x1a0/0x230
kmem_cache_create+0x50/0x68
init_se_kmem_caches+0x38/0x258 [target_core_mod]
target_core_init_configfs+0x8c/0x390 [target_core_mod]
do_one_initcall+0x54/0x230
do_init_module+0x64/0x1ec
load_module+0x150c/0x16f0
__se_sys_finit_module+0xf0/0x108
__arm64_sys_finit_module+0x24/0x30
el0_svc_common+0x80/0x1c0
el0_svc_handler+0x78/0xe0
el0_svc+0x10/0x260
kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
kmem_cache_create(se_sess_cache) failed with error -17
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x198
show_stack+0x24/0x30
dump_stack+0xb0/0x100
kmem_cache_create_usercopy+0xa8/0x230
kmem_cache_create+0x50/0x68
init_se_kmem_caches+0x38/0x258 [target_core_mod]
target_core_init_configfs+0x8c/0x390 [target_core_mod]
do_one_initcall+0x54/0x230
do_init_module+0x64/0x1ec
load_module+0x150c/0x16f0
__se_sys_finit_module+0xf0/0x108
__arm64_sys_finit_module+0x24/0x30
el0_svc_common+0x80/0x1c0
el0_svc_handler+0x78/0xe0
el0_svc+0x10/0x260
Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/slab_common.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -325,14 +325,6 @@ int slab_unmergeable(struct kmem_cache *
if (s->refcount < 0)
return 1;
-#ifdef CONFIG_MEMCG_KMEM
- /*
- * Skip the dying kmem_cache.
- */
- if (s->memcg_params.dying)
- return 1;
-#endif
-
return 0;
}
@@ -973,6 +965,16 @@ void kmem_cache_destroy(struct kmem_cach
get_online_mems();
mutex_lock(&slab_mutex);
+
+ /*
+ * Another thread referenced it again
+ */
+ if (READ_ONCE(s->refcount)) {
+ spin_lock_irq(&memcg_kmem_wq_lock);
+ s->memcg_params.dying = false;
+ spin_unlock_irq(&memcg_kmem_wq_lock);
+ goto out_unlock;
+ }
#endif
err = shutdown_memcg_caches(s);
next prev parent reply other threads:[~2021-07-22 16:34 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 16:30 [PATCH 5.4 00/71] 5.4.135-rc1 review Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 01/71] ARM: dts: gemini: rename mdio to the right name Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 02/71] ARM: dts: gemini: add device_type on pci Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 03/71] ARM: dts: rockchip: fix pinctrl sleep nodename for rk3036-kylin and rk3288 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 04/71] arm64: dts: rockchip: fix pinctrl sleep nodename for rk3399.dtsi Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 05/71] ARM: dts: rockchip: Fix the timer clocks order Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 06/71] ARM: dts: rockchip: Fix IOMMU nodes properties on rk322x Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 07/71] ARM: dts: rockchip: Fix power-controller node names for rk3066a Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 08/71] ARM: dts: rockchip: Fix power-controller node names for rk3188 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 09/71] ARM: dts: rockchip: Fix power-controller node names for rk3288 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 10/71] arm64: dts: rockchip: Fix power-controller node names for px30 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 11/71] arm64: dts: rockchip: Fix power-controller node names for rk3328 Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 12/71] reset: ti-syscon: fix to_ti_syscon_reset_data macro Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 13/71] ARM: brcmstb: dts: fix NAND nodes names Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 14/71] ARM: Cygnus: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 15/71] ARM: NSP: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 16/71] ARM: dts: BCM63xx: Fix " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 17/71] ARM: dts: Hurricane 2: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 18/71] ARM: dts: imx6: phyFLEX: Fix UART hardware flow control Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 19/71] ARM: imx: pm-imx5: Fix references to imx5_cpu_suspend_info Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 20/71] rtc: mxc_v2: add missing MODULE_DEVICE_TABLE Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 21/71] kbuild: sink stdout from cmd for silent build Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 22/71] ARM: dts: am57xx-cl-som-am57x: fix ti,no-reset-on-init flag for gpios Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 23/71] ARM: dts: am437x-gp-evm: " Greg Kroah-Hartman
2021-07-22 16:30 ` [PATCH 5.4 24/71] ARM: dts: stm32: fix gpio-keys node on STM32 MCU boards Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 25/71] ARM: dts: stm32: fix RCC node name on stm32f429 MCU Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 26/71] ARM: dts: stm32: fix timer nodes on STM32 MCU to prevent warnings Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 27/71] arm64: dts: juno: Update SCPI nodes as per the YAML schema Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 28/71] ARM: dts: rockchip: fix supply properties in io-domains nodes Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 29/71] ARM: dts: stm32: fix i2c node name on stm32f746 to prevent warnings Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 30/71] ARM: dts: stm32: move stmmac axi config in ethernet node on stm32mp15 Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 31/71] soc/tegra: fuse: Fix Tegra234-only builds Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 32/71] firmware: tegra: bpmp: " Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 33/71] arm64: dts: ls208xa: remove bus-num from dspi node Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 34/71] arm64: dts: imx8mq: assign PCIe clocks Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 35/71] thermal/core: Correct function name thermal_zone_device_unregister() Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 36/71] kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 37/71] rtc: max77686: Do not enforce (incorrect) interrupt trigger type Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 38/71] scsi: aic7xxx: Fix unintentional sign extension issue on left shift of u8 Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 39/71] scsi: libsas: Add LUN number check in .slave_alloc callback Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 40/71] scsi: libfc: Fix array index out of bound exception Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 41/71] scsi: qedf: Add check to synchronize abort and flush Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 42/71] sched/fair: Fix CFS bandwidth hrtimer expiry type Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 43/71] s390: introduce proper type handling call_on_stack() macro Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 44/71] cifs: prevent NULL deref in cifs_compose_mount_options() Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 45/71] arm64: dts: armada-3720-turris-mox: add firmware node Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 46/71] firmware: turris-mox-rwtm: add marvell,armada-3700-rwtm-firmware compatible string Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 47/71] arm64: dts: marvell: armada-37xx: move firmware node to generic dtsi file Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 48/71] f2fs: Show casefolding support only when supported Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 49/71] usb: cdns3: Enable TDL_CHK only for OUT ep Greg Kroah-Hartman
2021-07-22 16:31 ` Greg Kroah-Hartman [this message]
2021-07-22 16:31 ` [PATCH 5.4 51/71] dm writecache: return the exact table values that were set Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 52/71] net: dsa: mv88e6xxx: enable .port_set_policy() on Topaz Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 53/71] net: dsa: mv88e6xxx: enable .rmu_disable() " Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 54/71] net: ipv6: fix return value of ip6_skb_dst_mtu Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 55/71] netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfo Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 56/71] net/sched: act_ct: fix err check for nf_conntrack_confirm Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 57/71] net: bridge: sync fdb to new unicast-filtering ports Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 58/71] net: bcmgenet: Ensure all TX/RX queues DMAs are disabled Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 59/71] net: ip_tunnel: fix mtu calculation for ETHER tunnel devices Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 60/71] net: moxa: fix UAF in moxart_mac_probe Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 61/71] net: qcom/emac: fix UAF in emac_remove Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 62/71] net: ti: fix UAF in tlan_remove_one Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 63/71] net: send SYNACK packet with accepted fwmark Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 64/71] net: validate lwtstate->data before returning from skb_tunnel_info() Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 65/71] net: fddi: fix UAF in fza_probe Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 66/71] dma-buf/sync_file: Dont leak fences on merge failure Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 67/71] tcp: annotate data races around tp->mtu_info Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 68/71] ipv6: tcp: drop silly ICMPv6 packet too big messages Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 69/71] bpftool: Properly close va_list ap by va_end() on error Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 70/71] perf test bpf: Free obj_buf Greg Kroah-Hartman
2021-07-22 16:31 ` [PATCH 5.4 71/71] udp: annotate data races around unix_sk(sk)->gso_size Greg Kroah-Hartman
2021-07-23 6:36 ` [PATCH 5.4 00/71] 5.4.135-rc1 review Samuel Zou
2021-07-23 8:01 ` Jon Hunter
2021-07-23 11:28 ` Sudip Mukherjee
2021-07-23 12:54 ` Naresh Kamboju
2021-07-23 15:58 ` Shuah Khan
2021-07-23 16:16 ` Florian Fainelli
2021-07-23 21:07 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210722155619.545713554@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=sunnanyong@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox