From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Reinette Chatre <reinette.chatre@intel.com>,
Xiaochen Shen <xiaochen.shen@intel.com>,
Borislav Petkov <bp@suse.de>, Tony Luck <tony.luck@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 04/70] x86/resctrl: Fix a deadlock due to inaccurate reference
Date: Mon, 3 Feb 2020 16:19:16 +0000 [thread overview]
Message-ID: <20200203161912.983146506@linuxfoundation.org> (raw)
In-Reply-To: <20200203161912.158976871@linuxfoundation.org>
From: Xiaochen Shen <xiaochen.shen@intel.com>
commit 334b0f4e9b1b4a1d475f803419d202f6c5e4d18e upstream.
There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:
$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus cpus_list mon_data tasks
Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1
3 locks held by mkdir/48649:
#0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
#1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
#2: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
4 locks held by rmdir/48652:
#0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
#1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
#2: (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
#3: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.
Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.
As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.
Thread 1 (rdtgroup_rmdir) Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1) (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
------------------------- -------------------------
kernfs_iop_mkdir
/*
* kn: "m1", parent_kn: "mon_groups",
* prgrp_kn: parent_kn->parent: "c1",
*
* "mon_groups", parent_kn->active++: 1
*/
kernfs_get_active(parent_kn)
kernfs_iop_rmdir
/* "c1", kn->active++ */
kernfs_get_active(kn)
rdtgroup_kn_lock_live
atomic_inc(&rdtgrp->waitcount)
/* "c1", kn->active-- */
kernfs_break_active_protection(kn)
mutex_lock
rdtgroup_rmdir_ctrl
free_all_child_rdtgrp
sentry->flags = RDT_DELETED
rdtgroup_ctrl_remove
rdtgrp->flags = RDT_DELETED
kernfs_get(kn)
kernfs_remove(rdtgrp->kn)
__kernfs_remove
/* "mon_groups", sub_kn */
atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
kernfs_drain(sub_kn)
/*
* sub_kn->active == KN_DEACTIVATED_BIAS + 1,
* waiting on sub_kn->active to drop, but it
* never drops in Thread 2 which is blocked
* on getting rdtgroup_mutex.
*/
Thread 1 hangs here ---->
wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
...
rdtgroup_mkdir
rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
mkdir_rdt_prepare(parent_kn, prgrp_kn)
rdtgroup_kn_lock_live(prgrp_kn)
atomic_inc(&rdtgrp->waitcount)
/*
* "c1", prgrp_kn->active--
*
* The active reference on "c1" is
* dropped, but not matching the
* actual active reference taken
* on "mon_groups", thus causing
* Thread 1 to wait forever while
* holding rdtgroup_mutex.
*/
kernfs_break_active_protection(
prgrp_kn)
/*
* Trying to get rdtgroup_mutex
* which is held by Thread 1.
*/
Thread 2 hangs here ----> mutex_lock
...
The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.
Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.
This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.
Backporting notes:
Since upstream commit fa7d949337cc ("x86/resctrl: Rename and move rdt
files to a separate directory"), the file
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c has been renamed and moved to
arch/x86/kernel/cpu/resctrl/rdtgroup.c.
Apply the change against file arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
for older stable trees.
Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 77770caeea242..11c5accfa4db5 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2005,7 +2005,7 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
if (rdt_mon_capable) {
ret = mongroup_create_dir(rdtgroup_default.kn,
- NULL, "mon_groups",
+ &rdtgroup_default, "mon_groups",
&kn_mongrp);
if (ret) {
dentry = ERR_PTR(ret);
@@ -2415,7 +2415,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
/*
* Create the mon_data directory first.
*/
- ret = mongroup_create_dir(parent_kn, NULL, "mon_data", &kn);
+ ret = mongroup_create_dir(parent_kn, prgrp, "mon_data", &kn);
if (ret)
return ret;
@@ -2568,7 +2568,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
uint files = 0;
int ret;
- prdtgrp = rdtgroup_kn_lock_live(prgrp_kn);
+ prdtgrp = rdtgroup_kn_lock_live(parent_kn);
rdt_last_cmd_clear();
if (!prdtgrp) {
ret = -ENODEV;
@@ -2643,7 +2643,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
kernfs_activate(kn);
/*
- * The caller unlocks the prgrp_kn upon success.
+ * The caller unlocks the parent_kn upon success.
*/
return 0;
@@ -2654,7 +2654,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
out_free_rgrp:
kfree(rdtgrp);
out_unlock:
- rdtgroup_kn_unlock(prgrp_kn);
+ rdtgroup_kn_unlock(parent_kn);
return ret;
}
@@ -2692,7 +2692,7 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
*/
list_add_tail(&rdtgrp->mon.crdtgrp_list, &prgrp->mon.crdtgrp_list);
- rdtgroup_kn_unlock(prgrp_kn);
+ rdtgroup_kn_unlock(parent_kn);
return ret;
}
@@ -2735,7 +2735,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
* Create an empty mon_groups directory to hold the subset
* of tasks and cpus to monitor.
*/
- ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL);
+ ret = mongroup_create_dir(kn, rdtgrp, "mon_groups", NULL);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
goto out_del_list;
@@ -2751,7 +2751,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
out_common_fail:
mkdir_rdt_prepare_clean(rdtgrp);
out_unlock:
- rdtgroup_kn_unlock(prgrp_kn);
+ rdtgroup_kn_unlock(parent_kn);
return ret;
}
--
2.20.1
next prev parent reply other threads:[~2020-02-03 16:31 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-03 16:19 [PATCH 4.19 00/70] 4.19.102-stable review Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 01/70] vfs: fix do_last() regression Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 02/70] x86/resctrl: Fix use-after-free when deleting resource groups Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 03/70] x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup Greg Kroah-Hartman
2020-02-03 16:19 ` Greg Kroah-Hartman [this message]
2020-02-03 16:19 ` [PATCH 4.19 05/70] crypto: pcrypt - Fix user-after-free on module unload Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 06/70] rsi: add hci detach for hibernation and poweroff Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 07/70] rsi: fix use-after-free on failed probe and unbind Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 08/70] perf c2c: Fix return type for histogram sorting comparision functions Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 09/70] PM / devfreq: Add new name attribute for sysfs Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 10/70] tools lib: Fix builds when glibc contains strlcpy() Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 11/70] arm64: kbuild: remove compressed images on make ARCH=arm64 (dist)clean Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 12/70] ext4: validate the debug_want_extra_isize mount option at parse time Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 13/70] mm/mempolicy.c: fix out of bounds write in mpol_parse_str() Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 14/70] reiserfs: Fix memory leak of journal device string Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 15/70] media: digitv: dont continue if remote control state cant be read Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 16/70] media: af9005: uninitialized variable printked Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 17/70] media: vp7045: do not read uninitialized values if usb transfer fails Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 18/70] media: gspca: zero usb_buf Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 19/70] media: dvb-usb/dvb-usb-urb.c: initialize actlen to 0 Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 20/70] tomoyo: Use atomic_t for statistics counter Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 21/70] ttyprintk: fix a potential deadlock in interrupt context issue Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 22/70] Bluetooth: Fix race condition in hci_release_sock() Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 23/70] cgroup: Prevent double killing of css when enabling threaded cgroup Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 24/70] media: si470x-i2c: Move free() past last use of radio Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 25/70] ARM: dts: sun8i: a83t: Correct USB3503 GPIOs polarity Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 26/70] ARM: dts: am57xx-beagle-x15/am57xx-idk: Remove "gpios" for endpoint dt nodes Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 27/70] ARM: dts: beagle-x15-common: Model 5V0 regulator Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 28/70] soc: ti: wkup_m3_ipc: Fix race condition with rproc_boot Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 29/70] tools lib traceevent: Fix memory leakage in filter_event Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 30/70] rseq: Unregister rseq for clone CLONE_VM Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 31/70] clk: sunxi-ng: h6-r: Fix AR100/R_APB2 parent order Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 32/70] mac80211: mesh: restrict airtime metric to peered established plinks Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 33/70] clk: mmp2: Fix the order of timer mux parents Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 34/70] ASoC: rt5640: Fix NULL dereference on module unload Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 35/70] ixgbevf: Remove limit of 10 entries for unicast filter list Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 36/70] ixgbe: Fix calculation of queue with VFs and flow director on interface flap Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 37/70] igb: Fix SGMII SFP module discovery for 100FX/LX Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 38/70] platform/x86: GPD pocket fan: Allow somewhat lower/higher temperature limits Greg Kroah-Hartman
2020-02-04 19:12 ` Pavel Machek
2020-02-03 16:19 ` [PATCH 4.19 39/70] ASoC: sti: fix possible sleep-in-atomic Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 40/70] qmi_wwan: Add support for Quectel RM500Q Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 41/70] parisc: Use proper printk format for resource_size_t Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 42/70] wireless: fix enabling channel 12 for custom regulatory domain Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 43/70] cfg80211: Fix radar event during another phy CAC Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 44/70] mac80211: Fix TKIP replay protection immediately after key setup Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 45/70] wireless: wext: avoid gcc -O3 warning Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 46/70] netfilter: nft_tunnel: ERSPAN_VERSION must not be null Greg Kroah-Hartman
2020-02-03 16:19 ` [PATCH 4.19 47/70] net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 48/70] bnxt_en: Fix ipv6 RFS filter matching logic Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 49/70] riscv: delete temporary files Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 50/70] iwlwifi: mvm: fix NVM check for 3168 devices Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 51/70] iwlwifi: Dont ignore the cap field upon mcc update Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 52/70] Input: aiptek - use descriptors of current altsetting Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 53/70] ARM: dts: am335x-boneblack-common: fix memory size Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 54/70] vti[6]: fix packet tx through bpf_redirect() Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 55/70] xfrm interface: " Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 56/70] xfrm: interface: do not confirm neighbor when do pmtu update Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 57/70] scsi: fnic: do not queue commands during fwreset Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 58/70] ARM: 8955/1: virt: Relax arch timer version check during early boot Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 59/70] tee: optee: Fix compilation issue with nommu Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 60/70] airo: Fix possible info leak in AIROOLDIOCTL/SIOCDEVPRIVATE Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 61/70] airo: Add missing CAP_NET_ADMIN check " Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 62/70] r8152: get default setting of WOL before initializing Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 63/70] ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 64/70] qlcnic: Fix CPU soft lockup while collecting firmware dump Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 65/70] powerpc/fsl/dts: add fsl,erratum-a011043 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 66/70] net/fsl: treat fsl,erratum-a011043 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 67/70] net: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 68/70] seq_tab_next() should increase position index Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 69/70] l2t_seq_next " Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 4.19 70/70] net: Fix skb->csum update in inet_proto_csum_replace16() Greg Kroah-Hartman
2020-02-03 21:39 ` [PATCH 4.19 00/70] 4.19.102-stable review Jon Hunter
2020-02-04 12:37 ` Pavel Machek
2020-02-05 14:42 ` Greg Kroah-Hartman
2020-02-04 13:37 ` Naresh Kamboju
2020-02-04 17:19 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200203161912.983146506@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=reinette.chatre@intel.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=xiaochen.shen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).