From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Reinette Chatre <reinette.chatre@intel.com>,
Xiaochen Shen <xiaochen.shen@intel.com>,
Borislav Petkov <bp@suse.de>, Tony Luck <tony.luck@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.5 03/23] x86/resctrl: Fix a deadlock due to inaccurate reference
Date: Mon, 3 Feb 2020 16:20:23 +0000 [thread overview]
Message-ID: <20200203161903.583723943@linuxfoundation.org> (raw)
In-Reply-To: <20200203161902.288335885@linuxfoundation.org>
From: Xiaochen Shen <xiaochen.shen@intel.com>
[ Upstream commit 334b0f4e9b1b4a1d475f803419d202f6c5e4d18e ]
There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:
$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus cpus_list mon_data tasks
Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1
3 locks held by mkdir/48649:
#0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
#1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
#2: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
4 locks held by rmdir/48652:
#0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
#1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
#2: (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
#3: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.
Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.
As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.
Thread 1 (rdtgroup_rmdir) Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1) (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
------------------------- -------------------------
kernfs_iop_mkdir
/*
* kn: "m1", parent_kn: "mon_groups",
* prgrp_kn: parent_kn->parent: "c1",
*
* "mon_groups", parent_kn->active++: 1
*/
kernfs_get_active(parent_kn)
kernfs_iop_rmdir
/* "c1", kn->active++ */
kernfs_get_active(kn)
rdtgroup_kn_lock_live
atomic_inc(&rdtgrp->waitcount)
/* "c1", kn->active-- */
kernfs_break_active_protection(kn)
mutex_lock
rdtgroup_rmdir_ctrl
free_all_child_rdtgrp
sentry->flags = RDT_DELETED
rdtgroup_ctrl_remove
rdtgrp->flags = RDT_DELETED
kernfs_get(kn)
kernfs_remove(rdtgrp->kn)
__kernfs_remove
/* "mon_groups", sub_kn */
atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
kernfs_drain(sub_kn)
/*
* sub_kn->active == KN_DEACTIVATED_BIAS + 1,
* waiting on sub_kn->active to drop, but it
* never drops in Thread 2 which is blocked
* on getting rdtgroup_mutex.
*/
Thread 1 hangs here ---->
wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
...
rdtgroup_mkdir
rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
mkdir_rdt_prepare(parent_kn, prgrp_kn)
rdtgroup_kn_lock_live(prgrp_kn)
atomic_inc(&rdtgrp->waitcount)
/*
* "c1", prgrp_kn->active--
*
* The active reference on "c1" is
* dropped, but not matching the
* actual active reference taken
* on "mon_groups", thus causing
* Thread 1 to wait forever while
* holding rdtgroup_mutex.
*/
kernfs_break_active_protection(
prgrp_kn)
/*
* Trying to get rdtgroup_mutex
* which is held by Thread 1.
*/
Thread 2 hangs here ----> mutex_lock
...
The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.
Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.
This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.
Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index dac7209a07084..e4da26325e3ea 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1970,7 +1970,7 @@ static int rdt_get_tree(struct fs_context *fc)
if (rdt_mon_capable) {
ret = mongroup_create_dir(rdtgroup_default.kn,
- NULL, "mon_groups",
+ &rdtgroup_default, "mon_groups",
&kn_mongrp);
if (ret < 0)
goto out_info;
@@ -2446,7 +2446,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
/*
* Create the mon_data directory first.
*/
- ret = mongroup_create_dir(parent_kn, NULL, "mon_data", &kn);
+ ret = mongroup_create_dir(parent_kn, prgrp, "mon_data", &kn);
if (ret)
return ret;
@@ -2645,7 +2645,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
uint files = 0;
int ret;
- prdtgrp = rdtgroup_kn_lock_live(prgrp_kn);
+ prdtgrp = rdtgroup_kn_lock_live(parent_kn);
if (!prdtgrp) {
ret = -ENODEV;
goto out_unlock;
@@ -2718,7 +2718,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
kernfs_activate(kn);
/*
- * The caller unlocks the prgrp_kn upon success.
+ * The caller unlocks the parent_kn upon success.
*/
return 0;
@@ -2729,7 +2729,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
out_free_rgrp:
kfree(rdtgrp);
out_unlock:
- rdtgroup_kn_unlock(prgrp_kn);
+ rdtgroup_kn_unlock(parent_kn);
return ret;
}
@@ -2767,7 +2767,7 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
*/
list_add_tail(&rdtgrp->mon.crdtgrp_list, &prgrp->mon.crdtgrp_list);
- rdtgroup_kn_unlock(prgrp_kn);
+ rdtgroup_kn_unlock(parent_kn);
return ret;
}
@@ -2810,7 +2810,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
* Create an empty mon_groups directory to hold the subset
* of tasks and cpus to monitor.
*/
- ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL);
+ ret = mongroup_create_dir(kn, rdtgrp, "mon_groups", NULL);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
goto out_del_list;
@@ -2826,7 +2826,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
out_common_fail:
mkdir_rdt_prepare_clean(rdtgrp);
out_unlock:
- rdtgroup_kn_unlock(prgrp_kn);
+ rdtgroup_kn_unlock(parent_kn);
return ret;
}
--
2.20.1
next prev parent reply other threads:[~2020-02-03 16:38 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-03 16:20 [PATCH 5.5 00/23] 5.5.2-stable review Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 01/23] vfs: fix do_last() regression Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 02/23] cifs: fix soft mounts hanging in the reconnect code Greg Kroah-Hartman
2020-02-03 16:20 ` Greg Kroah-Hartman [this message]
2020-02-03 16:20 ` [PATCH 5.5 04/23] x86/resctrl: Fix use-after-free when deleting resource groups Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 05/23] x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 06/23] KVM: PPC: Book3S PR: Fix -Werror=return-type build failure Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 07/23] gfs2: Another gfs2_find_jhead fix Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 08/23] lib/test_bitmap: correct test data offsets for 32-bit Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 09/23] perf c2c: Fix return type for histogram sorting comparision functions Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 10/23] PM / devfreq: Add new name attribute for sysfs Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 11/23] tools lib: Fix builds when glibc contains strlcpy() Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 12/23] arm64: kbuild: remove compressed images on make ARCH=arm64 (dist)clean Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 13/23] mm/mempolicy.c: fix out of bounds write in mpol_parse_str() Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 14/23] reiserfs: Fix memory leak of journal device string Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 15/23] media: digitv: dont continue if remote control state cant be read Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 16/23] media: af9005: uninitialized variable printked Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 17/23] media: vp7045: do not read uninitialized values if usb transfer fails Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 18/23] media: gspca: zero usb_buf Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 19/23] media: dvb-usb/dvb-usb-urb.c: initialize actlen to 0 Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 20/23] tomoyo: Use atomic_t for statistics counter Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 21/23] ttyprintk: fix a potential deadlock in interrupt context issue Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 22/23] Bluetooth: Fix race condition in hci_release_sock() Greg Kroah-Hartman
2020-02-03 16:20 ` [PATCH 5.5 23/23] cgroup: Prevent double killing of css when enabling threaded cgroup Greg Kroah-Hartman
[not found] ` <20200203161902.288335885-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-02-03 21:40 ` [PATCH 5.5 00/23] 5.5.2-stable review Jon Hunter
2020-02-03 21:40 ` Jon Hunter
[not found] ` <10cd3c5f-0a2a-f73c-f071-17d1cc33531b-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2020-02-03 22:51 ` Greg Kroah-Hartman
2020-02-03 22:51 ` Greg Kroah-Hartman
2020-02-04 15:45 ` Naresh Kamboju
2020-02-05 13:07 ` Greg Kroah-Hartman
2020-02-08 16:13 ` Daniel Díaz
2020-02-04 17:20 ` Guenter Roeck
2020-02-04 22:56 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200203161903.583723943@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=reinette.chatre@intel.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=xiaochen.shen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.