public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	juri.lelli@arm.com, xlpang@redhat.com, rostedt@goodmis.org,
	mathieu.desnoyers@efficios.com, jdesfossez@efficios.com,
	dvhart@infradead.org, bristot@redhat.com,
	Ben Hutchings <ben@decadent.org.uk>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: [PATCH 4.9 40/53] futex: Drop hb->lock before enqueueing on the rtmutex
Date: Mon, 29 Mar 2021 09:58:15 +0200	[thread overview]
Message-ID: <20210329075608.827955071@linuxfoundation.org> (raw)
In-Reply-To: <20210329075607.561619583@linuxfoundation.org>

From: Peter Zijlstra <peterz@infradead.org>

commit 56222b212e8edb1cf51f5dd73ff645809b082b40 upstream.

When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
chain code will (falsely) report a deadlock and BUG.

The problem is that it hold hb->lock (now an rt_mutex) while doing
task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
interleaved just right with futex_unlock_pi() leads it to believe to see an
AB-BA deadlock.

  Task1 (holds rt_mutex,	Task2 (does FUTEX_LOCK_PI)
         does FUTEX_UNLOCK_PI)

				lock hb->lock
				lock rt_mutex (as per start_proxy)
  lock hb->lock

Which is a trivial AB-BA.

It is not an actual deadlock, because it won't be holding hb->lock by the
time it actually blocks on the rt_mutex, but the chainwalk code doesn't
know that and it would be a nightmare to handle this gracefully.

To avoid this problem, do the same as in futex_unlock_pi() and drop
hb->lock after acquiring wait_lock. This still fully serializes against
futex_unlock_pi(), since adding to the wait_list does the very same lock
dance, and removing it holds both locks.

Aside of solving the RT problem this makes the lock and unlock mechanism
symetric and reduces the hb->lock held time.

Reported-and-tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: juri.lelli@arm.com
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/futex.c                  |   30 +++++++++++++++++-------
 kernel/locking/rtmutex.c        |   49 ++++++++++++++++++++++------------------
 kernel/locking/rtmutex_common.h |    3 ++
 3 files changed, 52 insertions(+), 30 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2948,20 +2948,33 @@ retry_private:
 		goto no_block;
 	}
 
+	rt_mutex_init_waiter(&rt_waiter);
+
 	/*
-	 * We must add ourselves to the rt_mutex waitlist while holding hb->lock
-	 * such that the hb and rt_mutex wait lists match.
+	 * On PREEMPT_RT_FULL, when hb->lock becomes an rt_mutex, we must not
+	 * hold it while doing rt_mutex_start_proxy(), because then it will
+	 * include hb->lock in the blocking chain, even through we'll not in
+	 * fact hold it while blocking. This will lead it to report -EDEADLK
+	 * and BUG when futex_unlock_pi() interleaves with this.
+	 *
+	 * Therefore acquire wait_lock while holding hb->lock, but drop the
+	 * latter before calling rt_mutex_start_proxy_lock(). This still fully
+	 * serializes against futex_unlock_pi() as that does the exact same
+	 * lock handoff sequence.
 	 */
-	rt_mutex_init_waiter(&rt_waiter);
-	ret = rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
+	raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock);
+	spin_unlock(q.lock_ptr);
+	ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
+	raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
+
 	if (ret) {
 		if (ret == 1)
 			ret = 0;
 
+		spin_lock(q.lock_ptr);
 		goto no_block;
 	}
 
-	spin_unlock(q.lock_ptr);
 
 	if (unlikely(to))
 		hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
@@ -2974,6 +2987,9 @@ retry_private:
 	 * first acquire the hb->lock before removing the lock from the
 	 * rt_mutex waitqueue, such that we can keep the hb and rt_mutex
 	 * wait lists consistent.
+	 *
+	 * In particular; it is important that futex_unlock_pi() can not
+	 * observe this inconsistency.
 	 */
 	if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter))
 		ret = 0;
@@ -3071,10 +3087,6 @@ retry:
 
 		get_pi_state(pi_state);
 		/*
-		 * Since modifying the wait_list is done while holding both
-		 * hb->lock and wait_lock, holding either is sufficient to
-		 * observe it.
-		 *
 		 * By taking wait_lock while still holding hb->lock, we ensure
 		 * there is no point where we hold neither; and therefore
 		 * wake_futex_pi() must observe a state consistent with what we
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1695,31 +1695,14 @@ void rt_mutex_proxy_unlock(struct rt_mut
 	rt_mutex_set_owner(lock, NULL);
 }
 
-/**
- * rt_mutex_start_proxy_lock() - Start lock acquisition for another task
- * @lock:		the rt_mutex to take
- * @waiter:		the pre-initialized rt_mutex_waiter
- * @task:		the task to prepare
- *
- * Returns:
- *  0 - task blocked on lock
- *  1 - acquired the lock for task, caller should wake it up
- * <0 - error
- *
- * Special API call for FUTEX_REQUEUE_PI support.
- */
-int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 			      struct rt_mutex_waiter *waiter,
 			      struct task_struct *task)
 {
 	int ret;
 
-	raw_spin_lock_irq(&lock->wait_lock);
-
-	if (try_to_take_rt_mutex(lock, task, NULL)) {
-		raw_spin_unlock_irq(&lock->wait_lock);
+	if (try_to_take_rt_mutex(lock, task, NULL))
 		return 1;
-	}
 
 	/* We enforce deadlock detection for futexes */
 	ret = task_blocks_on_rt_mutex(lock, waiter, task,
@@ -1738,12 +1721,36 @@ int rt_mutex_start_proxy_lock(struct rt_
 	if (unlikely(ret))
 		remove_waiter(lock, waiter);
 
-	raw_spin_unlock_irq(&lock->wait_lock);
-
 	debug_rt_mutex_print_deadlock(waiter);
 
 	return ret;
 }
+
+/**
+ * rt_mutex_start_proxy_lock() - Start lock acquisition for another task
+ * @lock:		the rt_mutex to take
+ * @waiter:		the pre-initialized rt_mutex_waiter
+ * @task:		the task to prepare
+ *
+ * Returns:
+ *  0 - task blocked on lock
+ *  1 - acquired the lock for task, caller should wake it up
+ * <0 - error
+ *
+ * Special API call for FUTEX_REQUEUE_PI support.
+ */
+int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+			      struct rt_mutex_waiter *waiter,
+			      struct task_struct *task)
+{
+	int ret;
+
+	raw_spin_lock_irq(&lock->wait_lock);
+	ret = __rt_mutex_start_proxy_lock(lock, waiter, task);
+	raw_spin_unlock_irq(&lock->wait_lock);
+
+	return ret;
+}
 
 /**
  * rt_mutex_next_owner - return the next owner of the lock
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -104,6 +104,9 @@ extern void rt_mutex_init_proxy_locked(s
 				       struct task_struct *proxy_owner);
 extern void rt_mutex_proxy_unlock(struct rt_mutex *lock);
 extern void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter);
+extern int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+				     struct rt_mutex_waiter *waiter,
+				     struct task_struct *task);
 extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 				     struct rt_mutex_waiter *waiter,
 				     struct task_struct *task);



  parent reply	other threads:[~2021-03-29  8:04 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29  7:57 [PATCH 4.9 00/53] 4.9.264-rc1 review Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 01/53] net: fec: ptp: avoid register access when ipg clock is disabled Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 02/53] powerpc/4xx: Fix build errors from mfdcr() Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 03/53] atm: eni: dont release is never initialized Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 04/53] atm: lanai: dont run lanai_dev_close if not open Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 05/53] ixgbe: Fix memleak in ixgbe_configure_clsu32 Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 06/53] net: tehuti: fix error return code in bdx_probe() Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 07/53] sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 08/53] nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 09/53] NFS: Correct size calculation for create reply length Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 10/53] net: wan: fix error return code of uhdlc_init() Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 11/53] atm: uPD98402: fix incorrect allocation Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 12/53] atm: idt77252: fix null-ptr-dereference Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 13/53] u64_stats,lockdep: Fix u64_stats_init() vs lockdep Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 14/53] nfs: we dont support removing system.nfs4_acl Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 15/53] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 16/53] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 17/53] x86/tlb: Flush global mappings when KAISER is disabled Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 18/53] squashfs: fix inode lookup sanity checks Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 19/53] squashfs: fix xattr id and id " Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 20/53] arm64: dts: ls1043a: mark crypto engine dma coherent Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 21/53] bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 22/53] macvlan: macvlan_count_rx() needs to be aware of preemption Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 23/53] net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port Greg Kroah-Hartman
2021-03-29  7:57 ` [PATCH 4.9 24/53] e1000e: add rtnl_lock() to e1000_reset_task Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 25/53] e1000e: Fix error handling in e1000_set_d0_lplu_state_82571 Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 26/53] net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 27/53] can: c_can_pci: c_can_pci_remove(): fix use-after-free Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 28/53] can: c_can: move runtime PM enable/disable to c_can_platform Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 29/53] can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 30/53] mac80211: fix rate mask reset Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 31/53] net: cdc-phonet: fix data-interface release on probe failure Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 32/53] RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 33/53] ACPI: scan: Rearrange memory allocation in acpi_device_add() Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 34/53] ACPI: scan: Use unique number for instance_no Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 35/53] perf auxtrace: Fix auxtrace queue conflict Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 36/53] idr: add ida_is_empty Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 37/53] futex: Use smp_store_release() in mark_wake_futex() Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 38/53] futex,rt_mutex: Introduce rt_mutex_init_waiter() Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 39/53] futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() Greg Kroah-Hartman
2021-03-29  7:58 ` Greg Kroah-Hartman [this message]
2021-03-29  7:58 ` [PATCH 4.9 41/53] futex: Avoid freeing an active timer Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 42/53] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock() Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 43/53] futex: Handle early deadlock return correctly Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 44/53] futex: Fix (possible) missed wakeup Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 45/53] locking/futex: Allow low-level atomic operations to return -EAGAIN Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 46/53] arm64: futex: Bound number of LDXR/STXR loops in FUTEX_WAKE_OP Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 47/53] futex: Prevent robust futex exit race Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 48/53] futex: Fix incorrect should_fail_futex() handling Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 49/53] futex: Handle transient "ownerless" rtmutex state correctly Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 50/53] can: dev: Move device back to init netns on owning netns delete Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 51/53] net: sched: validate stab values Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 52/53] net: qrtr: fix a kernel-infoleak in qrtr_recvmsg() Greg Kroah-Hartman
2021-03-29  7:58 ` [PATCH 4.9 53/53] mac80211: fix double free in ibss_leave Greg Kroah-Hartman
2021-03-29 18:45 ` [PATCH 4.9 00/53] 4.9.264-rc1 review Florian Fainelli
2021-03-29 21:32 ` Guenter Roeck
2021-03-30  1:27 ` Shuah Khan
2021-03-30  7:05 ` Naresh Kamboju
2021-03-30  9:35 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210329075608.827955071@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ben@decadent.org.uk \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dvhart@infradead.org \
    --cc=jdesfossez@efficios.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox