From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
"Signed-off-by : Paul E . McKenney" <paulmck@kernel.org>,
Sasha Levin <sashal@kernel.org>,
rcu@vger.kernel.org
Subject: [PATCH AUTOSEL 5.10 09/39] rcu/tree: Handle VM stoppage in stall detection
Date: Sun, 5 Sep 2021 21:21:23 -0400 [thread overview]
Message-ID: <20210906012153.929962-9-sashal@kernel.org> (raw)
In-Reply-To: <20210906012153.929962-1-sashal@kernel.org>
From: Sergey Senozhatsky <senozhatsky@chromium.org>
[ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ]
The soft watchdog timer function checks if a virtual machine
was suspended and hence what looks like a lockup in fact
is a false positive.
This is what kvm_check_and_clear_guest_paused() does: it
tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
and if it's set then we need to touch all watchdogs and bail
out.
Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
check works fine.
There is, however, one more watchdog that runs from IRQ, so
watchdog timer fn races with it, and that watchdog is not aware
of PVCLOCK_GUEST_STOPPED - RCU stall detector.
apic_timer_interrupt()
smp_apic_timer_interrupt()
hrtimer_interrupt()
__hrtimer_run_queues()
tick_sched_timer()
tick_sched_handle()
update_process_times()
rcu_sched_clock_irq()
This triggers RCU stalls on our devices during VM resume.
If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
then there is nothing on this VCPU that touches watchdogs and
RCU reads stale gp stall timestamp and new jiffies value, which
makes it think that RCU has stalled.
Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
don't report RCU stalls when we resume the VM.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index ca21d28a0f98..0435e5e716a8 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -7,6 +7,8 @@
* Author: Paul E. McKenney <paulmck@linux.ibm.com>
*/
+#include <linux/kvm_para.h>
+
//////////////////////////////////////////////////////////////////////////////
//
// Controlling CPU stall warnings, including delay calculation.
@@ -633,6 +635,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
(READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+ /*
+ * If a virtual machine is stopped by the host it can look to
+ * the watchdog like an RCU stall. Check to see if the host
+ * stopped the vm.
+ */
+ if (kvm_check_and_clear_guest_paused())
+ return;
+
/* We haven't checked in, so go dump stack. */
print_cpu_stall(gps);
if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
@@ -642,6 +652,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+ /*
+ * If a virtual machine is stopped by the host it can look to
+ * the watchdog like an RCU stall. Check to see if the host
+ * stopped the vm.
+ */
+ if (kvm_check_and_clear_guest_paused())
+ return;
+
/* They had a few time units to dump stack, so complain. */
print_other_cpu_stall(gs2, gps);
if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
--
2.30.2
next prev parent reply other threads:[~2021-09-06 1:24 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 1:21 [PATCH AUTOSEL 5.10 01/39] locking/mutex: Fix HANDOFF condition Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 02/39] regmap: fix the offset of register error log Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 03/39] regulator: tps65910: Silence deferred probe error Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 04/39] crypto: mxs-dcp - Check for DMA mapping errors Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 05/39] sched/deadline: Fix reset_on_fork reporting of DL tasks Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 06/39] power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 07/39] crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop() Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 08/39] sched/deadline: Fix missing clock update in migrate_task_rq_dl() Sasha Levin
2021-09-06 1:21 ` Sasha Levin [this message]
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 10/39] EDAC/mce_amd: Do not load edac_mce_amd module on guests Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 11/39] posix-cpu-timers: Force next expiration recalc after itimer reset Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 12/39] hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns() Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 13/39] hrtimer: Ensure timerfd notification for HIGHRES=n Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 14/39] udf: Check LVID earlier Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 15/39] udf: Fix iocharset=utf8 mount option Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 16/39] isofs: joliet: " Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 17/39] bcache: add proper error unwinding in bcache_device_init Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 18/39] blk-throtl: optimize IOPS throttle for large IO scenarios Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 19/39] nvme-tcp: don't update queue count when failing to set io queues Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 20/39] nvme-rdma: " Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 21/39] nvmet: pass back cntlid on successful completion Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 22/39] power: supply: smb347-charger: Add missing pin control activation Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 23/39] power: supply: max17042_battery: fix typo in MAx17042_TOFF Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 24/39] s390/cio: add dev_busid sysfs entry for each subchannel Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 25/39] s390/zcrypt: fix wrong offset index for APKA master key valid state Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 26/39] libata: fix ata_host_start() Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 27/39] crypto: omap - Fix inconsistent locking of device lists Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 28/39] crypto: qat - do not ignore errors from enable_vf2pf_comms() Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 29/39] crypto: qat - handle both source of interrupt in VF ISR Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 30/39] crypto: qat - fix reuse of completion variable Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 31/39] crypto: qat - fix naming for init/shutdown VF to PF notifications Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 32/39] crypto: qat - do not export adf_iov_putmsg() Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 33/39] fcntl: fix potential deadlock for &fasync_struct.fa_lock Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 34/39] udf_get_extendedattr() had no boundary checks Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 35/39] s390/kasan: fix large PMD pages address alignment check Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 36/39] s390/pci: fix misleading rc in clp_set_pci_fn() Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 37/39] s390/debug: keep debug data on resize Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 38/39] s390/debug: fix debug area life cycle Sasha Levin
2021-09-06 1:21 ` [PATCH AUTOSEL 5.10 39/39] s390/ap: fix state machine hang after failure to enable irq Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210906012153.929962-9-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=senozhatsky@chromium.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox