From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Tejun Heo <tj@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 12/67] wq: handle VM suspension in stall detection
Date: Mon, 14 Jun 2021 12:26:55 +0200 [thread overview]
Message-ID: <20210614102644.196065247@linuxfoundation.org> (raw)
In-Reply-To: <20210614102643.797691914@linuxfoundation.org>
From: Sergey Senozhatsky <senozhatsky@chromium.org>
[ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ]
If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
once this VCPU resumes it will see the new jiffies value, while it
may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
There is a small chance of misreported WQ stalls in the meantime,
because new jiffies is time_after() old 'ts + thresh'.
wq_watchdog_timer_fn()
{
for_each_pool(pool, pi) {
if (time_after(jiffies, ts + thresh)) {
pr_emerg("BUG: workqueue lockup - pool");
}
}
}
Save jiffies at the beginning of this function and use that value
for stall detection. If VM gets suspended then we continue using
"old" jiffies value and old WQ touch timestamps. If IRQ at some
point restarts the stall detection cycle (pvclock_touch_watchdogs())
then old jiffies will always be before new 'ts + thresh'.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/workqueue.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1cc49340b68a..f278e2f584fd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -49,6 +49,7 @@
#include <linux/uaccess.h>
#include <linux/sched/isolation.h>
#include <linux/nmi.h>
+#include <linux/kvm_para.h>
#include "workqueue_internal.h"
@@ -5555,6 +5556,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
{
unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
bool lockup_detected = false;
+ unsigned long now = jiffies;
struct worker_pool *pool;
int pi;
@@ -5569,6 +5571,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
if (list_empty(&pool->worklist))
continue;
+ /*
+ * If a virtual machine is stopped by the host it can look to
+ * the watchdog like a stall.
+ */
+ kvm_check_and_clear_guest_paused();
+
/* get the latest of pool and touched timestamps */
pool_ts = READ_ONCE(pool->watchdog_ts);
touched = READ_ONCE(wq_watchdog_touched);
@@ -5587,12 +5595,12 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
}
/* did we stall? */
- if (time_after(jiffies, ts + thresh)) {
+ if (time_after(now, ts + thresh)) {
lockup_detected = true;
pr_emerg("BUG: workqueue lockup - pool");
pr_cont_pool_info(pool);
pr_cont(" stuck for %us!\n",
- jiffies_to_msecs(jiffies - pool_ts) / 1000);
+ jiffies_to_msecs(now - pool_ts) / 1000);
}
}
--
2.30.2
next prev parent reply other threads:[~2021-06-14 10:40 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-14 10:26 [PATCH 4.19 00/67] 4.19.195-rc1 review Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 01/67] perf/core: Fix endless multiplex timer Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 02/67] proc: Track /proc/$pid/attr/ opener mm_struct Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 03/67] net/nfc/rawsock.c: fix a permission check bug Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 04/67] ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 05/67] ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 06/67] ASoC: sti-sas: add missing MODULE_DEVICE_TABLE Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 07/67] isdn: mISDN: netjet: Fix crash in nj_probe: Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 08/67] bonding: init notify_work earlier to avoid uninitialized use Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 09/67] netlink: disable IRQs for netlink_lock_table() Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 10/67] net: mdiobus: get rid of a BUG_ON() Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 11/67] cgroup: disable controllers at parse time Greg Kroah-Hartman
2021-06-14 10:26 ` Greg Kroah-Hartman [this message]
2021-06-14 10:26 ` [PATCH 4.19 13/67] net/qla3xxx: fix schedule while atomic in ql_sem_spinlock Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 14/67] RDS tcp loopback connection can hang Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 15/67] scsi: bnx2fc: Return failure if io_req is already in ABTS processing Greg Kroah-Hartman
2021-06-14 10:26 ` [PATCH 4.19 16/67] scsi: vmw_pvscsi: Set correct residual data length Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 17/67] scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 18/67] net: macb: ensure the device is available before accessing GEMGXL control registers Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 19/67] net: appletalk: cops: Fix data race in cops_probe1 Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 20/67] nvme-fabrics: decode host pathing error for connect Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 21/67] MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 22/67] bnx2x: Fix missing error code in bnx2x_iov_init_one() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 23/67] powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 24/67] powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 " Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 25/67] i2c: mpc: Make use of i2c_recover_bus() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 26/67] i2c: mpc: implement erratum A-004447 workaround Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 27/67] ARM: dts: imx6qdl-sabresd: Assign corresponding power supply for LDOs Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 28/67] ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 29/67] drm: Fix use-after-free read in drm_getunique() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 30/67] drm: Lock pointer access in drm_master_release() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 31/67] kvm: avoid speculation-based attacks from out-of-range memslot accesses Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 32/67] staging: rtl8723bs: Fix uninitialized variables Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 33/67] btrfs: return value from btrfs_mark_extent_written() in case of error Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 34/67] cgroup1: dont allow \n in renaming Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 35/67] USB: f_ncm: ncm_bitrate (speed) is unsigned Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 36/67] usb: f_ncm: only first packet of aggregate needs to start timer Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 37/67] usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 38/67] usb: dwc3: debugfs: Add and remove endpoint dirs dynamically Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 39/67] usb: dwc3: ep0: fix NULL pointer exception Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 40/67] usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 41/67] usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 42/67] USB: serial: ftdi_sio: add NovaTech OrionMX product ID Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 43/67] USB: serial: omninet: add device id for Zyxel Omni 56K Plus Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 44/67] USB: serial: quatech2: fix control-request directions Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 45/67] USB: serial: cp210x: fix alternate function for CP2102N QFN20 Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 46/67] usb: gadget: eem: fix wrong eem header operation Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 47/67] usb: fix various gadgets null ptr deref on 10gbps cabling Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 48/67] usb: fix various gadget panics " Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 49/67] regulator: core: resolve supply for boot-on/always-on regulators Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 50/67] regulator: max77620: Use device_set_of_node_from_dev() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 51/67] RDMA/mlx4: Do not map the core_clock page to user space unless enabled Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 52/67] vmlinux.lds.h: Avoid orphan section with !SMP Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 53/67] perf: Fix data race between pin_count increment/decrement Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 54/67] sched/fair: Make sure to update tg contrib for blocked load Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 55/67] IB/mlx5: Fix initializing CQ fragments buffer Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 56/67] NFS: Fix a potential NULL dereference in nfs_get_client() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 57/67] NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 58/67] perf session: Correct buffer copying when peeking events Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 59/67] kvm: fix previous commit for 32-bit builds Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 60/67] NFS: Fix use-after-free in nfs4_init_client() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 61/67] NFSv4: Fix second deadlock in nfs4_evict_inode() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 62/67] NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 63/67] scsi: core: Fix error handling of scsi_host_alloc() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 64/67] scsi: core: Put .shost_dev in failure path if host state changes to RUNNING Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 65/67] scsi: core: Only put parent device if host state differs from SHOST_CREATED Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 66/67] ftrace: Do not blindly read the ip address in ftrace_bug() Greg Kroah-Hartman
2021-06-14 10:27 ` [PATCH 4.19 67/67] tracing: Correct the length check which causes memory corruption Greg Kroah-Hartman
2021-06-14 12:02 ` [PATCH 4.19 00/67] 4.19.195-rc1 review Pavel Machek
2021-06-14 17:55 ` Jon Hunter
2021-06-14 19:25 ` Shuah Khan
2021-06-15 1:03 ` Samuel Zou
2021-06-15 9:41 ` Naresh Kamboju
2021-06-15 9:47 ` Sudip Mukherjee
2021-06-15 14:20 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210614102644.196065247@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox