From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D482C43461 for ; Fri, 30 Apr 2021 05:54:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE8B161459 for ; Fri, 30 Apr 2021 05:54:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE8B161459 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7FAC56B009C; Fri, 30 Apr 2021 01:54:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A9BE6B009D; Fri, 30 Apr 2021 01:54:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64ABA6B009E; Fri, 30 Apr 2021 01:54:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110]) by kanga.kvack.org (Postfix) with ESMTP id 4071A6B009C for ; Fri, 30 Apr 2021 01:54:38 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 09E5945D8 for ; Fri, 30 Apr 2021 05:54:38 +0000 (UTC) X-FDA: 78087969036.04.5EC82B1 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf23.hostedemail.com (Postfix) with ESMTP id 724D5A0003B3 for ; Fri, 30 Apr 2021 05:54:32 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 9652861460; Fri, 30 Apr 2021 05:54:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1619762076; bh=Mw8gkNHqPVqA+cGNyW2hXpxDiyDL8O9mW18nNvh8dbI=; h=Date:From:To:Subject:In-Reply-To:From; b=NgzCDH2ZGuZZLgO0qyATlJZMUZMk0uvUx0KnSGB3+ZAUASLOE8UT+BYQR4646qBuQ q0csJmgAmD77hz8vb9G20r97302LvDmXJzokuUN7a1c6OIv9seu8YcqMAvX83m3Y+W 5rgfTWl3mtGOP2xIRRnHVNzU+nrAm10+NOdD+Sso= Date: Thu, 29 Apr 2021 22:54:36 -0700 From: Andrew Morton To: akpm@linux-foundation.org, linux-mm@kvack.org, loberman@redhat.com, mhocko@suse.com, mingo@kernel.org, mm-commits@vger.kernel.org, peterz@infradead.org, pmladek@suse.com, tglx@linutronix.de, torvalds@linux-foundation.org, vincent.whitchurch@axis.com Subject: [patch 027/178] watchdog: cleanup handling of false positives Message-ID: <20210430055436.MWryAEh4m%akpm@linux-foundation.org> In-Reply-To: <20210429225251.02b6386d21b69255b4f6c163@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 724D5A0003B3 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NgzCDH2Z; dmarc=none; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: cn4wpor5nyxdo1qmzd4wcg4h3r8r4i1r Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf23; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619762072-849602 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Petr Mladek Subject: watchdog: cleanup handling of false positives The commit d6ad3e286d2c075 ("softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume") introduced touch_softlockup_watchdog_sync(). It solved a problem when the watchdog was touched in an atomic context, the timer callback was proceed right after releasing interrupts, and the local clock has not been updated yet. In this case, sched_clock_tick() was called in watchdog_timer_fn() before updating the timer. So far so good. Later the commit 5d1c0f4a80a6df73 ("watchdog: add check for suspended vm in softlockup detector") added two kvm_check_and_clear_guest_paused() calls. They touch the watchdog when the guest has been sleeping. The code makes my head spin around. Scenario 1: + guest did sleep: + PVCLOCK_GUEST_STOPPED is set + 1st watchdog_timer_fn() invocation: + the watchdog is not touched yet + is_softlockup() returns too big delay + kvm_check_and_clear_guest_paused(): + clear PVCLOCK_GUEST_STOPPED + call touch_softlockup_watchdog_sync() + set SOFTLOCKUP_DELAY_REPORT + set softlockup_touch_sync + return from the timer callback + 2nd watchdog_timer_fn() invocation: + call sched_clock_tick() even though it is not needed. The timer callback was invoked again only because the clock has already been updated in the meantime. + call kvm_check_and_clear_guest_paused() that does nothing because PVCLOCK_GUEST_STOPPED has been cleared already. + call update_report_ts() and return. This is fine. Except that sched_clock_tick() might allow to set it already during the 1st invocation. Scenario 2: + guest did sleep + 1st watchdog_timer_fn() invocation + same as in 1st scenario + guest did sleep again: + set PVCLOCK_GUEST_STOPPED again + 2nd watchdog_timer_fn() invocation + SOFTLOCKUP_DELAY_REPORT is set from 1st invocation + call sched_clock_tick() + call kvm_check_and_clear_guest_paused() + clear PVCLOCK_GUEST_STOPPED + call touch_softlockup_watchdog_sync() + set SOFTLOCKUP_DELAY_REPORT + set softlockup_touch_sync + call update_report_ts() (set real timestamp immediately) + return from the timer callback + 3rd watchdog_timer_fn() invocation + timestamp is set from 2nd invocation + softlockup_touch_sync is set but not checked because the real timestamp is already set Make the code more straightforward: 1. Always call kvm_check_and_clear_guest_paused() at the very beginning to handle PVCLOCK_GUEST_STOPPED. It touches the watchdog when the quest did sleep. 2. Handle the situation when the watchdog has been touched (SOFTLOCKUP_DELAY_REPORT is set). Call sched_clock_tick() when touch_*sync() variant was used. It makes sure that the timestamp will be up to date even when it has been touched in atomic context or quest did sleep. As a result, kvm_check_and_clear_guest_paused() is called on a single location. And the right timestamp is always set when returning from the timer callback. Link: https://lkml.kernel.org/r/20210311122130.6788-7-pmladek@suse.com Signed-off-by: Petr Mladek Cc: Ingo Molnar Cc: Laurence Oberman Cc: Michal Hocko Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vincent Whitchurch Signed-off-by: Andrew Morton --- kernel/watchdog.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) --- a/kernel/watchdog.c~watchdog-cleanup-handling-of-false-positives +++ a/kernel/watchdog.c @@ -376,7 +376,14 @@ static enum hrtimer_restart watchdog_tim /* .. and repeat */ hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period)); - /* Reset the interval when touched externally by a known slow code. */ + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like a soft lockup. Check to see if the host + * stopped the vm before we process the timestamps. + */ + kvm_check_and_clear_guest_paused(); + + /* Reset the interval when touched by known problematic code. */ if (period_ts == SOFTLOCKUP_DELAY_REPORT) { if (unlikely(__this_cpu_read(softlockup_touch_sync))) { /* @@ -387,10 +394,7 @@ static enum hrtimer_restart watchdog_tim sched_clock_tick(); } - /* Clear the guest paused flag on watchdog reset */ - kvm_check_and_clear_guest_paused(); update_report_ts(); - return HRTIMER_RESTART; } @@ -403,14 +407,6 @@ static enum hrtimer_restart watchdog_tim duration = is_softlockup(touch_ts, period_ts); if (unlikely(duration)) { /* - * If a virtual machine is stopped by the host it can look to - * the watchdog like a soft lockup, check to see if the host - * stopped the vm before we issue the warning - */ - if (kvm_check_and_clear_guest_paused()) - return HRTIMER_RESTART; - - /* * Prevent multiple soft-lockup reports if one cpu is already * engaged in dumping all cpu back traces. */ _