From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4028AC433EF for ; Mon, 6 Sep 2021 01:24:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A00861241 for ; Mon, 6 Sep 2021 01:24:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240056AbhIFBZy (ORCPT ); Sun, 5 Sep 2021 21:25:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:38636 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239127AbhIFBYT (ORCPT ); Sun, 5 Sep 2021 21:24:19 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2AD3A610FE; Mon, 6 Sep 2021 01:22:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1630891324; bh=J0/QKOVQrV6NaD2AEvL3Fp1y0K2MAjINakCwnaEoKV4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ToK2UKJ7emxkLpK87SLQ8emzxDh2n1xNgNpLAqHu5JXrSwnXB0+uB91mllrNpXBVz yFMJA41ebh+wgQ9PrSHrdGGJxQTxdruiHTBHYQfSnltUD86rjqBDn3mse5kMh2q+C3 0NK/tzaBm+8+S+UJfMw815A7zZug4YXXeLdDMlg/GYnB8OLluyO5rw1gwGgEVrOZuX pLae7/bUtT7uoE2GyLun3AoK2o1h2B1vF5jU2YvxtUHrhM6aFXf3NBLDSbH1NWyIyz EORqrzZCtIgCxCmdV7coVPldtmk77/pstCScPwivWKFVnqW4kasCZ1ycI1fQyHSqPv TLnYeGzAcVc3A== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Sergey Senozhatsky , "Signed-off-by : Paul E . McKenney" , Sasha Levin , rcu@vger.kernel.org Subject: [PATCH AUTOSEL 5.10 09/39] rcu/tree: Handle VM stoppage in stall detection Date: Sun, 5 Sep 2021 21:21:23 -0400 Message-Id: <20210906012153.929962-9-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210906012153.929962-1-sashal@kernel.org> References: <20210906012153.929962-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Sergey Senozhatsky [ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ] The soft watchdog timer function checks if a virtual machine was suspended and hence what looks like a lockup in fact is a false positive. This is what kvm_check_and_clear_guest_paused() does: it tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) and if it's set then we need to touch all watchdogs and bail out. Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED check works fine. There is, however, one more watchdog that runs from IRQ, so watchdog timer fn races with it, and that watchdog is not aware of PVCLOCK_GUEST_STOPPED - RCU stall detector. apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() This triggers RCU stalls on our devices during VM resume. If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() then there is nothing on this VCPU that touches watchdogs and RCU reads stale gp stall timestamp and new jiffies value, which makes it think that RCU has stalled. Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and don't report RCU stalls when we resume the VM. Signed-off-by: Sergey Senozhatsky Signed-off-by: Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- kernel/rcu/tree_stall.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index ca21d28a0f98..0435e5e716a8 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -7,6 +7,8 @@ * Author: Paul E. McKenney */ +#include + ////////////////////////////////////////////////////////////////////////////// // // Controlling CPU stall warnings, including delay calculation. @@ -633,6 +635,14 @@ static void check_cpu_stall(struct rcu_data *rdp) (READ_ONCE(rnp->qsmask) & rdp->grpmask) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* We haven't checked in, so go dump stack. */ print_cpu_stall(gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) @@ -642,6 +652,14 @@ static void check_cpu_stall(struct rcu_data *rdp) ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like an RCU stall. Check to see if the host + * stopped the vm. + */ + if (kvm_check_and_clear_guest_paused()) + return; + /* They had a few time units to dump stack, so complain. */ print_other_cpu_stall(gs2, gps); if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) -- 2.30.2