From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 474DB1DD884 for ; Wed, 15 Jan 2025 23:22:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736983344; cv=none; b=U4kBa0xPaBk4xMTTX9R83rb53lLhGCuKdNfiSHqUaojyR1+MNucaQmF+GyIuEUB72F8lySjMQBf/wtl6G2DuZgcfw/HO1i5VU7b5mAsG8wI8+JhBMtp1bEdi6Sy0/cTeMoRHPi8IwfTAcJzE11//nHzddZAF1Ggjw4ba2CIlCJ4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736983344; c=relaxed/simple; bh=KdlJluTkxOnfAYEFjgsHb4xMd6XZulqvgcRfp7khFXA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PvTuwCDd/mekNaYJ9W8WB3G4Ofi0FuxKtnaUx2X2WGkY6NrZhB2C4kYURuryu1S/NzWHps6cNzAzxQDWWpAhuM8ZROsa+7F5EP4+yon4YyL7tTlfGXrNcK2cztZU0ybai2VJLmPC97seuV2wVE+9YfJYdMKqYPWcsNVYVxsE7c8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2wrgifJI; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2wrgifJI" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2f6dbe247e5so698783a91.0 for ; Wed, 15 Jan 2025 15:22:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736983340; x=1737588140; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3JHIOCMGw41yGBGLuHWVhzyX82V4iAB9WkKnb5lhxgI=; b=2wrgifJISFw+YC37b16X7klYyf9ng1mpSAAP2kupmAiKkGudFVY5rzMN0e1VkmefPe 21LwDvh09kv34tdbNod1v8WGJ+6h5KUtBDMVPAFYBWtz6F/SeaG1yM5nlIUFvclPqGZ5 JurxWWQGorhvQ+qSFlBJITE/ezQDHOdM39paiRms5bO/3kCIWO7tb6l0m2xcc7RLD7cr nMBJibehc2QMarb5ycO26rAJTGSKACA8goC7gsuzSy+g2SiLaZgWF6AreWzfQ8rPs78O 89PupwCPojaSt13zEpBPAiUqOAaEtpDl9uOCgNIKkCfLYniwWtwZWXdMH+t1e4UJxxW3 ZIUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736983340; x=1737588140; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3JHIOCMGw41yGBGLuHWVhzyX82V4iAB9WkKnb5lhxgI=; b=H7xVNI/DqVl6SXonF8XS8hgkYpSe6/5O9ZhkhG0p+4N/aB9ZaNls2U883wnV0ik+yd a+xXd/psxV2R1WdKtGDbRsT7+T+wmMuOVHn0nY1fWeKW9Eq/nUqAODBPoBtlPOgSYbM2 Fn5lo7iKb7gvaVEsCteZFdWMvQuq4KZoF/YaxEv0kmIdyvyOJDp5zrkErnTn7iWJbxKB 46Bp9vcsWRsKpTzb7Hb0TiqoWYf+BAo9pHmMGwOWNsLBXgcOaWm92jI5lREGqBF+UcVn 6IyAVscts2rtPvw+lDU8EqSP8lEj1NDtTiDqYaKAg1ykwpMcnbtaeRLQOzCQcU419xe6 K6Vw== X-Gm-Message-State: AOJu0YwjSAYa/QYk0inN+l4rrrN3o6zAGGdCIdc0th4UDwYKBtiKryhk I7nlZvFRa9CT4R+nRjypL5jl/eiG9k4Jxva2DE6eeuXBrt0VpJ4zijUFNXcN8n+cLImOYmT++HM xzzIoDT2bVXf9d9I/se3S76lI404XaCE0wtcVdcWm3OsgSFffhHoRSqP5CP/OQtlG8lAQr42oE7 NO/xIu3TTbzJmaNJ/iq0nkvZhrrQ0IBPbaCOYOG34g X-Google-Smtp-Source: AGHT+IFBqE3qrBf0HSYHhZWrcCHWk/sS7blXXVC78Ra3Mf5IcMOQB/FvY2GQjkfkps9VzEOlqOEVe0u7bYc= X-Received: from pjbpx16.prod.google.com ([2002:a17:90b:2710:b0:2ef:786a:1835]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:534b:b0:2ea:7329:43 with SMTP id 98e67ed59e1d1-2f548e9a5c2mr40238879a91.6.1736983340616; Wed, 15 Jan 2025 15:22:20 -0800 (PST) Date: Wed, 15 Jan 2025 15:22:19 -0800 In-Reply-To: <167517494734.4906.17956886323824650289.tip-bot2@tip-bot2> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230126151323.702003578@infradead.org> <167517494734.4906.17956886323824650289.tip-bot2@tip-bot2> Message-ID: Subject: Re: [tip: sched/core] sched/clock/x86: Mark sched_clock() noinstr From: Sean Christopherson To: linux-kernel@vger.kernel.org Cc: linux-tip-commits@vger.kernel.org, "Peter Zijlstra (Intel)" , Ingo Molnar , x86@kernel.org, kvm@vger.kernel.org, Paolo Bonzini Content-Type: text/plain; charset="us-ascii" +KVM and Paolo On Tue, Jan 31, 2023, tip-bot2 for Peter Zijlstra wrote: > The following commit has been merged into the sched/core branch of tip: > > Commit-ID: 8739c6811572b087decd561f96382087402cc343 > Gitweb: https://git.kernel.org/tip/8739c6811572b087decd561f96382087402cc343 > Author: Peter Zijlstra > AuthorDate: Thu, 26 Jan 2023 16:08:36 +01:00 > Committer: Ingo Molnar > CommitterDate: Tue, 31 Jan 2023 15:01:47 +01:00 > > sched/clock/x86: Mark sched_clock() noinstr > > In order to use sched_clock() from noinstr code, mark it and all it's > implenentations noinstr. > > The whole pvclock thing (used by KVM/Xen) is a bit of a pain, > since it calls out to watchdogs, create a > pvclock_clocksource_read_nowd() variant doesn't do that and can be > noinstr. ... > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index 16333ba..0f35d44 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -71,12 +71,12 @@ static int kvm_set_wallclock(const struct timespec64 *now) > return -ENODEV; > } > > -static u64 kvm_clock_read(void) > +static noinstr u64 kvm_clock_read(void) > { > u64 ret; > > preempt_disable_notrace(); > - ret = pvclock_clocksource_read(this_cpu_pvti()); > + ret = pvclock_clocksource_read_nowd(this_cpu_pvti()); > preempt_enable_notrace(); > return ret; > } > @@ -86,7 +86,7 @@ static u64 kvm_clock_get_cycles(struct clocksource *cs) > return kvm_clock_read(); > } > > -static u64 kvm_sched_clock_read(void) > +static noinstr u64 kvm_sched_clock_read(void) > { > return kvm_clock_read() - kvm_sched_clock_offset; > } > diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c > index 5a2a517..56acf53 100644 > --- a/arch/x86/kernel/pvclock.c > +++ b/arch/x86/kernel/pvclock.c > @@ -64,7 +64,8 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src) > return flags & valid_flags; > } > > -u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) > +static __always_inline > +u64 __pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, bool dowd) > { > unsigned version; > u64 ret; > @@ -77,7 +78,7 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) > flags = src->flags; > } while (pvclock_read_retry(src, version)); > > - if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) { > + if (dowd && unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) { This effectively broke sched_clock handling of PVCLOCK_GUEST_STOPPED, which was added by commit d63285e94af3 ("pvclock: detect watchdog reset at pvclock read"). By skipping the watchdog goo in the kvm_clock_read() path, the watchdogs will only be petted when kvmclock's pvclock_read_wallclock() is invoked, which AFAICT is limited to guest boot and suspend+resume (in the guest). I.e. PVCLOCK_GUEST_STOPPED is ignored until the *guest* suspends, which completely defeats the purpose of petting the watchdogs when reading the clock. I'm guessing no one has noticed/complained because watchdog_timer_fn, wq_watchdog_timer_fn(), and check_cpu_stall() all manually handling a STOPPED vCPU by invoking kvm_check_and_clear_guest_paused(). At a glance, only the hung task detector isn't in on the game, but its doggie still gets petted so long as one of the other watchdogs runs first. One option would be to sprinkle noinstr everywhere, because despite pvclock_touch_watchdogs() calling a lot of functions, all of those functions are little more than one-liners, i.e. can be noinstr without significant downsides. Alternatively, we could explicitly revert commit d63285e94af3, and then bribe someone into finding a better/common place to handle PVCLOCK_GUEST_STOPPED. I am leaning heavily toward this alternative, because the way things are headed with kvmclock is that the kernel will stop using it for sched_clock[*], at which point VMs that run on modern setups won't get the sched_clock behavior anyways. [*] https://lore.kernel.org/all/Z4gqlbumOFPF_rxd@google.com