public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: linux-kernel@vger.kernel.org
Cc: linux-tip-commits@vger.kernel.org,
	 "Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	x86@kernel.org,  kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [tip: sched/core] sched/clock/x86: Mark sched_clock() noinstr
Date: Wed, 15 Jan 2025 15:22:19 -0800	[thread overview]
Message-ID: <Z4hDK27OV7wK572A@google.com> (raw)
In-Reply-To: <167517494734.4906.17956886323824650289.tip-bot2@tip-bot2>

+KVM and Paolo

On Tue, Jan 31, 2023, tip-bot2 for Peter Zijlstra wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     8739c6811572b087decd561f96382087402cc343
> Gitweb:        https://git.kernel.org/tip/8739c6811572b087decd561f96382087402cc343
> Author:        Peter Zijlstra <peterz@infradead.org>
> AuthorDate:    Thu, 26 Jan 2023 16:08:36 +01:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Tue, 31 Jan 2023 15:01:47 +01:00
> 
> sched/clock/x86: Mark sched_clock() noinstr
> 
> In order to use sched_clock() from noinstr code, mark it and all it's
> implenentations noinstr.
> 
> The whole pvclock thing (used by KVM/Xen) is a bit of a pain,
> since it calls out to watchdogs, create a
> pvclock_clocksource_read_nowd() variant doesn't do that and can be
> noinstr.

...

> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 16333ba..0f35d44 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -71,12 +71,12 @@ static int kvm_set_wallclock(const struct timespec64 *now)
>  	return -ENODEV;
>  }
>  
> -static u64 kvm_clock_read(void)
> +static noinstr u64 kvm_clock_read(void)
>  {
>  	u64 ret;
>  
>  	preempt_disable_notrace();
> -	ret = pvclock_clocksource_read(this_cpu_pvti());
> +	ret = pvclock_clocksource_read_nowd(this_cpu_pvti());
>  	preempt_enable_notrace();
>  	return ret;
>  }
> @@ -86,7 +86,7 @@ static u64 kvm_clock_get_cycles(struct clocksource *cs)
>  	return kvm_clock_read();
>  }
>  
> -static u64 kvm_sched_clock_read(void)
> +static noinstr u64 kvm_sched_clock_read(void)
>  {
>  	return kvm_clock_read() - kvm_sched_clock_offset;
>  }
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 5a2a517..56acf53 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -64,7 +64,8 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
>  	return flags & valid_flags;
>  }
>  
> -u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> +static __always_inline
> +u64 __pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, bool dowd)
>  {
>  	unsigned version;
>  	u64 ret;
> @@ -77,7 +78,7 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>  		flags = src->flags;
>  	} while (pvclock_read_retry(src, version));
>  
> -	if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {
> +	if (dowd && unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {

This effectively broke sched_clock handling of PVCLOCK_GUEST_STOPPED, which was
added by commit d63285e94af3 ("pvclock: detect watchdog reset at pvclock read").

By skipping the watchdog goo in the kvm_clock_read() path, the watchdogs will only
be petted when kvmclock's pvclock_read_wallclock() is invoked, which AFAICT is
limited to guest boot and suspend+resume (in the guest).  I.e. PVCLOCK_GUEST_STOPPED
is ignored until the *guest* suspends, which completely defeats the purpose of
petting the watchdogs when reading the clock.

I'm guessing no one has noticed/complained because watchdog_timer_fn,
wq_watchdog_timer_fn(), and check_cpu_stall() all manually handling a STOPPED
vCPU by invoking kvm_check_and_clear_guest_paused().  At a glance, only the
hung task detector isn't in on the game, but its doggie still gets petted so
long as one of the other watchdogs runs first.

One option would be to sprinkle noinstr everywhere, because despite
pvclock_touch_watchdogs() calling a lot of functions, all of those functions
are little more than one-liners, i.e. can be noinstr without significant downsides.

Alternatively, we could explicitly revert commit d63285e94af3, and then bribe
someone into finding a better/common place to handle PVCLOCK_GUEST_STOPPED.
I am leaning heavily toward this alternative, because the way things are headed
with kvmclock is that the kernel will stop using it for sched_clock[*], at which
point VMs that run on modern setups won't get the sched_clock behavior anyways.

[*] https://lore.kernel.org/all/Z4gqlbumOFPF_rxd@google.com

  reply	other threads:[~2025-01-15 23:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-26 15:08 [PATCH v2 0/9] A few more cpuidle vs rcu fixes Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 1/9] drivers: firmware: psci: Dont instrument suspend code Peter Zijlstra
2023-01-31 14:22   ` [tip: sched/core] cpuidle: " tip-bot2 for Mark Rutland
2023-01-26 15:08 ` [PATCH v2 2/9] bug: Disable rcu_is_watching() during WARN/BUG Peter Zijlstra
2023-01-31 14:22   ` [tip: sched/core] cpuidle: lib/bug: " tip-bot2 for Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 3/9] tracing: Warn about !rcu_is_watching() Peter Zijlstra
2023-01-31 14:22   ` [tip: sched/core] cpuidle: " tip-bot2 for Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 4/9] tracing, preempt: Squash _rcuidle tracing Peter Zijlstra
2023-01-31  8:50   ` [PATCH v2.1 " Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 5/9] x86: Always inline arch_atomic64 Peter Zijlstra
2023-01-31 14:22   ` [tip: sched/core] x86/atomics: Always inline arch_atomic64*() tip-bot2 for Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 6/9] x86/pvclock: improve atomic update of last_value in pvclock_clocksource_read Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 7/9] x86: Mark sched_clock() noinstr Peter Zijlstra
2023-01-31 14:22   ` [tip: sched/core] sched/clock/x86: " tip-bot2 for Peter Zijlstra
2025-01-15 23:22     ` Sean Christopherson [this message]
2023-01-26 15:08 ` [PATCH v2 8/9] sched/clock: Make local_clock() noinstr Peter Zijlstra
2023-01-31 14:22   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-01-26 15:08 ` [PATCH v2 9/9] cpuidle: Fix poll_idle() noinstr annotation Peter Zijlstra
2023-01-29 23:23 ` [PATCH v2 0/9] A few more cpuidle vs rcu fixes Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4hDK27OV7wK572A@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox