public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Suleiman Souhlal <suleiman@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Chao Gao <chao.gao@intel.com>,
	 David Woodhouse <dwmw2@infradead.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 ssouhlal@freebsd.org
Subject: Re: [PATCH v3 1/3] kvm: Introduce kvm_total_suspend_ns().
Date: Wed, 5 Feb 2025 17:29:50 -0800	[thread overview]
Message-ID: <Z6QQjmJE4CiWtUpI@google.com> (raw)
In-Reply-To: <CABCjUKCDoHtLyX2CvrN+_D4N5ZiL2sLzyg+vY=LMkWZefrP_cA@mail.gmail.com>

On Wed, Feb 05, 2025, Suleiman Souhlal wrote:
> On Tue, Feb 4, 2025 at 4:58 PM Suleiman Souhlal <suleiman@google.com> wrote:
> > > Given that s2idle and standby don't reset host TSC, I think the right way to
> > > handle this conundrum is to address the flaw that's noted in the "backwards TSC"
> > > logic, and adjust guest TSC to be fully up-to-date in the S3 (or lower) case.
> > >
> > >          * ......................................  Unfortunately, we can't
> > >          * bring the TSCs fully up to date with real time, as we aren't yet far
> > >          * enough into CPU bringup that we know how much real time has actually
> > >          * elapsed; our helper function, ktime_get_boottime_ns() will be using boot
> > >          * variables that haven't been updated yet.
> > >
> > > I have no idea why commit 0dd6a6edb012 ("KVM: Dont mark TSC unstable due to S4
> > > suspend") hooked kvm_arch_enable_virtualization_cpu() instead of implementing a
> > > PM notifier, but I don't see anything that suggests it was deliberate, i.e. that
> > > KVm *needs* to effectively snapshot guest TSC when onlining CPUs.
> > >
> > > If that wart is fixed, then both kvmclock and TSC will account host suspend time,
> > > and KVM can safely account the suspend time into steal time regardless of which
> > > clock(s) the guest is using.
> >
> > I tried your suggestion of moving this to a PM notifier and I found
> > that it's possible for VCPUs to run after resume but before the PM
> > notifier has been called, because the resume notifiers get called
> > after tasks are unfrozen. Unfortunately that means that if we were to
> > do that, guest TSCs could go backwards.

Ugh.  That explains why KVM hooks the CPU online path.

> > However, I think it should be possible to keep the existing backwards
> > guest TSC prevention code but also use a notifier that further adjusts
> > the guest TSCs to advance time on suspends where the TSC did go
> > backwards. This would make both s2idle and deep suspends behave the
> > same way.
> 
> An alternative might be to block VCPUs from newly entering the guest
> between the pre and post suspend notifiers.
> Otherwise, some of the steal time accounting would have to be done in
> kvm_arch_enable_virtualization_cpu(), to make sure it gets applied on
> the first VCPU run, in case that happens before the resume notifier
> would have fired. But the comment there says we can't call
> ktime_get_boottime_ns() there, so maybe that's not possible.

I don't think the PM notifier approach is viable.  It's simply too late.  Without
a hook in CPU online, KVM can't even tell which VMs/vCPUs were running before the
suspend, i.e. which VMs need to be updated.

One idea would be to simply fast forward guest TSC to current time when the vCPU
is loaded after suspend+resume.  E.g. this hack appears to work.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcd0c12c308e..27b25f8ca413 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4824,7 +4824,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
        /* Apply any externally detected TSC adjustments (due to suspend) */
        if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
-               adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
+               u64 kernel_ns, tsc_now;
+
+               if (kvm_get_time_and_clockread(&kernel_ns, &tsc_now)) {
+                       u64 l1_tsc = nsec_to_cycles(vcpu, vcpu->kvm->arch.kvmclock_offset + kernel_ns);
+                       u64 offset = kvm_compute_l1_tsc_offset(vcpu, l1_tsc);
+
+                       kvm_vcpu_write_tsc_offset(vcpu, offset);
+               } else {
+                       adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
+               }
                vcpu->arch.tsc_offset_adjustment = 0;
                kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
        }

  reply	other threads:[~2025-02-06  1:29 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07  4:21 [PATCH v3 0/3] KVM: x86: Include host suspended time in steal time Suleiman Souhlal
2025-01-07  4:21 ` [PATCH v3 1/3] kvm: Introduce kvm_total_suspend_ns() Suleiman Souhlal
2025-01-07 15:27   ` Sean Christopherson
2025-01-07 16:43     ` Suleiman Souhlal
2025-01-08  7:15   ` kernel test robot
2025-01-08  8:36   ` kernel test robot
2025-01-15 21:49   ` Sean Christopherson
2025-01-17  6:35     ` Suleiman Souhlal
2025-01-17 16:52       ` Sean Christopherson
2025-01-21  5:37         ` Suleiman Souhlal
2025-01-21 20:22           ` Sean Christopherson
2025-02-04  7:58             ` Suleiman Souhlal
2025-02-05  5:55               ` Suleiman Souhlal
2025-02-06  1:29                 ` Sean Christopherson [this message]
2025-02-13  3:56                   ` Suleiman Souhlal
2025-01-07  4:22 ` [PATCH v3 2/3] KVM: x86: Include host suspended time in steal time Suleiman Souhlal
2025-01-07 15:37   ` Sean Christopherson
2025-01-08  4:05     ` Suleiman Souhlal
2025-01-08 15:17       ` Sean Christopherson
2025-01-07  4:22 ` [PATCH v3 3/3] KVM: x86: Document host suspend being included " Suleiman Souhlal
2025-01-07 15:37   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6QQjmJE4CiWtUpI@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=ssouhlal@freebsd.org \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox