public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: stsp <stsp2@yandex.ru>
Cc: kvm@vger.kernel.org
Subject: Re: guest/host mem out of sync on core2duo?
Date: Thu, 17 Jun 2021 14:42:08 +0000	[thread overview]
Message-ID: <YMtfQHGJL7XP/0Rq@google.com> (raw)
In-Reply-To: <73f1f90e-f952-45a4-184e-1aafb3e4a8fd@yandex.ru>

Dropped my old @intel email to stop getting bounces.

On Mon, Jun 14, 2021, stsp wrote:
> 14.06.2021 20:06, Sean Christopherson пишет:
> > On Sun, Jun 13, 2021, stsp wrote:
> > > Hi kvm developers.
> > > 
> > > I am having the strange problem that can only be reproduced on a core2duo CPU
> > > but not AMD FX or Intel Core I7.
> > > 
> > > My code has 2 ways of setting the guest registers: one is the guest's ring0
> > > stub that just pops all regs from stack and does iret to ring3.  That works
> > > fine.  But sometimes I use KVM_SET_SREGS and resume the VM directly to ring3.
> > > That randomly results in either a good run or invalid guest state return, or
> > > a page fault in guest.
> > Hmm, a core2duo failure is more than likely due to lack of unrestricted guest.
> > You verify this by loading kvm_intel on the Core i7 with unrestricted_guest=0.
> 
> Wow, excellent shot!  Indeed, the problem then starts reproducing also there!
> So at least I now have a problematic setup myself, rather than needing to ask
> for ssh from everyone involved. :)
> 
> What does this mean to us, though?  That its completely unrelated to any
> memory synchronization?

Yes, more than likely this has nothing to do with memory synchronization.

> > > I tried to analyze when either of the above happens exactly, and I have a
> > > very strong suspection that the problem is in a way I update LDT. LDT is
> > > shared between guest and host with KVM_SET_USER_MEMORY_REGION, and I modify
> > > it on host.  So it seems like if I just allocated the new LDT entry, there is
> > > a risk of invalid guest state, as if the guest's LDT still doesn't have it.
> > > If I modified some LDT entry, there can be a page fault in guest, as if the
> > > entry is still old.
> > IIUC, you are updating the LDT itself, e.g. an FS/GS descriptor in the LDT, as
> > opposed to updating the LDT descriptor in the GDT?
> 
> I am updating the LDT itself, not modifying its descriptor in gdt. And with
> the same KVM_SET_SREGS call I also update the segregs to the new values, if
> needed.

Hmm, unconditionally calling KVM_SET_SREGS if you modify anything in the LDT
would be worth trying.  Or did I misunderstand the "if needed" part?

> > Either way, do you also update all relevant segments via KVM_SET_SREGS after
> > modifying memory?
> 
> Yes, if this is needed.  Sometimes its not needed, and when not - it seems
> page fault is more likely. If I also update segregs - then invalid guest
> state.  But these are just the statistical guesses so far.

Ah.  Hrm.  It would still be worth doing KVM_SET_SREGS unconditionally, e.g. it
would narrow the search if the page faults go away and the failures are always
invalid guest state.

> >     Best guess is that KVM doesn't detect that the VM has state
> > that needs to be emulated, or that KVM's internal register state and what's in
> > memory are not consistent.
> 
> Hope you know what parts are emulated w/o unrestricted guest, in which case
> we can advance. :)

It's not parts per se.  KVM needs to emulate "everything", one instruction at a
time, until guest state is no longer invalid with respec to the !unrestricted
rules.

> > Anyways, I highly doubt this is a memory synchronization issue, a corner case
> > related to lack of unrestricted guest is much more likely.
> 
> Just to be sure I tried the CD bit in CR0 to rule out the caching issues, and
> that changes nothing.  So...
>
> What to do next?

In addition to the above experiment, can you get a state dump for the invalid
guest state failure?  I.e. load kvm_intel with dump_invalid_vmcs=1.  And on that
failure, also provide the input to KVM_SET_SREGS.  The LDT in memory might also
be interesting, but it's hopefully unnecessary, especially if unconditionally
doing kVM_SET_SREGS makes the page faults go away.

Best case scenario is that KVM_SET_SREGS stuffs invalid guest state that KVM
doesn't correct detect.  That would be easy to debug and fix, and would give us
a regression test as well.

  reply	other threads:[~2021-06-17 14:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-12 22:49 guest/host mem out of sync on core2duo? stsp
2021-06-13 12:36 ` stsp
2021-06-14 17:06 ` Sean Christopherson
2021-06-14 17:32   ` stsp
2021-06-17 14:42     ` Sean Christopherson [this message]
2021-06-18 15:59       ` stsp
2021-06-18 21:07         ` Jim Mattson
2021-06-18 21:55           ` stsp
2021-06-18 22:06             ` Jim Mattson
2021-06-18 22:26               ` stsp
2021-06-18 22:32               ` Sean Christopherson
2021-06-19  0:11                 ` stsp
2021-06-19  0:54                   ` Sean Christopherson
2021-06-19  9:18                     ` stsp
2021-06-21  2:34           ` exception vs SIGALRM race (was: Re: guest/host mem out of sync on core2duo?) stsp
2021-06-21 22:33             ` Jim Mattson
2021-06-21 23:32               ` stsp
2021-06-22  0:27               ` stsp
2021-06-28 21:47                 ` Jim Mattson
2021-06-28 21:50                   ` stsp
2021-06-28 22:00                   ` stsp
2021-06-28 22:27                     ` Jim Mattson
2021-07-06 16:28                       ` Paolo Bonzini
2021-07-06 22:22                         ` stsp
2021-07-06 23:41                           ` Paolo Bonzini
2021-06-23 23:38               ` exception vs SIGALRM race (with test-case now!) stsp
2021-06-24  0:11                 ` stsp
2021-06-24  0:25                   ` stsp
2021-06-24 18:05                     ` exception vs SIGALRM race on core2 CPUs (with qemu-based test-case this time!) stsp
2021-06-24 18:07                     ` stsp
2021-06-25 23:35                       ` exception vs SIGALRM race on core2 CPUs (with fix!) stsp
2021-06-26  0:15                         ` Jim Mattson
2021-06-26  0:35                           ` stsp
2021-06-26 21:50                           ` stsp
2021-06-27 12:13                           ` stsp
2021-06-26 14:03               ` exception vs SIGALRM race (another patch) stsp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMtfQHGJL7XP/0Rq@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=stsp2@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox