kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	kvm@vger.kernel.org, "Xiao Guangrong" <guangrong.xiao@gmail.com>
Subject: Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot
Date: Thu, 15 Aug 2019 09:23:24 -0600	[thread overview]
Message-ID: <20190815092324.46bb3ac1@x1.home> (raw)
In-Reply-To: <20190813201914.GI13991@linux.intel.com>

On Tue, 13 Aug 2019 13:19:14 -0700
Sean Christopherson <sean.j.christopherson@intel.com> wrote:

> On Tue, Aug 13, 2019 at 01:33:16PM -0600, Alex Williamson wrote:
> > On Tue, 13 Aug 2019 11:57:37 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:  
> 
> > Could it be something with the gfn test:
> > 
> >                         if (sp->gfn != gfn)
> >                                 continue;
> > 
> > If I remove it, I can't trigger the misbehavior.  If I log it, I only
> > get hits on VM boot/reboot and some of the gfns look suspiciously like
> > they could be the assigned GPU BARs and maybe MSI mappings:
> > 
> >                (sp->gfn) != (gfn)  
> 
> Hits at boot/reboot makes sense, memslots get zapped when userspace
> removes a memory region/slot, e.g. remaps BARs and whatnot.
> 
> ...
>  
> > Is this gfn optimization correct?  Overzealous?  Doesn't account
> > correctly for something about MMIO mappings?  Thanks,  
> 
> Yes?  Shadow pages are stored in a hash table, for_each_valid_sp() walks
> all entries for a given gfn.  The sp->gfn check is there to skip entries
> that hashed to the same list but for a completely different gfn.
> 
> Skipping the gfn check would be sort of a lightweight zap all in the
> sense that it would zap shadow pages that happend to collide with the
> target memslot/gfn but are otherwise unrelated.
> 
> What happens if you give just the GPU BAR at 0x80000000 a pass, i.e.:
> 
> 	if (sp->gfn != gfn && sp->gfn != 0x80000)
> 		continue;
> 
> If that doesn't work, it might be worth trying other gfns to see if you
> can pinpoint which sp is being zapped as collateral damage.
> 
> It's possible there is a pre-existing bug somewhere else that was being
> hidden because KVM was effectively zapping all SPTEs during (re)boot,
> and the hash collision is also hiding the bug by zapping the stale entry.
> 
> Of course it's also possible this code is wrong, :-)

Ok, fun day of trying to figure out which ranges are relevant, I've
narrowed it down to all of these:

0xffe00
0xfee00
0xfec00
0xc1000
0x80a000
0x800000
0x100000

ie. I can effective only say that sp->gfn values of 0x0, 0x40000, and
0x80000 can take the continue branch without seeing bad behavior in the
VM.

The assigned GPU has BARs at GPAs:

0xc0000000-0xc0ffffff
0x800000000-0x808000000
0x808000000-0x809ffffff

And the assigned companion audio function is at GPA:

0xc1080000-0xc1083fff

Only one of those seems to align very well with a gfn base involved
here.  The virtio ethernet has an mmio range at GPA 0x80a000000,
otherwise I don't find any other I/O devices coincident with the gfns
above.

I'm running the VM with 2MB hugepages, but I believe the issue still
occurs with standard pages.  When run with standard pages I see more
hits to gfn values 0, 0x40000, 0x80000, but the same number of hits to
the set above that cannot take the continue branch.  I don't know if
that means anything.

Any further ideas what to look for?  Thanks,

Alex

PS - I see the posted workaround patch, I'll test that in the interim.

  parent reply	other threads:[~2019-08-15 15:23 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 20:54 [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Sean Christopherson
2019-02-05 20:54 ` [PATCH v2 01/27] KVM: Call kvm_arch_memslots_updated() before updating memslots Sean Christopherson
2019-02-06  9:12   ` Cornelia Huck
2019-02-12 12:36 ` [PATCH v2 00/27] KVM: x86/mmu: Remove fast invalidate mechanism Paolo Bonzini
     [not found] ` <20190205210137.1377-11-sean.j.christopherson@intel.com>
2019-08-13 16:04   ` [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot Alex Williamson
2019-08-13 17:04     ` Sean Christopherson
2019-08-13 17:57       ` Alex Williamson
2019-08-13 19:33         ` Alex Williamson
2019-08-13 20:19           ` Sean Christopherson
2019-08-13 20:37             ` Paolo Bonzini
2019-08-13 21:14               ` Alex Williamson
2019-08-13 21:15                 ` Paolo Bonzini
2019-08-13 22:10                   ` Alex Williamson
2019-08-15 14:46                 ` Sean Christopherson
2019-08-15 15:23             ` Alex Williamson [this message]
2019-08-15 16:00               ` Sean Christopherson
2019-08-15 18:16                 ` Alex Williamson
2019-08-15 19:25                   ` Sean Christopherson
2019-08-15 20:11                     ` Alex Williamson
2019-08-19 16:03               ` Paolo Bonzini
2019-08-20 20:03                 ` Sean Christopherson
2019-08-20 20:42                   ` Alex Williamson
2019-08-20 21:02                     ` Sean Christopherson
2019-08-21 19:08                       ` Alex Williamson
2019-08-21 19:35                         ` Alex Williamson
2019-08-21 20:30                           ` Sean Christopherson
2019-08-23  2:25                             ` Sean Christopherson
2019-08-23 22:05                               ` Alex Williamson
2019-08-21 20:10                         ` Sean Christopherson
2019-08-26  7:36                           ` Tian, Kevin
2019-08-26 14:56                           ` Sean Christopherson
2020-06-26 17:32                   ` Sean Christopherson
2022-10-20 18:31                     ` Alexander Graf
2022-10-20 20:37                       ` Sean Christopherson
2022-10-20 21:06                         ` Alexander Graf
2022-10-21 19:40                           ` Sean Christopherson
2022-10-24  6:12                             ` Alexander Graf
2022-10-24 15:55                               ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190815092324.46bb3ac1@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=guangrong.xiao@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).