Re: [PATCH] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw@amazon.co.uk>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	 Paul Durrant <pdurrant@amazon.co.uk>
Subject: Re: [PATCH] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
Date: Wed, 17 Jan 2024 08:09:44 -0800	[thread overview]
Message-ID: <Zaf7yCYt8XFuMhAd@google.com> (raw)
In-Reply-To: <9a82db197449bdb97ee889d2f3cdd7998abd9692.camel@amazon.co.uk>

On Fri, Jan 12, 2024, David Woodhouse wrote:
> This function can race with kvm_gpc_deactivate(). Since that function
> does not take the ->refresh_lock, it can wipe and unmap the pfn and
> khva while hva_to_pfn_retry() has dropped its write lock on gpc->lock.
> 
> Then if hva_to_pfn_retry() determines that the PFN hasn't changed and
> that it can re-use the old pfn and khva, they get assigned back to
> gpc->pfn and gpc->khva even though the khva was already unmapped by
> kvm_gpc_deactivate(). This leaves the cache in an apparently valid
> state but with ->khva pointing to an address which has been unmapped.
> Which in turn leads to oopses in e.g. __kvm_xen_has_interrupt() and
> set_shinfo_evtchn_pending().
> 
> It may be possible to fix this just by making kvm_gpc_deactivate()
> take the ->refresh_lock, but that still leaves ->refresh_lock being
> basically redundant with the write lock on ->lock, which frankly
> makes my skin itch, with the way that pfn_to_hva_retry() operates on
> fields in the gpc without holding ->lock.
> 
> Instead, fix it by cleaning up the semantis of hva_to_pfn_retry(). It
> no longer operates on the gpc object at all; it's called with a uhva
> and returns the corresponding pfn (pinned), and a mapped khva for it.
> 
> The calling function __kvm_gpc_refresh() now drops ->lock before calling
> hva_to_pfn_retry(), then retakes the lock before checking for changes,
> and discards the new mapping if it lost a race. And will correctly
> note the old pfn/khva to be unmapped at the right time, instead of
> preserving them in a local variable while dropping the lock.
> 
> The optimisation in hva_to_pfn_retry() where it attempts to use the
> old mapping if the pfn doesn't change is dropped, since it makes the
> pinning more complex. It's a pointless optimisation anyway, since the
> odds of the pfn ending up the same when the uhva has changed (i.e.
> the odds of the two userspace addresses both pointing to the same
> underlying physical page) are negligible,
> 
> I remain slightly confused because although this is clearly a race in
> the gfn_to_pfn_cache code, I don't quite know how the Xen support code
> actually managed to trigger it. We've seen oopses from dereferencing a
> valid-looking ->khva in both __kvm_xen_has_interrupt() (the vcpu_info)
> and in set_shinfo_evtchn_pending() (the shared_info). But surely the
> race shouldn't happen for the vcpu_info gpc because all calls to both
> refresh and deactivate hold the vcpu mutex, and it shouldn't happen

FWIW, neither kvm_xen_destroy_vcpu() nor kvm_xen_destroy_vm() holds the appropriate
mutex.  

> for the shared_info gpc because all calls to both will hold the
> kvm->arch.xen.xen_lock mutex.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> 
> This is based on (and in) my tree at
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xenfv
> which has all the other outstanding KVM/Xen fixes.
> 
>  virt/kvm/pfncache.c | 181 +++++++++++++++++++++-----------------------
>  1 file changed, 85 insertions(+), 96 deletions(-)

NAK, at least as a bug fix.  We've already shuffled deck chairs on the Titanic
several times, I have zero confidence that doing so one more time is going to
truly solve the underlying mess.

The contract with the gfn_to_pfn_cache, or rather the lack thereof, is all kinds
of screwed up.  E.g. I added the mutex in commit 93984f19e7bc ("KVM: Fully serialize
gfn=>pfn cache refresh via mutex") to guard against concurrent unmap(), but the
unmap() API has since been removed.  We need to define an actual contract instead
of continuing to throw noodles as the wall in the hope that something sticks.

As you note above, some other mutex _should_ be held.  I think we should lean
into that.  E.g.

  1. Pass in the guarding mutex to kvm_gpc_init() and assert that said mutex is
     held for __refresh(), activate(), and deactivate().
  2. Fix the cases where that doesn't hold true.
  3. Drop refresh_mutex

next prev parent reply	other threads:[~2024-01-17 16:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-12 20:38 [PATCH] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Woodhouse, David
2024-01-17 16:09 ` Sean Christopherson [this message]
2024-01-17 16:32   ` Woodhouse, David
2024-01-17 16:51     ` Sean Christopherson
2024-01-17 16:59       ` David Woodhouse
2024-01-17 18:14         ` Sean Christopherson
2024-01-17 18:33           ` David Woodhouse
2024-01-23 15:06   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zaf7yCYt8XFuMhAd@google.com \
    --to=seanjc@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pdurrant@amazon.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.