public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
@ 2023-01-10 15:25 David Woodhouse
  2023-01-11  9:37 ` [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() David Woodhouse
  2023-01-11  9:37 ` [PATCH 3/3] KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule David Woodhouse
  0 siblings, 2 replies; 5+ messages in thread
From: David Woodhouse @ 2023-01-10 15:25 UTC (permalink / raw)
  To: Paolo Bonzini, paul, Sean Christopherson; +Cc: kvm, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

In commit 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate
area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same
time:
	/*
	 * The guest's runstate_info is split across two pages and we
	 * need to hold and validate both GPCs simultaneously. We can
	 * declare a lock ordering GPC1 > GPC2 because nothing else
	 * takes them more than one at a time.
	 */

However, we forgot to tell lockdep. Do so, by setting a subclass on the
first lock before taking the second.

Fixes: 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kvm/xen.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 402d9a34552c..07e61cc9881e 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -305,8 +305,10 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
 		 * The guest's runstate_info is split across two pages and we
 		 * need to hold and validate both GPCs simultaneously. We can
 		 * declare a lock ordering GPC1 > GPC2 because nothing else
-		 * takes them more than one at a time.
+		 * takes them more than one at a time. Set a subclass on the
+		 * gpc1 lock to make lockdep shut up about it.
 		 */
+		lock_set_subclass(&gpc1->lock.dep_map, 1, _THIS_IP_);
 		read_lock(&gpc2->lock);
 
 		if (!kvm_gpc_check(gpc2, user_len2)) {
-- 
2.35.3



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
  2023-01-10 15:25 [PATCH] KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking David Woodhouse
@ 2023-01-11  9:37 ` David Woodhouse
  2023-01-11 16:54   ` Peter Zijlstra
  2023-01-11  9:37 ` [PATCH 3/3] KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule David Woodhouse
  1 sibling, 1 reply; 5+ messages in thread
From: David Woodhouse @ 2023-01-11  9:37 UTC (permalink / raw)
  To: Paolo Bonzini, paul, Sean Christopherson
  Cc: kvm, Peter Zijlstra, Michal Luczaj

[-- Attachment #1: Type: text/plain, Size: 4044 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

The kvm_xen_update_runstate_guest() function can be called when the vCPU
is being scheduled out, from a preempt notifier. It *opportunistically*
updates the runstate area in the guest memory, if the gfn_to_pfn_cache
which caches the appropriate address is still valid.

If there is *contention* when it attempts to obtain gpc->lock, then
locking inside the priority inheritance checks may cause a deadlock.
Lockdep reports:

[13890.148997] Chain exists of:
                 &gpc->lock --> &p->pi_lock --> &rq->__lock

[13890.149002]  Possible unsafe locking scenario:

[13890.149003]        CPU0                    CPU1
[13890.149004]        ----                    ----
[13890.149005]   lock(&rq->__lock);
[13890.149007]                                lock(&p->pi_lock);
[13890.149009]                                lock(&rq->__lock);
[13890.149011]   lock(&gpc->lock);
[13890.149013]
                *** DEADLOCK ***

In the general case, if there's contention for a read lock on gpc->lock,
that's going to be because something else is either invalidating or
revalidating the cache. Either way, we've raced with seeing it in an
invalid state, in which case we would have aborted the opportunistic
update anyway.

So in the 'atomic' case when called from the preempt notifier, just
switch to using read_trylock() and avoid the PI handling altogether.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---

First patch in this series was '[PATCH] KVM: x86/xen: Fix lockdep
warning on "recursive" gpc locking' but now there are three.


 arch/x86/kvm/xen.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 07e61cc9881e..c444948ab1ac 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -272,7 +272,12 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
         * Attempt to obtain the GPC lock on *both* (if there are two)
         * gfn_to_pfn caches that cover the region.
         */
-       read_lock_irqsave(&gpc1->lock, flags);
+       local_irq_save(flags);
+       if (!read_trylock(&gpc1->lock)) {
+               if (atomic)
+                       return;
+               read_lock(&gpc1->lock);
+       }
        while (!kvm_gpc_check(gpc1, user_len1)) {
                read_unlock_irqrestore(&gpc1->lock, flags);
 
@@ -283,7 +288,7 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
                if (kvm_gpc_refresh(gpc1, user_len1))
                        return;
 
-               read_lock_irqsave(&gpc1->lock, flags);
+               goto retry;
        }
 
        if (likely(!user_len2)) {
@@ -309,7 +314,13 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
                 * gpc1 lock to make lockdep shut up about it.
                 */
                lock_set_subclass(&gpc1->lock.dep_map, 1, _THIS_IP_);
-               read_lock(&gpc2->lock);
+               if (!read_trylock(&gpc2->lock)) {
+                       if (atomic) {
+                               read_unlock_irqrestore(&gpc1->lock, flags);
+                               return;
+                       }
+                       read_lock(&gpc2->lock);
+               }
 
                if (!kvm_gpc_check(gpc2, user_len2)) {
                        read_unlock(&gpc2->lock);
-- 
2.34.1



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
  2023-01-10 15:25 [PATCH] KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking David Woodhouse
  2023-01-11  9:37 ` [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() David Woodhouse
@ 2023-01-11  9:37 ` David Woodhouse
  1 sibling, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2023-01-11  9:37 UTC (permalink / raw)
  To: Paolo Bonzini, paul, Sean Christopherson
  Cc: kvm, Peter Zijlstra, Michal Luczaj

[-- Attachment #1: Type: text/plain, Size: 1320 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

Documentation/virt/kvm/locking.rst tells us that kvm->lock is taken outside
vcpu->mutex. But that doesn't actually happen very often; it's only in
some esoteric cases like migration with AMD SEV. This means that lockdep
usually doesn't notice, and doesn't do its job of keeping us honest.

Ensure that lockdep *always* knows about the ordering of these two locks,
by briefly taking vcpu->mutex in kvm_vm_ioctl_create_vcpu() while kvm->lock
is held.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 virt/kvm/kvm_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 07bf29450521..5814037148bd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3924,6 +3924,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
        }
 
        mutex_lock(&kvm->lock);
+
+#ifdef CONFIG_LOCKDEP
+       /* Ensure that lockdep knows vcpu->mutex is taken *inside* kvm->lock */
+       mutex_lock(&vcpu->mutex);
+       mutex_unlock(&vcpu->mutex);
+#endif
+
        if (kvm_get_vcpu_by_id(kvm, id)) {
                r = -EEXIST;
                goto unlock_vcpu_destroy;
-- 
2.34.1



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
  2023-01-11  9:37 ` [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() David Woodhouse
@ 2023-01-11 16:54   ` Peter Zijlstra
  2023-01-11 17:43     ` David Woodhouse
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2023-01-11 16:54 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, paul, Sean Christopherson, kvm, Michal Luczaj

On Wed, Jan 11, 2023 at 09:37:50AM +0000, David Woodhouse wrote:
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 07e61cc9881e..c444948ab1ac 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -272,7 +272,12 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
>          * Attempt to obtain the GPC lock on *both* (if there are two)
>          * gfn_to_pfn caches that cover the region.
>          */
> -       read_lock_irqsave(&gpc1->lock, flags);
> +       local_irq_save(flags);
> +       if (!read_trylock(&gpc1->lock)) {
> +               if (atomic)
> +                       return;
> +               read_lock(&gpc1->lock);
> +       }
>         while (!kvm_gpc_check(gpc1, user_len1)) {
>                 read_unlock_irqrestore(&gpc1->lock, flags);
>  

There might be a problem with this pattern that would be alleviated when
written like:

	local_irq_save(flags);
	if (atomic) {
		if (!read_trylock(&gpc1->lock)) {
			local_irq_restore(flags);
			return;
		}
	} else {
		read_lock(&gpc1->lock);
	}

(also note you forgot the irq_restore on the exit path)

Specifically the problem is that trylock will not trigger the regular
lockdep machinery since it doesn't wait and hence cannot cause a
deadlock. With your form the trylock is the common case and lockdep will
only trigger (observe any potential cycles) if/when this hits
contention.

By using an unconditional read_lock() for the !atomic case this is
avoided.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
  2023-01-11 16:54   ` Peter Zijlstra
@ 2023-01-11 17:43     ` David Woodhouse
  0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2023-01-11 17:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, paul, Sean Christopherson, kvm, Michal Luczaj

[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]

On Wed, 2023-01-11 at 17:54 +0100, Peter Zijlstra wrote:
> On Wed, Jan 11, 2023 at 09:37:50AM +0000, David Woodhouse wrote:
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 07e61cc9881e..c444948ab1ac 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -272,7 +272,12 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
> >          * Attempt to obtain the GPC lock on *both* (if there are two)
> >          * gfn_to_pfn caches that cover the region.
> >          */
> > -       read_lock_irqsave(&gpc1->lock, flags);
> > +       local_irq_save(flags);
> > +       if (!read_trylock(&gpc1->lock)) {
> > +               if (atomic)
> > +                       return;
> > +               read_lock(&gpc1->lock);
> > +       }
> >         while (!kvm_gpc_check(gpc1, user_len1)) {
> >                 read_unlock_irqrestore(&gpc1->lock, flags);
> >  
> 
> There might be a problem with this pattern that would be alleviated when
> written like:
> 
>         local_irq_save(flags);
>         if (atomic) {
>                 if (!read_trylock(&gpc1->lock)) {
>                         local_irq_restore(flags);
>                         return;
>                 }
>         } else {
>                 read_lock(&gpc1->lock);
>         }
> 
> (also note you forgot the irq_restore on the exit path)
> 
> Specifically the problem is that trylock will not trigger the regular
> lockdep machinery since it doesn't wait and hence cannot cause a
> deadlock. With your form the trylock is the common case and lockdep will
> only trigger (observe any potential cycles) if/when this hits
> contention.
> 
> By using an unconditional read_lock() for the !atomic case this is
> avoided.

Makes sense. I've done that in the version at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kvm-lockdep
which will be v2 of the series.

I also didn't bother changing the 'read_lock_irqrestore' in the while
loop to 'goto retry'. No point in that since we don't retry in that
'atomic' case anyway. And it raised questions about why it was even
still a 'while' loop... 

Thanks.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-11 17:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-10 15:25 [PATCH] KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking David Woodhouse
2023-01-11  9:37 ` [PATCH 2/3] KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() David Woodhouse
2023-01-11 16:54   ` Peter Zijlstra
2023-01-11 17:43     ` David Woodhouse
2023-01-11  9:37 ` [PATCH 3/3] KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox