linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene
@ 2013-11-07  6:22 Liu Ping Fan
  2013-11-07  6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
  2013-11-07  9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
  0 siblings, 2 replies; 6+ messages in thread
From: Liu Ping Fan @ 2013-11-07  6:22 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.

Suppose the following scene:

Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus.

If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out,
and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in
realmode for a long time.

What makes things even worse if the following happens,
  On cpuM, bitlockX is hold, on cpuN, Y is hold.
  vcpu_B_2 try to lock Y on cpuM in realmode
  vcpu_A_2 try to lock X on cpuN in realmode

Oops! deadlock happens

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 043eec8..dbc1478 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	}
 
 	/* Find the HPTE in the hash table */
+	preempt_disable();
 	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
 					 HPTE_V_VALID | HPTE_V_ABSENT);
-	if (index < 0)
+	if (index < 0) {
+		preempt_enable();
 		return -ENOENT;
+	}
 	hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
 	v = hptep[0] & ~HPTE_V_HVLOCK;
 	gr = kvm->arch.revmap[index].guest_rpte;
@@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Unlock the HPTE */
 	asm volatile("lwsync" : : : "memory");
 	hptep[0] = v;
+	preempt_enable();
 
 	gpte->eaddr = eaddr;
 	gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] powerpc/kvm: remove redundant assignment
  2013-11-07  6:22 [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Liu Ping Fan
@ 2013-11-07  6:22 ` Liu Ping Fan
  2013-11-07 10:06   ` Alexander Graf
  2013-11-07  9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
  1 sibling, 1 reply; 6+ messages in thread
From: Liu Ping Fan @ 2013-11-07  6:22 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

ret is assigned twice with the same value, so remove the later one.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbc1478..9b97b42 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -733,7 +733,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	lock_rmap(rmap);
 
 	/* Check if we might have been invalidated; let the guest retry if so */
-	ret = RESUME_GUEST;
 	if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) {
 		unlock_rmap(rmap);
 		goto out_unlock;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene
  2013-11-07  6:22 [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Liu Ping Fan
  2013-11-07  6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
@ 2013-11-07  9:54 ` Alexander Graf
  2013-11-08  2:07   ` Liu ping fan
  1 sibling, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2013-11-07  9:54 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc


On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:

> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
> realmode, so it can trigger the deadlock.
>=20
> Suppose the following scene:
>=20
> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of =
vcpus.
>=20
> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched =
out,
> and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be =
caught in
> realmode for a long time.
>=20
> What makes things even worse if the following happens,
>  On cpuM, bitlockX is hold, on cpuN, Y is hold.
>  vcpu_B_2 try to lock Y on cpuM in realmode
>  vcpu_A_2 try to lock X on cpuN in realmode
>=20
> Oops! deadlock happens
>=20
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Very nice catch :).

I think it makes a lot of sense to document the fact that =
kvmppc_hv_find_lock_hpte() should be called with preemption disabled in =
a comment above the function, so that next time when someone potentially =
calls it, he knows that he needs to put preempt_disable() around it.

Thanks a lot for finding this pretty subtle issue. May I ask how you got =
there? Did you actually see systems deadlock because of this?


Alex

> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c =
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 043eec8..dbc1478 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct =
kvm_vcpu *vcpu, gva_t eaddr,
> 	}
>=20
> 	/* Find the HPTE in the hash table */
> +	preempt_disable();
> 	index =3D kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
> 					 HPTE_V_VALID | HPTE_V_ABSENT);
> -	if (index < 0)
> +	if (index < 0) {
> +		preempt_enable();
> 		return -ENOENT;
> +	}
> 	hptep =3D (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
> 	v =3D hptep[0] & ~HPTE_V_HVLOCK;
> 	gr =3D kvm->arch.revmap[index].guest_rpte;
> @@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct =
kvm_vcpu *vcpu, gva_t eaddr,
> 	/* Unlock the HPTE */
> 	asm volatile("lwsync" : : : "memory");
> 	hptep[0] =3D v;
> +	preempt_enable();
>=20
> 	gpte->eaddr =3D eaddr;
> 	gpte->vpage =3D ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & =
0xfff);
> --=20
> 1.8.1.4
>=20

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] powerpc/kvm: remove redundant assignment
  2013-11-07  6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
@ 2013-11-07 10:06   ` Alexander Graf
  2013-11-08  2:11     ` Liu ping fan
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2013-11-07 10:06 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc


On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:

> ret is assigned twice with the same value, so remove the later one.
>=20
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> Acked-by: Paul Mackerras <paulus@samba.org>

I suppose my last request for a patch description was slightly too =
abbreviated :). Sorry about that.

Imagine you are a Linux-stable maintainer. You have about 5000 patches =
in front of you and you want to figure out whether a patch should get =
backported into a stable tree or not.

It's very easy to read through the patch description.
It's reasonably easy to do a git show on the patch.
It's very hard to look at the actual surrounding code that was changed.

If I open a text editor on the file, I immediately see what you're =
saying:

>         ret =3D RESUME_GUEST;
>         preempt_disable();
>         while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>                 cpu_relax();
>         if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D =
hpte[1] ||
>             rev->guest_rpte !=3D hpte[2])
>                 /* HPTE has been changed under us; let the guest retry =
*/
>                 goto out_unlock;
>         hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>=20
>         rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
>         lock_rmap(rmap);
>=20
>         /* Check if we might have been invalidated; let the guest =
retry if so */
>         ret =3D RESUME_GUEST;

However, that scope is not given in the actual patch itself. If you look =
at the diff below, you have no idea whether the patch is fixing a bug or =
just removes duplication and doesn't actually have any effect. In fact, =
the compiled assembly should be the same with this patch and without. =
But you can't tell from the diff below.

So what I would like to see in the patch description is something that =
makes it easy to understand what's going on without the need to check =
out the source file. Something like

> We redundantly set ret to RESUME_GUEST twice without changing it in =
between. Only do it once:
>=20
>         ret =3D RESUME_GUEST;
>         preempt_disable();
>         while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>                 cpu_relax();
>         if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D =
hpte[1] ||
>             rev->guest_rpte !=3D hpte[2])
>                 /* HPTE has been changed under us; let the guest retry =
*/
>                 goto out_unlock;
>         hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>=20
>         rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
>         lock_rmap(rmap);
>=20
>         /* Check if we might have been invalidated; let the guest =
retry if so */
>         ret =3D RESUME_GUEST;


If I look at that patch description it immediately tells me "Ah, no need =
to worry, it's not a critical bug I need to backport". If you have a =
better idea how to express that I'm more than happy to take that too. =
Otherwise just let me know whether you like the description above and =
I'll modify it to the one that includes the code snippet when applying =
the patch.


Thanks a lot,

Alex

> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 -
> 1 file changed, 1 deletion(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c =
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index dbc1478..9b97b42 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -733,7 +733,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run =
*run, struct kvm_vcpu *vcpu,
> 	lock_rmap(rmap);
>=20
> 	/* Check if we might have been invalidated; let the guest retry =
if so */
> -	ret =3D RESUME_GUEST;
> 	if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) {
> 		unlock_rmap(rmap);
> 		goto out_unlock;
> --=20
> 1.8.1.4
>=20

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene
  2013-11-07  9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
@ 2013-11-08  2:07   ` Liu ping fan
  0 siblings, 0 replies; 6+ messages in thread
From: Liu ping fan @ 2013-11-08  2:07 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc

On Thu, Nov 7, 2013 at 5:54 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
>
>> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
>> realmode, so it can trigger the deadlock.
>>
>> Suppose the following scene:
>>
>> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of v=
cpus.
>>
>> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched o=
ut,
>> and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caugh=
t in
>> realmode for a long time.
>>
>> What makes things even worse if the following happens,
>>  On cpuM, bitlockX is hold, on cpuN, Y is hold.
>>  vcpu_B_2 try to lock Y on cpuM in realmode
>>  vcpu_A_2 try to lock X on cpuN in realmode
>>
>> Oops! deadlock happens
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Very nice catch :).
>
> I think it makes a lot of sense to document the fact that kvmppc_hv_find_=
lock_hpte() should be called with preemption disabled in a comment above th=
e function, so that next time when someone potentially calls it, he knows t=
hat he needs to put preempt_disable() around it.
>
Ok, I will document them in v3

> Thanks a lot for finding this pretty subtle issue. May I ask how you got =
there? Did you actually see systems deadlock because of this?
>
Intuition :). then I begin try to model a scene which causes the
deadlock. And fortunately, I find a case to verify my suspension.

Regards,
Pingfan
>
> Alex
>
>> ---
>> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book=
3s_64_mmu_hv.c
>> index 043eec8..dbc1478 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> @@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kv=
m_vcpu *vcpu, gva_t eaddr,
>>       }
>>
>>       /* Find the HPTE in the hash table */
>> +     preempt_disable();
>>       index =3D kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
>>                                        HPTE_V_VALID | HPTE_V_ABSENT);
>> -     if (index < 0)
>> +     if (index < 0) {
>> +             preempt_enable();
>>               return -ENOENT;
>> +     }
>>       hptep =3D (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
>>       v =3D hptep[0] & ~HPTE_V_HVLOCK;
>>       gr =3D kvm->arch.revmap[index].guest_rpte;
>> @@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_=
vcpu *vcpu, gva_t eaddr,
>>       /* Unlock the HPTE */
>>       asm volatile("lwsync" : : : "memory");
>>       hptep[0] =3D v;
>> +     preempt_enable();
>>
>>       gpte->eaddr =3D eaddr;
>>       gpte->vpage =3D ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff)=
;
>> --
>> 1.8.1.4
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] powerpc/kvm: remove redundant assignment
  2013-11-07 10:06   ` Alexander Graf
@ 2013-11-08  2:11     ` Liu ping fan
  0 siblings, 0 replies; 6+ messages in thread
From: Liu ping fan @ 2013-11-08  2:11 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc

On Thu, Nov 7, 2013 at 6:06 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
>
>> ret is assigned twice with the same value, so remove the later one.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> Acked-by: Paul Mackerras <paulus@samba.org>
>
> I suppose my last request for a patch description was slightly too abbrev=
iated :). Sorry about that.
>
> Imagine you are a Linux-stable maintainer. You have about 5000 patches in=
 front of you and you want to figure out whether a patch should get backpor=
ted into a stable tree or not.
>
> It's very easy to read through the patch description.
> It's reasonably easy to do a git show on the patch.
> It's very hard to look at the actual surrounding code that was changed.
>
> If I open a text editor on the file, I immediately see what you're saying=
:
>
>>         ret =3D RESUME_GUEST;
>>         preempt_disable();
>>         while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>>                 cpu_relax();
>>         if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D hp=
te[1] ||
>>             rev->guest_rpte !=3D hpte[2])
>>                 /* HPTE has been changed under us; let the guest retry *=
/
>>                 goto out_unlock;
>>         hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>>
>>         rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
>>         lock_rmap(rmap);
>>
>>         /* Check if we might have been invalidated; let the guest retry =
if so */
>>         ret =3D RESUME_GUEST;
>
> However, that scope is not given in the actual patch itself. If you look =
at the diff below, you have no idea whether the patch is fixing a bug or ju=
st removes duplication and doesn't actually have any effect. In fact, the c=
ompiled assembly should be the same with this patch and without. But you ca=
n't tell from the diff below.
>
> So what I would like to see in the patch description is something that ma=
kes it easy to understand what's going on without the need to check out the=
 source file. Something like
>
>> We redundantly set ret to RESUME_GUEST twice without changing it in betw=
een. Only do it once:
>>
>>         ret =3D RESUME_GUEST;
>>         preempt_disable();
>>         while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>>                 cpu_relax();
>>         if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D hp=
te[1] ||
>>             rev->guest_rpte !=3D hpte[2])
>>                 /* HPTE has been changed under us; let the guest retry *=
/
>>                 goto out_unlock;
>>         hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>>
>>         rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
>>         lock_rmap(rmap);
>>
>>         /* Check if we might have been invalidated; let the guest retry =
if so */
>>         ret =3D RESUME_GUEST;
>
>
> If I look at that patch description it immediately tells me "Ah, no need =
to worry, it's not a critical bug I need to backport". If you have a better=
 idea how to express that I'm more than happy to take that too. Otherwise j=
ust let me know whether you like the description above and I'll modify it t=
o the one that includes the code snippet when applying the patch.
>
Oh, yes. Thank you very much..
And I had a better understanding of the heavy work of maintainers :)
Will keep this in mind.

Best regards,
Pingfan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-11-08  2:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-07  6:22 [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Liu Ping Fan
2013-11-07  6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
2013-11-07 10:06   ` Alexander Graf
2013-11-08  2:11     ` Liu ping fan
2013-11-07  9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
2013-11-08  2:07   ` Liu ping fan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).