* [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene
@ 2013-11-07 6:22 Liu Ping Fan
2013-11-07 6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
2013-11-07 9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
0 siblings, 2 replies; 6+ messages in thread
From: Liu Ping Fan @ 2013-11-07 6:22 UTC (permalink / raw)
To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf
Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.
Suppose the following scene:
Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus.
If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out,
and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in
realmode for a long time.
What makes things even worse if the following happens,
On cpuM, bitlockX is hold, on cpuN, Y is hold.
vcpu_B_2 try to lock Y on cpuM in realmode
vcpu_A_2 try to lock X on cpuN in realmode
Oops! deadlock happens
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 043eec8..dbc1478 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
}
/* Find the HPTE in the hash table */
+ preempt_disable();
index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
HPTE_V_VALID | HPTE_V_ABSENT);
- if (index < 0)
+ if (index < 0) {
+ preempt_enable();
return -ENOENT;
+ }
hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
v = hptep[0] & ~HPTE_V_HVLOCK;
gr = kvm->arch.revmap[index].guest_rpte;
@@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
/* Unlock the HPTE */
asm volatile("lwsync" : : : "memory");
hptep[0] = v;
+ preempt_enable();
gpte->eaddr = eaddr;
gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] powerpc/kvm: remove redundant assignment
2013-11-07 6:22 [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Liu Ping Fan
@ 2013-11-07 6:22 ` Liu Ping Fan
2013-11-07 10:06 ` Alexander Graf
2013-11-07 9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
1 sibling, 1 reply; 6+ messages in thread
From: Liu Ping Fan @ 2013-11-07 6:22 UTC (permalink / raw)
To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf
ret is assigned twice with the same value, so remove the later one.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbc1478..9b97b42 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -733,7 +733,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
lock_rmap(rmap);
/* Check if we might have been invalidated; let the guest retry if so */
- ret = RESUME_GUEST;
if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) {
unlock_rmap(rmap);
goto out_unlock;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene
2013-11-07 6:22 [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Liu Ping Fan
2013-11-07 6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
@ 2013-11-07 9:54 ` Alexander Graf
2013-11-08 2:07 ` Liu ping fan
1 sibling, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2013-11-07 9:54 UTC (permalink / raw)
To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc
On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
> realmode, so it can trigger the deadlock.
>=20
> Suppose the following scene:
>=20
> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of =
vcpus.
>=20
> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched =
out,
> and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be =
caught in
> realmode for a long time.
>=20
> What makes things even worse if the following happens,
> On cpuM, bitlockX is hold, on cpuN, Y is hold.
> vcpu_B_2 try to lock Y on cpuM in realmode
> vcpu_A_2 try to lock X on cpuN in realmode
>=20
> Oops! deadlock happens
>=20
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Very nice catch :).
I think it makes a lot of sense to document the fact that =
kvmppc_hv_find_lock_hpte() should be called with preemption disabled in =
a comment above the function, so that next time when someone potentially =
calls it, he knows that he needs to put preempt_disable() around it.
Thanks a lot for finding this pretty subtle issue. May I ask how you got =
there? Did you actually see systems deadlock because of this?
Alex
> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c =
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 043eec8..dbc1478 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct =
kvm_vcpu *vcpu, gva_t eaddr,
> }
>=20
> /* Find the HPTE in the hash table */
> + preempt_disable();
> index =3D kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
> HPTE_V_VALID | HPTE_V_ABSENT);
> - if (index < 0)
> + if (index < 0) {
> + preempt_enable();
> return -ENOENT;
> + }
> hptep =3D (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
> v =3D hptep[0] & ~HPTE_V_HVLOCK;
> gr =3D kvm->arch.revmap[index].guest_rpte;
> @@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct =
kvm_vcpu *vcpu, gva_t eaddr,
> /* Unlock the HPTE */
> asm volatile("lwsync" : : : "memory");
> hptep[0] =3D v;
> + preempt_enable();
>=20
> gpte->eaddr =3D eaddr;
> gpte->vpage =3D ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & =
0xfff);
> --=20
> 1.8.1.4
>=20
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] powerpc/kvm: remove redundant assignment
2013-11-07 6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
@ 2013-11-07 10:06 ` Alexander Graf
2013-11-08 2:11 ` Liu ping fan
0 siblings, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2013-11-07 10:06 UTC (permalink / raw)
To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc
On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
> ret is assigned twice with the same value, so remove the later one.
>=20
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> Acked-by: Paul Mackerras <paulus@samba.org>
I suppose my last request for a patch description was slightly too =
abbreviated :). Sorry about that.
Imagine you are a Linux-stable maintainer. You have about 5000 patches =
in front of you and you want to figure out whether a patch should get =
backported into a stable tree or not.
It's very easy to read through the patch description.
It's reasonably easy to do a git show on the patch.
It's very hard to look at the actual surrounding code that was changed.
If I open a text editor on the file, I immediately see what you're =
saying:
> ret =3D RESUME_GUEST;
> preempt_disable();
> while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
> cpu_relax();
> if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D =
hpte[1] ||
> rev->guest_rpte !=3D hpte[2])
> /* HPTE has been changed under us; let the guest retry =
*/
> goto out_unlock;
> hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>=20
> rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
> lock_rmap(rmap);
>=20
> /* Check if we might have been invalidated; let the guest =
retry if so */
> ret =3D RESUME_GUEST;
However, that scope is not given in the actual patch itself. If you look =
at the diff below, you have no idea whether the patch is fixing a bug or =
just removes duplication and doesn't actually have any effect. In fact, =
the compiled assembly should be the same with this patch and without. =
But you can't tell from the diff below.
So what I would like to see in the patch description is something that =
makes it easy to understand what's going on without the need to check =
out the source file. Something like
> We redundantly set ret to RESUME_GUEST twice without changing it in =
between. Only do it once:
>=20
> ret =3D RESUME_GUEST;
> preempt_disable();
> while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
> cpu_relax();
> if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D =
hpte[1] ||
> rev->guest_rpte !=3D hpte[2])
> /* HPTE has been changed under us; let the guest retry =
*/
> goto out_unlock;
> hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>=20
> rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
> lock_rmap(rmap);
>=20
> /* Check if we might have been invalidated; let the guest =
retry if so */
> ret =3D RESUME_GUEST;
If I look at that patch description it immediately tells me "Ah, no need =
to worry, it's not a critical bug I need to backport". If you have a =
better idea how to express that I'm more than happy to take that too. =
Otherwise just let me know whether you like the description above and =
I'll modify it to the one that includes the code snippet when applying =
the patch.
Thanks a lot,
Alex
> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 -
> 1 file changed, 1 deletion(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c =
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index dbc1478..9b97b42 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -733,7 +733,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run =
*run, struct kvm_vcpu *vcpu,
> lock_rmap(rmap);
>=20
> /* Check if we might have been invalidated; let the guest retry =
if so */
> - ret =3D RESUME_GUEST;
> if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) {
> unlock_rmap(rmap);
> goto out_unlock;
> --=20
> 1.8.1.4
>=20
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene
2013-11-07 9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
@ 2013-11-08 2:07 ` Liu ping fan
0 siblings, 0 replies; 6+ messages in thread
From: Liu ping fan @ 2013-11-08 2:07 UTC (permalink / raw)
To: Alexander Graf; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc
On Thu, Nov 7, 2013 at 5:54 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
>
>> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
>> realmode, so it can trigger the deadlock.
>>
>> Suppose the following scene:
>>
>> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of v=
cpus.
>>
>> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched o=
ut,
>> and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caugh=
t in
>> realmode for a long time.
>>
>> What makes things even worse if the following happens,
>> On cpuM, bitlockX is hold, on cpuN, Y is hold.
>> vcpu_B_2 try to lock Y on cpuM in realmode
>> vcpu_A_2 try to lock X on cpuN in realmode
>>
>> Oops! deadlock happens
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Very nice catch :).
>
> I think it makes a lot of sense to document the fact that kvmppc_hv_find_=
lock_hpte() should be called with preemption disabled in a comment above th=
e function, so that next time when someone potentially calls it, he knows t=
hat he needs to put preempt_disable() around it.
>
Ok, I will document them in v3
> Thanks a lot for finding this pretty subtle issue. May I ask how you got =
there? Did you actually see systems deadlock because of this?
>
Intuition :). then I begin try to model a scene which causes the
deadlock. And fortunately, I find a case to verify my suspension.
Regards,
Pingfan
>
> Alex
>
>> ---
>> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book=
3s_64_mmu_hv.c
>> index 043eec8..dbc1478 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> @@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kv=
m_vcpu *vcpu, gva_t eaddr,
>> }
>>
>> /* Find the HPTE in the hash table */
>> + preempt_disable();
>> index =3D kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
>> HPTE_V_VALID | HPTE_V_ABSENT);
>> - if (index < 0)
>> + if (index < 0) {
>> + preempt_enable();
>> return -ENOENT;
>> + }
>> hptep =3D (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
>> v =3D hptep[0] & ~HPTE_V_HVLOCK;
>> gr =3D kvm->arch.revmap[index].guest_rpte;
>> @@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_=
vcpu *vcpu, gva_t eaddr,
>> /* Unlock the HPTE */
>> asm volatile("lwsync" : : : "memory");
>> hptep[0] =3D v;
>> + preempt_enable();
>>
>> gpte->eaddr =3D eaddr;
>> gpte->vpage =3D ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff)=
;
>> --
>> 1.8.1.4
>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] powerpc/kvm: remove redundant assignment
2013-11-07 10:06 ` Alexander Graf
@ 2013-11-08 2:11 ` Liu ping fan
0 siblings, 0 replies; 6+ messages in thread
From: Liu ping fan @ 2013-11-08 2:11 UTC (permalink / raw)
To: Alexander Graf; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc
On Thu, Nov 7, 2013 at 6:06 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
>
>> ret is assigned twice with the same value, so remove the later one.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> Acked-by: Paul Mackerras <paulus@samba.org>
>
> I suppose my last request for a patch description was slightly too abbrev=
iated :). Sorry about that.
>
> Imagine you are a Linux-stable maintainer. You have about 5000 patches in=
front of you and you want to figure out whether a patch should get backpor=
ted into a stable tree or not.
>
> It's very easy to read through the patch description.
> It's reasonably easy to do a git show on the patch.
> It's very hard to look at the actual surrounding code that was changed.
>
> If I open a text editor on the file, I immediately see what you're saying=
:
>
>> ret =3D RESUME_GUEST;
>> preempt_disable();
>> while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>> cpu_relax();
>> if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D hp=
te[1] ||
>> rev->guest_rpte !=3D hpte[2])
>> /* HPTE has been changed under us; let the guest retry *=
/
>> goto out_unlock;
>> hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>>
>> rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
>> lock_rmap(rmap);
>>
>> /* Check if we might have been invalidated; let the guest retry =
if so */
>> ret =3D RESUME_GUEST;
>
> However, that scope is not given in the actual patch itself. If you look =
at the diff below, you have no idea whether the patch is fixing a bug or ju=
st removes duplication and doesn't actually have any effect. In fact, the c=
ompiled assembly should be the same with this patch and without. But you ca=
n't tell from the diff below.
>
> So what I would like to see in the patch description is something that ma=
kes it easy to understand what's going on without the need to check out the=
source file. Something like
>
>> We redundantly set ret to RESUME_GUEST twice without changing it in betw=
een. Only do it once:
>>
>> ret =3D RESUME_GUEST;
>> preempt_disable();
>> while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
>> cpu_relax();
>> if ((hptep[0] & ~HPTE_V_HVLOCK) !=3D hpte[0] || hptep[1] !=3D hp=
te[1] ||
>> rev->guest_rpte !=3D hpte[2])
>> /* HPTE has been changed under us; let the guest retry *=
/
>> goto out_unlock;
>> hpte[0] =3D (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
>>
>> rmap =3D &memslot->arch.rmap[gfn - memslot->base_gfn];
>> lock_rmap(rmap);
>>
>> /* Check if we might have been invalidated; let the guest retry =
if so */
>> ret =3D RESUME_GUEST;
>
>
> If I look at that patch description it immediately tells me "Ah, no need =
to worry, it's not a critical bug I need to backport". If you have a better=
idea how to express that I'm more than happy to take that too. Otherwise j=
ust let me know whether you like the description above and I'll modify it t=
o the one that includes the code snippet when applying the patch.
>
Oh, yes. Thank you very much..
And I had a better understanding of the heavy work of maintainers :)
Will keep this in mind.
Best regards,
Pingfan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-11-08 2:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-07 6:22 [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Liu Ping Fan
2013-11-07 6:22 ` [PATCH v2 2/2] powerpc/kvm: remove redundant assignment Liu Ping Fan
2013-11-07 10:06 ` Alexander Graf
2013-11-08 2:11 ` Liu ping fan
2013-11-07 9:54 ` [PATCH v2 1/2] powerpc/kvm: fix rare but potential deadlock scene Alexander Graf
2013-11-08 2:07 ` Liu ping fan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).