linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
@ 2014-04-11 11:45 Liu Ping Fan
  2014-04-11 14:03 ` Alexander Graf
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Ping Fan @ 2014-04-11 11:45 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc
  Cc: Paul Mackerras, Alexander Graf, kvm, Aneesh Kumar K.V

When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start
and mmu_notifier_invalidate_range_end, which will mark existing guest hpte
entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
guest hpte entries.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
v3:
  rebased onto Alex's tree, branch kvm-ppc-next
  substitue commit log with the more clear description in Aneesh's reply (thanks, Aneesh)
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1d6c56a..1117525 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -234,7 +234,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		pte_size = psize;
 		pte = lookup_linux_pte_and_update(pgdir, hva, writing,
 						  &pte_size);
-		if (pte_present(pte)) {
+		if (pte_present(pte)&&!pte_numa(pte)) {
 			if (writing && !pte_write(pte))
 				/* make the actual HPTE be read-only */
 				ptel = hpte_make_readonly(ptel);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
  2014-04-11 11:45 [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect Liu Ping Fan
@ 2014-04-11 14:03 ` Alexander Graf
  2014-04-13  2:27   ` Liu ping fan
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2014-04-11 14:03 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: kvm, kvm-ppc, Paul Mackerras, Aneesh Kumar K.V, linuxppc-dev


On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com> =
wrote:

> When we mark pte with _PAGE_NUMA we already call =
mmu_notifier_invalidate_range_start
> and mmu_notifier_invalidate_range_end, which will mark existing guest =
hpte
> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting =
new
> guest hpte entries.

What happens when we don't? Why do we need the check? Why isn't it done =
implicitly? What happens when we treat a NUMA marked page as =
non-present? Why does it work out for us?

Assume you have no idea what PAGE_NUMA is, but try to figure out what =
this patch does and whether you need to cherry-pick it into your =
downstream kernel. The description as is still is not very helpful for =
that. It doesn't even explain what really changes with this patch =
applied.

>=20
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> v3:
>  rebased onto Alex's tree, branch kvm-ppc-next
>  substitue commit log with the more clear description in Aneesh's =
reply (thanks, Aneesh)
> ---
> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c =
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 1d6c56a..1117525 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -234,7 +234,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned =
long flags,
> 		pte_size =3D psize;
> 		pte =3D lookup_linux_pte_and_update(pgdir, hva, writing,
> 						  &pte_size);
> -		if (pte_present(pte)) {
> +		if (pte_present(pte)&&!pte_numa(pte)) {

Spaces missing


Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
  2014-04-11 14:03 ` Alexander Graf
@ 2014-04-13  2:27   ` Liu ping fan
  2014-04-14  6:43     ` Alexander Graf
  0 siblings, 1 reply; 6+ messages in thread
From: Liu ping fan @ 2014-04-13  2:27 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Liu Ping Fan, kvm@vger.kernel.org mailing list, kvm-ppc,
	Paul Mackerras, Aneesh Kumar K.V, linuxppc-dev

On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote=
:
>
>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate=
_range_start
>> and mmu_notifier_invalidate_range_end, which will mark existing guest hp=
te
>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>> guest hpte entries.
>
> What happens when we don't? Why do we need the check? Why isn't it done i=
mplicitly? What happens when we treat a NUMA marked page as non-present? Wh=
y does it work out for us?
>
> Assume you have no idea what PAGE_NUMA is, but try to figure out what thi=
s patch does and whether you need to cherry-pick it into your downstream ke=
rnel. The description as is still is not very helpful for that. It doesn't =
even explain what really changes with this patch applied.
>
Yeah.  what about appending the following description?  Can it make
the context clear?
"Guest should not setup a hpte for the page whose pte is marked with
_PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
to check whether the page is placed correctly or not."

>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> v3:
>>  rebased onto Alex's tree, branch kvm-ppc-next
>>  substitue commit log with the more clear description in Aneesh's reply =
(thanks, Aneesh)
>> ---
>> arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book=
3s_hv_rm_mmu.c
>> index 1d6c56a..1117525 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> @@ -234,7 +234,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned lon=
g flags,
>>               pte_size =3D psize;
>>               pte =3D lookup_linux_pte_and_update(pgdir, hva, writing,
>>                                                 &pte_size);
>> -             if (pte_present(pte)) {
>> +             if (pte_present(pte)&&!pte_numa(pte)) {
>
> Spaces missing
>
Will fix,

Thx,
Fan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
  2014-04-13  2:27   ` Liu ping fan
@ 2014-04-14  6:43     ` Alexander Graf
  2014-04-14  8:08       ` liu ping fan
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Graf @ 2014-04-14  6:43 UTC (permalink / raw)
  To: Liu ping fan
  Cc: Liu Ping Fan, kvm@vger.kernel.org mailing list, kvm-ppc,
	Paul Mackerras, Aneesh Kumar K.V, linuxppc-dev


On 13.04.14 04:27, Liu ping fan wrote:
> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:
>>
>>> When we mark pte with _PAGE_NUMA we already call mmu_notifier_invalidate_range_start
>>> and mmu_notifier_invalidate_range_end, which will mark existing guest hpte
>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>> guest hpte entries.
>> What happens when we don't? Why do we need the check? Why isn't it done implicitly? What happens when we treat a NUMA marked page as non-present? Why does it work out for us?
>>
>> Assume you have no idea what PAGE_NUMA is, but try to figure out what this patch does and whether you need to cherry-pick it into your downstream kernel. The description as is still is not very helpful for that. It doesn't even explain what really changes with this patch applied.
>>
> Yeah.  what about appending the following description?  Can it make
> the context clear?
> "Guest should not setup a hpte for the page whose pte is marked with
> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
> to check whether the page is placed correctly or not."

Try to come up with a text that answers the following questions in order:

   - What does _PAGE_NUMA mean?
   - How does page migration with _PAGE_NUMA work?
   -> Why should we not map pages when _PAGE_NUMA is set?
   - Which part of what needs to be done did the previous _PAGE_NUMA 
patch address?
   - What's the situation without this patch?
   - Which scenario does this patch fix?

Once you have a text that answers those, you should have a good patch 
description :).

Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
  2014-04-14  6:43     ` Alexander Graf
@ 2014-04-14  8:08       ` liu ping fan
  2014-04-14  9:01         ` Alexander Graf
  0 siblings, 1 reply; 6+ messages in thread
From: liu ping fan @ 2014-04-14  8:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Liu Ping Fan, kvm@vger.kernel.org mailing list, kvm-ppc,
	Paul Mackerras, Liu ping fan, linuxppc-dev, Aneesh Kumar K.V

On Mon, Apr 14, 2014 at 2:43 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 13.04.14 04:27, Liu ping fan wrote:
>>
>> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>>>
>>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> wrote:
>>>
>>>> When we mark pte with _PAGE_NUMA we already call
>>>> mmu_notifier_invalidate_range_start
>>>> and mmu_notifier_invalidate_range_end, which will mark existing guest
>>>> hpte
>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>> guest hpte entries.
>>>
>>> What happens when we don't? Why do we need the check? Why isn't it done
>>> implicitly? What happens when we treat a NUMA marked page as non-present?
>>> Why does it work out for us?
>>>
>>> Assume you have no idea what PAGE_NUMA is, but try to figure out what
>>> this patch does and whether you need to cherry-pick it into your downstream
>>> kernel. The description as is still is not very helpful for that. It doesn't
>>> even explain what really changes with this patch applied.
>>>
>> Yeah.  what about appending the following description?  Can it make
>> the context clear?
>> "Guest should not setup a hpte for the page whose pte is marked with
>> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
>> to check whether the page is placed correctly or not."
>
>
> Try to come up with a text that answers the following questions in order:
>
I divide them into 3 groups, and answer them by 3 sections. Seems that
it has the total story :)
Please take a look.

>   - What does _PAGE_NUMA mean?
Group 1 -> section 2

>   - How does page migration with _PAGE_NUMA work?
>   -> Why should we not map pages when _PAGE_NUMA is set?
Group 2 -> section 1
(Note: for the 1st question in this group, I am not sure about the
details, except that we can fix numa balancing by moving task or
moving page.  So I comment as " migration should be involved to cut
down the distance between the cpu and pages")

>   - Which part of what needs to be done did the previous _PAGE_NUMA patch
> address?
>   - What's the situation without this patch?
>   - Which scenario does this patch fix?
>
Group 3 -> section 3


Numa fault is a method which help to achieve auto numa balancing.
When such a page fault takes place, the page fault handler will check
whether the page is placed correctly. If not, migration should be
involved to cut down the distance between the cpu and pages.

A pte with _PAGE_NUMA help to implement numa fault. It means not to
allow the MMU to access the page directly. So a page fault is triggered
and numa fault handler gets the opportunity to run checker.

As for the access of MMU, we need special handling for the powernv's guest.
When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
invalidate it in guest's htab, but when we tried to re-insert them,
we firstly try to fix it in real-mode. Only after this fails, we fallback
to virt mode, and most of important, we run numa fault handler in virt
mode.  This patch guards the way of real-mode to ensure that if a pte is
marked with _PAGE_NUMA, it will NOT be fixed in real mode, instead, it will
be fixed in virt mode and have the opportunity to be checked with placement.


Thx,
Fan


> Once you have a text that answers those, you should have a good patch
> description :).
>
> Alex
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect
  2014-04-14  8:08       ` liu ping fan
@ 2014-04-14  9:01         ` Alexander Graf
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander Graf @ 2014-04-14  9:01 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, kvm@vger.kernel.org mailing list, kvm-ppc,
	Paul Mackerras, Liu ping fan, linuxppc-dev, Aneesh Kumar K.V


On 14.04.14 10:08, liu ping fan wrote:
> On Mon, Apr 14, 2014 at 2:43 PM, Alexander Graf <agraf@suse.de> wrote:
>> On 13.04.14 04:27, Liu ping fan wrote:
>>> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf@suse.de> wrote:
>>>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> wrote:
>>>>
>>>>> When we mark pte with _PAGE_NUMA we already call
>>>>> mmu_notifier_invalidate_range_start
>>>>> and mmu_notifier_invalidate_range_end, which will mark existing guest
>>>>> hpte
>>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>>> guest hpte entries.
>>>> What happens when we don't? Why do we need the check? Why isn't it done
>>>> implicitly? What happens when we treat a NUMA marked page as non-present?
>>>> Why does it work out for us?
>>>>
>>>> Assume you have no idea what PAGE_NUMA is, but try to figure out what
>>>> this patch does and whether you need to cherry-pick it into your downstream
>>>> kernel. The description as is still is not very helpful for that. It doesn't
>>>> even explain what really changes with this patch applied.
>>>>
>>> Yeah.  what about appending the following description?  Can it make
>>> the context clear?
>>> "Guest should not setup a hpte for the page whose pte is marked with
>>> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
>>> to check whether the page is placed correctly or not."
>>
>> Try to come up with a text that answers the following questions in order:
>>
> I divide them into 3 groups, and answer them by 3 sections. Seems that
> it has the total story :)
> Please take a look.
>
>>    - What does _PAGE_NUMA mean?
> Group 1 -> section 2
>
>>    - How does page migration with _PAGE_NUMA work?
>>    -> Why should we not map pages when _PAGE_NUMA is set?
> Group 2 -> section 1
> (Note: for the 1st question in this group, I am not sure about the
> details, except that we can fix numa balancing by moving task or
> moving page.  So I comment as " migration should be involved to cut
> down the distance between the cpu and pages")
>
>>    - Which part of what needs to be done did the previous _PAGE_NUMA patch
>> address?
>>    - What's the situation without this patch?
>>    - Which scenario does this patch fix?
>>
> Group 3 -> section 3
>
>
> Numa fault is a method which help to achieve auto numa balancing.
> When such a page fault takes place, the page fault handler will check
> whether the page is placed correctly. If not, migration should be
> involved to cut down the distance between the cpu and pages.
>
> A pte with _PAGE_NUMA help to implement numa fault. It means not to
> allow the MMU to access the page directly. So a page fault is triggered
> and numa fault handler gets the opportunity to run checker.
>
> As for the access of MMU, we need special handling for the powernv's guest.
> When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
> invalidate it in guest's htab, but when we tried to re-insert them,
> we firstly try to fix it in real-mode. Only after this fails, we fallback
> to virt mode, and most of important, we run numa fault handler in virt
> mode.  This patch guards the way of real-mode to ensure that if a pte is
> marked with _PAGE_NUMA, it will NOT be fixed in real mode, instead, it will
> be fixed in virt mode and have the opportunity to be checked with placement.

s/fixed/mapped/g

Otherwise works as patch description for me :).


Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-14  9:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-11 11:45 [PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect Liu Ping Fan
2014-04-11 14:03 ` Alexander Graf
2014-04-13  2:27   ` Liu ping fan
2014-04-14  6:43     ` Alexander Graf
2014-04-14  8:08       ` liu ping fan
2014-04-14  9:01         ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).