Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	paulus@samba.org, linuxppc-dev@lists.ozlabs.org,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
Date: Tue, 06 May 2014 09:39:47 +0000	[thread overview]
Message-ID: <5368ADE3.1050503@suse.de> (raw)
In-Reply-To: <1399368400.18906.9.camel@pasglop>

On 05/06/2014 11:26 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote:
>
>> So if I understand this patch correctly, it simply introduces logic to
>> handle page sizes other than 4k, 64k, 16M by analyzing the actual page
>> size field in the HPTE. Mind to explain why exactly that enables us to
>> use THP?
>>
>> What exactly is the flow if the pages are not backed by huge pages? What
>> is the flow when they start to get backed by huge pages?
> The hypervisor doesn't care about segments ... but it needs to properly
> decode the page size requested by the guest, if anything, to issue the
> right form of tlbie instruction.
>
> The encoding in the HPTE for a 16M page inside a 64K segment is
> different than the encoding for a 16M in a 16M segment, this is done so
> that the encoding carries both information, which allows broadcast
> tlbie to properly find the right set in the TLB for invalidations among
> others.
>
> So from a KVM perspective, we don't know whether the guest is doing THP
> or something else (Linux calls it THP but all we care here is that this
> is MPSS, another guest than Linux might exploit that differently).

Ugh. So we're just talking about a guest using MPSS here? Not about the 
host doing THP? I must've missed that part.

>
> What we do know is that if we advertise MPSS, we need to decode the page
> sizes encoded in the HPTE so that we know what we are dealing with in
> H_ENTER and can do the appropriate TLB invalidations in H_REMOVE &
> evictions.

Yes. That makes a lot of sense. So this patch really is all about 
enabling MPSS support for 16MB pages. No more, no less.

>
>>> +			if (a_size != -1)
>>> +				return 1ul << mmu_psize_defs[a_size].shift;
>>> +		}
>>> +
>>> +	}
>>> +	return 0;
>>>    }
>>>    
>>>    static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 8227dba5af0f..a38d3289320a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
>>>    	 * support pte_enc here
>>>    	 */
>>>    	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
>>> +	/*
>>> +	 * Add 16MB MPSS support
>>> +	 */
>>> +	if (linux_psize != MMU_PAGE_16M) {
>>> +		(*sps)->enc[1].page_shift = 24;
>>> +		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
>>> +	}
>> So this basically indicates that every segment (except for the 16MB one)
>> can also handle 16MB MPSS page sizes? I suppose you want to remove the
>> comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS here.
> I haven't reviewed the code there, make sure it will indeed do a
> different encoding for every combination of segment/actual page size.
>
>> Can we also ensure that every system we run on can do MPSS?
> P7 and P8 are identical in that regard. However 970 doesn't do MPSS so
> let's make sure we get that right.

yes. When / if people can easily get their hands on p7/p8 bare metal 
systems I'll be more than happy to remove 970 support as well, but for 
now it's probably good to keep in.


Alex

WARNING: multiple messages have this Message-ID (diff)

From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
Date: Tue, 06 May 2014 11:39:47 +0200	[thread overview]
Message-ID: <5368ADE3.1050503@suse.de> (raw)
In-Reply-To: <1399368400.18906.9.camel@pasglop>

On 05/06/2014 11:26 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote:
>
>> So if I understand this patch correctly, it simply introduces logic to
>> handle page sizes other than 4k, 64k, 16M by analyzing the actual page
>> size field in the HPTE. Mind to explain why exactly that enables us to
>> use THP?
>>
>> What exactly is the flow if the pages are not backed by huge pages? What
>> is the flow when they start to get backed by huge pages?
> The hypervisor doesn't care about segments ... but it needs to properly
> decode the page size requested by the guest, if anything, to issue the
> right form of tlbie instruction.
>
> The encoding in the HPTE for a 16M page inside a 64K segment is
> different than the encoding for a 16M in a 16M segment, this is done so
> that the encoding carries both information, which allows broadcast
> tlbie to properly find the right set in the TLB for invalidations among
> others.
>
> So from a KVM perspective, we don't know whether the guest is doing THP
> or something else (Linux calls it THP but all we care here is that this
> is MPSS, another guest than Linux might exploit that differently).

Ugh. So we're just talking about a guest using MPSS here? Not about the 
host doing THP? I must've missed that part.

>
> What we do know is that if we advertise MPSS, we need to decode the page
> sizes encoded in the HPTE so that we know what we are dealing with in
> H_ENTER and can do the appropriate TLB invalidations in H_REMOVE &
> evictions.

Yes. That makes a lot of sense. So this patch really is all about 
enabling MPSS support for 16MB pages. No more, no less.

>
>>> +			if (a_size != -1)
>>> +				return 1ul << mmu_psize_defs[a_size].shift;
>>> +		}
>>> +
>>> +	}
>>> +	return 0;
>>>    }
>>>    
>>>    static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 8227dba5af0f..a38d3289320a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
>>>    	 * support pte_enc here
>>>    	 */
>>>    	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
>>> +	/*
>>> +	 * Add 16MB MPSS support
>>> +	 */
>>> +	if (linux_psize != MMU_PAGE_16M) {
>>> +		(*sps)->enc[1].page_shift = 24;
>>> +		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
>>> +	}
>> So this basically indicates that every segment (except for the 16MB one)
>> can also handle 16MB MPSS page sizes? I suppose you want to remove the
>> comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS here.
> I haven't reviewed the code there, make sure it will indeed do a
> different encoding for every combination of segment/actual page size.
>
>> Can we also ensure that every system we run on can do MPSS?
> P7 and P8 are identical in that regard. However 970 doesn't do MPSS so
> let's make sure we get that right.

yes. When / if people can easily get their hands on p7/p8 bare metal 
systems I'll be more than happy to remove 970 support as well, but for 
now it's probably good to keep in.


Alex

WARNING: multiple messages have this Message-ID (diff)

From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	paulus@samba.org, linuxppc-dev@lists.ozlabs.org,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
Date: Tue, 06 May 2014 11:39:47 +0200	[thread overview]
Message-ID: <5368ADE3.1050503@suse.de> (raw)
In-Reply-To: <1399368400.18906.9.camel@pasglop>

On 05/06/2014 11:26 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote:
>
>> So if I understand this patch correctly, it simply introduces logic to
>> handle page sizes other than 4k, 64k, 16M by analyzing the actual page
>> size field in the HPTE. Mind to explain why exactly that enables us to
>> use THP?
>>
>> What exactly is the flow if the pages are not backed by huge pages? What
>> is the flow when they start to get backed by huge pages?
> The hypervisor doesn't care about segments ... but it needs to properly
> decode the page size requested by the guest, if anything, to issue the
> right form of tlbie instruction.
>
> The encoding in the HPTE for a 16M page inside a 64K segment is
> different than the encoding for a 16M in a 16M segment, this is done so
> that the encoding carries both information, which allows broadcast
> tlbie to properly find the right set in the TLB for invalidations among
> others.
>
> So from a KVM perspective, we don't know whether the guest is doing THP
> or something else (Linux calls it THP but all we care here is that this
> is MPSS, another guest than Linux might exploit that differently).

Ugh. So we're just talking about a guest using MPSS here? Not about the 
host doing THP? I must've missed that part.

>
> What we do know is that if we advertise MPSS, we need to decode the page
> sizes encoded in the HPTE so that we know what we are dealing with in
> H_ENTER and can do the appropriate TLB invalidations in H_REMOVE &
> evictions.

Yes. That makes a lot of sense. So this patch really is all about 
enabling MPSS support for 16MB pages. No more, no less.

>
>>> +			if (a_size != -1)
>>> +				return 1ul << mmu_psize_defs[a_size].shift;
>>> +		}
>>> +
>>> +	}
>>> +	return 0;
>>>    }
>>>    
>>>    static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 8227dba5af0f..a38d3289320a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
>>>    	 * support pte_enc here
>>>    	 */
>>>    	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
>>> +	/*
>>> +	 * Add 16MB MPSS support
>>> +	 */
>>> +	if (linux_psize != MMU_PAGE_16M) {
>>> +		(*sps)->enc[1].page_shift = 24;
>>> +		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
>>> +	}
>> So this basically indicates that every segment (except for the 16MB one)
>> can also handle 16MB MPSS page sizes? I suppose you want to remove the
>> comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS here.
> I haven't reviewed the code there, make sure it will indeed do a
> different encoding for every combination of segment/actual page size.
>
>> Can we also ensure that every system we run on can do MPSS?
> P7 and P8 are identical in that regard. However 970 doesn't do MPSS so
> let's make sure we get that right.

yes. When / if people can easily get their hands on p7/p8 bare metal 
systems I'll be more than happy to remove 970 support as well, but for 
now it's probably good to keep in.


Alex

next prev parent reply	other threads:[~2014-05-06  9:39 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-04 17:30 [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest Aneesh Kumar K.V
2014-05-04 17:42 ` Aneesh Kumar K.V
2014-05-04 17:30 ` Aneesh Kumar K.V
2014-05-04 17:36 ` Aneesh Kumar K.V
2014-05-04 17:48   ` Aneesh Kumar K.V
2014-05-04 17:36   ` Aneesh Kumar K.V
2014-05-05 11:38 ` Alexander Graf
2014-05-05 11:38   ` Alexander Graf
2014-05-05 11:38   ` Alexander Graf
2014-05-05 14:47   ` Aneesh Kumar K.V
2014-05-05 14:59     ` Aneesh Kumar K.V
2014-05-05 14:47     ` Aneesh Kumar K.V
2014-05-06  4:20     ` Paul Mackerras
2014-05-06  4:20       ` Paul Mackerras
2014-05-06  4:20       ` Paul Mackerras
2014-05-06 14:25       ` Aneesh Kumar K.V
2014-05-06 14:37         ` Aneesh Kumar K.V
2014-05-06 14:25         ` Aneesh Kumar K.V
2014-05-06  9:12 ` Alexander Graf
2014-05-06  9:12   ` Alexander Graf
2014-05-06  9:12   ` Alexander Graf
2014-05-06  9:26   ` Benjamin Herrenschmidt
2014-05-06  9:26     ` Benjamin Herrenschmidt
2014-05-06  9:26     ` Benjamin Herrenschmidt
2014-05-06  9:39     ` Alexander Graf [this message]
2014-05-06  9:39       ` Alexander Graf
2014-05-06  9:39       ` Alexander Graf
2014-05-06 15:06       ` Aneesh Kumar K.V
2014-05-06 15:18         ` Aneesh Kumar K.V
2014-05-06 15:06         ` Aneesh Kumar K.V
2014-05-06 15:23         ` Alexander Graf
2014-05-06 15:23           ` Alexander Graf
2014-05-06 15:23           ` Alexander Graf
2014-05-06 16:08           ` Aneesh Kumar K.V
2014-05-06 16:20             ` Aneesh Kumar K.V
2014-05-06 16:08             ` Aneesh Kumar K.V
2014-05-06 16:18             ` Alexander Graf
2014-05-06 16:18               ` Alexander Graf
2014-05-06 16:18               ` Alexander Graf
2014-05-06 20:35             ` Benjamin Herrenschmidt
2014-05-06 20:35               ` Benjamin Herrenschmidt
2014-05-06 20:35               ` Benjamin Herrenschmidt
2014-05-06 14:23   ` Aneesh Kumar K.V
2014-05-06 14:35     ` Aneesh Kumar K.V
2014-05-06 14:23     ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5368ADE3.1050503@suse.de \
    --to=agraf@suse.de \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.