Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org, paulus@samba.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
Date: Tue, 06 May 2014 11:39:47 +0200	[thread overview]
Message-ID: <5368ADE3.1050503@suse.de> (raw)
In-Reply-To: <1399368400.18906.9.camel@pasglop>

On 05/06/2014 11:26 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote:
>
>> So if I understand this patch correctly, it simply introduces logic to
>> handle page sizes other than 4k, 64k, 16M by analyzing the actual page
>> size field in the HPTE. Mind to explain why exactly that enables us to
>> use THP?
>>
>> What exactly is the flow if the pages are not backed by huge pages? What
>> is the flow when they start to get backed by huge pages?
> The hypervisor doesn't care about segments ... but it needs to properly
> decode the page size requested by the guest, if anything, to issue the
> right form of tlbie instruction.
>
> The encoding in the HPTE for a 16M page inside a 64K segment is
> different than the encoding for a 16M in a 16M segment, this is done so
> that the encoding carries both information, which allows broadcast
> tlbie to properly find the right set in the TLB for invalidations among
> others.
>
> So from a KVM perspective, we don't know whether the guest is doing THP
> or something else (Linux calls it THP but all we care here is that this
> is MPSS, another guest than Linux might exploit that differently).

Ugh. So we're just talking about a guest using MPSS here? Not about the 
host doing THP? I must've missed that part.

>
> What we do know is that if we advertise MPSS, we need to decode the page
> sizes encoded in the HPTE so that we know what we are dealing with in
> H_ENTER and can do the appropriate TLB invalidations in H_REMOVE &
> evictions.

Yes. That makes a lot of sense. So this patch really is all about 
enabling MPSS support for 16MB pages. No more, no less.

>
>>> +			if (a_size != -1)
>>> +				return 1ul << mmu_psize_defs[a_size].shift;
>>> +		}
>>> +
>>> +	}
>>> +	return 0;
>>>    }
>>>    
>>>    static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 8227dba5af0f..a38d3289320a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
>>>    	 * support pte_enc here
>>>    	 */
>>>    	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
>>> +	/*
>>> +	 * Add 16MB MPSS support
>>> +	 */
>>> +	if (linux_psize != MMU_PAGE_16M) {
>>> +		(*sps)->enc[1].page_shift = 24;
>>> +		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
>>> +	}
>> So this basically indicates that every segment (except for the 16MB one)
>> can also handle 16MB MPSS page sizes? I suppose you want to remove the
>> comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS here.
> I haven't reviewed the code there, make sure it will indeed do a
> different encoding for every combination of segment/actual page size.
>
>> Can we also ensure that every system we run on can do MPSS?
> P7 and P8 are identical in that regard. However 970 doesn't do MPSS so
> let's make sure we get that right.

yes. When / if people can easily get their hands on p7/p8 bare metal 
systems I'll be more than happy to remove 970 support as well, but for 
now it's probably good to keep in.


Alex

next prev parent reply	other threads:[~2014-05-06  9:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-04 17:30 [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest Aneesh Kumar K.V
2014-05-04 17:36 ` Aneesh Kumar K.V
2014-05-05 11:38 ` Alexander Graf
2014-05-05 14:47   ` Aneesh Kumar K.V
2014-05-06  4:20     ` Paul Mackerras
2014-05-06 14:25       ` Aneesh Kumar K.V
2014-05-06  9:12 ` Alexander Graf
2014-05-06  9:26   ` Benjamin Herrenschmidt
2014-05-06  9:39     ` Alexander Graf [this message]
2014-05-06 15:06       ` Aneesh Kumar K.V
2014-05-06 15:23         ` Alexander Graf
2014-05-06 16:08           ` Aneesh Kumar K.V
2014-05-06 16:18             ` Alexander Graf
2014-05-06 20:35             ` Benjamin Herrenschmidt
2014-05-06 14:23   ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5368ADE3.1050503@suse.de \
    --to=agraf@suse.de \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).