From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: David Gibson <dwg@au1.ibm.com>
Cc: paulus@samba.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: [PATCH -V7 09/10] powerpc: Optimize hugepage invalidate
Date: Sat, 04 May 2013 00:35:07 +0530 [thread overview]
Message-ID: <87fvy351gc.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130503052846.GU13041@truffula.fritz.box>
David Gibson <dwg@au1.ibm.com> writes:
> On Mon, Apr 29, 2013 at 01:21:50AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> Hugepage invalidate involves invalidating multiple hpte entries.
>> Optimize the operation using H_BULK_REMOVE on lpar platforms.
>> On native, reduce the number of tlb flush.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Since this is purely an optimization, have you tried reproducing the
> bugs you're chasing with this patch not included?
That was due to not handling thp split while walking page table. I have
that fixed. Will post the next version soon.
>
>> ---
>> arch/powerpc/include/asm/machdep.h | 3 +
>> arch/powerpc/mm/hash_native_64.c | 78 +++++++++++++++++++++
>> arch/powerpc/mm/pgtable_64.c | 13 +++-
>> arch/powerpc/platforms/pseries/lpar.c | 126 ++++++++++++++++++++++++++++++++--
>> 4 files changed, 210 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>> index 3f3f691..5d1e7d2 100644
>> --- a/arch/powerpc/include/asm/machdep.h
>> +++ b/arch/powerpc/include/asm/machdep.h
>> @@ -56,6 +56,9 @@ struct machdep_calls {
.....
>>
>> +/*
>> + * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
>> + * to make sure that we avoid bouncing the hypervisor tlbie lock.
>> + */
>> +#define PPC64_HUGE_HPTE_BATCH 12
>> +
>> +static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
>> + unsigned long *vpn, int count,
>> + int psize, int ssize)
>> +{
>> + unsigned long param[9];
>
> [9]? I only see 8 elements being used.
cut paste error from pSeries_lpar_flush_hash_range
>
>> + int i = 0, pix = 0, rc;
>> + unsigned long flags = 0;
>> + int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
>> +
>> + if (lock_tlbie)
>> + spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>
> Why are these hash operations being called with the tlbie lock held?
if the firmware doesn't support lockless TLBIE, we need to do locking
at the guest side. pSeries_lpar_flush_hash_range does that.
>
>> +
>> + for (i = 0; i < count; i++) {
>> +
>> + if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
>> + pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize,
>> + ssize, 0);
>
> Couldn't you set the ppc_md hook based on the firmware request to
> avoid this test in the inner loop? I don't see any tlbie operations
> at all.
didn't get that.
>
>> + } else {
>> + param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
>> + param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
>> + pix += 2;
>> + if (pix == 8) {
>> + rc = plpar_hcall9(H_BULK_REMOVE, param,
>> + param[0], param[1], param[2],
>> + param[3], param[4], param[5],
>> + param[6], param[7]);
>> + BUG_ON(rc != H_SUCCESS);
>> + pix = 0;
>> + }
>> + }
>> + }
>> + if (pix) {
>> + param[pix] = HBR_END;
>> + rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1],
>> + param[2], param[3], param[4], param[5],
>> + param[6], param[7]);
>> + BUG_ON(rc != H_SUCCESS);
>> + }
>> +
>> + if (lock_tlbie)
>> + spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
>> +}
>> +
>> +static void pSeries_lpar_hugepage_invalidate(struct mm_struct *mm,
>> + unsigned char *hpte_slot_array,
>> + unsigned long addr, int psize)
>> +{
>> + int ssize = 0, i, index = 0;
>> + unsigned long s_addr = addr;
>> + unsigned int max_hpte_count, valid;
>> + unsigned long vpn_array[PPC64_HUGE_HPTE_BATCH];
>> + unsigned long slot_array[PPC64_HUGE_HPTE_BATCH];
>> + unsigned long shift, hidx, vpn = 0, vsid, hash, slot;
>> +
>> + shift = mmu_psize_defs[psize].shift;
>> + max_hpte_count = HUGE_PAGE_SIZE >> shift;
>> +
>> + for (i = 0; i < max_hpte_count; i++) {
>> + /*
>> + * 8 bits per each hpte entries
>> + * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
>> + */
>> + valid = hpte_slot_array[i] & 0x1;
>> + if (!valid)
>> + continue;
>> + hidx = hpte_slot_array[i] >> 1;
>> +
>> + /* get the vpn */
>> + addr = s_addr + (i * (1ul << shift));
>> + if (!is_kernel_addr(addr)) {
>> + ssize = user_segment_size(addr);
>> + vsid = get_vsid(mm->context.id, addr, ssize);
>> + WARN_ON(vsid == 0);
>> + } else {
>> + vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
>> + ssize = mmu_kernel_ssize;
>> + }
>> +
>> + vpn = hpt_vpn(addr, vsid, ssize);
>> + hash = hpt_hash(vpn, shift, ssize);
>> + if (hidx & _PTEIDX_SECONDARY)
>> + hash = ~hash;
>> +
>> + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
>> + slot += hidx & _PTEIDX_GROUP_IX;
>> +
>> + slot_array[index] = slot;
>> + vpn_array[index] = vpn;
>> + if (index == PPC64_HUGE_HPTE_BATCH - 1) {
>> + /*
>> + * Now do a bluk invalidate
>> + */
>> + __pSeries_lpar_hugepage_invalidate(slot_array,
>> + vpn_array,
>> + PPC64_HUGE_HPTE_BATCH,
>> + psize, ssize);
>
> I don't really understand why you have one loop in this function, then
> another in the __ function.
?? if we didn't accumulate batch size number of entries, we won't call
the above. Hence we will have to do the bulk remove outside the if
loop.
>
>> + index = 0;
>> + } else
>> + index++;
>> + }
>> + if (index)
>> + __pSeries_lpar_hugepage_invalidate(slot_array, vpn_array,
>> + index, psize, ssize);
>> +}
>> +
>> static void pSeries_lpar_hpte_removebolted(unsigned long ea,
>> int psize, int ssize)
>> {
>> @@ -360,13 +478,6 @@ static void pSeries_lpar_hpte_removebolted(unsigned long ea,
>> pSeries_lpar_hpte_invalidate(slot, vpn, psize, ssize, 0);
>> }
>>
>> -/* Flag bits for H_BULK_REMOVE */
>> -#define HBR_REQUEST 0x4000000000000000UL
>> -#define HBR_RESPONSE 0x8000000000000000UL
>> -#define HBR_END 0xc000000000000000UL
>> -#define HBR_AVPN 0x0200000000000000UL
>> -#define HBR_ANDCOND 0x0100000000000000UL
>> -
>> /*
>> * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>> * lock.
>> @@ -452,6 +563,7 @@ void __init hpte_init_lpar(void)
>> ppc_md.hpte_removebolted = pSeries_lpar_hpte_removebolted;
>> ppc_md.flush_hash_range = pSeries_lpar_flush_hash_range;
>> ppc_md.hpte_clear_all = pSeries_lpar_hptab_clear;
>> + ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
>> }
>>
>> #ifdef CONFIG_PPC_SMLPAR
>
> --
-aneesh
WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: David Gibson <dwg@au1.ibm.com>
Cc: benh@kernel.crashing.org, paulus@samba.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH -V7 09/10] powerpc: Optimize hugepage invalidate
Date: Sat, 04 May 2013 00:35:07 +0530 [thread overview]
Message-ID: <87fvy351gc.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130503052846.GU13041@truffula.fritz.box>
David Gibson <dwg@au1.ibm.com> writes:
> On Mon, Apr 29, 2013 at 01:21:50AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> Hugepage invalidate involves invalidating multiple hpte entries.
>> Optimize the operation using H_BULK_REMOVE on lpar platforms.
>> On native, reduce the number of tlb flush.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Since this is purely an optimization, have you tried reproducing the
> bugs you're chasing with this patch not included?
That was due to not handling thp split while walking page table. I have
that fixed. Will post the next version soon.
>
>> ---
>> arch/powerpc/include/asm/machdep.h | 3 +
>> arch/powerpc/mm/hash_native_64.c | 78 +++++++++++++++++++++
>> arch/powerpc/mm/pgtable_64.c | 13 +++-
>> arch/powerpc/platforms/pseries/lpar.c | 126 ++++++++++++++++++++++++++++++++--
>> 4 files changed, 210 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>> index 3f3f691..5d1e7d2 100644
>> --- a/arch/powerpc/include/asm/machdep.h
>> +++ b/arch/powerpc/include/asm/machdep.h
>> @@ -56,6 +56,9 @@ struct machdep_calls {
.....
>>
>> +/*
>> + * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
>> + * to make sure that we avoid bouncing the hypervisor tlbie lock.
>> + */
>> +#define PPC64_HUGE_HPTE_BATCH 12
>> +
>> +static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
>> + unsigned long *vpn, int count,
>> + int psize, int ssize)
>> +{
>> + unsigned long param[9];
>
> [9]? I only see 8 elements being used.
cut paste error from pSeries_lpar_flush_hash_range
>
>> + int i = 0, pix = 0, rc;
>> + unsigned long flags = 0;
>> + int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
>> +
>> + if (lock_tlbie)
>> + spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>
> Why are these hash operations being called with the tlbie lock held?
if the firmware doesn't support lockless TLBIE, we need to do locking
at the guest side. pSeries_lpar_flush_hash_range does that.
>
>> +
>> + for (i = 0; i < count; i++) {
>> +
>> + if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
>> + pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize,
>> + ssize, 0);
>
> Couldn't you set the ppc_md hook based on the firmware request to
> avoid this test in the inner loop? I don't see any tlbie operations
> at all.
didn't get that.
>
>> + } else {
>> + param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
>> + param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
>> + pix += 2;
>> + if (pix == 8) {
>> + rc = plpar_hcall9(H_BULK_REMOVE, param,
>> + param[0], param[1], param[2],
>> + param[3], param[4], param[5],
>> + param[6], param[7]);
>> + BUG_ON(rc != H_SUCCESS);
>> + pix = 0;
>> + }
>> + }
>> + }
>> + if (pix) {
>> + param[pix] = HBR_END;
>> + rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1],
>> + param[2], param[3], param[4], param[5],
>> + param[6], param[7]);
>> + BUG_ON(rc != H_SUCCESS);
>> + }
>> +
>> + if (lock_tlbie)
>> + spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
>> +}
>> +
>> +static void pSeries_lpar_hugepage_invalidate(struct mm_struct *mm,
>> + unsigned char *hpte_slot_array,
>> + unsigned long addr, int psize)
>> +{
>> + int ssize = 0, i, index = 0;
>> + unsigned long s_addr = addr;
>> + unsigned int max_hpte_count, valid;
>> + unsigned long vpn_array[PPC64_HUGE_HPTE_BATCH];
>> + unsigned long slot_array[PPC64_HUGE_HPTE_BATCH];
>> + unsigned long shift, hidx, vpn = 0, vsid, hash, slot;
>> +
>> + shift = mmu_psize_defs[psize].shift;
>> + max_hpte_count = HUGE_PAGE_SIZE >> shift;
>> +
>> + for (i = 0; i < max_hpte_count; i++) {
>> + /*
>> + * 8 bits per each hpte entries
>> + * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
>> + */
>> + valid = hpte_slot_array[i] & 0x1;
>> + if (!valid)
>> + continue;
>> + hidx = hpte_slot_array[i] >> 1;
>> +
>> + /* get the vpn */
>> + addr = s_addr + (i * (1ul << shift));
>> + if (!is_kernel_addr(addr)) {
>> + ssize = user_segment_size(addr);
>> + vsid = get_vsid(mm->context.id, addr, ssize);
>> + WARN_ON(vsid == 0);
>> + } else {
>> + vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
>> + ssize = mmu_kernel_ssize;
>> + }
>> +
>> + vpn = hpt_vpn(addr, vsid, ssize);
>> + hash = hpt_hash(vpn, shift, ssize);
>> + if (hidx & _PTEIDX_SECONDARY)
>> + hash = ~hash;
>> +
>> + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
>> + slot += hidx & _PTEIDX_GROUP_IX;
>> +
>> + slot_array[index] = slot;
>> + vpn_array[index] = vpn;
>> + if (index == PPC64_HUGE_HPTE_BATCH - 1) {
>> + /*
>> + * Now do a bluk invalidate
>> + */
>> + __pSeries_lpar_hugepage_invalidate(slot_array,
>> + vpn_array,
>> + PPC64_HUGE_HPTE_BATCH,
>> + psize, ssize);
>
> I don't really understand why you have one loop in this function, then
> another in the __ function.
?? if we didn't accumulate batch size number of entries, we won't call
the above. Hence we will have to do the bulk remove outside the if
loop.
>
>> + index = 0;
>> + } else
>> + index++;
>> + }
>> + if (index)
>> + __pSeries_lpar_hugepage_invalidate(slot_array, vpn_array,
>> + index, psize, ssize);
>> +}
>> +
>> static void pSeries_lpar_hpte_removebolted(unsigned long ea,
>> int psize, int ssize)
>> {
>> @@ -360,13 +478,6 @@ static void pSeries_lpar_hpte_removebolted(unsigned long ea,
>> pSeries_lpar_hpte_invalidate(slot, vpn, psize, ssize, 0);
>> }
>>
>> -/* Flag bits for H_BULK_REMOVE */
>> -#define HBR_REQUEST 0x4000000000000000UL
>> -#define HBR_RESPONSE 0x8000000000000000UL
>> -#define HBR_END 0xc000000000000000UL
>> -#define HBR_AVPN 0x0200000000000000UL
>> -#define HBR_ANDCOND 0x0100000000000000UL
>> -
>> /*
>> * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>> * lock.
>> @@ -452,6 +563,7 @@ void __init hpte_init_lpar(void)
>> ppc_md.hpte_removebolted = pSeries_lpar_hpte_removebolted;
>> ppc_md.flush_hash_range = pSeries_lpar_flush_hash_range;
>> ppc_md.hpte_clear_all = pSeries_lpar_hptab_clear;
>> + ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
>> }
>>
>> #ifdef CONFIG_PPC_SMLPAR
>
> --
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-03 19:05 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-28 19:51 [PATCH -V7 00/10] THP support for PPC64 (Patchset 2) Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-04-28 19:51 ` [PATCH -V7 01/10] powerpc/THP: Double the PMD table size for THP Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 3:21 ` David Gibson
2013-05-03 3:21 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 02/10] powerpc/THP: Implement transparent hugepages for ppc64 Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 4:52 ` David Gibson
2013-05-03 4:52 ` David Gibson
2013-05-03 8:19 ` Benjamin Herrenschmidt
2013-05-03 8:19 ` Benjamin Herrenschmidt
2013-05-03 11:54 ` David Gibson
2013-05-03 11:54 ` David Gibson
2013-05-03 13:00 ` Benjamin Herrenschmidt
2013-05-03 13:00 ` Benjamin Herrenschmidt
2013-05-03 18:54 ` Aneesh Kumar K.V
2013-05-03 18:54 ` Aneesh Kumar K.V
2013-05-04 19:14 ` Aneesh Kumar K.V
2013-05-04 19:14 ` Aneesh Kumar K.V
2013-05-04 21:39 ` Benjamin Herrenschmidt
2013-05-04 21:39 ` Benjamin Herrenschmidt
2013-05-06 1:28 ` David Gibson
2013-05-06 1:28 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 03/10] powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-04-28 19:51 ` [PATCH -V7 04/10] powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 4:53 ` David Gibson
2013-05-03 4:53 ` David Gibson
2013-05-03 18:58 ` Aneesh Kumar K.V
2013-05-03 18:58 ` Aneesh Kumar K.V
2013-05-04 6:28 ` David Gibson
2013-05-04 6:28 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 05/10] powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 4:56 ` David Gibson
2013-05-03 4:56 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 06/10] powerpc: Update gup_pmd_range to handle transparent hugepages Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 4:57 ` David Gibson
2013-05-03 4:57 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 07/10] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 5:13 ` David Gibson
2013-05-03 5:13 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 08/10] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 5:15 ` David Gibson
2013-05-03 5:15 ` David Gibson
2013-05-03 18:49 ` Aneesh Kumar K.V
2013-05-03 18:49 ` Aneesh Kumar K.V
2013-05-05 8:59 ` David Gibson
2013-05-05 8:59 ` David Gibson
2013-04-28 19:51 ` [PATCH -V7 09/10] powerpc: Optimize hugepage invalidate Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 5:28 ` David Gibson
2013-05-03 5:28 ` David Gibson
2013-05-03 19:05 ` Aneesh Kumar K.V [this message]
2013-05-03 19:05 ` Aneesh Kumar K.V
2013-05-03 21:54 ` Benjamin Herrenschmidt
2013-05-03 21:54 ` Benjamin Herrenschmidt
2013-04-28 19:51 ` [PATCH -V7 10/10] powerpc: disable assert_pte_locked Aneesh Kumar K.V
2013-04-28 19:51 ` Aneesh Kumar K.V
2013-05-03 5:30 ` David Gibson
2013-05-03 5:30 ` David Gibson
2013-05-03 19:07 ` Aneesh Kumar K.V
2013-05-03 19:07 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fvy351gc.fsf@linux.vnet.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=dwg@au1.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.