From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simon.jeons@gmail.com>
Received: from mail-ye0-f170.google.com (mail-ye0-f170.google.com
 [209.85.213.170])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (not verified))
 by ozlabs.org (Postfix) with ESMTPS id EAF212C020F
 for <linuxppc-dev@lists.ozlabs.org>; Fri, 19 Apr 2013 11:56:00 +1000 (EST)
Received: by mail-ye0-f170.google.com with SMTP id q9so560422yen.1
 for <linuxppc-dev@lists.ozlabs.org>; Thu, 18 Apr 2013 18:55:57 -0700 (PDT)
Message-ID: <5170A426.4060807@gmail.com>
Date: Fri, 19 Apr 2013 09:55:50 +0800
From: Simon Jeons <simon.jeons@gmail.com>
MIME-Version: 1.0
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [PATCH -V5 00/25] THP support for PPC64
References: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
In-Reply-To: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Cc: paulus@samba.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.lists.ozlabs.org>
List-Unsubscribe: <https://lists.ozlabs.org/options/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>
List-Archive: <http://lists.ozlabs.org/pipermail/linuxppc-dev/>
List-Post: <mailto:linuxppc-dev@lists.ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>
List-Subscribe: <https://lists.ozlabs.org/listinfo/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>

Hi Aneesh,
On 04/04/2013 01:57 PM, Aneesh Kumar K.V wrote:
> Hi,
>
> This patchset adds transparent hugepage support for PPC64.
>
> TODO:
> * hash preload support in update_mmu_cache_pmd (we don't do that for hugetlb)
>
> Some numbers:
>
> The latency measurements code from Anton  found at
> http://ozlabs.org/~anton/junkcode/latency2001.c
>
> THP disabled 64K page size
> ------------------------
> [root@llmp24l02 ~]# ./latency2001 8G
>   8589934592    731.73 cycles    205.77 ns
> [root@llmp24l02 ~]# ./latency2001 8G
>   8589934592    743.39 cycles    209.05 ns
> [root@llmp24l02 ~]#
>
> THP disabled large page via hugetlbfs
> -------------------------------------
> [root@llmp24l02 ~]# ./latency2001  -l 8G
>   8589934592    416.09 cycles    117.01 ns
> [root@llmp24l02 ~]# ./latency2001  -l 8G
>   8589934592    415.74 cycles    116.91 ns
>
> THP enabled 64K page size.
> ----------------
> [root@llmp24l02 ~]# ./latency2001 8G
>   8589934592    405.07 cycles    113.91 ns
> [root@llmp24l02 ~]# ./latency2001 8G
>   8589934592    411.82 cycles    115.81 ns
> [root@llmp24l02 ~]#
>
> We are close to hugetlbfs in latency and we can achieve this with zero
> config/page reservation. Most of the allocations above are fault allocated.
>
> Another test that does 50000000 random access over 1GB area goes from
> 2.65 seconds to 1.07 seconds with this patchset.
>
> split_huge_page impact:
> ---------------------
> To look at the performance impact of large page invalidate, I tried the below
> experiment. The test involved, accessing a large contiguous region of memory
> location as below
>
>      for (i = 0; i < size; i += PAGE_SIZE)
> 	data[i] = i;
>
> We wanted to access the data in sequential order so that we look at the
> worst case THP performance. Accesing the data in sequential order implies
> we have the Page table cached and overhead of TLB miss is as minimal as
> possible. We also don't touch the entire page, because that can result in
> cache evict.
>
> After we touched the full range as above, we now call mprotect on each
> of that page. A mprotect will result in a hugepage split. This should
> allow us to measure the impact of hugepage split.
>
>      for (i = 0; i < size; i += PAGE_SIZE)
> 	 mprotect(&data[i], PAGE_SIZE, PROT_READ);
>
> Split hugepage impact:
> ---------------------
> THP enabled: 2.851561705 seconds for test completion
> THP disable: 3.599146098 seconds for test completion
>
> We are 20.7% better than non THP case even when we have all the large pages split.
>
> Detailed output:
>
> THP enabled:
> ---------------------------------------
> [root@llmp24l02 ~]# cat /proc/vmstat  | grep thp
> thp_fault_alloc 0
> thp_fault_fallback 0
> thp_collapse_alloc 0
> thp_collapse_alloc_failed 0
> thp_split 0
> thp_zero_page_alloc 0
> thp_zero_page_alloc_failed 0
> [root@llmp24l02 ~]# /root/thp/tools/perf/perf stat -e page-faults,dTLB-load-misses ./split-huge-page-mpro 20G
> time taken to touch all the data in ns: 2763096913
>
>   Performance counter stats for './split-huge-page-mpro 20G':
>
>               1,581 page-faults
>               3,159 dTLB-load-misses
>
>         2.851561705 seconds time elapsed
>
> [root@llmp24l02 ~]#
> [root@llmp24l02 ~]# cat /proc/vmstat  | grep thp
> thp_fault_alloc 1279
> thp_fault_fallback 0
> thp_collapse_alloc 0
> thp_collapse_alloc_failed 0
> thp_split 1279
> thp_zero_page_alloc 0
> thp_zero_page_alloc_failed 0
> [root@llmp24l02 ~]#
>
>      77.05%  split-huge-page  [kernel.kallsyms]     [k] .clear_user_page
>       7.10%  split-huge-page  [kernel.kallsyms]     [k] .perf_event_mmap_ctx
>       1.51%  split-huge-page  split-huge-page-mpro  [.] 0x0000000000000a70
>       0.96%  split-huge-page  [unknown]             [H] 0x000000000157e3bc
>       0.81%  split-huge-page  [kernel.kallsyms]     [k] .up_write
>       0.76%  split-huge-page  [kernel.kallsyms]     [k] .perf_event_mmap
>       0.76%  split-huge-page  [kernel.kallsyms]     [k] .down_write
>       0.74%  split-huge-page  [kernel.kallsyms]     [k] .lru_add_page_tail
>       0.61%  split-huge-page  [kernel.kallsyms]     [k] .split_huge_page
>       0.59%  split-huge-page  [kernel.kallsyms]     [k] .change_protection
>       0.51%  split-huge-page  [kernel.kallsyms]     [k] .release_pages
>
>
>       0.96%  split-huge-page  [unknown]             [H] 0x000000000157e3bc
>              |
>              |--79.44%-- reloc_start
>              |          |
>              |          |--86.54%-- .__pSeries_lpar_hugepage_invalidate
>              |          |          .pSeries_lpar_hugepage_invalidate
>              |          |          .hpte_need_hugepage_flush
>              |          |          .split_huge_page
>              |          |          .__split_huge_page_pmd
>              |          |          .vma_adjust
>              |          |          .vma_merge
>              |          |          .mprotect_fixup
>              |          |          .SyS_mprotect
>
>
> THP disabled:
> ---------------
> [root@llmp24l02 ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled
> [root@llmp24l02 ~]# /root/thp/tools/perf/perf stat -e page-faults,dTLB-load-misses ./split-huge-page-mpro 20G
> time taken to touch all the data in ns: 3513767220
>
>   Performance counter stats for './split-huge-page-mpro 20G':
>
>            3,27,726 page-faults
>            3,29,654 dTLB-load-misses
>
>         3.599146098 seconds time elapsed
>
> [root@llmp24l02 ~]#

Thanks for your great work. One question about page table of ppc64:
Why x86 use tree based page table and ppc64 use hash based page table?

>
> Changes from V4:
> * Fix bad page error in page_table_alloc
>    BUG: Bad page state in process stream  pfn:f1a59
>    page:f0000000034dc378 count:1 mapcount:0 mapping:          (null) index:0x0
>    [c000000f322c77d0] [c00000000015e198] .bad_page+0xe8/0x140
>    [c000000f322c7860] [c00000000015e3c4] .free_pages_prepare+0x1d4/0x1e0
>    [c000000f322c7910] [c000000000160450] .free_hot_cold_page+0x50/0x230
>    [c000000f322c79c0] [c00000000003ad18] .page_table_alloc+0x168/0x1c0
>
> Changes from V3:
> * PowerNV boot fixes
>
> Change from V2:
> * Change patch "powerpc: Reduce PTE table memory wastage" to use much simpler approach
>    for PTE page sharing.
> * Changes to handle huge pages in KVM code.
> * Address other review comments
>
> Changes from V1
> * Address review comments
> * More patch split
> * Add batch hpte invalidate for hugepages.
>
> Changes from RFC V2:
> * Address review comments
> * More code cleanup and patch split
>
> Changes from RFC V1:
> * HugeTLB fs now works
> * Compile issues fixed
> * rebased to v3.8
> * Patch series reorded so that ppc64 cleanups and MM THP changes are moved
>    early in the series. This should help in picking those patches early.
>
> Thanks,
> -aneesh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>