From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ye0-f170.google.com (mail-ye0-f170.google.com [209.85.213.170]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (not verified)) by ozlabs.org (Postfix) with ESMTPS id EAF212C020F for ; Fri, 19 Apr 2013 11:56:00 +1000 (EST) Received: by mail-ye0-f170.google.com with SMTP id q9so560422yen.1 for ; Thu, 18 Apr 2013 18:55:57 -0700 (PDT) Message-ID: <5170A426.4060807@gmail.com> Date: Fri, 19 Apr 2013 09:55:50 +0800 From: Simon Jeons MIME-Version: 1.0 To: "Aneesh Kumar K.V" Subject: Re: [PATCH -V5 00/25] THP support for PPC64 References: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> In-Reply-To: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Cc: paulus@samba.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi Aneesh, On 04/04/2013 01:57 PM, Aneesh Kumar K.V wrote: > Hi, > > This patchset adds transparent hugepage support for PPC64. > > TODO: > * hash preload support in update_mmu_cache_pmd (we don't do that for hugetlb) > > Some numbers: > > The latency measurements code from Anton found at > http://ozlabs.org/~anton/junkcode/latency2001.c > > THP disabled 64K page size > ------------------------ > [root@llmp24l02 ~]# ./latency2001 8G > 8589934592 731.73 cycles 205.77 ns > [root@llmp24l02 ~]# ./latency2001 8G > 8589934592 743.39 cycles 209.05 ns > [root@llmp24l02 ~]# > > THP disabled large page via hugetlbfs > ------------------------------------- > [root@llmp24l02 ~]# ./latency2001 -l 8G > 8589934592 416.09 cycles 117.01 ns > [root@llmp24l02 ~]# ./latency2001 -l 8G > 8589934592 415.74 cycles 116.91 ns > > THP enabled 64K page size. > ---------------- > [root@llmp24l02 ~]# ./latency2001 8G > 8589934592 405.07 cycles 113.91 ns > [root@llmp24l02 ~]# ./latency2001 8G > 8589934592 411.82 cycles 115.81 ns > [root@llmp24l02 ~]# > > We are close to hugetlbfs in latency and we can achieve this with zero > config/page reservation. Most of the allocations above are fault allocated. > > Another test that does 50000000 random access over 1GB area goes from > 2.65 seconds to 1.07 seconds with this patchset. > > split_huge_page impact: > --------------------- > To look at the performance impact of large page invalidate, I tried the below > experiment. The test involved, accessing a large contiguous region of memory > location as below > > for (i = 0; i < size; i += PAGE_SIZE) > data[i] = i; > > We wanted to access the data in sequential order so that we look at the > worst case THP performance. Accesing the data in sequential order implies > we have the Page table cached and overhead of TLB miss is as minimal as > possible. We also don't touch the entire page, because that can result in > cache evict. > > After we touched the full range as above, we now call mprotect on each > of that page. A mprotect will result in a hugepage split. This should > allow us to measure the impact of hugepage split. > > for (i = 0; i < size; i += PAGE_SIZE) > mprotect(&data[i], PAGE_SIZE, PROT_READ); > > Split hugepage impact: > --------------------- > THP enabled: 2.851561705 seconds for test completion > THP disable: 3.599146098 seconds for test completion > > We are 20.7% better than non THP case even when we have all the large pages split. > > Detailed output: > > THP enabled: > --------------------------------------- > [root@llmp24l02 ~]# cat /proc/vmstat | grep thp > thp_fault_alloc 0 > thp_fault_fallback 0 > thp_collapse_alloc 0 > thp_collapse_alloc_failed 0 > thp_split 0 > thp_zero_page_alloc 0 > thp_zero_page_alloc_failed 0 > [root@llmp24l02 ~]# /root/thp/tools/perf/perf stat -e page-faults,dTLB-load-misses ./split-huge-page-mpro 20G > time taken to touch all the data in ns: 2763096913 > > Performance counter stats for './split-huge-page-mpro 20G': > > 1,581 page-faults > 3,159 dTLB-load-misses > > 2.851561705 seconds time elapsed > > [root@llmp24l02 ~]# > [root@llmp24l02 ~]# cat /proc/vmstat | grep thp > thp_fault_alloc 1279 > thp_fault_fallback 0 > thp_collapse_alloc 0 > thp_collapse_alloc_failed 0 > thp_split 1279 > thp_zero_page_alloc 0 > thp_zero_page_alloc_failed 0 > [root@llmp24l02 ~]# > > 77.05% split-huge-page [kernel.kallsyms] [k] .clear_user_page > 7.10% split-huge-page [kernel.kallsyms] [k] .perf_event_mmap_ctx > 1.51% split-huge-page split-huge-page-mpro [.] 0x0000000000000a70 > 0.96% split-huge-page [unknown] [H] 0x000000000157e3bc > 0.81% split-huge-page [kernel.kallsyms] [k] .up_write > 0.76% split-huge-page [kernel.kallsyms] [k] .perf_event_mmap > 0.76% split-huge-page [kernel.kallsyms] [k] .down_write > 0.74% split-huge-page [kernel.kallsyms] [k] .lru_add_page_tail > 0.61% split-huge-page [kernel.kallsyms] [k] .split_huge_page > 0.59% split-huge-page [kernel.kallsyms] [k] .change_protection > 0.51% split-huge-page [kernel.kallsyms] [k] .release_pages > > > 0.96% split-huge-page [unknown] [H] 0x000000000157e3bc > | > |--79.44%-- reloc_start > | | > | |--86.54%-- .__pSeries_lpar_hugepage_invalidate > | | .pSeries_lpar_hugepage_invalidate > | | .hpte_need_hugepage_flush > | | .split_huge_page > | | .__split_huge_page_pmd > | | .vma_adjust > | | .vma_merge > | | .mprotect_fixup > | | .SyS_mprotect > > > THP disabled: > --------------- > [root@llmp24l02 ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled > [root@llmp24l02 ~]# /root/thp/tools/perf/perf stat -e page-faults,dTLB-load-misses ./split-huge-page-mpro 20G > time taken to touch all the data in ns: 3513767220 > > Performance counter stats for './split-huge-page-mpro 20G': > > 3,27,726 page-faults > 3,29,654 dTLB-load-misses > > 3.599146098 seconds time elapsed > > [root@llmp24l02 ~]# Thanks for your great work. One question about page table of ppc64: Why x86 use tree based page table and ppc64 use hash based page table? > > Changes from V4: > * Fix bad page error in page_table_alloc > BUG: Bad page state in process stream pfn:f1a59 > page:f0000000034dc378 count:1 mapcount:0 mapping: (null) index:0x0 > [c000000f322c77d0] [c00000000015e198] .bad_page+0xe8/0x140 > [c000000f322c7860] [c00000000015e3c4] .free_pages_prepare+0x1d4/0x1e0 > [c000000f322c7910] [c000000000160450] .free_hot_cold_page+0x50/0x230 > [c000000f322c79c0] [c00000000003ad18] .page_table_alloc+0x168/0x1c0 > > Changes from V3: > * PowerNV boot fixes > > Change from V2: > * Change patch "powerpc: Reduce PTE table memory wastage" to use much simpler approach > for PTE page sharing. > * Changes to handle huge pages in KVM code. > * Address other review comments > > Changes from V1 > * Address review comments > * More patch split > * Add batch hpte invalidate for hugepages. > > Changes from RFC V2: > * Address review comments > * More code cleanup and patch split > > Changes from RFC V1: > * HugeTLB fs now works > * Compile issues fixed > * rebased to v3.8 > * Patch series reorded so that ppc64 cleanups and MM THP changes are moved > early in the series. This should help in picking those patches early. > > Thanks, > -aneesh > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org