From: Ryan Roberts <ryan.roberts@arm.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>,
Matthew Wilcox <willy@infradead.org>,
John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Yin Fengwei <fengwei.yin@intel.com>,
David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yang Shi <shy828301@gmail.com>,
"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Itaru Kitayama <itaru.kitayama@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory
Date: Mon, 13 Nov 2023 12:12:07 +0000 [thread overview]
Message-ID: <f034dd2c-4ce1-47e5-a3a6-c3c1fcab5c4b@arm.com> (raw)
In-Reply-To: <479b3e2b-456d-46c1-9677-38f6c95a0be8@huawei.com>
On 13/11/2023 11:52, Kefeng Wang wrote:
>
>
> On 2023/11/13 18:19, Ryan Roberts wrote:
>> On 13/11/2023 05:18, Matthew Wilcox wrote:
>>> On Sun, Nov 12, 2023 at 10:57:47PM -0500, John Hubbard wrote:
>>>> I've done some initial performance testing of this patchset on an arm64
>>>> SBSA server. When these patches are combined with the arm64 arch contpte
>>>> patches in Ryan's git tree (he has conveniently combined everything
>>>> here: [1]), we are seeing a remarkable, consistent speedup of 10.5x on
>>>> some memory-intensive workloads. Many test runs, conducted independently
>>>> by different engineers and on different machines, have convinced me and
>>>> my colleagues that this is an accurate result.
>>>>
>>>> In order to achieve that result, we used the git tree in [1] with
>>>> following settings:
>>>>
>>>> echo always >/sys/kernel/mm/transparent_hugepage/enabled
>>>> echo recommend >/sys/kernel/mm/transparent_hugepage/anon_orders
>>>>
>>>> This was on a aarch64 machine configure to use a 64KB base page size.
>>>> That configuration means that the PMD size is 512MB, which is of course
>>>> too large for practical use as a pure PMD-THP. However, with with these
>>>> small-size (less than PMD-sized) THPs, we get the improvements in TLB
>>>> coverage, while still getting pages that are small enough to be
>>>> effectively usable.
>>>
>>> That is quite remarkable!
>>
>> Yes, agreed - thanks for sharing these results! A very nice Monday morning boost!
>>
>>>
>>> My hope is to abolish the 64kB page size configuration. ie instead of
>>> using the mixture of page sizes that you currently are -- 64k and
>>> 1M (right? Order-0, and order-4)
>>
>> Not quite; the contpte-size for a 64K page size is 2M/order-5. (and yes, it is
>> 64K/order-4 for a 4K page size, and 2M/order-7 for a 16K page size. I agree that
>> intuitively you would expect the order to remain constant, but it doesn't).
>>
>> The "recommend" setting above will actually enable order-3 as well even though
>> there is no HW benefit to this. So the full set of available memory sizes here
>> is:
>>
>> 64K/order-0, 512K/order-3, 2M/order-5, 512M/order-13
>>
>>> , that 4k, 64k and 2MB (order-0,
>>> order-4 and order-9) will provide better performance.
>>>
>>> Have you run any experiements with a 4kB page size?
>>
>> Agree that would be interesting with 64K small-sized THP enabled. And I'd love
>> to get to a world were we universally deal in variable sized chunks of memory,
>> aligned on 4K boundaries.
>>
>> In my experience though, there are still some performance benefits to 64K base
>> page vs 4K+contpte; the page tables are more cache efficient for the former case
>> - 64K of memory is described by 8 bytes in the former vs 8x16=128 bytes in the
>> latter. In practice the HW will still only read 8 bytes in the latter but that's
>> taking up a full cache line vs the former where a single cache line stores 8x
>> 64K entries.
>
> We test some benchmark, eg, unixbench, lmbench, sysbench, with v5 on
> arm64 board(for better evaluation of anon large folio, using ext4,
> which don't support large folio for now), will test again and send
> the results once v7 out.
Thanks for the testing and for posting the insights!
>
> 1) base page 4k + without anon large folio
> 2) base page 64k + without anon large folio
> 3) base page 4k + with anon large folio + cont-pte(order = 4,0)
>
> Most of the test results from v5 show the 3) have a good improvement
> vs 1), but still low than 2)
Do you have any understanding what the shortfall is for these particular
workloads? Certainly the cache spatial locality benefit of the 64K page tables
could be a factor. But certainly for the workloads I've been looking at, a
bigger factor is often the fact that executable file-backed memory (elf
segments) are not in 64K folios and therefore not contpte-mapped. If the iTLB is
under pressure this can help a lot. I have a change (hack) to force all
executable mappings to be read-ahead into 64K folios and this gives an
improvement. But obviously that only works when the file system supports large
folios (so not ext4 right now). It would certainly be interesting to see just
how close to native 64K we can get when employing these extra ideas.
>, also for some latency-sensitive
> benchmark, 2) and 3) maybe have poor performance vs 1).
>
> Note, for pcp_allowed_order, order <= PAGE_ALLOC_COSTLY_ORDER=3, for
> 3), we maybe enlarge it for better scalability when page allocation
> on arm64, not test on v5, will try to enlarge it on v7.
Yes interesting! I'm hoping to post v7 this week - just waiting for mm-unstable
to be rebased on v6.7-rc1. I'd be interested to see your results.
>
>>
>> Thanks,
>> Ryan
>>
>>
>>
next prev parent reply other threads:[~2023-11-13 12:12 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-29 11:44 [PATCH v6 0/9] variable-order, large folios for anonymous memory Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 1/9] mm: Allow deferred splitting of arbitrary anon large folios Ryan Roberts
2023-10-05 8:19 ` David Hildenbrand
2023-09-29 11:44 ` [PATCH v6 2/9] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-09-29 13:45 ` Kirill A. Shutemov
2023-09-29 14:39 ` Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 3/9] mm: thp: Account pte-mapped anonymous THP usage Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 4/9] mm: thp: Introduce anon_orders and anon_always_mask sysfs files Ryan Roberts
2023-09-29 22:55 ` Andrew Morton
2023-10-02 10:15 ` Ryan Roberts
2023-10-07 22:54 ` Michael Ellerman
2023-10-10 0:20 ` Andrew Morton
2023-10-12 9:31 ` David Hildenbrand
2023-10-12 11:07 ` Michael Ellerman
2023-10-11 6:02 ` kernel test robot
2023-09-29 11:44 ` [PATCH v6 5/9] mm: thp: Extend THP to allocate anonymous large folios Ryan Roberts
[not found] ` <CGME20231005120507eucas1p13f50fa99f52808818840ee7db194e12e@eucas1p1.samsung.com>
2023-10-05 12:05 ` Daniel Gomez
2023-10-05 12:49 ` Ryan Roberts
2023-10-05 14:59 ` Daniel Gomez
2023-10-27 23:04 ` John Hubbard
2023-10-30 11:43 ` Ryan Roberts
2023-10-30 23:25 ` John Hubbard
2023-11-01 13:56 ` Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 6/9] mm: thp: Add "recommend" option for anon_orders Ryan Roberts
2023-10-06 20:08 ` David Hildenbrand
2023-10-06 22:28 ` Yu Zhao
2023-10-09 11:45 ` Ryan Roberts
2023-10-09 14:43 ` David Hildenbrand
2023-10-09 20:04 ` Yu Zhao
2023-10-10 10:16 ` Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 7/9] arm64/mm: Override arch_wants_pte_order() Ryan Roberts
2023-10-02 15:21 ` Catalin Marinas
2023-10-03 7:32 ` Ryan Roberts
2023-10-03 12:05 ` Catalin Marinas
2023-09-29 11:44 ` [PATCH v6 8/9] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts
2023-09-29 11:44 ` [PATCH v6 9/9] selftests/mm/cow: Add tests for small-order anon THP Ryan Roberts
2023-10-06 20:06 ` [PATCH v6 0/9] variable-order, large folios for anonymous memory David Hildenbrand
2023-10-09 11:28 ` Ryan Roberts
2023-10-09 16:22 ` David Hildenbrand
2023-10-10 10:47 ` Ryan Roberts
2023-10-13 20:14 ` David Hildenbrand
2023-10-20 12:33 ` Ryan Roberts
2023-10-25 16:24 ` Ryan Roberts
2023-10-25 18:47 ` David Hildenbrand
2023-10-25 19:11 ` Yu Zhao
2023-10-26 9:53 ` Ryan Roberts
2023-10-26 15:19 ` David Hildenbrand
2023-10-25 19:10 ` John Hubbard
2023-10-31 11:50 ` Ryan Roberts
2023-10-31 11:55 ` Ryan Roberts
2023-10-31 12:03 ` David Hildenbrand
2023-10-31 13:13 ` Ryan Roberts
2023-10-31 18:29 ` Yang Shi
2023-11-01 14:02 ` Ryan Roberts
2023-11-01 18:11 ` Yang Shi
2023-10-31 11:58 ` David Hildenbrand
2023-10-31 13:12 ` Ryan Roberts
2023-11-13 3:57 ` John Hubbard
2023-11-13 5:18 ` Matthew Wilcox
2023-11-13 10:19 ` Ryan Roberts
2023-11-13 11:52 ` Kefeng Wang
2023-11-13 12:12 ` Ryan Roberts [this message]
2023-11-13 14:52 ` Kefeng Wang
2023-11-13 14:52 ` John Hubbard
2023-11-13 15:04 ` Matthew Wilcox
2023-11-14 10:57 ` Ryan Roberts
2023-12-05 16:05 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f034dd2c-4ce1-47e5-a3a6-c3c1fcab5c4b@arm.com \
--to=ryan.roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=itaru.kitayama@gmail.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).