From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>, linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Mike Kravetz <mike.kravetz@oracle.com>,
Hugh Dickins <hughd@google.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Yin Fengwei <fengwei.yin@intel.com>,
Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios
Date: Thu, 10 Aug 2023 13:14:03 +0200 [thread overview]
Message-ID: <ae63d396-b4a4-4579-bfd2-e99a0350bbf0@redhat.com> (raw)
In-Reply-To: <60b5b2a2-1d1d-661c-d61e-855178fff44d@redhat.com>
On 09.08.23 21:17, David Hildenbrand wrote:
> On 09.08.23 21:07, Ryan Roberts wrote:
>> On 09/08/2023 09:32, David Hildenbrand wrote:
>>> Let's track the total mapcount for all large folios in the first subpage.
>>>
>>> The total mapcount is what we actually want to know in folio_mapcount()
>>> and it is also sufficient for implementing folio_mapped(). This also
>>> gets rid of any "raceiness" concerns as expressed in
>>> folio_total_mapcount().
>>>
>>> With sub-PMD THP becoming more important and things looking promising
>>> that we will soon get support for such anon THP, we want to avoid looping
>>> over all pages of a folio just to calculate the total mapcount. Further,
>>> we may soon want to use the total mapcount in other context more
>>> frequently, so prepare for reading it efficiently and atomically.
>>>
>>> Make room for the total mapcount in page[1] of the folio by moving
>>> _nr_pages_mapped to page[2] of the folio: it is not applicable to hugetlb
>>> -- and with the total mapcount in place probably also not desirable even
>>> if PMD-mappable hugetlb pages could get PTE-mapped at some point -- so we
>>> can overlay the hugetlb fields.
>>>
>>> Note that we currently don't expect any order-1 compound pages / THP in
>>> rmap code, and that such support is not planned. If ever desired, we could
>>> move the compound mapcount to another page, because it only applies to
>>> PMD-mappable folios that are definitely larger. Let's avoid consuming
>>> more space elsewhere for now -- we might need more space for a different
>>> purpose in some subpages soon.
>>>
>>> Maintain the total mapcount also for hugetlb pages. Use the total mapcount
>>> to implement folio_mapcount(), total_mapcount(), folio_mapped() and
>>> page_mapped().
>>>
>>> We can now get rid of folio_total_mapcount() and
>>> folio_large_is_mapped(), by just inlining reading of the total mapcount.
>>>
>>> _nr_pages_mapped is now only used in rmap code, so not accidentially
>>> externally where it might be used on arbitrary order-1 pages. The remaining
>>> usage is:
>>>
>>> (1) Detect how to adjust stats: NR_ANON_MAPPED and NR_FILE_MAPPED
>>> -> If we would account the total folio as mapped when mapping a
>>> page (based on the total mapcount), we could remove that usage.
>>>
>>> (2) Detect when to add a folio to the deferred split queue
>>> -> If we would apply a different heuristic, or scan using the rmap on
>>> the memory reclaim path for partially mapped anon folios to
>>> split them, we could remove that usage as well.
>>>
>>> So maybe, we can simplify things in the future and remove
>>> _nr_pages_mapped. For now, leave these things as they are, they need more
>>> thought. Hugh really did a nice job implementing that precise tracking
>>> after all.
>>>
>>> Note: Not adding order-1 sanity checks to the file_rmap functions for
>>> now. For anon pages, they are certainly not required right now.
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>> Cc: Hugh Dickins <hughd@google.com>
>>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> Other than the nits and query on zeroing _total_mapcount below, LGTM. If zeroing
>> is correct:
>>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>
> Thanks for the review!
>
> [...]
>
>>>
>>> static inline int total_mapcount(struct page *page)
>>
>> nit: couldn't total_mapcount() just be implemented as a wrapper around
>> folio_mapcount()? What's the benefit of PageCompound() check instead of just
>> getting the folio and checking if it's large? i.e:
>
> Good point, let me take a look tomorrow if the compiler can optimize in
> both cases equally well.
I checked by adjusting total_mapcount():
Before:
if (PageTransHuge(page) && total_mapcount(page) > 1)
ffffffff81411931: 4c 89 e7 mov %r12,%rdi
ffffffff81411934: e8 f7 b1 ff ff call ffffffff8140cb30 <PageTransHuge>
ffffffff81411939: 85 c0 test %eax,%eax
ffffffff8141193b: 74 29 je ffffffff81411966 <migrate_misplaced_page+0x166>
ffffffff8141193d: 49 8b 04 24 mov (%r12),%rax
return test_bit(PG_head, &page->flags) ||
ffffffff81411941: a9 00 00 01 00 test $0x10000,%eax
ffffffff81411946: 0f 85 1f 01 00 00 jne ffffffff81411a6b <migrate_misplaced_page+0x26b>
READ_ONCE(page->compound_head) & 1;
ffffffff8141194c: 49 8b 44 24 08 mov 0x8(%r12),%rax
return test_bit(PG_head, &page->flags) ||
ffffffff81411951: a8 01 test $0x1,%al
ffffffff81411953: 0f 85 12 01 00 00 jne ffffffff81411a6b <migrate_misplaced_page+0x26b>
ffffffff81411959: 41 8b 44 24 30 mov 0x30(%r12),%eax
return atomic_read(&page->_mapcount) + 1;
ffffffff8141195e: 83 c0 01 add $0x1,%eax
ffffffff81411961: 83 f8 01 cmp $0x1,%eax
ffffffff81411964: 7f 77 jg ffffffff814119dd <migrate_misplaced_page+0x1dd>
So a total of 10 instructions after handling the mov/call/test/je for PageTransHuge().
After:
if (PageTransHuge(page) && total_mapcount(page) > 1)
ffffffff81411931: 4c 89 e7 mov %r12,%rdi
ffffffff81411934: e8 f7 b1 ff ff call ffffffff8140cb30 <PageTransHuge>
ffffffff81411939: 85 c0 test %eax,%eax
ffffffff8141193b: 74 2f je ffffffff8141196c <migrate_misplaced_page+0x16c>
unsigned long head = READ_ONCE(page->compound_head);
ffffffff8141193d: 49 8b 44 24 08 mov 0x8(%r12),%rax
if (unlikely(head & 1))
ffffffff81411942: a8 01 test $0x1,%al
ffffffff81411944: 0f 85 fc 05 00 00 jne ffffffff81411f46 <migrate_misplaced_page+0x746>
ffffffff8141194a: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
return page;
ffffffff8141194f: 4c 89 e0 mov %r12,%rax
ffffffff81411952: 48 8b 10 mov (%rax),%rdx
if (likely(!folio_test_large(folio)))
ffffffff81411955: f7 c2 00 00 01 00 test $0x10000,%edx
ffffffff8141195b: 0f 85 da 05 00 00 jne ffffffff81411f3b <migrate_misplaced_page+0x73b>
ffffffff81411961: 8b 40 30 mov 0x30(%rax),%eax
return atomic_read(&folio->_mapcount) + 1;
ffffffff81411964: 83 c0 01 add $0x1,%eax
ffffffff81411967: 83 f8 01 cmp $0x1,%eax
ffffffff8141196a: 7f 77 jg ffffffff814119e3 <migrate_misplaced_page+0x1e3>
So a total of 11 (+ 1 NOP) instructions after handling the mov/call/test/je for PageTransHuge().
Essentially one more MOV instruction.
I guess nobody really cares :)
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-08-10 11:16 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-09 8:32 [PATCH mm-unstable v1] mm: add a total mapcount for large folios David Hildenbrand
2023-08-09 15:45 ` Zi Yan
2023-08-09 19:07 ` Ryan Roberts
2023-08-09 19:17 ` David Hildenbrand
2023-08-10 10:40 ` Ryan Roberts
2023-08-10 11:14 ` David Hildenbrand [this message]
2023-08-10 11:27 ` David Hildenbrand
2023-08-10 11:32 ` David Hildenbrand
2023-08-10 11:35 ` Ryan Roberts
2023-08-09 19:21 ` Matthew Wilcox
2023-08-09 19:26 ` David Hildenbrand
2023-08-10 3:14 ` Yin Fengwei
2023-08-09 21:23 ` Peter Xu
2023-08-10 3:25 ` Matthew Wilcox
2023-08-10 8:37 ` David Hildenbrand
2023-08-10 21:48 ` Peter Xu
2023-08-10 21:54 ` Matthew Wilcox
2023-08-10 21:59 ` David Hildenbrand
2023-08-11 15:03 ` Peter Xu
2023-08-11 15:14 ` Zi Yan
2023-08-11 15:17 ` David Hildenbrand
2023-08-10 8:59 ` David Hildenbrand
2023-08-10 10:48 ` Ryan Roberts
2023-08-10 17:15 ` Peter Xu
2023-08-10 17:47 ` David Hildenbrand
2023-08-10 19:02 ` Ryan Roberts
2023-08-10 20:57 ` Peter Xu
2023-08-10 21:48 ` Matthew Wilcox
2023-08-10 22:27 ` David Hildenbrand
2023-08-11 15:18 ` Peter Xu
2023-08-11 15:32 ` David Hildenbrand
2023-08-11 15:58 ` Peter Xu
2023-08-11 16:08 ` David Hildenbrand
2023-08-11 16:11 ` Zi Yan
2023-08-11 22:18 ` Peter Xu
2023-08-10 22:16 ` David Hildenbrand
2023-08-10 3:24 ` Yin Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae63d396-b4a4-4579-bfd2-e99a0350bbf0@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.