From: David Hildenbrand <david@redhat.com>
To: Usama Arif <usamaarif642@gmail.com>,
akpm@linux-foundation.org, linux-mm@kvack.org
Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev,
roman.gushchin@linux.dev, yuzhao@google.com, npache@redhat.com,
baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org,
willy@infradead.org, cerasuolodomenico@gmail.com,
ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios
Date: Thu, 12 Dec 2024 11:49:27 +0100 [thread overview]
Message-ID: <b328da08-595f-4044-ad32-5ed9660a4d62@redhat.com> (raw)
In-Reply-To: <9d602b3f-878f-4f92-aade-f7fd7c1a626a@gmail.com>
On 12.12.24 11:30, Usama Arif wrote:
>
>
> On 11/12/2024 18:03, David Hildenbrand wrote:
>> On 30.08.24 12:03, Usama Arif wrote:
>>> Currently folio->_deferred_list is used to keep track of
>>> partially_mapped folios that are going to be split under memory
>>> pressure. In the next patch, all THPs that are faulted in and collapsed
>>> by khugepaged are also going to be tracked using _deferred_list.
>>>
>>> This patch introduces a pageflag to be able to distinguish between
>>> partially mapped folios and others in the deferred_list at split time in
>>> deferred_split_scan. Its needed as __folio_remove_rmap decrements
>>> _mapcount, _large_mapcount and _entire_mapcount, hence it won't be
>>> possible to distinguish between partially mapped folios and others in
>>> deferred_split_scan.
>>>
>>> Eventhough it introduces an extra flag to track if the folio is
>>> partially mapped, there is no functional change intended with this
>>> patch and the flag is not useful in this patch itself, it will
>>> become useful in the next patch when _deferred_list has non partially
>>> mapped folios.
>>>
>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>> ---
>>> include/linux/huge_mm.h | 4 ++--
>>> include/linux/page-flags.h | 13 +++++++++++-
>>> mm/huge_memory.c | 41 ++++++++++++++++++++++++++++----------
>>> mm/memcontrol.c | 3 ++-
>>> mm/migrate.c | 3 ++-
>>> mm/page_alloc.c | 5 +++--
>>> mm/rmap.c | 5 +++--
>>> mm/vmscan.c | 3 ++-
>>> 8 files changed, 56 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 4da102b74a8c..0b0539f4ee1a 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -333,7 +333,7 @@ static inline int split_huge_page(struct page *page)
>>> {
>>> return split_huge_page_to_list_to_order(page, NULL, 0);
>>> }
>>> -void deferred_split_folio(struct folio *folio);
>>> +void deferred_split_folio(struct folio *folio, bool partially_mapped);
>>> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>> unsigned long address, bool freeze, struct folio *folio);
>>> @@ -502,7 +502,7 @@ static inline int split_huge_page(struct page *page)
>>> {
>>> return 0;
>>> }
>>> -static inline void deferred_split_folio(struct folio *folio) {}
>>> +static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
>>> #define split_huge_pmd(__vma, __pmd, __address) \
>>> do { } while (0)
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index 2175ebceb41c..1b3a76710487 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -186,6 +186,7 @@ enum pageflags {
>>> /* At least one page in this folio has the hwpoison flag set */
>>> PG_has_hwpoisoned = PG_active,
>>> PG_large_rmappable = PG_workingset, /* anon or file-backed */
>>> + PG_partially_mapped = PG_reclaim, /* was identified to be partially mapped */
>>> };
>>> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1)
>>> @@ -859,8 +860,18 @@ static inline void ClearPageCompound(struct page *page)
>>> ClearPageHead(page);
>>> }
>>> FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
>>> +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
>>> +/*
>>> + * PG_partially_mapped is protected by deferred_split split_queue_lock,
>>> + * so its safe to use non-atomic set/clear.
>>
>> Just stumbled over that. In my understanding, this assumption is wrong.
>>
>> I don't think anything prevents other PF_ANY (PG_anon_exclusive, PG_PG_hwpoison) / PF_SECOND (PF_has_hwpoisoned) flags from getting modified concurrently I'm afraid.
>>
> Hi David,
>
> Just to clear my understanding, what you are suggesting could happen in __folio_set/clear_partially_mapped is:
> 1) __folio_set/clear_partially_mapped reads the 2nd page flags (x) where one of the other 2nd page flags is lets say not set.
> 2) One of the other 2nd page flags are set atomically.
> 3) __folio_set/clear_partially_mapped writes x + changes to partially_mapped. However, the change in step 2 to one of the other 2nd page flag is lost.
>
> Is that correct?
That matches my understanding.
But that would mean we shouldn't have any page flags (first or second
page) as non atomic?
Yes. We may only use non-atomic variants as long as we can guarantee
that nobody can concurrently operate on the flags, for example on the
early folio allocation path or on the folio freeing path.
Observe how the other SECOND users are atomic, PG_anon_exclusive is
atomic (except on two page freeing paths) and PF_hwpoison is atomic.
> although it would depend if they are being
> changed at the same time point. If you encountered a particular instance of PG_anon_exclusive or PF_has_hwpoisoned being changed at the same point as
> __folio_set/clear_partially_mapped, could you point to it?
Regarding PG_hwpoison, observe how memory_failure() performs the
TestSetPageHWPoison() + folio_set_has_hwpoisoned() before unmapping the
pages, without any locking. This can race with pretty much any operation
that triggers unmapping.
With PG_anon_exclusive it's a bit more complicated, but it's probably
sufficient if MADV_DONTNEED (setting deferred) races with concurrent
swapout/mgration (clearing PG_anon_exclusive), whereby both operations
are not performed under the same PT lock. This can happen after partial
mremap(), or after fork() when only parts of the large folio were shared
with the child.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-12-12 10:49 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 10:03 [PATCH v5 0/6] mm: split underused THPs Usama Arif
2024-08-30 10:03 ` [PATCH v5 1/6] mm: free zapped tail pages when splitting isolated thp Usama Arif
2024-09-05 8:46 ` Hugh Dickins
2024-09-05 10:21 ` Usama Arif
2024-09-05 18:05 ` Hugh Dickins
2024-09-05 19:24 ` Usama Arif
2024-08-30 10:03 ` [PATCH v5 2/6] mm: remap unused subpages to shared zeropage " Usama Arif
2024-10-23 16:21 ` Zi Yan
2024-10-23 16:50 ` Usama Arif
2024-10-23 16:55 ` Zi Yan
2024-10-23 16:56 ` Yu Zhao
2025-09-18 8:53 ` Qun-wei Lin (林群崴)
2025-09-18 8:56 ` David Hildenbrand
2025-09-18 11:42 ` Usama Arif
2025-09-18 11:57 ` David Hildenbrand
2025-09-18 12:22 ` Lance Yang
2025-09-18 12:25 ` Lance Yang
2025-09-18 12:35 ` David Hildenbrand
2025-09-19 5:16 ` Lance Yang
2025-09-19 7:55 ` David Hildenbrand
2025-09-19 8:14 ` Lance Yang
2025-09-19 10:53 ` Lance Yang
2025-09-19 12:19 ` Lance Yang
2025-09-19 12:44 ` Lance Yang
2025-09-19 13:09 ` David Hildenbrand
2025-09-19 13:24 ` Lance Yang
2024-08-30 10:03 ` [PATCH v5 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Usama Arif
2024-08-30 10:03 ` [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios Usama Arif
2024-12-11 15:03 ` David Hildenbrand
2024-12-12 10:30 ` Usama Arif
2024-12-12 10:49 ` David Hildenbrand [this message]
2024-08-30 10:03 ` [PATCH v5 5/6] mm: split underused THPs Usama Arif
2024-08-30 10:03 ` [PATCH v5 6/6] mm: add sysfs entry to disable splitting " Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b328da08-595f-4044-ad32-5ed9660a4d62@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=cerasuolodomenico@gmail.com \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).