From: Shivank Garg <shivankg@amd.com>
To: David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
shaggy@kernel.org, wangkefeng.wang@huawei.com,
jane.chu@oracle.com, ziy@nvidia.com, donettom@linux.ibm.com,
apopple@nvidia.com, jfs-discussion@lists.sourceforge.net,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com
Subject: Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
Date: Tue, 29 Apr 2025 16:27:04 +0530 [thread overview]
Message-ID: <57536c5a-23dd-4f14-af35-9c5523000e80@amd.com> (raw)
In-Reply-To: <604a1db2-bd64-455e-9cf7-968cacef0122@redhat.com>
On 4/25/2025 1:17 PM, David Hildenbrand wrote:
> On 24.04.25 13:57, Shivank Garg wrote:
>> Hi All,
>>
>> Thank you for reviewing my patch and providing feedback.
>>
>> On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
>>> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>>>> On 23.04.25 09:22, David Hildenbrand wrote:
>>>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>>>> +/**
>>>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>>>
>>>>>>> "folio_migration_expected_refs"
>>
>> Thank you for catching this, I'll fix it.
>>
>> I wasn't previously aware of using make W=1 to build kernel-docs and
>> check for warnings - this is very useful information for me.
>>
>> I'll add to changelog to better explain why this is needed for JFS.
>>
>>>>>>
>>>>>> What I do wonder is whether we want to have such a specialised
>>>>>> function existing. We have can_split_folio() in huge_memory.c
>>>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>>>> unmapped first.
>>>>>
>>>>> I was debating with myself whether we should do the usual "refs from
>>>>> ->private, refs from page table mappings" .. dance, and look up the
>>>>> mapping from the folio instead of passing it in.
>>>>>
>>>>> I concluded that for this (migration) purpose the function is good
>>>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>>>> still page table mappings), it would not fake that there are no
>>>>> unexpected references.
>>>>
>>>> Sorry, I forgot that we still care about the reference from ->private here.
>>>> We expect the folio to be unmapped.
>>>
>>> Right, so just adding in folio_mapcount() will be a no-op for migration,
>>> but enable its reuse by can_split_folio(). Maybe. Anyway, the way I
>>> explain page refocunts to people (and I need to put this in a document
>>> somewhere):
>>>
>>> There are three types of contribution to the refcount:
>>>
>>> - Expected. These are deducible from the folio itself, and they're all
>>> findable. You need to figure out what the expected number of
>>> references are to a folio if you're going to try to freeze it.
>>> These can be references from the mapcount, the page cache, the swap
>>> cache, the private data, your call chain.
>>> - Temporary. Someone else has found the folio somehow; perhaps through
>>> the page cache, or by calling GUP or something. They mean you can't
>>> freeze the folio because you don't know who has the reference or how
>>> long they might hold it for.
>>> - Spurious. This is like a temporary reference, but worse because if
>>> you read the code, there should be no way for there to be any temporary
>>> references to the folio. Someone's found a stale pointer to this
>>> folio and has bumped the reference count while they check that the
>>> folio they have is the one they expected to find. They're going
>>> to find out that the pointer they followed is stale and put their
>>> refcount soon, but in the meantime you still can't freeze the folio.
>>>
>>> So I don't love the idea of having a function with the word "expected"
>>> in the name that returns a value which doesn't take into account all
>>> the potential contributors to the expected value. And sure we can keep
>>> adding qualifiers to the function name to indicate how it is to be used,
>>> but at some point I think we should say "It's OK for this to be a little
>>> less efficient so we can understand what it means".
>>
>> Thank you, Willy, for the detailed explanation about page reference counting.
>> This has helped me understand the concept much better.
>>
>> Based on your explanation and the discussion, I'm summarizing the 2 approaches:
>>
>> 1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
>> to clarify it does not account for other potential contributors.
>> or folio_unmapped_base_refs?
>> 2. Accounting all possible contributors to expected refs:
>> folio_expected_refs(mapping, folio)
>> {
>> int refs = 1;
>>
>> if (mapping) {
>> if (folio_test_anon(folio))
>> refs += folio_test_swapcache(folio) ?
>> folio_nr_pages(folio) : 0;
>> else
>> refs += folio_nr_pages(folio);
>>
>> if (folio_test_private(folio))
>> refs++;
>> }
>> refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
>> return refs;
>> }
>>
>> Please let me know if this approach is acceptable or if you have
>> other suggestions for improvement.
>
> A couple of points:
>
> 1) Can we name it folio_expected_ref_count()
>
> 2) Can we avoid passing in the mapping? Might not be expensive to look it
> up again. Below I avoid calling folio_mapping().
>
> 3) Can we delegate adding the additional reference to the caller? Will make it
> easier to use elsewhere (e.g., not additional reference because we are holding
> the page table lock).
>
> 4) Can we add kerneldoc, and in particular document the semantics?
>
> Not sure if we should inline this function or put it into mm/utils.c
>
Hi David,
Thank you for the detailed suggestions. They all make sense to me.
I did not understand a few changes in your patch below:
>
> I'm thinking of something like (completely untested):
>
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a205020e2a58b..a0ad4ed9a75ff 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2112,6 +2112,61 @@ static inline bool folio_maybe_mapped_shared(struct folio *folio)
> return folio_test_large_maybe_mapped_shared(folio);
> }
>
> +/**
> + * folio_expected_ref_count - calculate the expected folio refcount
> + * @folio: the folio
> + *
> + * Calculate the expected folio refcount, taking references from the pagecache,
> + * swapcache, PG_private and page table mappings into account. Useful in
> + * combination with folio_ref_count() to detect unexpected references (e.g.,
> + * GUP or other temporary references).
> + *
> + * Does currently not consider references from the LRU cache. If the folio
> + * was isolated from the LRU (which is the case during migration or split),
> + * the folio was already isolated from the LRU and the LRU cache does not apply.
> + *
> + * Calling this function on an unmapped folio -- !folio_mapped() -- that is
> + * locked will return a stable result.
> + *
> + * Calling this function on a mapped folio will not result in a stable result,
> + * because nothing stops additional page table mappings from coming (e.g.,
> + * fork()) or going (e.g., munmap()).
> + *
> + * Calling this function without the folio lock will also not result in a
> + * stable result: for example, the folio might get dropped from the swapcache
> + * concurrently.
> + *
> + * However, even when called without the folio lock or on a mapped folio,
> + * this function can be used to detect unexpected references early (for example.
> + * if it makes sense to even lock the folio and unmap it).
> + *
> + * The caller must add any reference (e.g., from folio_try_get()) it might be
> + * holding itself to the result.
> + *
> + * Returns the expected folio refcount.
> + */
> +static inline int folio_expected_ref_count(const struct folio *folio)
> +{
> + const int order = folio_order(folio);
> + int ref_count = 0;
Why are we not taking base ref_count as 1 like it's done in original folio_expected_refs
implementation?
> +
> + if (WARN_ON_ONCE(folio_test_slab(folio)))
> + return 0;
> +
> + if (folio_test_anon(folio)) {
> + /* One reference per page from the swapcache. */
> + ref_count += folio_test_swapcache(folio) << order;
why not use folio_nr_pages() here instead 1 << order?
something like folio_test_swapcache(folio) * folio_nr_pages(folio).
> + } else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) {
> + /* One reference per page from the pagecache. */
> + ref_count += !!folio->mapping << order;
> + /* One reference from PG_private. */
> + ref_count += folio_test_private(folio);
> + }
> +
> + /* One reference per page table mapping. */
> + return ref_count + folio_mapcount(folio);;
> +}
> +
> #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
> static inline int arch_make_folio_accessible(struct folio *folio)
> {
I tested your patch with stress-ng and my move-pages test code. I did not see
any bugs/errors.
Thanks,
Shivank
next prev parent reply other threads:[~2025-04-29 10:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 11:40 [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops Shivank Garg
2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
2025-04-22 15:18 ` David Hildenbrand
2025-04-22 15:22 ` Zi Yan
2025-04-22 23:41 ` Andrew Morton
2025-04-23 0:36 ` Matthew Wilcox
2025-04-23 7:22 ` David Hildenbrand
2025-04-23 7:25 ` David Hildenbrand
2025-04-24 3:19 ` Matthew Wilcox
2025-04-24 11:57 ` Shivank Garg
2025-04-25 7:47 ` David Hildenbrand
2025-04-29 10:57 ` Shivank Garg [this message]
2025-04-29 11:31 ` David Hildenbrand
2025-04-22 11:40 ` [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops Shivank Garg
2025-04-22 15:23 ` David Hildenbrand
2025-04-22 13:59 ` [syzbot] [mm?] WARNING in move_to_new_folio syzbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57536c5a-23dd-4f14-af35-9c5523000e80@amd.com \
--to=shivankg@amd.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=david@redhat.com \
--cc=donettom@linux.ibm.com \
--cc=jane.chu@oracle.com \
--cc=jfs-discussion@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shaggy@kernel.org \
--cc=syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox