public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Shivank Garg <shivankg@amd.com>
To: David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	shaggy@kernel.org, wangkefeng.wang@huawei.com,
	jane.chu@oracle.com, ziy@nvidia.com, donettom@linux.ibm.com,
	apopple@nvidia.com, jfs-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com
Subject: Re: [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function
Date: Tue, 29 Apr 2025 16:27:04 +0530	[thread overview]
Message-ID: <57536c5a-23dd-4f14-af35-9c5523000e80@amd.com> (raw)
In-Reply-To: <604a1db2-bd64-455e-9cf7-968cacef0122@redhat.com>



On 4/25/2025 1:17 PM, David Hildenbrand wrote:
> On 24.04.25 13:57, Shivank Garg wrote:
>> Hi All,
>>
>> Thank you for reviewing my patch and providing feedback.
>>
>> On 4/24/2025 8:49 AM, Matthew Wilcox wrote:
>>> On Wed, Apr 23, 2025 at 09:25:05AM +0200, David Hildenbrand wrote:
>>>> On 23.04.25 09:22, David Hildenbrand wrote:
>>>>> On 23.04.25 02:36, Matthew Wilcox wrote:
>>>>>> On Tue, Apr 22, 2025 at 04:41:11PM -0700, Andrew Morton wrote:
>>>>>>>> +/**
>>>>>>>> + * folio_migrate_expected_refs - Count expected references for an unmapped folio.
>>>>>>>
>>>>>>> "folio_migration_expected_refs"
>>
>> Thank you for catching this, I'll fix it.
>>
>> I wasn't previously aware of using make W=1 to build kernel-docs and
>> check for warnings - this is very useful information for me.
>>
>> I'll add to changelog to better explain why this is needed for JFS.
>>
>>>>>>
>>>>>> What I do wonder is whether we want to have such a specialised
>>>>>> function existing.  We have can_split_folio() in huge_memory.c
>>>>>> which is somewhat more comprehensive and doesn't require the folio to be
>>>>>> unmapped first.
>>>>>
>>>>> I was debating with myself whether we should do the usual "refs from
>>>>> ->private, refs from page table mappings" .. dance, and look up the
>>>>> mapping from the folio instead of passing it in.
>>>>>
>>>>> I concluded that for this (migration) purpose the function is good
>>>>> enough as it is: if abused in wrong context (e.g., still ->private,
>>>>> still page table mappings), it would not fake that there are no
>>>>> unexpected references.
>>>>
>>>> Sorry, I forgot that we still care about the reference from ->private here.
>>>> We expect the folio to be unmapped.
>>>
>>> Right, so just adding in folio_mapcount() will be a no-op for migration,
>>> but enable its reuse by can_split_folio().  Maybe.  Anyway, the way I
>>> explain page refocunts to people (and I need to put this in a document
>>> somewhere):
>>>
>>> There are three types of contribution to the refcount:
>>>
>>>   - Expected.  These are deducible from the folio itself, and they're all
>>>     findable.  You need to figure out what the expected number of
>>>     references are to a folio if you're going to try to freeze it.
>>>     These can be references from the mapcount, the page cache, the swap
>>>     cache, the private data, your call chain.
>>>   - Temporary.  Someone else has found the folio somehow; perhaps through
>>>     the page cache, or by calling GUP or something.  They mean you can't
>>>     freeze the folio because you don't know who has the reference or how
>>>     long they might hold it for.
>>>   - Spurious.  This is like a temporary reference, but worse because if
>>>     you read the code, there should be no way for there to be any temporary
>>>     references to the folio.  Someone's found a stale pointer to this
>>>     folio and has bumped the reference count while they check that the
>>>     folio they have is the one they expected to find.  They're going
>>>     to find out that the pointer they followed is stale and put their
>>>     refcount soon, but in the meantime you still can't freeze the folio.
>>>
>>> So I don't love the idea of having a function with the word "expected"
>>> in the name that returns a value which doesn't take into account all
>>> the potential contributors to the expected value.  And sure we can keep
>>> adding qualifiers to the function name to indicate how it is to be used,
>>> but at some point I think we should say "It's OK for this to be a little
>>> less efficient so we can understand what it means".
>>
>> Thank you, Willy, for the detailed explanation about page reference counting.
>> This has helped me understand the concept much better.
>>
>> Based on your explanation and the discussion, I'm summarizing the 2 approaches:
>>
>> 1. Rename folio_migration_expected_refs to folio_migration_expected_base_refs, to
>> to clarify it does not account for other potential contributors.
>> or folio_unmapped_base_refs?
>> 2. Accounting all possible contributors to expected refs:
>> folio_expected_refs(mapping, folio)
>> {   
>>     int refs = 1;
>>
>>     if (mapping) {
>>         if (folio_test_anon(folio))
>>             refs += folio_test_swapcache(folio) ?
>>                 folio_nr_pages(folio) : 0;
>>         else
>>             refs += folio_nr_pages(folio);
>>
>>         if (folio_test_private(folio))
>>             refs++;
>>     }
>>     refs += folio_mapcount(folio); // takes mapped folio into account and evaluate as no-op for unmapped folios during migration
>>     return refs;
>> }
>>
>> Please let me know if this approach is acceptable or if you have
>> other suggestions for improvement.
> 
> A couple of points:
> 
> 1) Can we name it folio_expected_ref_count()
> 
> 2) Can we avoid passing in the mapping? Might not be expensive to look it
>    up again. Below I avoid calling folio_mapping().
> 
> 3) Can we delegate adding the additional reference to the caller? Will make it
>    easier to use elsewhere (e.g., not additional reference because we are holding
>    the page table lock).
> 
> 4) Can we add kerneldoc, and in particular document the semantics?
> 
> Not sure if we should inline this function or put it into mm/utils.c
> 

Hi David,

Thank you for the detailed suggestions. They all make sense to me.

I did not understand a few changes in your patch below:
> 
> I'm thinking of something like (completely untested):
> 
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a205020e2a58b..a0ad4ed9a75ff 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2112,6 +2112,61 @@ static inline bool folio_maybe_mapped_shared(struct folio *folio)
>      return folio_test_large_maybe_mapped_shared(folio);
>  }
>  
> +/**
> + * folio_expected_ref_count - calculate the expected folio refcount
> + * @folio: the folio
> + *
> + * Calculate the expected folio refcount, taking references from the pagecache,
> + * swapcache, PG_private and page table mappings into account. Useful in
> + * combination with folio_ref_count() to detect unexpected references (e.g.,
> + * GUP or other temporary references).
> + *
> + * Does currently not consider references from the LRU cache. If the folio
> + * was isolated from the LRU (which is the case during migration or split),
> + * the folio was already isolated from the LRU and the LRU cache does not apply.
> + *
> + * Calling this function on an unmapped folio -- !folio_mapped() -- that is
> + * locked will return a stable result.
> + *
> + * Calling this function on a mapped folio will not result in a stable result,
> + * because nothing stops additional page table mappings from coming (e.g.,
> + * fork()) or going (e.g., munmap()).
> + *
> + * Calling this function without the folio lock will also not result in a
> + * stable result: for example, the folio might get dropped from the swapcache
> + * concurrently.
> + *
> + * However, even when called without the folio lock or on a mapped folio,
> + * this function can be used to detect unexpected references early (for example.
> + * if it makes sense to even lock the folio and unmap it).
> + *
> + * The caller must add any reference (e.g., from folio_try_get()) it might be
> + * holding itself to the result.
> + *
> + * Returns the expected folio refcount.
> + */
> +static inline int folio_expected_ref_count(const struct folio *folio)
> +{
> +    const int order = folio_order(folio);
> +    int ref_count = 0;

Why are we not taking base ref_count as 1 like it's done in original folio_expected_refs
implementation?

> +
> +    if (WARN_ON_ONCE(folio_test_slab(folio)))
> +        return 0;
> +
> +    if (folio_test_anon(folio)) {
> +        /* One reference per page from the swapcache. */
> +        ref_count += folio_test_swapcache(folio) << order;

why not use folio_nr_pages() here instead 1 << order?
something like folio_test_swapcache(folio) * folio_nr_pages(folio).

> +    } else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) {
> +        /* One reference per page from the pagecache. */
> +        ref_count += !!folio->mapping << order;
> +        /* One reference from PG_private. */
> +        ref_count += folio_test_private(folio);
> +    }
> +
> +    /* One reference per page table mapping. */
> +    return ref_count + folio_mapcount(folio);;

> +}
> +
>  #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
>  static inline int arch_make_folio_accessible(struct folio *folio)
>  {

I tested your patch with stress-ng and my move-pages test code. I did not see
any bugs/errors.

Thanks,
Shivank





  reply	other threads:[~2025-04-29 10:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 11:40 [PATCH V4 0/2] JFS: Implement migrate_folio for jfs_metapage_aops Shivank Garg
2025-04-22 11:40 ` [PATCH V4 1/2] mm: add folio_migration_expected_refs() as inline function Shivank Garg
2025-04-22 15:18   ` David Hildenbrand
2025-04-22 15:22   ` Zi Yan
2025-04-22 23:41   ` Andrew Morton
2025-04-23  0:36     ` Matthew Wilcox
2025-04-23  7:22       ` David Hildenbrand
2025-04-23  7:25         ` David Hildenbrand
2025-04-24  3:19           ` Matthew Wilcox
2025-04-24 11:57             ` Shivank Garg
2025-04-25  7:47               ` David Hildenbrand
2025-04-29 10:57                 ` Shivank Garg [this message]
2025-04-29 11:31                   ` David Hildenbrand
2025-04-22 11:40 ` [PATCH V4 2/2] jfs: implement migrate_folio for jfs_metapage_aops Shivank Garg
2025-04-22 15:23   ` David Hildenbrand
2025-04-22 13:59 ` [syzbot] [mm?] WARNING in move_to_new_folio syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57536c5a-23dd-4f14-af35-9c5523000e80@amd.com \
    --to=shivankg@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=donettom@linux.ibm.com \
    --cc=jane.chu@oracle.com \
    --cc=jfs-discussion@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shaggy@kernel.org \
    --cc=syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox