From: David Hildenbrand <david@redhat.com>
To: Hugh Dickins <hughd@google.com>, Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Will Deacon <will@kernel.org>, Shivank Garg <shivankg@amd.com>,
Christoph Hellwig <hch@infradead.org>,
Keir Fraser <keirf@google.com>, Jason Gunthorpe <jgg@ziepe.ca>,
John Hubbard <jhubbard@nvidia.com>,
Frederick Mayle <fmayle@google.com>, Peter Xu <peterx@redhat.com>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>,
Alexander Krabler <Alexander.Krabler@kuka.com>,
Ge Yang <yangge1116@126.com>, Li Zhe <lizhe.67@bytedance.com>,
Chris Li <chrisl@kernel.org>, Yu Zhao <yuzhao@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Konstantin Khlebnikov <koct9i@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 1/7] mm: fix folio_expected_ref_count() when PG_private_2
Date: Mon, 1 Sep 2025 09:52:16 +0200 [thread overview]
Message-ID: <92def216-ca9c-402d-8643-226592ca1a85@redhat.com> (raw)
In-Reply-To: <52da6c6a-e568-38bd-775b-eff74f87215b@google.com>
On 01.09.25 03:17, Hugh Dickins wrote:
> On Mon, 1 Sep 2025, Matthew Wilcox wrote:
>> On Sun, Aug 31, 2025 at 02:01:16AM -0700, Hugh Dickins wrote:
>>> 6.16's folio_expected_ref_count() is forgetting the PG_private_2 flag,
>>> which (like PG_private, but not in addition to PG_private) counts for
>>> 1 more reference: it needs to be using folio_has_private() in place of
>>> folio_test_private().
>>
>> No, it doesn't. I know it used to, but no filesystem was actually doing
>> that. So I changed mm to match how filesystems actually worked.
>> I'm not sure if there's still documentation lying around that gets
>> this wrong or if you're remembering how things used to be documented,
>> but it's never how any filesystem has ever worked.
>>
>> We're achingly close to getting rid of PG_private_2. I think it's just
>> ceph and nfs that still use it.
>
> I knew you were trying to get rid of it (hurrah! thank you), so when I
> tried porting my lru_add_drainage to 6.12 I was careful to check whether
> folio_expected_ref_count() would need to add it to the accounting there:
> apparently yes; but then I was surprised to find that it's still present
> in 6.17-rc, I'd assumed it gone long ago.
>
> I didn't try to read the filesystems (which could easily have been
> inconsistent about it) to understand: what convinced me amidst all
> the confusion was this comment and code in mm/filemap.c:
>
> /**
> * folio_end_private_2 - Clear PG_private_2 and wake any waiters.
> * @folio: The folio.
> *
> * Clear the PG_private_2 bit on a folio and wake up any sleepers waiting for
> * it. The folio reference held for PG_private_2 being set is released.
> *
> * This is, for example, used when a netfs folio is being written to a local
> * disk cache, thereby allowing writes to the cache for the same folio to be
> * serialised.
> */
> void folio_end_private_2(struct folio *folio)
> {
> VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio);
> clear_bit_unlock(PG_private_2, folio_flags(folio, 0));
> folio_wake_bit(folio, PG_private_2);
> folio_put(folio);
> }
> EXPORT_SYMBOL(folio_end_private_2);
>
> That seems to be clear that PG_private_2 is matched by a folio reference,
> but perhaps you can explain it away - worth changing the comment if so.
>
> I was also anxious to work out whether PG_private with PG_private_2
> would mean +1 or +2: I don't think I found any decisive statement,
> but traditional use of page_has_private() implied +1; and I expect
> there's no filesystem which actually could have both on the same folio.
I think it's "+1", like we used to have.
I was seriously confused when discovering (iow, concerned about false
positives):
PG_fscache = PG_private_2,
But in the end PG_fscache is only used in comments and e.g.,
__fscache_clear_page_bits() calls folio_end_private_2(). So both are
really just aliases.
[Either PG_fscache should be dropped and referred to as PG_private_2, or
PG_private_2 should be dropped and PG_fscache used instead. It's even
inconsistently used in that fscache. file.
Or both should be dropped, of course, once we can actually get rid of it
...]
So PG_private_2 should not be used for any other purpose.
folio_start_private_2() / folio_end_private_2() indeed pair the flag
with a reference. There are no other callers that would set/clear the
flag without involving a reference.
The usage of private_2 is declared deprecated all over the place. So the
question is if we really still care.
The ceph usage is guarded by CONFIG_CEPH_FSCACHE, the NFS one by
NFS_FSCACHE, nothing really seems to prevent it from getting configured
in easily.
Now, one problem would be if migration / splitting / ... code where we
use folio_expected_ref_count() cannot deal with that additional
reference properly, in which case this patch would indeed cause harm.
If all folio_expected_ref_count() callers can deal with updating that
reference, all good.
nfs_migrate_folio(), for example, has folio_test_private_2() handling in
there (just wait until it is gone). ceph handles it during
ceph_writepages_start(), but uses ordinary filemap_migrate_folio.
Long story short: this patch is problematic if one
folio_expected_ref_count() users is not aware of how to handle that
additional reference.
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-09-01 7:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-31 8:57 [PATCH 0/7] mm: better GUP pin lru_add_drain_all() Hugh Dickins
2025-08-31 9:01 ` [PATCH 1/7] mm: fix folio_expected_ref_count() when PG_private_2 Hugh Dickins
2025-08-31 23:37 ` Matthew Wilcox
2025-09-01 1:17 ` Hugh Dickins
2025-09-01 7:52 ` David Hildenbrand [this message]
2025-09-01 8:04 ` David Hildenbrand
2025-08-31 9:05 ` [PATCH 2/7] mm/gup: check ref_count instead of lru before migration Hugh Dickins
2025-09-01 8:00 ` David Hildenbrand
2025-08-31 9:08 ` [PATCH 3/7] mm/gup: local lru_add_drain() to avoid lru_add_drain_all() Hugh Dickins
2025-09-01 8:05 ` David Hildenbrand
2025-08-31 9:11 ` [PATCH 4/7] mm: Revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" Hugh Dickins
2025-09-01 8:06 ` David Hildenbrand
2025-08-31 9:13 ` [PATCH 5/7] mm: Revert "mm: vmscan.c: fix OOM on swap stress test" Hugh Dickins
2025-09-01 8:07 ` David Hildenbrand
2025-08-31 9:16 ` [PATCH 6/7] mm: folio_may_be_cached() unless folio_test_large() Hugh Dickins
2025-09-01 8:13 ` David Hildenbrand
2025-08-31 9:18 ` [PATCH 7/7] mm: lru_add_drain_all() do local lru_add_drain() first Hugh Dickins
2025-09-01 8:14 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=92def216-ca9c-402d-8643-226592ca1a85@redhat.com \
--to=david@redhat.com \
--cc=Alexander.Krabler@kuka.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=chrisl@kernel.org \
--cc=fmayle@google.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=keirf@google.com \
--cc=koct9i@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhe.67@bytedance.com \
--cc=peterx@redhat.com \
--cc=shivankg@amd.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yangge1116@126.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).