* [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping
[not found] <20260322190319.85301-1-abhishek_sts8.ref@yahoo.com>
@ 2026-03-22 19:03 ` Abhishek Kumar
2026-03-23 3:57 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: Abhishek Kumar @ 2026-03-22 19:03 UTC (permalink / raw)
To: Matthew Wilcox, Andrew Morton
Cc: linux-mm, linux-kernel, linux-fsdevel,
syzbot+606f94dfeaaa45124c90, Abhishek Kumar
KCSAN reports a data race between page_cache_delete() and
folio_mapping():
page_cache_delete() performs a plain store to folio->mapping:
folio->mapping = NULL;
folio_mapping() performs a plain load from folio->mapping:
mapping = folio->mapping;
page_cache_delete() is called from the truncation path under the i_pages
xarray lock, while folio_mapping() is called from the reclaim path
(evict_folios -> folio_evictable -> folio_mapping) under only
rcu_read_lock() without the xarray lock.
The race is benign since the reclaim path tolerates stale values --
reading a stale non-NULL mapping simply results in a suboptimal eviction
decision. However, the plain accesses risk store/load tearing and allow
the compiler to perform harmful optimizations (merging, elision, or
fission of the accesses).
Fix this by using WRITE_ONCE() in page_cache_delete() and READ_ONCE()
in folio_mapping() to prevent compiler misbehavior and silence the KCSAN
report.
Fixes: 2f52578f9c64 ("mm/util: Add folio_mapping() and folio_file_mapping()")
Reported-by: syzbot+606f94dfeaaa45124c90@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=606f94dfeaaa45124c90
Signed-off-by: Abhishek Kumar <abhishek_sts8@yahoo.com>
---
mm/filemap.c | 2 +-
mm/util.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 406cef06b684..bba669bd114f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -142,7 +142,7 @@ static void page_cache_delete(struct address_space *mapping,
xas_store(&xas, shadow);
xas_init_marks(&xas);
- folio->mapping = NULL;
+ WRITE_ONCE(folio->mapping, NULL);
/* Leave folio->index set: truncation lookup relies upon it */
mapping->nrpages -= nr;
}
diff --git a/mm/util.c b/mm/util.c
index b05ab6f97e11..5ecb19ddf026 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -700,7 +700,7 @@ struct address_space *folio_mapping(const struct folio *folio)
if (unlikely(folio_test_swapcache(folio)))
return swap_address_space(folio->swap);
- mapping = folio->mapping;
+ mapping = READ_ONCE(folio->mapping);
if ((unsigned long)mapping & FOLIO_MAPPING_FLAGS)
return NULL;
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping
2026-03-22 19:03 ` [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping Abhishek Kumar
@ 2026-03-23 3:57 ` Matthew Wilcox
2026-03-23 10:47 ` Pedro Falcato
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2026-03-23 3:57 UTC (permalink / raw)
To: Abhishek Kumar
Cc: Andrew Morton, linux-mm, linux-kernel, linux-fsdevel,
syzbot+606f94dfeaaa45124c90, Axel Rasmussen, Yuanchu Xie, Wei Xu
On Mon, Mar 23, 2026 at 12:33:19AM +0530, Abhishek Kumar wrote:
> KCSAN reports a data race between page_cache_delete() and
> folio_mapping():
>
> page_cache_delete() performs a plain store to folio->mapping:
> folio->mapping = NULL;
>
> folio_mapping() performs a plain load from folio->mapping:
> mapping = folio->mapping;
>
> page_cache_delete() is called from the truncation path under the i_pages
> xarray lock,
That's not relevant. The important lock for maintaining folio->mapping
is the folio lock (see the VM_BUG_ON_FOLIO line in page_cache_delete()).
At a minimum, this changelog needs to be fixed because there's already
too much confusion around the locking rules.
> while folio_mapping() is called from the reclaim path
> (evict_folios -> folio_evictable -> folio_mapping) under only
> rcu_read_lock() without the xarray lock.
Umm. First up, this is MGLRU-only code, right? Adding the so-called
maintainers.
Second ... I'm really unsure how we want to handle this generally.
This could be quite the game of whack-a-mole; we have many, many places
in the kernel which dereference folio->mapping without holding a lock.
Perhaps they are all fine; but 12 of the 455 references to
folio->mapping currently have READ_ONCE attached. That's a lot of code
to audit.
> The race is benign since the reclaim path tolerates stale values --
> reading a stale non-NULL mapping simply results in a suboptimal eviction
> decision. However, the plain accesses risk store/load tearing and allow
> the compiler to perform harmful optimizations (merging, elision, or
> fission of the accesses).
I think the bigger problem is reloading. As I understand it, this code:
struct address_space *m = folio->mapping;
if (m && m->flags)
could end up loading 'm' twice, once before the setting to NULL and once
after. That's more plausible than deciding to load byte-by-byte, or
whatever else these "merging, elision, or fission" words mean.
> Fix this by using WRITE_ONCE() in page_cache_delete() and READ_ONCE()
> in folio_mapping() to prevent compiler misbehavior and silence the KCSAN
> report.
Just to be clear, I don't object to the patch itself, I'm just scared
of the consequences. And the locking comment above needs to be fixed.
But please wait a few days for discussion to play out.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping
2026-03-23 3:57 ` Matthew Wilcox
@ 2026-03-23 10:47 ` Pedro Falcato
0 siblings, 0 replies; 3+ messages in thread
From: Pedro Falcato @ 2026-03-23 10:47 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Abhishek Kumar, Andrew Morton, linux-mm, linux-kernel,
linux-fsdevel, syzbot+606f94dfeaaa45124c90, Axel Rasmussen,
Yuanchu Xie, Wei Xu
On Mon, Mar 23, 2026 at 03:57:51AM +0000, Matthew Wilcox wrote:
> On Mon, Mar 23, 2026 at 12:33:19AM +0530, Abhishek Kumar wrote:
> > KCSAN reports a data race between page_cache_delete() and
> > folio_mapping():
> >
> > page_cache_delete() performs a plain store to folio->mapping:
> > folio->mapping = NULL;
> >
> > folio_mapping() performs a plain load from folio->mapping:
> > mapping = folio->mapping;
> >
> > page_cache_delete() is called from the truncation path under the i_pages
> > xarray lock,
>
> That's not relevant. The important lock for maintaining folio->mapping
> is the folio lock (see the VM_BUG_ON_FOLIO line in page_cache_delete()).
Not only that, but holding invalidate_lock or i_rwsem can implicitly make
the folios stable by excluding out truncation.
> At a minimum, this changelog needs to be fixed because there's already
> too much confusion around the locking rules.
>
> > while folio_mapping() is called from the reclaim path
> > (evict_folios -> folio_evictable -> folio_mapping) under only
> > rcu_read_lock() without the xarray lock.
>
> Umm. First up, this is MGLRU-only code, right? Adding the so-called
> maintainers.
>
> Second ... I'm really unsure how we want to handle this generally.
> This could be quite the game of whack-a-mole; we have many, many places
> in the kernel which dereference folio->mapping without holding a lock.
>
> Perhaps they are all fine; but 12 of the 455 references to
> folio->mapping currently have READ_ONCE attached. That's a lot of code
> to audit.
Yes, and a lot of these just aren't trivial to prove 100% correct. e.g:
fs/ext2/dir.c:
static void ext2_commit_chunk(struct folio *folio, loff_t pos, unsigned len)
{
struct address_space *mapping = folio->mapping;
Racey?
int ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
struct folio *folio, struct inode *inode, bool update_times)
{
loff_t pos = folio_pos(folio) + offset_in_folio(folio, de);
unsigned len = ext2_rec_len_from_disk(de->rec_len);
int err;
folio_lock(folio);
Maybe not, but we don't revalidate folio->mapping after the lock. Racey?
Then you look at the ext2_set_link() callers and it is rename, which holds
i_rwsem. This plus the reload in filemap_get_entry() should make it sufficient
(and excludes against reclaim whacking the folio).
But there are other callers of ext2_commit_chunk(), etc, and ext2 is by far
one of the simpler filesystems out there :)
>
> > The race is benign since the reclaim path tolerates stale values --
> > reading a stale non-NULL mapping simply results in a suboptimal eviction
> > decision. However, the plain accesses risk store/load tearing and allow
> > the compiler to perform harmful optimizations (merging, elision, or
> > fission of the accesses).
>
> I think the bigger problem is reloading. As I understand it, this code:
>
> struct address_space *m = folio->mapping;
>
> if (m && m->flags)
>
> could end up loading 'm' twice, once before the setting to NULL and once
> after. That's more plausible than deciding to load byte-by-byte, or
> whatever else these "merging, elision, or fission" words mean.
>
> > Fix this by using WRITE_ONCE() in page_cache_delete() and READ_ONCE()
> > in folio_mapping() to prevent compiler misbehavior and silence the KCSAN
> > report.
>
> Just to be clear, I don't object to the patch itself, I'm just scared
> of the consequences. And the locking comment above needs to be fixed.
> But please wait a few days for discussion to play out.
IMHO this looks fine, particularly as folio_mapping() is practically
mm-internal (but there are some other users in fs/, for some reason).
And shouldn't be problematic as AIUI KCSAN will still report WRITE_ONCE+plain read
or READ_ONCE+plain write races.
--
Pedro
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-23 10:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260322190319.85301-1-abhishek_sts8.ref@yahoo.com>
2026-03-22 19:03 ` [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping Abhishek Kumar
2026-03-23 3:57 ` Matthew Wilcox
2026-03-23 10:47 ` Pedro Falcato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox