public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Pedro Falcato <pfalcato@suse.de>
To: Matthew Wilcox <willy@infradead.org>
Cc: Abhishek Kumar <abhishek_sts8@yahoo.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 linux-fsdevel@vger.kernel.org,
	syzbot+606f94dfeaaa45124c90@syzkaller.appspotmail.com,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>
Subject: Re: [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping
Date: Mon, 23 Mar 2026 10:47:46 +0000	[thread overview]
Message-ID: <zomsk45mkrop54w4iijusollwafixqul6zrgzczghenrpyodgg@bgo77hcidmko> (raw)
In-Reply-To: <acC6P6ULWgxKXFiK@casper.infradead.org>

On Mon, Mar 23, 2026 at 03:57:51AM +0000, Matthew Wilcox wrote:
> On Mon, Mar 23, 2026 at 12:33:19AM +0530, Abhishek Kumar wrote:
> > KCSAN reports a data race between page_cache_delete() and
> > folio_mapping():
> > 
> >   page_cache_delete() performs a plain store to folio->mapping:
> >     folio->mapping = NULL;
> > 
> >   folio_mapping() performs a plain load from folio->mapping:
> >     mapping = folio->mapping;
> > 
> > page_cache_delete() is called from the truncation path under the i_pages
> > xarray lock,
> 
> That's not relevant.  The important lock for maintaining folio->mapping
> is the folio lock (see the VM_BUG_ON_FOLIO line in page_cache_delete()).

Not only that, but holding invalidate_lock or i_rwsem can implicitly make
the folios stable by excluding out truncation.

> At a minimum, this changelog needs to be fixed because there's already
> too much confusion around the locking rules.
> 
> > while folio_mapping() is called from the reclaim path
> > (evict_folios -> folio_evictable -> folio_mapping) under only
> > rcu_read_lock() without the xarray lock.
> 
> Umm.  First up, this is MGLRU-only code, right?  Adding the so-called
> maintainers.
> 
> Second ... I'm really unsure how we want to handle this generally.
> This could be quite the game of whack-a-mole; we have many, many places
> in the kernel which dereference folio->mapping without holding a lock.
> 
> Perhaps they are all fine; but 12 of the 455 references to
> folio->mapping currently have READ_ONCE attached.  That's a lot of code
> to audit.

Yes, and a lot of these just aren't trivial to prove 100% correct. e.g:

fs/ext2/dir.c:

static void ext2_commit_chunk(struct folio *folio, loff_t pos, unsigned len)
{               
        struct address_space *mapping = folio->mapping;

Racey?

int ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
                struct folio *folio, struct inode *inode, bool update_times)
{
        loff_t pos = folio_pos(folio) + offset_in_folio(folio, de);
        unsigned len = ext2_rec_len_from_disk(de->rec_len);
        int err;

        folio_lock(folio); 

Maybe not, but we don't revalidate folio->mapping after the lock. Racey?

Then you look at the ext2_set_link() callers and it is rename, which holds
i_rwsem. This plus the reload in filemap_get_entry() should make it sufficient
(and excludes against reclaim whacking the folio).

But there are other callers of ext2_commit_chunk(), etc, and ext2 is by far
one of the simpler filesystems out there :)

> 
> > The race is benign since the reclaim path tolerates stale values --
> > reading a stale non-NULL mapping simply results in a suboptimal eviction
> > decision.  However, the plain accesses risk store/load tearing and allow
> > the compiler to perform harmful optimizations (merging, elision, or
> > fission of the accesses).
> 
> I think the bigger problem is reloading.  As I understand it, this code:
> 
> 	struct address_space *m = folio->mapping;
> 
> 	if (m && m->flags)
> 
> could end up loading 'm' twice, once before the setting to NULL and once
> after.  That's more plausible than deciding to load byte-by-byte, or
> whatever else these "merging, elision, or fission" words mean.
> 
> > Fix this by using WRITE_ONCE() in page_cache_delete() and READ_ONCE()
> > in folio_mapping() to prevent compiler misbehavior and silence the KCSAN
> > report.
> 
> Just to be clear, I don't object to the patch itself, I'm just scared
> of the consequences.  And the locking comment above needs to be fixed.
> But please wait a few days for discussion to play out.

IMHO this looks fine, particularly as folio_mapping() is practically
mm-internal (but there are some other users in fs/, for some reason).
And shouldn't be problematic as AIUI KCSAN will still report WRITE_ONCE+plain read
or READ_ONCE+plain write races.

-- 
Pedro


      reply	other threads:[~2026-03-23 10:47 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260322190319.85301-1-abhishek_sts8.ref@yahoo.com>
2026-03-22 19:03 ` [PATCH] mm: fix data race in __filemap_remove_folio / folio_mapping Abhishek Kumar
2026-03-23  3:57   ` Matthew Wilcox
2026-03-23 10:47     ` Pedro Falcato [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=zomsk45mkrop54w4iijusollwafixqul6zrgzczghenrpyodgg@bgo77hcidmko \
    --to=pfalcato@suse.de \
    --cc=abhishek_sts8@yahoo.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=syzbot+606f94dfeaaa45124c90@syzkaller.appspotmail.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox