From: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
To: Qu Wenruo <wqu@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Qu Wenruo <quwenruo.btrfs@gmx.com>,
Linux Memory Management List <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
vivek.kasireddy@intel.com,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: Large folios and filemap_get_folios_contig()
Date: Thu, 3 Apr 2025 17:50:31 -0700 [thread overview]
Message-ID: <Z-8s1-kiIDkzgRbc@fedora> (raw)
In-Reply-To: <59539c02-d353-4811-bcbe-080b408f445e@suse.com>
[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]
On Fri, Apr 04, 2025 at 07:46:59AM +1030, Qu Wenruo wrote:
>
>
> 在 2025/4/3 23:05, Matthew Wilcox 写道:
> > On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote:
> > > Recently I hit a bug when developing the large folios support for btrfs.
> > >
> > > That we call filemap_get_folios_contig(), then lock each returned folio.
> > > (We also have a case where we unlock each returned folio)
> > >
> > > However since a large folio can be returned several times in the batch,
> > > this obviously makes a deadlock, as btrfs is trying to lock the same
> > > folio more than once.
> >
> > Sorry, what? A large folio should only be returned once. xas_next()
> > moves to the next folio. How is it possible that
> > filemap_get_folios_contig() returns the same folio more than once?
>
> But that's exactly what I got from filemap_get_folios_contig():
>
> lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336
> end=819199(36864)
> lock_delalloc_folios: r/i=5/260 found_folios=1
> lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536)
> lock_delalloc_folios: r/i=5/260 found_folios=8
> lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144)
>
> r/i is the root and inode number from btrfs, and you can completely ignore
> it.
>
> @locked_folio is the folio we're already holding a lock, the value inside
> the brackets is the folio size.
>
> @start and @end is the range we're searching for, the value inside the
> brackets is the search range length.
>
> The first iteration returns the current locked folio, and since the range
> inside that folio is only 4K, thus it's only returned once.
>
> The next 8 slots are all inside the same large folio at 786432, resulting
> duplicated entries.
>
> >
> > > Then I looked into the caller of filemap_get_folios_contig() inside
> > > mm/gup, and it indeed does the correct skip.
> >
> > ... that code looks wrong to me.
>
> It looks like it's xas_find() is doing the correct skip by calling
> xas_next_offset() -> xas_move_index() to skip the next one.
>
> But the filemap_get_folios_contig() only calls xas_next() by increasing the
> index, not really skip to the next folio.
>
> Although I can be totally wrong as I'm not familiar with the xarray
> internals at all.
Thanks for bringing this to my attention, it looks like this is due to a
mistake during my folio conversion for this function. The xas_next()
call tries to go to the next index, but if that index is part of a
multi-index entry things get awkward if we don't manually account for the
index shift of a large folio (which I missed).
Can you please try out this attached patch and see if it gets rid of
the duplicate problem?
> However I totally agree the duplicated behavior (and the extra handling of
> duplicated entries) looks very wrong.
>
> Thanks,
> Qu
[-- Attachment #2: 0001-Fix-filemap_get_folios_contig-returning-batches-of-i.patch --]
[-- Type: text/plain, Size: 1533 bytes --]
From 91e8353cee38b1624fc13587f6db5058d764d983 Mon Sep 17 00:00:00 2001
From: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Date: Thu, 3 Apr 2025 16:54:17 -0700
Subject: [PATCH] Fix filemap_get_folios_contig returning batches of identical
folios
filemap_get_folios_contig() is supposed to return distinct folios
found within [start, end]. Large folios in the Xarray become multi-index
entries. xas_next() can iterate through the sub-indexes before finding
a sibling entry and breaking out of the loop.
This can result in a returned folio_batch containing an indeterminate
number of duplicate folios, which forces the callers to skeptically
handle the returned batch. This is inefficient and incurs a large
maintenance overhead.
We can fix this by calling xas_advance() after we have successfully
adding a folio to the batch to ensure our Xarray is positioned such that
it will correctly find the next folio - similar to
filemap_get_read_batch().
Fixes: 35b471467f88 ("filemap: add filemap_get_folios_contig()")
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: <stable@vger.kernel.org>
---
mm/filemap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index cc69f174f76b..bc7b28dfba3c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2233,6 +2233,7 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
*start = folio->index + nr;
goto out;
}
+ xas_advance(&xas, folio_next_index(folio) - 1);
continue;
put_folio:
folio_put(folio);
--
2.48.1
next prev parent reply other threads:[~2025-04-04 0:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-03 9:36 Large folios and filemap_get_folios_contig() Qu Wenruo
2025-04-03 12:35 ` Matthew Wilcox
2025-04-03 21:16 ` Qu Wenruo
2025-04-04 0:50 ` Vishal Moola (Oracle) [this message]
2025-04-04 4:15 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-8s1-kiIDkzgRbc@fedora \
--to=vishal.moola@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=vivek.kasireddy@intel.com \
--cc=willy@infradead.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).