All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
To: Qu Wenruo <wqu@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	vivek.kasireddy@intel.com,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: Large folios and filemap_get_folios_contig()
Date: Thu, 3 Apr 2025 17:50:31 -0700	[thread overview]
Message-ID: <Z-8s1-kiIDkzgRbc@fedora> (raw)
In-Reply-To: <59539c02-d353-4811-bcbe-080b408f445e@suse.com>

[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]

On Fri, Apr 04, 2025 at 07:46:59AM +1030, Qu Wenruo wrote:
> 
> 
> 在 2025/4/3 23:05, Matthew Wilcox 写道:
> > On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote:
> > > Recently I hit a bug when developing the large folios support for btrfs.
> > > 
> > > That we call filemap_get_folios_contig(), then lock each returned folio.
> > > (We also have a case where we unlock each returned folio)
> > > 
> > > However since a large folio can be returned several times in the batch,
> > > this obviously makes a deadlock, as btrfs is trying to lock the same
> > > folio more than once.
> > 
> > Sorry, what?  A large folio should only be returned once.  xas_next()
> > moves to the next folio.  How is it possible that
> > filemap_get_folios_contig() returns the same folio more than once?
> 
> But that's exactly what I got from filemap_get_folios_contig():
> 
> lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336
> end=819199(36864)
> lock_delalloc_folios: r/i=5/260 found_folios=1
> lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536)
> lock_delalloc_folios: r/i=5/260 found_folios=8
> lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144)
> 
> r/i is the root and inode number from btrfs, and you can completely ignore
> it.
> 
> @locked_folio is the folio we're already holding a lock, the value inside
> the brackets is the folio size.
> 
> @start and @end is the range we're searching for, the value inside the
> brackets is the search range length.
> 
> The first iteration returns the current locked folio, and since the range
> inside that folio is only 4K, thus it's only returned once.
> 
> The next 8 slots are all inside the same large folio at 786432, resulting
> duplicated entries.
> 
> > 
> > > Then I looked into the caller of filemap_get_folios_contig() inside
> > > mm/gup, and it indeed does the correct skip.
> > 
> > ... that code looks wrong to me.
> 
> It looks like it's xas_find() is doing the correct skip by calling
> xas_next_offset() -> xas_move_index() to skip the next one.
> 
> But the filemap_get_folios_contig() only calls xas_next() by increasing the
> index, not really skip to the next folio.
> 
> Although I can be totally wrong as I'm not familiar with the xarray
> internals at all.

Thanks for bringing this to my attention, it looks like this is due to a
mistake during my folio conversion for this function. The xas_next()
call tries to go to the next index, but if that index is part of a
multi-index entry things get awkward if we don't manually account for the
index shift of a large folio (which I missed).

Can you please try out this attached patch and see if it gets rid of
the duplicate problem?

> However I totally agree the duplicated behavior (and the extra handling of
> duplicated entries) looks very wrong.
> 
> Thanks,
> Qu

[-- Attachment #2: 0001-Fix-filemap_get_folios_contig-returning-batches-of-i.patch --]
[-- Type: text/plain, Size: 1533 bytes --]

From 91e8353cee38b1624fc13587f6db5058d764d983 Mon Sep 17 00:00:00 2001
From: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Date: Thu, 3 Apr 2025 16:54:17 -0700
Subject: [PATCH] Fix filemap_get_folios_contig returning batches of identical
 folios

filemap_get_folios_contig() is supposed to return distinct folios
found within [start, end]. Large folios in the Xarray become multi-index
entries. xas_next() can iterate through the sub-indexes before finding
a sibling entry and breaking out of the loop.

This can result in a returned folio_batch containing an indeterminate
number of duplicate folios, which forces the callers to skeptically
handle the returned batch. This is inefficient and incurs a large
maintenance overhead.

We can fix this by calling xas_advance() after we have successfully
adding a folio to the batch to ensure our Xarray is positioned such that
it will correctly find the next folio - similar to
filemap_get_read_batch().

Fixes: 35b471467f88 ("filemap: add filemap_get_folios_contig()")
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: <stable@vger.kernel.org>
---
 mm/filemap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index cc69f174f76b..bc7b28dfba3c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2233,6 +2233,7 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
 			*start = folio->index + nr;
 			goto out;
 		}
+		xas_advance(&xas, folio_next_index(folio) - 1);
 		continue;
 put_folio:
 		folio_put(folio);
-- 
2.48.1


  reply	other threads:[~2025-04-04  0:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03  9:36 Large folios and filemap_get_folios_contig() Qu Wenruo
2025-04-03 12:35 ` Matthew Wilcox
2025-04-03 21:16   ` Qu Wenruo
2025-04-04  0:50     ` Vishal Moola (Oracle) [this message]
2025-04-04  4:15       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-8s1-kiIDkzgRbc@fedora \
    --to=vishal.moola@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=willy@infradead.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.