linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul@sk.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-kernel@vger.kernel.org, clm@fb.com, josef@toxicpanda.com,
	dsterba@suse.com, linux-btrfs@vger.kernel.org,
	kernel_team@skhynix.com, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, yeoreum.yun@arm.com,
	yunseong.kim@ericsson.com, gwan-gyeong.mun@intel.com,
	harry.yoo@oracle.com, ysk@kzalloc.com
Subject: Re: [RFC] DEPT report on around btrfs, unlink, and truncate
Date: Tue, 24 Jun 2025 10:59:24 +0900	[thread overview]
Message-ID: <20250624015924.GE5820@system.software.com> (raw)
In-Reply-To: <20250624014426.GC5820@system.software.com>

On Tue, Jun 24, 2025 at 10:44:26AM +0900, Byungchul Park wrote:
> On Mon, Jun 23, 2025 at 07:28:38PM +0930, Qu Wenruo wrote:
> > 在 2025/6/23 19:22, Byungchul Park 写道:
> > > On Mon, Jun 23, 2025 at 06:22:44PM +0930, Qu Wenruo wrote:
> > > > 在 2025/6/23 17:49, Byungchul Park 写道:
> > > > > On Mon, Jun 23, 2025 at 03:20:43PM +0930, Qu Wenruo wrote:
> > > > > > 在 2025/6/23 12:51, Byungchul Park 写道:
> > > > > > > Hi folks,
> > > > > > > 
> > > > > > > Thanks to Yunseong, we got two DEPT reports in btrfs.  It doesn't mean
> > > > > > > it's obvious deadlocks, but after digging into the reports, I'm
> > > > > > > wondering if it could happen by any chance.
> > > > > > > 
> > > > > > > 1) The first scenario that I'm concerning is:
> > > > > > > 
> > > > > > >      context A            context B
> > > > > > > 
> > > > > > >                           do_truncate()
> > > > > > >                             ...
> > > > > > >                               btrfs_do_readpage() // with folio lock held
> > > > > > 
> > > > > >                                  This one is for data.
> > > > > 
> > > > > Do you mean this folio is for data?  Thanks for the confirmation.
> > > > 
> > > > Yes, only data folios will go through btrfs_do_readpage().
> > > > 
> > > > For metadata, we never go through btrfs_do_readpage(), but
> > > > read_extent_buffer_pages_nowait().
> > > > 
> > > > 
> > > > > 
> > > > > > >      do_unlinkat()
> > > > > > >        ...
> > > > > > >          push_leaf_right()
> > > > > > >         btrfs_tree_lock_nested()
> > > > > > >           down_write_nested(&eb->lock) // hold
> > > > > 
> > > > > This is struct extent_buffer's rw_sem.  Right?
> > > > > 
> > > > > > >                                 btrfs_get_extent()
> > > > > > >                                   btrfs_lookup_file_extent()
> > > > > > >                                     btrfs_search_slot()
> > > > > > >                                       down_read_nested(&eb->lock) // stuck
> > > > > > 
> > > > > >                                          This one is for metadata.
> > > > >                                                ^
> > > > >                                        I don't get this actually.
> > > > > 
> > > > > This is struct extent_buffer's rw_sem, too.  Cannot this rw_sem be the
> > > > > same as the rw_sem above in context A?
> > > > 
> > > > My bad, I thought you're talking about that down_read_nested()
> > > > conflicting with folio lock.
> > > > 
> > > > But if you're talking about extent_buffer::lock, then the one in context
> > > > B will wait for the one in context A, and that's expected.
> > > 
> > > Sounds good.
> > > 
> > > > > > Data and metadata page cache will never cross into each other.
> > > > > > 
> > > > > > Thanks,
> > > > > > Qu
> > > > > > 
> > > > > > >           __push_leaf_right()
> > > > > > >             ...
> > > > > > >               folio_lock() // stuck
> > > > > 
> > > > > Did you mean this folio is always for metadata?
> > > > 
> > > > Can you explain more on where this folio_lock() comes from?
> > > 
> > > I should also rely on the following stacktrace in the dept report.  I
> > > asked Yunseong who reported this issue, for the decoded stacktrace, so
> > > that I can interpret that better.  I will get back once I figure out
> > > where the wait on PG_locked comes from.
> > > 
> > >     [  304.344198][ T7488] [W] dept_page_wait_on_bit(pg_locked_map:0):
> > >     [  304.344211][ T7488] [<ffff8000823b1d20>] __push_leaf_right+0x8f0/0xc70
> > 
> > I believe it's from btrfs_clear_buffer_dirty():
> > 
> > As we have a for() loop iterating all the folios of a an extent buffer
> > (aka, metadata structure), then clear the dirty flags.
> > 
> > The same applies to btrfs_mark_buffer_dirty() -> set_extent_buffer_dirty().
> 
> Thanks to Yunseong, I figured out this is the case.
> 
> > In that case, the folio is 100% belonging to btree inode thus metadata.
> 
> Good to know.
> 
> Lastly, is it still good with directly manipulating block devs or
							^
						I meant block devs files.

	Byungchul

> stacked file system using loopback devices, from the confliction of
> folios and extent_buffers?
> 
> If you confirm it, this issue can be closed :-) Thanks in advance.
> 
> 	Byungchul
> 
> > Thus the folio lock can not conflict with a data folio, thus there
> > should be no deadlock.
> > 
> > Thanks,
> > Qu
> > 
> > 
> > >     [  304.344232][ T7488] stacktrace:
> > >     [  304.344241][ T7488]       __push_leaf_right+0x8f0/0xc70
> > >     [  304.344260][ T7488]       push_leaf_right+0x408/0x628
> > >     [  304.344278][ T7488]       btrfs_del_items+0x974/0xaec
> > >     [  304.344297][ T7488]       btrfs_truncate_inode_items+0x1c5c/0x2b00
> > >     [  304.344314][ T7488]       btrfs_evict_inode+0xa4c/0xd38
> > >     [  304.344335][ T7488]       evict+0x340/0x7b0
> > >     [  304.344352][ T7488]       iput+0x4ec/0x840
> > >     [  304.344369][ T7488]       do_unlinkat+0x444/0x59c
> > >     [  304.344388][ T7488]       __arm64_sys_unlinkat+0x11c/0x260
> > >     [  304.344407][ T7488]       invoke_syscall+0x88/0x2e0
> > >     [  304.344425][ T7488]       el0_svc_common.constprop.0+0xe8/0x2e0
> > >     [  304.344445][ T7488]       do_el0_svc+0x44/0x60
> > >     [  304.344463][ T7488]       el0_svc+0x50/0x188
> > >     [  304.344482][ T7488]       el0t_64_sync_handler+0x10c/0x140
> > >     [  304.344503][ T7488]       el0t_64_sync+0x198/0x19c
> > > 
> > > Thanks.
> > > 
> > >       Byungchul
> > > 
> > > > I didn't see any location where __push_leaf_right() is locking a folio
> > > > nor the original do_unlinkat().
> > > > 
> > > > So here I can only guess the folio is from __push_leaf_right() context,
> > > > that means it can only be a metadata folio.
> > > > 
> > > > > 
> > > > > If no, it could lead a deadlock in my opinion.  If yes, dept should
> > > > > assign different classes to folios between data data and metadata.
> > > > 
> > > > So far I believe the folio belongs to metadata.
> > > > 
> > > > And since btrfs has very different handling of metadata folios, and it's
> > > > a little confusing that, we also have a btree_inode to handle the
> > > > metadata page cache, but do not have read_folio() callbacks, it can be a
> > > > little confusing to some automatic tools.
> > > > 
> > > > Thanks,
> > > > Qu
> > 
> > 

  reply	other threads:[~2025-06-24  1:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-23  3:21 [RFC] DEPT report on around btrfs, unlink, and truncate Byungchul Park
2025-06-23  5:50 ` Qu Wenruo
2025-06-23  8:19   ` Byungchul Park
2025-06-23  8:52     ` Qu Wenruo
2025-06-23  9:52       ` Byungchul Park
2025-06-23  9:58         ` Qu Wenruo
2025-06-24  1:44           ` Byungchul Park
2025-06-24  1:59             ` Byungchul Park [this message]
2025-06-24  3:07             ` Qu Wenruo
2025-06-23 12:22         ` Yunseong Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250624015924.GE5820@system.software.com \
    --to=byungchul@sk.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=gwan-gyeong.mun@intel.com \
    --cc=harry.yoo@oracle.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=torvalds@linux-foundation.org \
    --cc=yeoreum.yun@arm.com \
    --cc=ysk@kzalloc.com \
    --cc=yunseong.kim@ericsson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).