linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: fstests@vger.kernel.org, kdevops@lists.linux.dev,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, david@redhat.com,
	linmiaohe@huawei.com, muchun.song@linux.dev, osalvador@suse.de
Subject: Re: [PATCH] fstests: add fsstress + compaction test
Date: Thu, 18 Apr 2024 14:30:56 +0100	[thread overview]
Message-ID: <ZiEgkExEZ3By2wD0@casper.infradead.org> (raw)
In-Reply-To: <ZiDA1Lokzwxd3d-v@bombadil.infradead.org>

On Wed, Apr 17, 2024 at 11:42:28PM -0700, Luis Chamberlain wrote:
> > > Today I find that v6.9-rc4 is also hitting an unrecoverable hung task
> > > between compaction and fsstress while running generic/476 on the
> > > following kdevops test sections [2]:
> > > 
> > >   * xfs_nocrc
> > >   * xfs_nocrc_2k
> > >   * xfs_nocrc_4k
> > > 
> > > Analyzing the trace I see the guest uses loopback block devices for the
> > > fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs
> > > partition. The contention based on traces [3] [4] seems to be that we
> > > have somehow have fsstress + compaction race on folio_wait_bit_common().
> > 
> > What do you mean by "race"?  Here's what I see:
> 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: folio_wait_bit_common (mm/filemap.c:1275 (discriminator 4)) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: migrate_pages_batch (./include/linux/pagemap.h:1048 mm/migrate.c:1486 mm/migrate.c:1700) 

That's folio_lock().

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task kworker/u38:8:807213 blocked for more than 120 seconds.
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: folio_wait_bit_common (mm/filemap.c:1275 (discriminator 4)) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: extent_write_cache_pages (fs/btrfs/extent_io.c:2130) btrfs

folio_lock().

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task kworker/u34:9:1268436 blocked for more than 120 seconds.
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: folio_wait_bit_common (mm/filemap.c:1275 (discriminator 4)) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: btrfs_folio_start_writer_lock (./include/linux/pagemap.h:1048 fs/btrfs/subpage.c:394) btrfs

folio_lock().

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task xfsaild/loop5:1377891 blocked for more than 120 seconds.
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: io_schedule (kernel/sched/core.c:9019 (discriminator 1) kernel/sched/core.c:9045 (discriminator 1)) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: rq_qos_wait (block/blk-rq-qos.c:284 (discriminator 4)) 

I'm not familiar with this code.

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: wbt_wait (block/blk-wbt.c:660) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: __rq_qos_throttle (block/blk-rq-qos.c:66) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: blk_mq_submit_bio (block/blk-mq.c:2880 block/blk-mq.c:2984) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: submit_bio_noacct_nocheck (./include/linux/bio.h:639 block/blk-core.c:701 block/blk-core.c:729) 
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: _xfs_buf_ioapply (fs/xfs/xfs_buf.c:1584 fs/xfs/xfs_buf.c:1671) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: __xfs_buf_submit (./arch/x86/include/asm/atomic.h:67 ./include/linux/atomic/atomic-arch-fallback.h:2278 ./include/linux/atomic/atomic-instrumented.h:1384 fs/xfs/xfs_buf.c:1762) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_buf_delwri_submit_buffers (fs/xfs/xfs_buf.c:2280 (discriminator 2)) xfs

... but it's submitting a write.

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfsaild (fs/xfs/xfs_trans_ail.c:560 (discriminator 1) fs/xfs/xfs_trans_ail.c:671 (discriminator 1)) xfs

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task fsstress:1377894 blocked for more than 120 seconds.
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_buf_iowait (fs/xfs/xfs_buf.c:1691) xfs

Waiting for an I/O completion

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: __xfs_buf_submit (fs/xfs/xfs_buf.c:1770) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_buf_read_map (fs/xfs/xfs_buf.c:870) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289 (discriminator 1)) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_btree_read_buf_block (./fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_btree.c:1432) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_btree_lookup_get_block (fs/xfs/libxfs/xfs_btree.c:1934) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_btree_lookup (fs/xfs/libxfs/xfs_btree.c:2045) xfs

but a read, not a write.

> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task fsstress:1377895 blocked for more than 120 seconds.
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_buf_iowait (fs/xfs/xfs_buf.c:1691) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: __xfs_buf_submit (fs/xfs/xfs_buf.c:1770) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_buf_read_map (fs/xfs/xfs_buf.c:870) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289 (discriminator 1)) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_da_read_buf (fs/xfs/libxfs/xfs_da_btree.c:2676) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_attr3_leaf_read (fs/xfs/libxfs/xfs_attr_leaf.c:458) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_attr_leaf_hasname (fs/xfs/libxfs/xfs_attr.c:1206) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_attr_leaf_get (fs/xfs/libxfs/xfs_attr.c:1275) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_attr_get (fs/xfs/libxfs/xfs_attr.c:276) xfs
> Apr 16 23:06:11 base-xfs-nocrc-2k kernel: xfs_xattr_get (fs/xfs/xfs_xattr.c:143) xfs

A different read.

I'm not seeing a _race_ here.  I'm seeing tasks _stuck_, but on what?
A missing wakeup?

  reply	other threads:[~2024-04-18 13:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18  0:13 [PATCH] fstests: add fsstress + compaction test Luis Chamberlain
2024-04-18  1:39 ` Matthew Wilcox
2024-04-18  6:42   ` Luis Chamberlain
2024-04-18 13:30     ` Matthew Wilcox [this message]
2024-04-18  6:57   ` Christoph Hellwig
2024-04-18  9:19     ` Vlastimil Babka
2024-04-18 18:45       ` Andrew Morton
2024-04-18 19:01         ` Matthew Wilcox
2024-04-19  7:51           ` Vlastimil Babka
2024-04-19 17:25             ` Luis Chamberlain
2024-04-20 14:02 ` Zorro Lang
2024-05-28 22:58   ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiEgkExEZ3By2wD0@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=david@redhat.com \
    --cc=fstests@vger.kernel.org \
    --cc=kdevops@lists.linux.dev \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).