From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"ojaswin@linux.ibm.com" <ojaswin@linux.ibm.com>
Subject: Re: [bug report] fstests generic/774 hang
Date: Fri, 7 Nov 2025 09:50:04 -0800 [thread overview]
Message-ID: <20251107175004.GL196370@frogsfrogsfrogs> (raw)
In-Reply-To: <5e3f6b82-1e8c-4cd1-90a6-e1612f76370b@oracle.com>
On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote:
> On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > > > On Nov 06, 2025 / 08:53, John Garry wrote:
> > ...
> > > > > Having a hang - even for the conditions set - should not produce a hang. I
> > > > > can check on whether we can improve the software-based atomic writes in xfs
> > > > > to avoid this.
> > > >
> > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > > > test node and share.
> > >
> > > Yes, anything you can share would be helpful.
> >
> > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> > Darrick. I also attached the kernel config (_config.gz) which I used to build
> > the test target kernel.
> >
> > > FWIW the test runs in 51
> > > seconds here, but I only have 4 CPUs in the VM and fast storage so its
> > > filesize is "only" 800MB.
> >
> > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> > test case a few times to recreate it with the 8GiB TCMU devices. When it does
> > not hang, the test case takes about an hour to complete.
>
> Hi Shinichiro,
>
> Can you still stop the test with ctrl^C, right?
>
> @Darrick, I worry that there is too much ip lock contention in
> xfs_atomic_write_cow_iomap_begin(), especially since we may drop and
> re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force
> serialization in xfs_file_dio_write_atomic(). After all, this was not
> intended to provide good performance. Or look at other ways to optimise this
> (if we do want good performance).
I don't see how that helps. All that does is shift the lock contention
from xfs_inode::i_lock to inode::i_rwsem. At the end of the day, this
test is starting up 2*nr_cpus threads to issue large atomic directio
writes that take a long time to complete. Stall warnings when there are
a large number of threads all trying to directio write to a file whose
blocks require a metadata update upon IO completion are a long known
problem.
I altered my test VM to have 24 cores and enough RAM to avoid OOMing the
machine. Setting up the mixed mappings file took 27 seconds, and the
aio writes themselves took 3:15. Validating the contents took 4
seconds.
Maaaybe we should back off on the file size. I don't see why it needs
to create a 5GB file for testing. The verify runs at 2100MB/s whereas
the atomic writes plod along at 25MB/s. That's why this test takes a
loooong time to run.
(I don't see the lfsr complaints, but I'm running fio 3.41 from git)
--D
> Thanks,
> John
>
next prev parent reply other threads:[~2025-11-07 17:50 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki
2025-11-05 0:33 ` Darrick J. Wong
2025-11-05 2:19 ` Shinichiro Kawasaki
2025-11-05 8:52 ` John Garry
2025-11-05 10:39 ` John Garry
2025-11-05 11:29 ` John Garry
2025-11-05 12:37 ` Shinichiro Kawasaki
2025-11-06 8:19 ` Shinichiro Kawasaki
2025-11-06 8:53 ` John Garry
2025-11-07 2:27 ` Shinichiro Kawasaki
2025-11-07 4:28 ` Darrick J. Wong
2025-11-07 5:53 ` Shinichiro Kawasaki
2025-11-07 12:48 ` John Garry
2025-11-07 17:50 ` Darrick J. Wong [this message]
2025-11-07 23:18 ` Darrick J. Wong
2025-11-10 2:41 ` Shinichiro Kawasaki
2025-11-09 12:02 ` Ojaswin Mujoo
2025-11-10 12:46 ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki
2025-11-10 21:12 ` Darrick J. Wong
2025-11-11 11:43 ` Shinichiro Kawasaki
2025-11-09 11:58 ` Ojaswin Mujoo
2025-11-10 8:58 ` John Garry
2025-11-10 12:39 ` Shinichiro Kawasaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251107175004.GL196370@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=john.g.garry@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox