From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"ojaswin@linux.ibm.com" <ojaswin@linux.ibm.com>
Subject: Re: [bug report] fstests generic/774 hang
Date: Fri, 7 Nov 2025 15:18:30 -0800 [thread overview]
Message-ID: <20251107231830.GM196370@frogsfrogsfrogs> (raw)
In-Reply-To: <20251107175004.GL196370@frogsfrogsfrogs>
On Fri, Nov 07, 2025 at 09:50:04AM -0800, Darrick J. Wong wrote:
> On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote:
> > On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> > > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> > > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > > > > On Nov 06, 2025 / 08:53, John Garry wrote:
> > > ...
> > > > > > Having a hang - even for the conditions set - should not produce a hang. I
> > > > > > can check on whether we can improve the software-based atomic writes in xfs
> > > > > > to avoid this.
> > > > >
> > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > > > > test node and share.
> > > >
> > > > Yes, anything you can share would be helpful.
> > >
> > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> > > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> > > Darrick. I also attached the kernel config (_config.gz) which I used to build
> > > the test target kernel.
> > >
> > > > FWIW the test runs in 51
> > > > seconds here, but I only have 4 CPUs in the VM and fast storage so its
> > > > filesize is "only" 800MB.
> > >
> > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> > > test case a few times to recreate it with the 8GiB TCMU devices. When it does
> > > not hang, the test case takes about an hour to complete.
> >
> > Hi Shinichiro,
> >
> > Can you still stop the test with ctrl^C, right?
> >
> > @Darrick, I worry that there is too much ip lock contention in
> > xfs_atomic_write_cow_iomap_begin(), especially since we may drop and
> > re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force
> > serialization in xfs_file_dio_write_atomic(). After all, this was not
> > intended to provide good performance. Or look at other ways to optimise this
> > (if we do want good performance).
>
> I don't see how that helps. All that does is shift the lock contention
> from xfs_inode::i_lock to inode::i_rwsem. At the end of the day, this
> test is starting up 2*nr_cpus threads to issue large atomic directio
> writes that take a long time to complete. Stall warnings when there are
> a large number of threads all trying to directio write to a file whose
> blocks require a metadata update upon IO completion are a long known
> problem.
>
> I altered my test VM to have 24 cores and enough RAM to avoid OOMing the
> machine. Setting up the mixed mappings file took 27 seconds, and the
> aio writes themselves took 3:15. Validating the contents took 4
> seconds.
>
> Maaaybe we should back off on the file size. I don't see why it needs
> to create a 5GB file for testing. The verify runs at 2100MB/s whereas
> the atomic writes plod along at 25MB/s. That's why this test takes a
> loooong time to run.
>
> (I don't see the lfsr complaints, but I'm running fio 3.41 from git)
Spoke too soon, now I'm seeing it all over the test fleet.
--D
> --D
>
> > Thanks,
> > John
> >
>
next prev parent reply other threads:[~2025-11-07 23:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki
2025-11-05 0:33 ` Darrick J. Wong
2025-11-05 2:19 ` Shinichiro Kawasaki
2025-11-05 8:52 ` John Garry
2025-11-05 10:39 ` John Garry
2025-11-05 11:29 ` John Garry
2025-11-05 12:37 ` Shinichiro Kawasaki
2025-11-06 8:19 ` Shinichiro Kawasaki
2025-11-06 8:53 ` John Garry
2025-11-07 2:27 ` Shinichiro Kawasaki
2025-11-07 4:28 ` Darrick J. Wong
2025-11-07 5:53 ` Shinichiro Kawasaki
2025-11-07 12:48 ` John Garry
2025-11-07 17:50 ` Darrick J. Wong
2025-11-07 23:18 ` Darrick J. Wong [this message]
2025-11-10 2:41 ` Shinichiro Kawasaki
2025-11-09 12:02 ` Ojaswin Mujoo
2025-11-10 12:46 ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki
2025-11-10 21:12 ` Darrick J. Wong
2025-11-11 11:43 ` Shinichiro Kawasaki
2025-11-09 11:58 ` Ojaswin Mujoo
2025-11-10 8:58 ` John Garry
2025-11-10 12:39 ` Shinichiro Kawasaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251107231830.GM196370@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=john.g.garry@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox