Linux XFS filesystem development
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"ojaswin@linux.ibm.com" <ojaswin@linux.ibm.com>
Subject: Re: [bug report] fstests generic/774 hang
Date: Fri, 7 Nov 2025 15:18:30 -0800	[thread overview]
Message-ID: <20251107231830.GM196370@frogsfrogsfrogs> (raw)
In-Reply-To: <20251107175004.GL196370@frogsfrogsfrogs>

On Fri, Nov 07, 2025 at 09:50:04AM -0800, Darrick J. Wong wrote:
> On Fri, Nov 07, 2025 at 12:48:38PM +0000, John Garry wrote:
> > On 07/11/2025 05:53, Shinichiro Kawasaki wrote:
> > > On Nov 06, 2025 / 20:28, Darrick J. Wong wrote:
> > > > On Fri, Nov 07, 2025 at 02:27:50AM +0000, Shinichiro Kawasaki wrote:
> > > > > On Nov 06, 2025 / 08:53, John Garry wrote:
> > > ...
> > > > > > Having a hang - even for the conditions set - should not produce a hang. I
> > > > > > can check on whether we can improve the software-based atomic writes in xfs
> > > > > > to avoid this.
> > > > > 
> > > > > Thanks. Will sysrq-t output help? If it helps, I can take it from the hanging
> > > > > test node and share.
> > > > 
> > > > Yes, anything you can share would be helpful.
> > > 
> > > Okay, I attached dmesg log file (dmesg.gz), which contains the INFO messages and
> > > the sysrq-t output. It was taken with v6.18-rc4 kernel with the fix patches by
> > > Darrick. I also attached the kernel config (_config.gz) which I used to build
> > > the test target kernel.
> > > 
> > > > FWIW the test runs in 51
> > > > seconds here, but I only have 4 CPUs in the VM and fast storage so its
> > > > filesize is "only" 800MB.
> > > 
> > > FYI, my test node has 24 CPUs. The hang is sporadic and I needed to repeat the
> > > test case a few times to recreate it with the 8GiB TCMU devices. When it does
> > > not hang, the test case takes about an hour to complete.
> > 
> > Hi Shinichiro,
> > 
> > Can you still stop the test with ctrl^C, right?
> > 
> > @Darrick, I worry that there is too much ip lock contention in
> > xfs_atomic_write_cow_iomap_begin(), especially since we may drop and
> > re-acquire the lock (in xfs_trans_alloc_inode()). Maybe we should force
> > serialization in xfs_file_dio_write_atomic(). After all, this was not
> > intended to provide good performance. Or look at other ways to optimise this
> > (if we do want good performance).
> 
> I don't see how that helps.  All that does is shift the lock contention
> from xfs_inode::i_lock to inode::i_rwsem.  At the end of the day, this
> test is starting up 2*nr_cpus threads to issue large atomic directio
> writes that take a long time to complete.  Stall warnings when there are
> a large number of threads all trying to directio write to a file whose
> blocks require a metadata update upon IO completion are a long known
> problem.
> 
> I altered my test VM to have 24 cores and enough RAM to avoid OOMing the
> machine.  Setting up the mixed mappings file took 27 seconds, and the
> aio writes themselves took 3:15.  Validating the contents took 4
> seconds.
> 
> Maaaybe we should back off on the file size.  I don't see why it needs
> to create a 5GB file for testing.  The verify runs at 2100MB/s whereas
> the atomic writes plod along at 25MB/s.  That's why this test takes a
> loooong time to run.
> 
> (I don't see the lfsr complaints, but I'm running fio 3.41 from git)

Spoke too soon, now I'm seeing it all over the test fleet.

--D

> --D
> 
> > Thanks,
> > John
> > 
> 

  reply	other threads:[~2025-11-07 23:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30  8:45 [bug report] fstests generic/774 hang Shinichiro Kawasaki
2025-11-05  0:33 ` Darrick J. Wong
2025-11-05  2:19   ` Shinichiro Kawasaki
2025-11-05  8:52     ` John Garry
2025-11-05 10:39       ` John Garry
2025-11-05 11:29         ` John Garry
2025-11-05 12:37         ` Shinichiro Kawasaki
2025-11-06  8:19           ` Shinichiro Kawasaki
2025-11-06  8:53             ` John Garry
2025-11-07  2:27               ` Shinichiro Kawasaki
2025-11-07  4:28                 ` Darrick J. Wong
2025-11-07  5:53                   ` Shinichiro Kawasaki
2025-11-07 12:48                     ` John Garry
2025-11-07 17:50                       ` Darrick J. Wong
2025-11-07 23:18                         ` Darrick J. Wong [this message]
2025-11-10  2:41                       ` Shinichiro Kawasaki
2025-11-09 12:02             ` Ojaswin Mujoo
2025-11-10 12:46               ` [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: " Shinichiro Kawasaki
2025-11-10 21:12                 ` Darrick J. Wong
2025-11-11 11:43                   ` Shinichiro Kawasaki
2025-11-09 11:58         ` Ojaswin Mujoo
2025-11-10  8:58           ` John Garry
2025-11-10 12:39           ` Shinichiro Kawasaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251107231830.GM196370@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox