All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: don't flush the entire filesystem when a buffered write runs out of space
Date: Fri, 27 Mar 2020 13:27:14 +1100	[thread overview]
Message-ID: <20200327022714.GQ10776@dread.disaster.area> (raw)
In-Reply-To: <20200327014558.GG29339@magnolia>

On Thu, Mar 26, 2020 at 06:45:58PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> A customer reported rcu stalls and softlockup warnings on a computer
> with many CPU cores and many many more IO threads trying to write to a
> filesystem that is totally out of space.  Subsequent analysis pointed to
> the many many IO threads calling xfs_flush_inodes -> sync_inodes_sb,
> which causes a lot of wb_writeback_work to be queued.  The writeback
> worker spends so much time trying to wake the many many threads waiting
> for writeback completion that it trips the softlockup detector, and (in
> this case) the system automatically reboots.

That doesn't sound right. Each writeback work that is queued via
sync_inodes_sb should only have a single process waiting on it's
completion. And how many threads do you actually have to need to
wake up for it to trigger a 10s soft-lockup timeout?

More detail, please?

> In addition, they complain that the lengthy xfs_flush_inodes scan traps
> all of those threads in uninterruptible sleep, which hampers their
> ability to kill the program or do anything else to escape the situation.
> 
> Fix this by replacing the full filesystem flush (which is offloaded to a
> workqueue which we then have to wait for) with directly flushing the
> file that we're trying to write.

Which does nothing to flush -other- outstanding delalloc
reservations and allow the eofblocks/cowblock scan to reclaim unused
post-EOF speculative preallocations.

That's the purpose of the xfs_flush_inodes() - without it we can get
very premature ENOSPC, especially on small filesystems when writing
largish files in the background. So I'm not sure that dropping the
sync is a viable solution. It is actually needed.

Perhaps we need to go back to the ancient code thatonly allowed XFS
to run a single xfs_flush_inodes() at a time - everything else
waited on the single flush to complete, then all returned at the
same time...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2020-03-27  2:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27  1:45 [PATCH] xfs: don't flush the entire filesystem when a buffered write runs out of space Darrick J. Wong
2020-03-27  2:27 ` Dave Chinner [this message]
2020-03-27  2:51   ` Darrick J. Wong
2020-03-27  4:50     ` Dave Chinner
2020-03-27  9:08 ` Christoph Hellwig
2020-03-27  9:09   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200327022714.GQ10776@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.