From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: ratelimit inode flush on buffered write ENOSPC
Date: Mon, 30 Mar 2020 11:33:58 +1100 [thread overview]
Message-ID: <20200330003357.GX10776@dread.disaster.area> (raw)
In-Reply-To: <20200330001602.GB80283@magnolia>
On Sun, Mar 29, 2020 at 05:16:02PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 30, 2020 at 09:08:02AM +1100, Dave Chinner wrote:
> > On Sun, Mar 29, 2020 at 10:22:09AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > A customer reported rcu stalls and softlockup warnings on a computer
> > > with many CPU cores and many many more IO threads trying to write to a
> > > filesystem that is totally out of space. Subsequent analysis pointed to
> > > the many many IO threads calling xfs_flush_inodes -> sync_inodes_sb,
> > > which causes a lot of wb_writeback_work to be queued. The writeback
> > > worker spends so much time trying to wake the many many threads waiting
> > > for writeback completion that it trips the softlockup detector, and (in
> > > this case) the system automatically reboots.
> > >
> > > In addition, they complain that the lengthy xfs_flush_inodes scan traps
> > > all of those threads in uninterruptible sleep, which hampers their
> > > ability to kill the program or do anything else to escape the situation.
> > >
> > > If there's thousands of threads trying to write to files on a full
> > > filesystem, each of those threads will start separate copies of the
> > > inode flush scan. This is kind of pointless since we only need one
> > > scan, so rate limit the inode flush.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > fs/xfs/xfs_mount.h | 1 +
> > > fs/xfs/xfs_super.c | 14 ++++++++++++++
> > > 2 files changed, 15 insertions(+)
> > >
> > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > index 88ab09ed29e7..50c43422fa17 100644
> > > --- a/fs/xfs/xfs_mount.h
> > > +++ b/fs/xfs/xfs_mount.h
> > > @@ -167,6 +167,7 @@ typedef struct xfs_mount {
> > > struct xfs_kobj m_error_meta_kobj;
> > > struct xfs_error_cfg m_error_cfg[XFS_ERR_CLASS_MAX][XFS_ERR_ERRNO_MAX];
> > > struct xstats m_stats; /* per-fs stats */
> > > + struct ratelimit_state m_flush_inodes_ratelimit;
> > >
> > > struct workqueue_struct *m_buf_workqueue;
> > > struct workqueue_struct *m_unwritten_workqueue;
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 68fea439d974..abf06bf9c3f3 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -528,6 +528,9 @@ xfs_flush_inodes(
> > > {
> > > struct super_block *sb = mp->m_super;
> > >
> > > + if (!__ratelimit(&mp->m_flush_inodes_ratelimit))
> > > + return;
> > > +
> > > if (down_read_trylock(&sb->s_umount)) {
> > > sync_inodes_sb(sb);
> > > up_read(&sb->s_umount);
> > > @@ -1366,6 +1369,17 @@ xfs_fc_fill_super(
> > > if (error)
> > > goto out_free_names;
> > >
> > > + /*
> > > + * Cap the number of invocations of xfs_flush_inodes to 16 for every
> > > + * quarter of a second. The magic numbers here were determined by
> > > + * observation neither to cause stalls in writeback when there are a
> > > + * lot of IO threads and the fs is near ENOSPC, nor cause any fstest
> > > + * regressions. YMMV.
> > > + */
> > > + ratelimit_state_init(&mp->m_flush_inodes_ratelimit, HZ / 4, 16);
> > > + ratelimit_set_flags(&mp->m_flush_inodes_ratelimit,
> > > + RATELIMIT_MSG_ON_RELEASE);
> >
> > Urk.
> >
> > RATELIMIT_MSG_ON_RELEASE prevents "callbacks suppressed"
> > messages when rate limiting was active and resets via __rate_limit().
> > However, in ratelimit_state_exit(), that flag -enables- printing
> > "callbacks suppressed" messages when rate limiting was active and is
> > reset.
> >
> > Same flag, exact opposite behaviour...
> >
> > The comment says it's behaviour is supposed to match that of
> > ratelimit_state_exit() (i.e. print message on ratelimit exit), so I
> > really can't tell if this is correct/intended usage or just API
> > abuse....
>
> This flag (AFAICT) basically means "summarize skipped calls later",
> where later is when _exit is called. It's very annoying that this
> printk thing is mixed in with what otherwise is a simple ratelimiting
> mechanism, since there isn't much to be gained by spamming dmesg every
> time a buffered write hits ENOSPC, and absolutely nothing to be gained
> by logging that at umount time (with comm being the umount process!)
>
> Since there's no design documentation for how the ratelimiting system
> works, the best I can do is RTFS and do whatever magic gets the outcome
> I want (which is to set the flag and skip calling _exit. Only one of
> the ratelimit state users calls ratelimit_state_exit, so it's apparently
> not required.
Yeah, that's pretty much where I got to with it - I wasn't at all
sure if there was something obvious I was missing or it was just
another piece of poorly written code...
> This all is poor engineering practice, but you /did/ suggest
> ratelimiting (on IRC) and I don't want to go reimplementing ratelimit.c
> either, and it /does/ fix the xfs_flush_inodes flooding problems.
*nod*
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2020-03-30 0:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-29 17:22 [PATCH] xfs: ratelimit inode flush on buffered write ENOSPC Darrick J. Wong
2020-03-29 19:57 ` Allison Collins
2020-03-29 22:08 ` Dave Chinner
2020-03-30 0:16 ` Darrick J. Wong
2020-03-30 0:33 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200330003357.GX10776@dread.disaster.area \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox