All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery
Date: Fri, 6 Sep 2019 08:10:54 +1000	[thread overview]
Message-ID: <20190905221054.GG1119@dread.disaster.area> (raw)
In-Reply-To: <20190905152644.GD2229799@magnolia>

On Thu, Sep 05, 2019 at 08:26:44AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 05, 2019 at 06:47:12PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > generic/530 on a machine with enough ram and a non-preemptible
> > kernel can run the AGI processing phase of log recovery enitrely out
> > of cache. This means it never blocks on locks, never waits for IO
> > and runs entirely through the unlinked lists until it either
> > completes or blocks and hangs because it has run out of log space.
> > 
> > It runs out of log space because the background CIL push is
> > scheduled but never runs. queue_work() queues the CIL work on the
> > current CPU that is busy, and the workqueue code will not run it on
> > any other CPU. Hence if the unlinked list processing never yields
> > the CPU voluntarily, the push work is delayed indefinitely. This
> > results in the CIL aggregating changes until all the log space is
> > consumed.
> > 
> > When the log recoveyr processing evenutally blocks, the CIL flushes
> > but because the last iclog isn't submitted for IO because it isn't
> > full, the CIL flush never completes and nothing ever moves the log
> > head forwards, or indeed inserts anything into the tail of the log,
> > and hence nothing is able to get the log moving again and recovery
> > hangs.
> > 
> > There are several problems here, but the two obvious ones from
> > the trace are that:
> > 	a) log recovery does not yield the CPU for over 4 seconds,
> > 	b) binding CIL pushes to a single CPU is a really bad idea.
> > 
> > This patch addresses just these two aspects of the problem, and are
> > suitable for backporting to work around any issues in older kernels.
> > The more fundamental problem of preventing the CIL from consuming
> > more than 50% of the log without committing will take more invasive
> > and complex work, so will be done as followup work.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log_recover.c | 1 +
> >  fs/xfs/xfs_super.c       | 3 ++-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index f05c6c99c4f3..c9665455431e 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -5080,6 +5080,7 @@ xlog_recover_process_iunlinks(
> >  			while (agino != NULLAGINO) {
> >  				agino = xlog_recover_process_one_iunlink(mp,
> >  							agno, agino, bucket);
> > +				cond_resched();
> 
> Funny, I encountered a similar problem in the deferred inactivation
> series where iunlinked inodes marked for inactivation pile up until we
> OOM or stall in the log.  I solved it by kicking the inactivation
> workqueue and going to sleep every ~1000 inodes.

If the workqueue had already been kicked, then yielding with
cond_resched() would probably be enough to avoid that.

I think I'm going to have a bit of a look at our use of workqueues -
I didn't realise that the default behaviour of the workqueues was
"cannot run work unless CPU is yeilded" - it kinda makes the "do
async work by workqueue" model somewhat problematic if the work
queued by a single CPU can only be run on that same CPU instead of
concurrently across all idle CPUs...

> >  			}
> >  		}
> >  		xfs_buf_rele(agibp);
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index f9450235533c..55a268997bde 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -818,7 +818,8 @@ xfs_init_mount_workqueues(
> >  		goto out_destroy_buf;
> >  
> >  	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
> > -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
> > +			WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_UNBOUND,
> 
> More stupid nits: spaces between the "|".

It's the same as the rest of the code in that function.

Fixed anyway.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-09-05 22:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05  8:47 [PATCH 1/8 v2] xfs: log race fixes and cleanups Dave Chinner
2019-09-05  8:47 ` [PATCH 1/8] xfs: push the AIL in xlog_grant_head_wake Dave Chinner
2019-09-05 15:18   ` Darrick J. Wong
2019-09-05 22:02     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 2/8] xfs: fix missed wakeup on l_flush_wait Dave Chinner
2019-09-05 15:21   ` Darrick J. Wong
2019-09-05  8:47 ` [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery Dave Chinner
2019-09-05 15:26   ` Darrick J. Wong
2019-09-05 22:10     ` Dave Chinner [this message]
2019-09-05  8:47 ` [PATCH 4/8] xfs: factor debug code out of xlog_state_do_callback() Dave Chinner
2019-09-05 15:30   ` Darrick J. Wong
2019-09-05 22:14     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 5/8] xfs: factor callbacks " Dave Chinner
2019-09-05 15:39   ` Darrick J. Wong
2019-09-05 22:17     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 6/8] xfs: factor iclog state processing " Dave Chinner
2019-09-05 15:45   ` Darrick J. Wong
2019-09-05  8:47 ` [PATCH 7/8] xfs: push iclog state cleaning into xlog_state_clean_log Dave Chinner
2019-09-05 15:48   ` Darrick J. Wong
2019-09-05 22:28     ` Dave Chinner
2019-09-05  8:47 ` [PATCH 8/8] xfs: push the grant head when the log head moves forward Dave Chinner
2019-09-05 16:00   ` Darrick J. Wong
2019-09-05 15:44 ` [PATCH 1/8 v2] xfs: log race fixes and cleanups Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2019-09-06  0:05 [PATCH0/8 v3] " Dave Chinner
2019-09-06  0:05 ` [PATCH 3/8] xfs: prevent CIL push holdoff in log recovery Dave Chinner
2019-09-06  0:15   ` Darrick J. Wong
2019-09-06  2:01     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905221054.GG1119@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.