All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: xfs@oss.sgi.com
Subject: Re: long hangs when deleting large directories (3.0-rc3)
Date: Wed, 22 Jun 2011 17:30:47 +1000	[thread overview]
Message-ID: <20110622073047.GT32466@dastard> (raw)
In-Reply-To: <20110622070647.GA1744@x4.trippels.de>

On Wed, Jun 22, 2011 at 09:06:47AM +0200, Markus Trippelsdorf wrote:
> On 2011.06.22 at 10:04 +1000, Dave Chinner wrote:
> > On Tue, Jun 21, 2011 at 08:57:01PM +0200, Markus Trippelsdorf wrote:
> > > On 2011.06.21 at 20:24 +0200, Markus Trippelsdorf wrote:
> > > > On 2011.06.21 at 10:02 +0200, Markus Trippelsdorf wrote:
> > > > > On 2011.06.21 at 14:25 +1000, Dave Chinner wrote:
> > > > > > That is, you really need to get a profile of the rm -rf process - or
> > > > > > whatever is consuming the CPU time - (e.g. via perf or ftrace)
> > > > > > across the hang to so we can narrow down the potential cause of the
> > > > > > latency. Speaking of which, latencytop might be helpful in
> > > > > > identifying where input is getting held up....
> > > > > 
> > > > > I've recorded a profile with "perf record -g /home/markus/rm_sync"
> > > > > ~ % cat rm_sync
> > > > > rm -fr /mnt/tmp/tmp/linux && sync
> > > > 
> > > > FWIW here are two links to svg time-charts produced by:
> > > > 
> > > > perf timechart record /home/markus/rm_sync
> > > > 
> > > > http://trippelsdorf.de/timechart1.svg
> > > > http://trippelsdorf.de/timechart2.svg
> > > > 
> > > 
> > > And this is what the mysterious kworker is doing during the sync.
> > > It's the one consuming most of the CPU time.
> > > 
> > >     39.96%      kworker/3:0  [kernel.kallsyms]                            0xffffffff811da9da k [k] xfs_trans_ail_update_bulk
> > >                 |
> > >                 --- xfs_trans_ail_update_bulk
> > >                     xfs_trans_committed_bulk
> > >                     xlog_cil_committed
> > >                     xlog_state_do_callback
> > >                     xlog_state_done_syncing
> > >                     xlog_iodone
> > >                     xfs_buf_iodone_work
> > >                     process_one_work
> > >                     worker_thread
> > >                     kthread
> > >                     kernel_thread_helper
> > 
> > So that is inserting items into the AIL at transaction completion.
> > That can consume lots of CPU time if the CIL checkpoints are being
> > flushed quickly enough. Given you are doing a rm -rf at this point
> > in time, I'd expect to see this trace present in the profile, though
> > maybe not at that extent.
> > 
> > I have seen this before but have never been able to it reproduce
> > reliably.  If checkpoints are completed "out of order" due to the
> > way the commit records are written into the iclogs. This can cause
> > extra CPU because the AIL insertion then has to skip over all the
> > items that were inserted out of order before splicing each group of
> > items into the AIL. I only rarely see this (maybe once every couple
> > of weeks of performance testing), so I'm not sure it's the problem
> > you are seeing.
> > 
> > Adding this debug to xfs_ail_splice() list walk will tell us if this is
> > happening and how many items it had to walk when you see a hang:
> > 
> >  	xfs_lsn_t       lsn)
> >  {
> >  	xfs_log_item_t  *next_lip;
> > +	int walked = 0;
> >  
> >  	/* If the list is empty, just insert the item.  */
> >  	if (list_empty(&ailp->xa_ail)) {
> >  		list_splice(list, &ailp->xa_ail);
> >  		return;
> >  	}
> >  
> >  	list_for_each_entry_reverse(next_lip, &ailp->xa_ail, li_ail) {
> >  		if (XFS_LSN_CMP(next_lip->li_lsn, lsn) <= 0)
> >  			break;
> > +		if (!walked++) {
> > +			xfs_warn(ailp->xa_mount,
> > +			 "ail: ooo splice, tail 0x%llx, item 0x%llx\n",
> > +				next_lip->li_lsn, lsn);
> > +		}
> >  	}
> > +	if (walked > 10) {
> > +		xfs_warn(ailp->xa_mount,
> > +			 "ail: ooo splice, walked %d items\n", walked);
> > +	}
> >  
> >  	ASSERT(&next_lip->li_ail == &ailp->xa_ail ||
> >  	       XFS_LSN_CMP(next_lip->li_lsn, lsn) <= 0);
> > 
> > That will at least tell us if this is the cause of your problem. If
> > it is, I think I know how to avoid most of the list walk overhead
> > fairly easily and that should avoid the need to change workqueue
> > configurations at all.
> 
> The kernel log is attached.

Ok, so that is the cause of the problem∵ THe 3 seconds of output
where it is nothing but:

Jun 22 08:53:09 x4 kernel: XFS (sdb1): ail: ooo splice, tail 0x12000156e7, item 0x12000156e6
Jun 22 08:53:09 x4 kernel: XFS (sdb1): ail: ooo splice, walked 15503 items      
.....
Jun 22 08:53:12 x4 kernel: XFS (sdb1): ail: ooo splice, tail 0x12000156e7, item 0x12000156e6
Jun 22 08:53:12 x4 kernel: XFS (sdb1): ail: ooo splice, walked 16945 items

Interesting is the LSN of the tail - it's only one sector further on
than the items being inserted. That's what I'd expect from a commit
record write race between two checkpoints. I'll have a deeper look
into whether this can be avoided later tonight and also whether I
can easily implement a "last insert cursor" easily so subsequent
inserts at the same LSN avoid the walk....

> > > The following patch fixes the problem for me.
.....
> > There are two different things you can try with the wq
> > initialisation that might help prevent the problem. Firstly, try
> > this:
> > 
> > -					WQ_MEM_RECLAIM | WQ_HIGHPRI, 1);
> > +					WQ_MEM_RECLAIM | WQ_HIGHPRI, 8);
> > 
> > To change the number of concurrent per-cpu work items that can be
> > processed on the work queue. I don't think that will fix the
> > inversion, but it may allow more concurrency which hides the
> > inversion.
> > 
> > The other thing you can try is:
> > 
> > -					WQ_MEM_RECLAIM | WQ_HIGHPRI, 1);
> > +					WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_CPU_INTENSIVE, 1);
> > 
> > Which marks the workqueue as one that will consume a lot of CPU and
> > hence it is scheduled differently and hence should avoid other
> > pending work from being starved. We use this WQ_CPU_INTENSIVE flag
> > on other XFS workqueues that are known to consume lots of CPU, so I
> > suspect this is the right thing to do here.
> 
> Yes, that was the next combination I've tested after WQ_UNBOUND and it
> solves the issue, too.

Just to be clear, which combination did you test? The
WQ_CPU_INTENSIVE one?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-06-22  7:30 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-18 14:19 long hangs when deleting large directories (3.0-rc3) Markus Trippelsdorf
2011-06-18 14:24 ` Christoph Hellwig
2011-06-18 14:37   ` Markus Trippelsdorf
2011-06-19  8:16     ` Markus Trippelsdorf
2011-06-19 22:24 ` Dave Chinner
2011-06-20  0:54   ` Markus Trippelsdorf
2011-06-20  1:34     ` Dave Chinner
2011-06-20  2:02       ` Markus Trippelsdorf
2011-06-20  2:36         ` Dave Chinner
2011-06-20  6:03           ` Markus Trippelsdorf
2011-06-20 11:13             ` Markus Trippelsdorf
2011-06-20 11:45               ` Michael Monnerie
2011-06-20 12:31                 ` Markus Trippelsdorf
2011-06-20 21:16                   ` Markus Trippelsdorf
2011-06-21  4:25               ` Dave Chinner
2011-06-21  8:02                 ` Markus Trippelsdorf
2011-06-21 18:24                   ` Markus Trippelsdorf
2011-06-21 18:57                     ` Markus Trippelsdorf
2011-06-21 21:22                       ` Markus Trippelsdorf
2011-06-22  0:04                       ` Dave Chinner
2011-06-22  7:06                         ` Markus Trippelsdorf
2011-06-22  7:30                           ` Dave Chinner [this message]
2011-06-22  7:40                             ` Markus Trippelsdorf
2011-06-29  4:31                             ` Dave Chinner
2011-06-29  6:19                               ` Markus Trippelsdorf
2011-06-29  7:24                                 ` Dave Chinner
2011-06-29  7:41                                   ` Markus Trippelsdorf
2011-06-29 12:10                                     ` Dave Chinner
2011-06-29 12:48                                       ` Markus Trippelsdorf
2011-06-29 15:08                                         ` Michael Weissenbacher
2011-06-29 23:53                                         ` Dave Chinner
2011-06-21 10:18                 ` Markus Trippelsdorf
2011-06-21 10:40                   ` Markus Trippelsdorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110622073047.GT32466@dastard \
    --to=david@fromorbit.com \
    --cc=markus@trippelsdorf.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.