public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Markus Trippelsdorf <markus@trippelsdorf.de>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: long hangs when deleting large directories (3.0-rc3)
Date: Wed, 29 Jun 2011 08:19:54 +0200	[thread overview]
Message-ID: <20110629061954.GA1711@x4.trippels.de> (raw)
In-Reply-To: <20110629043143.GA1026@dastard>

On 2011.06.29 at 14:31 +1000, Dave Chinner wrote:
> On Wed, Jun 22, 2011 at 05:30:47PM +1000, Dave Chinner wrote:
> > On Wed, Jun 22, 2011 at 09:06:47AM +0200, Markus Trippelsdorf wrote:
> > > On 2011.06.22 at 10:04 +1000, Dave Chinner wrote:
> > > > On Tue, Jun 21, 2011 at 08:57:01PM +0200, Markus Trippelsdorf wrote:
> > > > 
> > > > That will at least tell us if this is the cause of your problem. If
> > > > it is, I think I know how to avoid most of the list walk overhead
> > > > fairly easily and that should avoid the need to change workqueue
> > > > configurations at all.
> > > 
> > > The kernel log is attached.
> > 
> > Ok, so that is the cause of the problem∵ THe 3 seconds of output
> > where it is nothing but:
> > 
> > Jun 22 08:53:09 x4 kernel: XFS (sdb1): ail: ooo splice, tail 0x12000156e7, item 0x12000156e6
> > Jun 22 08:53:09 x4 kernel: XFS (sdb1): ail: ooo splice, walked 15503 items      
> > .....
> > Jun 22 08:53:12 x4 kernel: XFS (sdb1): ail: ooo splice, tail 0x12000156e7, item 0x12000156e6
> > Jun 22 08:53:12 x4 kernel: XFS (sdb1): ail: ooo splice, walked 16945 items
> > 
> > Interesting is the LSN of the tail - it's only one sector further on
> > than the items being inserted. That's what I'd expect from a commit
> > record write race between two checkpoints. I'll have a deeper look
> > into whether this can be avoided later tonight and also whether I
> > can easily implement a "last insert cursor" easily so subsequent
> > inserts at the same LSN avoid the walk....
> 
> Ok, so here's a patch that does just this. I should probably also do
> a little bit of cleanup on the cursor code as well, but this avoids
> the repeated walks of the AIL to find the insert position.
> 
> Can you try it without the WQ changes you made, Marcus, and see if
> the interactivity problems go away?

Sorry to be the bringer of bad news, but this made things much worse:

-------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage------ --dsk/sdc-- ---system-- ---load-avg--- --dsk/sdc--
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| int   csw | 1m   5m  15m |reads writs
  1   1  98   0   0   0:  0   1  99   0   0   0:  0   1  99   0   0   0:  0   1  99   0   0   0|   0     0 | 603   380 |0.66 0.55 0.28|   0     0 
  1   0  99   0   0   0:  1   0  99   0   0   0:  1  19  80   0   0   0:  0   0 100   0   0   0|   0     0 | 719   383 |0.66 0.55 0.28|   0     0 
  3   1  96   0   0   0:  3   1  96   0   0   0:  1  52  47   0   0   0:  0   0 100   0   0   0|   0  6464k|1847   919 |0.66 0.55 0.28|   0   202 
  2  13  85   0   0   0:  2   2  96   0   0   0:  1  56  43   0   0   0:  1  31  69   0   0   0|4096B  256k|1910  1280 |0.68 0.56 0.28|   1     8 
> 0   1  99   0   0   0:  0   0 100   0   0   0:  0   1  99   0   0   0:  0 100   0   0   0   0|   0     0 |1256   170 |0.68 0.56 0.28|   0     0 
> 0   1  99   0   0   0:  1   1  98   0   0   0:  1   0  99   0   0   0:  0  99   0   0   0   1|   0     0 |1395   229 |0.68 0.56 0.28|   0     0 
> 0   0 100   0   0   0:  0   0 100   0   0   0:  0   3  97   0   0   0:  0 100   0   0   0   0|   0   512B|1304   167 |0.68 0.56 0.28|   0     1 
> 1   1  98   0   0   0:  1   1  98   0   0   0:  0   0 100   0   0   0:  0  99   0   0   0   1|   0     0 |1211   146 |0.68 0.56 0.28|   0     0 
> 0   0 100   0   0   0:  0   0 100   0   0   0:  0   1  99   0   0   0:  0  97   0   0   0   3|   0     0 |1270   149 |0.87 0.60 0.30|   0     0 
  5   2  65  29   0   0:  2   3  95   0   0   0:  1   0  99   0   0   0:  2  24  72   0   0   1|   0  8866k|2654  2398 |0.87 0.60 0.30|   0   496 
  6   2  25  67   0   0:  3   1  59  37   0   0:  0   0 100   0   0   0:  4   4  92   0   0   0|   0  4554k|2224  2494 |0.87 0.60 0.30|   0   399 
  1   1  98   0   0   0:  0   0  83  17   0   0:  1   3  96   0   0   0:  0   1  99   0   0   0|   0  2270k|1079  1030 |0.87 0.60 0.30|   0   200 
  1   1  98   0   0   0:  1   1  98   0   0   0:  0   1  99   0   0   0:  1   0  99   0   0   0|   0  9216B| 713   567 |0.87 0.60 0.30|   0     2 
  0   0 100   0   0   0:  1   1  98   0   0   0:  0   0 100   0   0   0:  0   1  99   0   0   0|   0     0 | 492   386 |0.80 0.59 0.30|   0     0 

As you can see in the table above (resolution 1sec) the hang is now
5-6 seconds long, instead of the 1-3 seconds seen before.

-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-06-29  6:19 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-18 14:19 long hangs when deleting large directories (3.0-rc3) Markus Trippelsdorf
2011-06-18 14:24 ` Christoph Hellwig
2011-06-18 14:37   ` Markus Trippelsdorf
2011-06-19  8:16     ` Markus Trippelsdorf
2011-06-19 22:24 ` Dave Chinner
2011-06-20  0:54   ` Markus Trippelsdorf
2011-06-20  1:34     ` Dave Chinner
2011-06-20  2:02       ` Markus Trippelsdorf
2011-06-20  2:36         ` Dave Chinner
2011-06-20  6:03           ` Markus Trippelsdorf
2011-06-20 11:13             ` Markus Trippelsdorf
2011-06-20 11:45               ` Michael Monnerie
2011-06-20 12:31                 ` Markus Trippelsdorf
2011-06-20 21:16                   ` Markus Trippelsdorf
2011-06-21  4:25               ` Dave Chinner
2011-06-21  8:02                 ` Markus Trippelsdorf
2011-06-21 18:24                   ` Markus Trippelsdorf
2011-06-21 18:57                     ` Markus Trippelsdorf
2011-06-21 21:22                       ` Markus Trippelsdorf
2011-06-22  0:04                       ` Dave Chinner
2011-06-22  7:06                         ` Markus Trippelsdorf
2011-06-22  7:30                           ` Dave Chinner
2011-06-22  7:40                             ` Markus Trippelsdorf
2011-06-29  4:31                             ` Dave Chinner
2011-06-29  6:19                               ` Markus Trippelsdorf [this message]
2011-06-29  7:24                                 ` Dave Chinner
2011-06-29  7:41                                   ` Markus Trippelsdorf
2011-06-29 12:10                                     ` Dave Chinner
2011-06-29 12:48                                       ` Markus Trippelsdorf
2011-06-29 15:08                                         ` Michael Weissenbacher
2011-06-29 23:53                                         ` Dave Chinner
2011-06-21 10:18                 ` Markus Trippelsdorf
2011-06-21 10:40                   ` Markus Trippelsdorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110629061954.GA1711@x4.trippels.de \
    --to=markus@trippelsdorf.de \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox