public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs hang or slowness while removing files
@ 2015-12-02 11:07 Avi Kivity
  2015-12-02 21:09 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Avi Kivity @ 2015-12-02 11:07 UTC (permalink / raw)
  To: xfs

Removing a directory with ~900 32MB files, we saw this:

[ 5645.684464] INFO: task xfsaild/md0:12247 blocked for more than 120 
seconds.
[ 5645.686488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 5645.687713] xfsaild/md0     D ffff88103f9d3680     0 12247    2 
0x00000080
[ 5645.687729]  ffff8810136f7d40 0000000000000046 ffff882026d82220 
ffff8810136f7fd8
[ 5645.687732]  ffff8810136f7fd8 ffff8810136f7fd8 ffff882026d82220 
ffff882026d82220
[ 5645.687734]  ffff88103f9d44c0 0000000000000001 0000000000000000 
ffff8820285aa928
[ 5645.687737] Call Trace:
[ 5645.687747]  [<ffffffff816098d9>] schedule+0x29/0x70
[ 5645.687768]  [<ffffffffa06cd880>] _xfs_log_force+0x230/0x290 [xfs]
[ 5645.687773]  [<ffffffff810a9510>] ? wake_up_state+0x20/0x20
[ 5645.687796]  [<ffffffffa06cd906>] xfs_log_force+0x26/0x80 [xfs]
[ 5645.687808]  [<ffffffffa06d2380>] ? 
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 5645.687818]  [<ffffffffa06d24d1>] xfsaild+0x151/0x5e0 [xfs]
[ 5645.687828]  [<ffffffffa06d2380>] ? 
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 5645.687831]  [<ffffffff8109727f>] kthread+0xcf/0xe0
[ 5645.687834]  [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
[ 5645.687837]  [<ffffffff81614358>] ret_from_fork+0x58/0x90
[ 5645.687852]  [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140

'rm' did not complete, but was killable.  Nothing else was running on 
the system at the time.

The filesystem was mounted with the discard option set, but since that 
is discouraged, we'll retry without it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs hang or slowness while removing files
  2015-12-02 11:07 xfs hang or slowness while removing files Avi Kivity
@ 2015-12-02 21:09 ` Dave Chinner
  2015-12-03 12:02   ` Avi Kivity
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2015-12-02 21:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: xfs

On Wed, Dec 02, 2015 at 01:07:56PM +0200, Avi Kivity wrote:
> Removing a directory with ~900 32MB files, we saw this:
> 
> [ 5645.684464] INFO: task xfsaild/md0:12247 blocked for more than
> 120 seconds.
> [ 5645.686488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 5645.687713] xfsaild/md0     D ffff88103f9d3680     0 12247    2
> 0x00000080
> [ 5645.687729]  ffff8810136f7d40 0000000000000046 ffff882026d82220
> ffff8810136f7fd8
> [ 5645.687732]  ffff8810136f7fd8 ffff8810136f7fd8 ffff882026d82220
> ffff882026d82220
> [ 5645.687734]  ffff88103f9d44c0 0000000000000001 0000000000000000
> ffff8820285aa928
> [ 5645.687737] Call Trace:
> [ 5645.687747]  [<ffffffff816098d9>] schedule+0x29/0x70
> [ 5645.687768]  [<ffffffffa06cd880>] _xfs_log_force+0x230/0x290 [xfs]
> [ 5645.687773]  [<ffffffff810a9510>] ? wake_up_state+0x20/0x20
> [ 5645.687796]  [<ffffffffa06cd906>] xfs_log_force+0x26/0x80 [xfs]
> [ 5645.687808]  [<ffffffffa06d2380>] ?
> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> [ 5645.687818]  [<ffffffffa06d24d1>] xfsaild+0x151/0x5e0 [xfs]
> [ 5645.687828]  [<ffffffffa06d2380>] ?
> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> [ 5645.687831]  [<ffffffff8109727f>] kthread+0xcf/0xe0
> [ 5645.687834]  [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
> [ 5645.687837]  [<ffffffff81614358>] ret_from_fork+0x58/0x90
> [ 5645.687852]  [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
> 
> 'rm' did not complete, but was killable.  Nothing else was running
> on the system at the time.

Which means the filesystem was not hung, nor was rm blocked in XFS.
That implies the directory/inode reads that rm does were running
really slowly. Something else going on here.

> The filesystem was mounted with the discard option set, but since
> that is discouraged, we'll retry without it.

Ah, yes, that could cause exactly these symptoms.

I'd guess you are using storage that has unqueued TRIM operations
(i.e. SATA 3.0 hardware somewhere in your storage path, as queued
TRIM only came along with SATA 3.1 and AFAIA there's not a lot of
3.1 hardware out there yet) which means while discards are
being issued all other IO tanks and goes really slow.

We have seen individual TRIM requests on some SSDs take tens of
milliseconds to complete, regardless of their size. Hence if you
have one of these devices and you're running thousands of TRIM
commands across ~30GB of data being freed, then you'd see things
like rm being really slow on the read side and log forces waiting an
awful long time for journal IO completion processing to take
place...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs hang or slowness while removing files
  2015-12-02 21:09 ` Dave Chinner
@ 2015-12-03 12:02   ` Avi Kivity
  0 siblings, 0 replies; 3+ messages in thread
From: Avi Kivity @ 2015-12-03 12:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs



On 12/02/2015 11:09 PM, Dave Chinner wrote:
> On Wed, Dec 02, 2015 at 01:07:56PM +0200, Avi Kivity wrote:
>> Removing a directory with ~900 32MB files, we saw this:
>>
>> [ 5645.684464] INFO: task xfsaild/md0:12247 blocked for more than
>> 120 seconds.
>> [ 5645.686488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 5645.687713] xfsaild/md0     D ffff88103f9d3680     0 12247    2
>> 0x00000080
>> [ 5645.687729]  ffff8810136f7d40 0000000000000046 ffff882026d82220
>> ffff8810136f7fd8
>> [ 5645.687732]  ffff8810136f7fd8 ffff8810136f7fd8 ffff882026d82220
>> ffff882026d82220
>> [ 5645.687734]  ffff88103f9d44c0 0000000000000001 0000000000000000
>> ffff8820285aa928
>> [ 5645.687737] Call Trace:
>> [ 5645.687747]  [<ffffffff816098d9>] schedule+0x29/0x70
>> [ 5645.687768]  [<ffffffffa06cd880>] _xfs_log_force+0x230/0x290 [xfs]
>> [ 5645.687773]  [<ffffffff810a9510>] ? wake_up_state+0x20/0x20
>> [ 5645.687796]  [<ffffffffa06cd906>] xfs_log_force+0x26/0x80 [xfs]
>> [ 5645.687808]  [<ffffffffa06d2380>] ?
>> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> [ 5645.687818]  [<ffffffffa06d24d1>] xfsaild+0x151/0x5e0 [xfs]
>> [ 5645.687828]  [<ffffffffa06d2380>] ?
>> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> [ 5645.687831]  [<ffffffff8109727f>] kthread+0xcf/0xe0
>> [ 5645.687834]  [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
>> [ 5645.687837]  [<ffffffff81614358>] ret_from_fork+0x58/0x90
>> [ 5645.687852]  [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
>>
>> 'rm' did not complete, but was killable.  Nothing else was running
>> on the system at the time.
> Which means the filesystem was not hung, nor was rm blocked in XFS.
> That implies the directory/inode reads that rm does were running
> really slowly. Something else going on here.
>
>> The filesystem was mounted with the discard option set, but since
>> that is discouraged, we'll retry without it.
> Ah, yes, that could cause exactly these symptoms.
>
> I'd guess you are using storage that has unqueued TRIM operations
> (i.e. SATA 3.0 hardware somewhere in your storage path, as queued
> TRIM only came along with SATA 3.1 and AFAIA there's not a lot of
> 3.1 hardware out there yet) which means while discards are
> being issued all other IO tanks and goes really slow.

It's bare-metal cloud hardware so I don't immediately know (I don't 
control the machine).  I could find out -

> We have seen individual TRIM requests on some SSDs take tens of
> milliseconds to complete, regardless of their size. Hence if you
> have one of these devices and you're running thousands of TRIM
> commands across ~30GB of data being freed, then you'd see things
> like rm being really slow on the read side and log forces waiting an
> awful long time for journal IO completion processing to take
> place...
>

- but it's probably better to just drop discard and see if it happens again.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-03 12:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-02 11:07 xfs hang or slowness while removing files Avi Kivity
2015-12-02 21:09 ` Dave Chinner
2015-12-03 12:02   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox