public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Error during bulk removal of files
@ 2013-05-10 16:17 Subranshu Patel
  2013-05-10 16:40 ` Mark Tinguely
  0 siblings, 1 reply; 6+ messages in thread
From: Subranshu Patel @ 2013-05-10 16:17 UTC (permalink / raw)
  To: xfs

I am using mdtest benchmarking tool to remove 7 million files of 1KB.
After some time, the mdtest process gets blocked.

Dmesg shows the following

--------------------->8------------

May 10 18:49:00 machine1 kernel: INFO: task xfssyncd/dm-3:8156 blocked
for more than 120 seconds.

May 10 18:49:00 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

May 10 18:49:00 machine1 kernel: xfssyncd/dm-3 D 0000000000000002 0
8156 2 0x00000080

May 10 18:49:00 machine1 kernel: ffff88185932bc70 0000000000000046
ffff88185e58a400 ffff8800282566e8

May 10 18:49:00 machine1 kernel: 00000000000176f6 ffff88185fb44080
ffff880c6100aaa0 ffffffff8160b400

May 10 18:49:00 machine1 kernel: ffff88185fb44638 ffff88185932bfd8
000000000000fb88 ffff88185fb44638

May 10 18:49:00 machine1 kernel: Call Trace:

May 10 18:49:00 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20

May 10 18:49:00 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20

May 10 18:49:00 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa055f0a3>]
xfs_fs_log_dummy+0x43/0x90 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa056c9c4>] ?
xfs_log_need_covered+0x94/0xd0 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa0591831>]
xfs_sync_worker+0x81/0x90 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]

May 10 18:49:00 machine1 kernel: [<ffffffff81091d66>] kthread+0x96/0xa0

May 10 18:49:00 machine1 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20

May 10 18:49:00 machine1 kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0

May 10 18:49:00 machine1 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20

--------------------->8------------

The XFS filesystem size is 14TB (LVM), kernel version 2.6.32

Machine has a memory of 96GB

Please let me know if this is a known issue and have been already
fixed in higher kernel version.

--
Subranshu

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error during bulk removal of files
  2013-05-10 16:17 Error during bulk removal of files Subranshu Patel
@ 2013-05-10 16:40 ` Mark Tinguely
  2013-05-11  1:15   ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Tinguely @ 2013-05-10 16:40 UTC (permalink / raw)
  To: Subranshu Patel; +Cc: xfs

On 05/10/13 11:17, Subranshu Patel wrote:

<reformated> Call Trace:

  [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
  [<ffffffff81060250>] ? default_wake_function+0x0/0x20
  [<ffffffffa056e08b>] xlog_grant_log_space+0x3ab/0x520 [xfs]
  [<ffffffffa0582d1a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
  [<ffffffff8127f3ac>] ? random32+0x1c/0x20
  [<ffffffffa057d201>] ? xfs_trans_ail_push+0x21/0x80 [xfs]
  [<ffffffffa056e2e6>] xfs_log_reserve+0xe6/0x140 [xfs]
  [<ffffffffa057bad0>] xfs_trans_reserve+0xa0/0x210 [xfs]
  [<ffffffffa055f0a3>] xfs_fs_log_dummy+0x43/0x90 [xfs]
  [<ffffffffa056c9c4>] ? xfs_log_need_covered+0x94/0xd0 [xfs]
  [<ffffffffa0591831>] xfs_sync_worker+0x81/0x90 [xfs]
  [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
  [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
  [<ffffffff81091d66>] kthread+0x96/0xa0
  [<ffffffff8100c14a>] child_rip+0xa/0x20
  [<ffffffff81091cd0>] ? kthread+0x0/0xa0
  [<ffffffff8100c140>] ? child_rip+0x0/0x20

Ran out of log grant space trying to write out the dummy record in the 
sync worker.

Something is not getting moved off the AIL. This could be a symptom; for 
example a lock not being released could lead to depleted log space. What 
else is happening on the filesystem?

>
> --------------------->8------------
>
> The XFS filesystem size is 14TB (LVM), kernel version 2.6.32
>
> Machine has a memory of 96GB
>
> Please let me know if this is a known issue and have been already
> fixed in higher kernel version.
>
> --
> Subranshu


--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error during bulk removal of files
  2013-05-10 16:40 ` Mark Tinguely
@ 2013-05-11  1:15   ` Dave Chinner
  2013-05-13 16:15     ` Subranshu Patel
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-05-11  1:15 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Subranshu Patel, xfs

On Fri, May 10, 2013 at 11:40:16AM -0500, Mark Tinguely wrote:
> On 05/10/13 11:17, Subranshu Patel wrote:
> 
> <reformated> Call Trace:
> 
>  [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
>  [<ffffffff81060250>] ? default_wake_function+0x0/0x20
>  [<ffffffffa056e08b>] xlog_grant_log_space+0x3ab/0x520 [xfs]
>  [<ffffffffa0582d1a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
>  [<ffffffff8127f3ac>] ? random32+0x1c/0x20
>  [<ffffffffa057d201>] ? xfs_trans_ail_push+0x21/0x80 [xfs]
>  [<ffffffffa056e2e6>] xfs_log_reserve+0xe6/0x140 [xfs]
>  [<ffffffffa057bad0>] xfs_trans_reserve+0xa0/0x210 [xfs]
>  [<ffffffffa055f0a3>] xfs_fs_log_dummy+0x43/0x90 [xfs]
>  [<ffffffffa056c9c4>] ? xfs_log_need_covered+0x94/0xd0 [xfs]
>  [<ffffffffa0591831>] xfs_sync_worker+0x81/0x90 [xfs]
>  [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
>  [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
>  [<ffffffff81091d66>] kthread+0x96/0xa0
>  [<ffffffff8100c14a>] child_rip+0xa/0x20
>  [<ffffffff81091cd0>] ? kthread+0x0/0xa0
>  [<ffffffff8100c140>] ? child_rip+0x0/0x20
> 
> Ran out of log grant space trying to write out the dummy record in
> the sync worker.
> 
> Something is not getting moved off the AIL. This could be a symptom;
> for example a lock not being released could lead to depleted log
> space. What else is happening on the filesystem?
> 
> >
> >--------------------->8------------
> >
> >The XFS filesystem size is 14TB (LVM), kernel version 2.6.32

What 2.6.32 kernel? Output of uname -a?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error during bulk removal of files
  2013-05-11  1:15   ` Dave Chinner
@ 2013-05-13 16:15     ` Subranshu Patel
  2013-05-13 17:55       ` Mark Tinguely
  2013-05-14  0:36       ` Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Subranshu Patel @ 2013-05-13 16:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mark Tinguely, xfs

> What 2.6.32 kernel? Output of uname -a?

 uname -a
Linux machine1 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
2012 x86_64 x86_64 x86_64 GNU/Linux

 # cat /etc/issue
Red Hat Enterprise Linux Server release 6.3 (Santiago)

I am using mdtest program to delete 7 million files (of 1KB size).
Total 128 concurrent mdtest process are there. Each process removes
approx 54000 files ( 7 million / 128).
mdtest runs for a while, after that, it gets blocked and does not do
anything. However the machine is still responsive.

dmesg:

May 13 16:27:10 machine1 kernel: INFO: task xfssyncd/dm-3:6838 blocked
for more than 120 seconds.
May 13 16:27:10 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 16:27:10 machine1 kernel: xfssyncd/dm-3 D 0000000000000010
0  6838      2 0x00000080
May 13 16:27:10 machine1 kernel: ffff88185d571c70 0000000000000046
ffff88002830fec0 0000000000016680
May 13 16:27:10 machine1 kernel: 0000000000016680 ffff88185975e040
ffff880c61170ae0 ffffffff8160b400
May 13 16:27:10 machine1 kernel: ffff88185975e5f8 ffff88185d571fd8
000000000000fb88 ffff88185975e5f8
May 13 16:27:10 machine1 kernel: Call Trace:
May 13 16:27:10 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa055f0a3>]
xfs_fs_log_dummy+0x43/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056c9c4>] ?
xfs_log_need_covered+0x94/0xd0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0591831>]
xfs_sync_worker+0x81/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81091d66>] kthread+0x96/0xa0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
May 13 16:27:10 machine1 kernel: INFO: task mdtest:7397 blocked for
more than 120 seconds.
May 13 16:27:10 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 16:27:10 machine1 kernel: mdtest        D 0000000000000019
0  7397   7395 0x00000080
May 13 16:27:10 machine1 kernel: ffff880c3ce13bf8 0000000000000082
0000000000000000 ffffffff81060262
May 13 16:27:10 machine1 kernel: ffff880c3ce13ba8 ffffffff8104e309
ffff880c3ce13ba8 0000000300000000
May 13 16:27:10 machine1 kernel: ffff880c5d027af8 ffff880c3ce13fd8
000000000000fb88 ffff880c5d027af8
May 13 16:27:10 machine1 kernel: Call Trace:
May 13 16:27:10 machine1 kernel: [<ffffffff81060262>] ?
default_wake_function+0x12/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff8104e309>] ?
__wake_up_common+0x59/0x90
May 13 16:27:10 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa058a720>] ?
xfs_tosspages+0x20/0x30 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa05810b4>]
xfs_inactive+0x2e4/0x460 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff814fed1e>] ? mutex_lock+0x1e/0x50
May 13 16:27:10 machine1 kernel: [<ffffffffa058e790>]
xfs_fs_clear_inode+0xa0/0xd0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81195adc>] clear_inode+0xac/0x140
May 13 16:27:10 machine1 kernel: [<ffffffff81196296>]
generic_delete_inode+0x196/0x1d0
May 13 16:27:10 machine1 kernel: [<ffffffff81196335>]
generic_drop_inode+0x65/0x80
May 13 16:27:10 machine1 kernel: [<ffffffff81195182>] iput+0x62/0x70
May 13 16:27:10 machine1 kernel: [<ffffffff8118b0d2>] do_unlinkat+0x112/0x1c0
May 13 16:27:10 machine1 kernel: [<ffffffff810d69e2>] ?
audit_syscall_entry+0x272/0x2a0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c5b5>] ?
math_state_restore+0x45/0x60
May 13 16:27:10 machine1 kernel: [<ffffffff8118b196>] sys_unlink+0x16/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b
May 13 16:27:10 machine1 kernel: INFO: task mdtest:7398 blocked for
more than 120 seconds.
May 13 16:27:10 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 16:27:10 machine1 kernel: mdtest        D 0000000000000008
0  7398   7395 0x00000080
May 13 16:27:10 machine1 kernel: ffff880b79bcbbf8 0000000000000082
ffff880b79bcbb88 ffffffffa0587633
May 13 16:27:10 machine1 kernel: 0000000000000000 0000000000014001
0000000000000010 ffff880b79bcbc90
May 13 16:27:10 machine1 kernel: ffff880c5e6fbaf8 ffff880b79bcbfd8
000000000000fb88 ffff880c5e6fbaf8
May 13 16:27:10 machine1 kernel: Call Trace:
May 13 16:27:10 machine1 kernel: [<ffffffffa0587633>] ?
xfs_buf_get+0x103/0x1a0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bdbf>] ?
xfs_trans_alloc+0x9f/0xb0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057fb93>]
xfs_remove+0x113/0x3a0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81186fd3>] ?
generic_permission+0x23/0xb0
May 13 16:27:10 machine1 kernel: [<ffffffffa058d6c8>]
xfs_vn_unlink+0x48/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81188c0f>] vfs_unlink+0x9f/0xe0
May 13 16:27:10 machine1 kernel: [<ffffffff8118795a>] ? lookup_hash+0x3a/0x50
May 13 16:27:10 machine1 kernel: [<ffffffff8118b143>] do_unlinkat+0x183/0x1c0
May 13 16:27:10 machine1 kernel: [<ffffffff810d69e2>] ?
audit_syscall_entry+0x272/0x2a0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c5b5>] ?
math_state_restore+0x45/0x60
May 13 16:27:10 machine1 kernel: [<ffffffff8118b196>] sys_unlink+0x16/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b

--
Subranshu

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error during bulk removal of files
  2013-05-13 16:15     ` Subranshu Patel
@ 2013-05-13 17:55       ` Mark Tinguely
  2013-05-14  0:36       ` Dave Chinner
  1 sibling, 0 replies; 6+ messages in thread
From: Mark Tinguely @ 2013-05-13 17:55 UTC (permalink / raw)
  To: Subranshu Patel; +Cc: xfs

On 05/13/13 11:15, Subranshu Patel wrote:
>> What 2.6.32 kernel? Output of uname -a?
>
>   uname -a
> Linux machine1 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>
>   # cat /etc/issue
> Red Hat Enterprise Linux Server release 6.3 (Santiago)
>
> I am using mdtest program to delete 7 million files (of 1KB size).
> Total 128 concurrent mdtest process are there. Each process removes
> approx 54000 files ( 7 million / 128).
> mdtest runs for a while, after that, it gets blocked and does not do
> anything. However the machine is still responsive.
>

Is the delay log enabled? If so have you tried the command with it disabled?

Are you willing to take a crash of the hang? Or want to look at the 
entries on the AIL?

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Error during bulk removal of files
  2013-05-13 16:15     ` Subranshu Patel
  2013-05-13 17:55       ` Mark Tinguely
@ 2013-05-14  0:36       ` Dave Chinner
  1 sibling, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2013-05-14  0:36 UTC (permalink / raw)
  To: Subranshu Patel; +Cc: Mark Tinguely, xfs

On Mon, May 13, 2013 at 09:45:30PM +0530, Subranshu Patel wrote:
> > What 2.6.32 kernel? Output of uname -a?
> 
>  uname -a
> Linux machine1 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
> 2012 x86_64 x86_64 x86_64 GNU/Linux
> 
>  # cat /etc/issue
> Red Hat Enterprise Linux Server release 6.3 (Santiago)

Please report this to your Redhat support representative. Indeed, I
suspect that running the latest update kernel will fix your problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-14  0:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-10 16:17 Error during bulk removal of files Subranshu Patel
2013-05-10 16:40 ` Mark Tinguely
2013-05-11  1:15   ` Dave Chinner
2013-05-13 16:15     ` Subranshu Patel
2013-05-13 17:55       ` Mark Tinguely
2013-05-14  0:36       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox