* Error during bulk removal of files
@ 2013-05-10 16:17 Subranshu Patel
2013-05-10 16:40 ` Mark Tinguely
0 siblings, 1 reply; 6+ messages in thread
From: Subranshu Patel @ 2013-05-10 16:17 UTC (permalink / raw)
To: xfs
I am using mdtest benchmarking tool to remove 7 million files of 1KB.
After some time, the mdtest process gets blocked.
Dmesg shows the following
--------------------->8------------
May 10 18:49:00 machine1 kernel: INFO: task xfssyncd/dm-3:8156 blocked
for more than 120 seconds.
May 10 18:49:00 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 10 18:49:00 machine1 kernel: xfssyncd/dm-3 D 0000000000000002 0
8156 2 0x00000080
May 10 18:49:00 machine1 kernel: ffff88185932bc70 0000000000000046
ffff88185e58a400 ffff8800282566e8
May 10 18:49:00 machine1 kernel: 00000000000176f6 ffff88185fb44080
ffff880c6100aaa0 ffffffff8160b400
May 10 18:49:00 machine1 kernel: ffff88185fb44638 ffff88185932bfd8
000000000000fb88 ffff88185fb44638
May 10 18:49:00 machine1 kernel: Call Trace:
May 10 18:49:00 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 10 18:49:00 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 10 18:49:00 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa055f0a3>]
xfs_fs_log_dummy+0x43/0x90 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa056c9c4>] ?
xfs_log_need_covered+0x94/0xd0 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa0591831>]
xfs_sync_worker+0x81/0x90 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
May 10 18:49:00 machine1 kernel: [<ffffffff81091d66>] kthread+0x96/0xa0
May 10 18:49:00 machine1 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
May 10 18:49:00 machine1 kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
May 10 18:49:00 machine1 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
--------------------->8------------
The XFS filesystem size is 14TB (LVM), kernel version 2.6.32
Machine has a memory of 96GB
Please let me know if this is a known issue and have been already
fixed in higher kernel version.
--
Subranshu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Error during bulk removal of files
2013-05-10 16:17 Error during bulk removal of files Subranshu Patel
@ 2013-05-10 16:40 ` Mark Tinguely
2013-05-11 1:15 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Mark Tinguely @ 2013-05-10 16:40 UTC (permalink / raw)
To: Subranshu Patel; +Cc: xfs
On 05/10/13 11:17, Subranshu Patel wrote:
<reformated> Call Trace:
[<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
[<ffffffff81060250>] ? default_wake_function+0x0/0x20
[<ffffffffa056e08b>] xlog_grant_log_space+0x3ab/0x520 [xfs]
[<ffffffffa0582d1a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
[<ffffffff8127f3ac>] ? random32+0x1c/0x20
[<ffffffffa057d201>] ? xfs_trans_ail_push+0x21/0x80 [xfs]
[<ffffffffa056e2e6>] xfs_log_reserve+0xe6/0x140 [xfs]
[<ffffffffa057bad0>] xfs_trans_reserve+0xa0/0x210 [xfs]
[<ffffffffa055f0a3>] xfs_fs_log_dummy+0x43/0x90 [xfs]
[<ffffffffa056c9c4>] ? xfs_log_need_covered+0x94/0xd0 [xfs]
[<ffffffffa0591831>] xfs_sync_worker+0x81/0x90 [xfs]
[<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
[<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
[<ffffffff81091d66>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff81091cd0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Ran out of log grant space trying to write out the dummy record in the
sync worker.
Something is not getting moved off the AIL. This could be a symptom; for
example a lock not being released could lead to depleted log space. What
else is happening on the filesystem?
>
> --------------------->8------------
>
> The XFS filesystem size is 14TB (LVM), kernel version 2.6.32
>
> Machine has a memory of 96GB
>
> Please let me know if this is a known issue and have been already
> fixed in higher kernel version.
>
> --
> Subranshu
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Error during bulk removal of files
2013-05-10 16:40 ` Mark Tinguely
@ 2013-05-11 1:15 ` Dave Chinner
2013-05-13 16:15 ` Subranshu Patel
0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-05-11 1:15 UTC (permalink / raw)
To: Mark Tinguely; +Cc: Subranshu Patel, xfs
On Fri, May 10, 2013 at 11:40:16AM -0500, Mark Tinguely wrote:
> On 05/10/13 11:17, Subranshu Patel wrote:
>
> <reformated> Call Trace:
>
> [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
> [<ffffffff81060250>] ? default_wake_function+0x0/0x20
> [<ffffffffa056e08b>] xlog_grant_log_space+0x3ab/0x520 [xfs]
> [<ffffffffa0582d1a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
> [<ffffffff8127f3ac>] ? random32+0x1c/0x20
> [<ffffffffa057d201>] ? xfs_trans_ail_push+0x21/0x80 [xfs]
> [<ffffffffa056e2e6>] xfs_log_reserve+0xe6/0x140 [xfs]
> [<ffffffffa057bad0>] xfs_trans_reserve+0xa0/0x210 [xfs]
> [<ffffffffa055f0a3>] xfs_fs_log_dummy+0x43/0x90 [xfs]
> [<ffffffffa056c9c4>] ? xfs_log_need_covered+0x94/0xd0 [xfs]
> [<ffffffffa0591831>] xfs_sync_worker+0x81/0x90 [xfs]
> [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
> [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
> [<ffffffff81091d66>] kthread+0x96/0xa0
> [<ffffffff8100c14a>] child_rip+0xa/0x20
> [<ffffffff81091cd0>] ? kthread+0x0/0xa0
> [<ffffffff8100c140>] ? child_rip+0x0/0x20
>
> Ran out of log grant space trying to write out the dummy record in
> the sync worker.
>
> Something is not getting moved off the AIL. This could be a symptom;
> for example a lock not being released could lead to depleted log
> space. What else is happening on the filesystem?
>
> >
> >--------------------->8------------
> >
> >The XFS filesystem size is 14TB (LVM), kernel version 2.6.32
What 2.6.32 kernel? Output of uname -a?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Error during bulk removal of files
2013-05-11 1:15 ` Dave Chinner
@ 2013-05-13 16:15 ` Subranshu Patel
2013-05-13 17:55 ` Mark Tinguely
2013-05-14 0:36 ` Dave Chinner
0 siblings, 2 replies; 6+ messages in thread
From: Subranshu Patel @ 2013-05-13 16:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: Mark Tinguely, xfs
> What 2.6.32 kernel? Output of uname -a?
uname -a
Linux machine1 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
2012 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/issue
Red Hat Enterprise Linux Server release 6.3 (Santiago)
I am using mdtest program to delete 7 million files (of 1KB size).
Total 128 concurrent mdtest process are there. Each process removes
approx 54000 files ( 7 million / 128).
mdtest runs for a while, after that, it gets blocked and does not do
anything. However the machine is still responsive.
dmesg:
May 13 16:27:10 machine1 kernel: INFO: task xfssyncd/dm-3:6838 blocked
for more than 120 seconds.
May 13 16:27:10 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 16:27:10 machine1 kernel: xfssyncd/dm-3 D 0000000000000010
0 6838 2 0x00000080
May 13 16:27:10 machine1 kernel: ffff88185d571c70 0000000000000046
ffff88002830fec0 0000000000016680
May 13 16:27:10 machine1 kernel: 0000000000016680 ffff88185975e040
ffff880c61170ae0 ffffffff8160b400
May 13 16:27:10 machine1 kernel: ffff88185975e5f8 ffff88185d571fd8
000000000000fb88 ffff88185975e5f8
May 13 16:27:10 machine1 kernel: Call Trace:
May 13 16:27:10 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa055f0a3>]
xfs_fs_log_dummy+0x43/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056c9c4>] ?
xfs_log_need_covered+0x94/0xd0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0591831>]
xfs_sync_worker+0x81/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa059171e>] xfssyncd+0x17e/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa05915a0>] ? xfssyncd+0x0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81091d66>] kthread+0x96/0xa0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
May 13 16:27:10 machine1 kernel: INFO: task mdtest:7397 blocked for
more than 120 seconds.
May 13 16:27:10 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 16:27:10 machine1 kernel: mdtest D 0000000000000019
0 7397 7395 0x00000080
May 13 16:27:10 machine1 kernel: ffff880c3ce13bf8 0000000000000082
0000000000000000 ffffffff81060262
May 13 16:27:10 machine1 kernel: ffff880c3ce13ba8 ffffffff8104e309
ffff880c3ce13ba8 0000000300000000
May 13 16:27:10 machine1 kernel: ffff880c5d027af8 ffff880c3ce13fd8
000000000000fb88 ffff880c5d027af8
May 13 16:27:10 machine1 kernel: Call Trace:
May 13 16:27:10 machine1 kernel: [<ffffffff81060262>] ?
default_wake_function+0x12/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff8104e309>] ?
__wake_up_common+0x59/0x90
May 13 16:27:10 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa058a720>] ?
xfs_tosspages+0x20/0x30 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa05810b4>]
xfs_inactive+0x2e4/0x460 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff814fed1e>] ? mutex_lock+0x1e/0x50
May 13 16:27:10 machine1 kernel: [<ffffffffa058e790>]
xfs_fs_clear_inode+0xa0/0xd0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81195adc>] clear_inode+0xac/0x140
May 13 16:27:10 machine1 kernel: [<ffffffff81196296>]
generic_delete_inode+0x196/0x1d0
May 13 16:27:10 machine1 kernel: [<ffffffff81196335>]
generic_drop_inode+0x65/0x80
May 13 16:27:10 machine1 kernel: [<ffffffff81195182>] iput+0x62/0x70
May 13 16:27:10 machine1 kernel: [<ffffffff8118b0d2>] do_unlinkat+0x112/0x1c0
May 13 16:27:10 machine1 kernel: [<ffffffff810d69e2>] ?
audit_syscall_entry+0x272/0x2a0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c5b5>] ?
math_state_restore+0x45/0x60
May 13 16:27:10 machine1 kernel: [<ffffffff8118b196>] sys_unlink+0x16/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b
May 13 16:27:10 machine1 kernel: INFO: task mdtest:7398 blocked for
more than 120 seconds.
May 13 16:27:10 machine1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 16:27:10 machine1 kernel: mdtest D 0000000000000008
0 7398 7395 0x00000080
May 13 16:27:10 machine1 kernel: ffff880b79bcbbf8 0000000000000082
ffff880b79bcbb88 ffffffffa0587633
May 13 16:27:10 machine1 kernel: 0000000000000000 0000000000014001
0000000000000010 ffff880b79bcbc90
May 13 16:27:10 machine1 kernel: ffff880c5e6fbaf8 ffff880b79bcbfd8
000000000000fb88 ffff880c5e6fbaf8
May 13 16:27:10 machine1 kernel: Call Trace:
May 13 16:27:10 machine1 kernel: [<ffffffffa0587633>] ?
xfs_buf_get+0x103/0x1a0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056c492>] xlog_wait+0x72/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81060250>] ?
default_wake_function+0x0/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa056e08b>]
xlog_grant_log_space+0x3ab/0x520 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa0582d1a>] ?
kmem_zone_zalloc+0x3a/0x50 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff8127f3ac>] ? random32+0x1c/0x20
May 13 16:27:10 machine1 kernel: [<ffffffffa057d201>] ?
xfs_trans_ail_push+0x21/0x80 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa056e2e6>]
xfs_log_reserve+0xe6/0x140 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bad0>]
xfs_trans_reserve+0xa0/0x210 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057bdbf>] ?
xfs_trans_alloc+0x9f/0xb0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffffa057fb93>]
xfs_remove+0x113/0x3a0 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81186fd3>] ?
generic_permission+0x23/0xb0
May 13 16:27:10 machine1 kernel: [<ffffffffa058d6c8>]
xfs_vn_unlink+0x48/0x90 [xfs]
May 13 16:27:10 machine1 kernel: [<ffffffff81188c0f>] vfs_unlink+0x9f/0xe0
May 13 16:27:10 machine1 kernel: [<ffffffff8118795a>] ? lookup_hash+0x3a/0x50
May 13 16:27:10 machine1 kernel: [<ffffffff8118b143>] do_unlinkat+0x183/0x1c0
May 13 16:27:10 machine1 kernel: [<ffffffff810d69e2>] ?
audit_syscall_entry+0x272/0x2a0
May 13 16:27:10 machine1 kernel: [<ffffffff8100c5b5>] ?
math_state_restore+0x45/0x60
May 13 16:27:10 machine1 kernel: [<ffffffff8118b196>] sys_unlink+0x16/0x20
May 13 16:27:10 machine1 kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b
--
Subranshu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Error during bulk removal of files
2013-05-13 16:15 ` Subranshu Patel
@ 2013-05-13 17:55 ` Mark Tinguely
2013-05-14 0:36 ` Dave Chinner
1 sibling, 0 replies; 6+ messages in thread
From: Mark Tinguely @ 2013-05-13 17:55 UTC (permalink / raw)
To: Subranshu Patel; +Cc: xfs
On 05/13/13 11:15, Subranshu Patel wrote:
>> What 2.6.32 kernel? Output of uname -a?
>
> uname -a
> Linux machine1 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> # cat /etc/issue
> Red Hat Enterprise Linux Server release 6.3 (Santiago)
>
> I am using mdtest program to delete 7 million files (of 1KB size).
> Total 128 concurrent mdtest process are there. Each process removes
> approx 54000 files ( 7 million / 128).
> mdtest runs for a while, after that, it gets blocked and does not do
> anything. However the machine is still responsive.
>
Is the delay log enabled? If so have you tried the command with it disabled?
Are you willing to take a crash of the hang? Or want to look at the
entries on the AIL?
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Error during bulk removal of files
2013-05-13 16:15 ` Subranshu Patel
2013-05-13 17:55 ` Mark Tinguely
@ 2013-05-14 0:36 ` Dave Chinner
1 sibling, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2013-05-14 0:36 UTC (permalink / raw)
To: Subranshu Patel; +Cc: Mark Tinguely, xfs
On Mon, May 13, 2013 at 09:45:30PM +0530, Subranshu Patel wrote:
> > What 2.6.32 kernel? Output of uname -a?
>
> uname -a
> Linux machine1 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> # cat /etc/issue
> Red Hat Enterprise Linux Server release 6.3 (Santiago)
Please report this to your Redhat support representative. Indeed, I
suspect that running the latest update kernel will fix your problem.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-14 0:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-10 16:17 Error during bulk removal of files Subranshu Patel
2013-05-10 16:40 ` Mark Tinguely
2013-05-11 1:15 ` Dave Chinner
2013-05-13 16:15 ` Subranshu Patel
2013-05-13 17:55 ` Mark Tinguely
2013-05-14 0:36 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox