* Xfs delaylog hanged up
@ 2010-11-22 19:27 Spelic
2010-11-22 23:29 ` Dave Chinner
2010-11-24 22:52 ` Spelic
0 siblings, 2 replies; 21+ messages in thread
From: Spelic @ 2010-11-22 19:27 UTC (permalink / raw)
To: xfs
Hi there
EXCELLENT speed improvement with delaylog! My congratulations.
Now there's a problem: it hanged on me (not sure it was because of
delaylog, but anyway...)
I am on 2.6.37rc2
16 disk MD raid-5
mkfs line
mkfs.xfs -l size=128m -d agcount=16 <device>
(then lvm2, all aligned to stripes)
mount line:
/dev/mapper/perftestvg-xfslv /perftest/xfs xfs
rw,relatime,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota
0 0
deadline scheduler, nr_requests 1024
I was doing a parallel-kernel-unpack benchmark:
/perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
/perftest/xfs# for i in {1..10} ; do time tar -jxf
linux-2.6.37-rc2.tar.bz2 -C dir$i & done ; echo waiting now ; time wait;
echo syncing now ; time sync
[1] 3400
[2] 3401
[3] 3402
[4] 3404
[5] 3407
[6] 3409
[7] 3410
[8] 3413
[9] 3415
[10] 3417
waiting now
[unfortunately I hit ctrl-C here by mistake after a while and it
might have unlocked the situation]
[These are the final times for tars: 6 minutes, while it usually
takes about 1min20sec]
real 6m34.883s
user 0m19.770s
sys 0m5.610s
real 6m45.220s
user 0m20.600s
sys 0m5.750s
real 6m46.638s
user 0m20.920s
sys 0m5.800s
real 6m46.731s
user 0m20.640s
sys 0m5.840s
real 6m46.743s
user 0m20.860s
sys 0m5.950s
real 6m46.857s
user 0m20.330s
sys 0m6.000s
real 6m47.885s
user 0m21.000s
sys 0m6.010s
real 6m48.718s
user 0m20.780s
sys 0m5.700s
real 6m51.915s
user 0m19.530s
sys 0m4.680s
real 7m0.249s
user 0m20.330s
sys 0m6.290s
Iostat showed everything totally still.
Computer was responsive for the rest (OS is on other filesystems)
Look at the stack traces in the 5 minutes they were hanged:
# ps aux | grep tar
root 875 0.0 0.0 12688 556 ? S 16:47 0:00
upstart-udev-bridge --daemon
root 3403 1.9 0.0 18224 1168 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir1
root 3405 1.8 0.0 18224 1168 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir2
root 3406 0.9 0.0 18224 1164 pts/1 D 18:58 0:02
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir3
root 3408 1.7 0.0 18224 1164 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir4
root 3411 1.8 0.0 18224 1164 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir5
root 3412 1.6 0.0 18224 1164 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir6
root 3414 1.7 0.0 18224 1164 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir7
root 3416 1.8 0.0 18224 1164 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir8
root 3418 1.8 0.0 18224 1168 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir9
root 3419 1.7 0.0 18224 1164 pts/1 D 18:58 0:04
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir10
# cat /proc/3403/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[<ffffffff8114de68>] notify_change+0x168/0x2d0
[<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
[<ffffffff81134733>] sys_chmod+0x13/0x20
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3405/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[<ffffffff81137405>] fput+0xd5/0x280
[<ffffffff81133bf8>] filp_close+0x58/0x90
[<ffffffff81133ce9>] sys_close+0xb9/0x110
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3406/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02c05ed>] xfs_iomap_write_allocate+0x21d/0x360 [xfs]
[<ffffffffa02c09bf>] xfs_iomap+0x28f/0x340 [xfs]
[<ffffffffa02da287>] xfs_map_blocks+0x37/0x40 [xfs]
[<ffffffffa02dbae0>] xfs_vm_writepage+0x3f0/0x5d0 [xfs]
[<ffffffff810f00d2>] __writepage+0x12/0x40
[<ffffffff810f1198>] write_cache_pages+0x1f8/0x490
[<ffffffff810f144f>] generic_writepages+0x1f/0x30
[<ffffffffa02da8a8>] xfs_vm_writepages+0x58/0x70 [xfs]
[<ffffffff810f147c>] do_writepages+0x1c/0x40
[<ffffffff81158ddd>] writeback_single_inode+0x9d/0x260
[<ffffffff811591e4>] writeback_sb_inodes+0xd4/0x170
[<ffffffff81159324>] writeback_inodes_wb+0xa4/0x170
[<ffffffff810f0a60>] balance_dirty_pages_ratelimited_nr+0x390/0x4a0
[<ffffffff810e71b3>] generic_file_buffered_write+0x1b3/0x250
[<ffffffffa02e09b8>] xfs_file_aio_write+0x7b8/0xa60 [xfs]
[<ffffffff81135582>] do_sync_write+0xd2/0x110
[<ffffffff81135878>] vfs_write+0xc8/0x190
[<ffffffff811361ac>] sys_write+0x4c/0x80
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3408/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
[<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
[<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
[<ffffffff81140f97>] vfs_create+0xa7/0xd0
[<ffffffff811420e8>] do_last+0x658/0x720
[<ffffffff8114414b>] do_filp_open+0x1fb/0x650
[<ffffffff81133da3>] do_sys_open+0x63/0x120
[<ffffffff81133e8b>] sys_open+0x1b/0x20
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3411/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[<ffffffff81137405>] fput+0xd5/0x280
[<ffffffff81133bf8>] filp_close+0x58/0x90
[<ffffffff81133ce9>] sys_close+0xb9/0x110
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3412/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
[<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
[<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
[<ffffffff81140f97>] vfs_create+0xa7/0xd0
[<ffffffff811420e8>] do_last+0x658/0x720
[<ffffffff8114414b>] do_filp_open+0x1fb/0x650
[<ffffffff81133da3>] do_sys_open+0x63/0x120
[<ffffffff81133e8b>] sys_open+0x1b/0x20
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3414/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[<ffffffff8114de68>] notify_change+0x168/0x2d0
[<ffffffff8115dc57>] utimes_common+0xd7/0x1b0
[<ffffffff8115ddc0>] do_utimes+0x90/0xf0
[<ffffffff8115dece>] sys_utimensat+0x2e/0x80
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3416/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[<ffffffff8114de68>] notify_change+0x168/0x2d0
[<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
[<ffffffff81134733>] sys_chmod+0x13/0x20
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3418/stack
[<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[<ffffffff81137405>] fput+0xd5/0x280
[<ffffffff81133bf8>] filp_close+0x58/0x90
[<ffffffff81133ce9>] sys_close+0xb9/0x110
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3419/stack
[<ffffffffa02c6195>] xlog_grant_log_space+0x2b5/0x5e0 [xfs]
[<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[<ffffffff81137405>] fput+0xd5/0x280
[<ffffffff81133bf8>] filp_close+0x58/0x90
[<ffffffff81133ce9>] sys_close+0xb9/0x110
[<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
dmesg:
[ 8042.770130] xfsaild/dm-6 D 00000001000b92b8 0 2979 2
0x00000000
[ 8042.770135] ffff88082c11ba90 0000000000000046 ffff88083140de08
0000000000000000
[ 8042.770140] 00000000000138c0 ffff88083dfa5a80 ffff88083dfa5e18
ffff88082c11bfd8
[ 8042.770144] ffff88083dfa5e20 00000000000138c0 ffff88082c11a010
00000000000138c0
[ 8042.770148] Call Trace:
[ 8042.770187] [<ffffffffa02c4a8f>]
xlog_state_get_iclog_space+0xdf/0x2c0 [xfs]
[ 8042.770205] [<ffffffffa02c422a>] ? xlog_sync+0x1fa/0x3d0 [xfs]
[ 8042.770213] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.770231] [<ffffffffa02c5708>] xlog_write+0x198/0x6c0 [xfs]
[ 8042.770249] [<ffffffffa02c6fd2>] xlog_cil_push+0x252/0x3b0 [xfs]
[ 8042.770267] [<ffffffffa02c7636>] xlog_cil_force_lsn+0x116/0x120 [xfs]
[ 8042.770286] [<ffffffffa02dd3dc>] ? xfs_buf_delwri_queue+0xcc/0x150 [xfs]
[ 8042.770291] [<ffffffff81033209>] ? default_spin_lock_flags+0x9/0x10
[ 8042.770296] [<ffffffff810808d2>] ? down_trylock+0x32/0x50
[ 8042.770314] [<ffffffffa02c4543>] _xfs_log_force+0x63/0x230 [xfs]
[ 8042.770332] [<ffffffffa02c4933>] xfs_log_force+0x13/0x40 [xfs]
[ 8042.770351] [<ffffffffa02d38e4>] xfsaild_push+0x3b4/0x400 [xfs]
[ 8042.770357] [<ffffffff81571c26>] ? schedule_timeout+0x196/0x2f0
[ 8042.770377] [<ffffffffa02e67a4>] xfsaild+0x74/0xb0 [xfs]
[ 8042.770395] [<ffffffffa02e6730>] ? xfsaild+0x0/0xb0 [xfs]
[ 8042.770400] [<ffffffff8107b076>] kthread+0x96/0xa0
[ 8042.770404] [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
[ 8042.770408] [<ffffffff8107afe0>] ? kthread+0x0/0xa0
[ 8042.770411] [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
[ 8042.770414] INFO: task xfssyncd/dm-6:2980 blocked for more than 120
seconds.
[ 8042.770449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.770501] xfssyncd/dm-6 D ffff88083dfa0000 0 2980 2
0x00000000
[ 8042.770505] ffff88015d98fcf0 0000000000000046 ffff88015d98fc80
ffffffffa02cdb39
[ 8042.770510] 00000000000138c0 ffff88083dfa0000 ffff88083dfa0398
ffff88015d98ffd8
[ 8042.770514] ffff88083dfa03a0 00000000000138c0 ffff88015d98e010
00000000000138c0
[ 8042.770518] Call Trace:
[ 8042.770537] [<ffffffffa02cdb39>] ? xfs_perag_get_tag+0x39/0xc0 [xfs]
[ 8042.770555] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.770574] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.770578] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.770595] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.770614] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.770631] [<ffffffffa02b5ac9>] xfs_fs_log_dummy+0x49/0xb0 [xfs]
[ 8042.770649] [<ffffffffa02c3c46>] ? xfs_log_need_covered+0x66/0xd0 [xfs]
[ 8042.770668] [<ffffffffa02e82e3>] xfs_sync_worker+0x83/0x90 [xfs]
[ 8042.770686] [<ffffffffa02e70b3>] xfssyncd+0x183/0x230 [xfs]
[ 8042.770705] [<ffffffffa02e6f30>] ? xfssyncd+0x0/0x230 [xfs]
[ 8042.770709] [<ffffffff8107b076>] kthread+0x96/0xa0
[ 8042.770712] [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
[ 8042.770715] [<ffffffff8107afe0>] ? kthread+0x0/0xa0
[ 8042.770718] [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
[ 8042.770721] INFO: task flush-252:6:3320 blocked for more than 120
seconds.
[ 8042.770756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.770808] flush-252:6 D 00000001000b92b0 0 3320 2
0x00000000
[ 8042.770812] ffff8808329776b0 0000000000000046 0000000000000286
ffff880800000000
[ 8042.770817] 00000000000138c0 ffff88083dbe96a0 ffff88083dbe9a38
ffff880832977fd8
[ 8042.770821] ffff88083dbe9a40 00000000000138c0 ffff880832976010
00000000000138c0
[ 8042.770825] Call Trace:
[ 8042.770843] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.770861] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.770865] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.770883] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.770901] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.770920] [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.770937] [<ffffffffa02c05ed>]
xfs_iomap_write_allocate+0x21d/0x360 [xfs]
[ 8042.770957] [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50
[xfs]
[ 8042.770974] [<ffffffffa02c09bf>] xfs_iomap+0x28f/0x340 [xfs]
[ 8042.770992] [<ffffffffa02da287>] xfs_map_blocks+0x37/0x40 [xfs]
[ 8042.771011] [<ffffffffa02dbae0>] xfs_vm_writepage+0x3f0/0x5d0 [xfs]
[ 8042.771017] [<ffffffff810f00d2>] __writepage+0x12/0x40
[ 8042.771020] [<ffffffff810f1198>] write_cache_pages+0x1f8/0x490
[ 8042.771024] [<ffffffff810f00c0>] ? __writepage+0x0/0x40
[ 8042.771027] [<ffffffff8157130c>] ? schedule+0x44c/0xa80
[ 8042.771045] [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50
[xfs]
[ 8042.771050] [<ffffffff810f144f>] generic_writepages+0x1f/0x30
[ 8042.771068] [<ffffffffa02da8a8>] xfs_vm_writepages+0x58/0x70 [xfs]
[ 8042.771071] [<ffffffff810f147c>] do_writepages+0x1c/0x40
[ 8042.771077] [<ffffffff81158ddd>] writeback_single_inode+0x9d/0x260
[ 8042.771080] [<ffffffff811591e4>] writeback_sb_inodes+0xd4/0x170
[ 8042.771083] [<ffffffff81158258>] ? queue_io+0x98/0x1b0
[ 8042.771087] [<ffffffff81159324>] writeback_inodes_wb+0xa4/0x170
[ 8042.771090] [<ffffffff8115967b>] wb_writeback+0x28b/0x400
[ 8042.771094] [<ffffffff8105477b>] ? dequeue_task_fair+0x8b/0x90
[ 8042.771099] [<ffffffff810095fb>] ? __switch_to+0xcb/0x350
[ 8042.771103] [<ffffffff812a7ab9>] ? __percpu_counter_sum+0x69/0x80
[ 8042.771106] [<ffffffff8115998b>] wb_do_writeback+0x19b/0x1e0
[ 8042.771110] [<ffffffff81159a7a>] bdi_writeback_thread+0xaa/0x270
[ 8042.771113] [<ffffffff811599d0>] ? bdi_writeback_thread+0x0/0x270
[ 8042.771117] [<ffffffff8107b076>] kthread+0x96/0xa0
[ 8042.771120] [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
[ 8042.771123] [<ffffffff8107afe0>] ? kthread+0x0/0xa0
[ 8042.771126] [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
[ 8042.771130] INFO: task tar:3403 blocked for more than 120 seconds.
[ 8042.771163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.771216] tar D ffff88083d04ad40 0 3403 3400
0x00000000
[ 8042.771220] ffff88015e7a9c68 0000000000000082 0000000000000001
ffff880832815f00
[ 8042.771224] 00000000000138c0 ffff88083d04ad40 ffff88083d04b0d8
ffff88015e7a9fd8
[ 8042.771228] ffff88083d04b0e0 00000000000138c0 ffff88015e7a8010
00000000000138c0
[ 8042.771232] Call Trace:
[ 8042.771250] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.771268] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.771272] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.771290] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.771308] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.771326] [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.771345] [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[ 8042.771363] [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[ 8042.771368] [<ffffffff8114de68>] notify_change+0x168/0x2d0
[ 8042.771373] [<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
[ 8042.771377] [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.771381] [<ffffffff811345e8>] ? sys_chown+0x68/0x90
[ 8042.771384] [<ffffffff81134733>] sys_chmod+0x13/0x20
[ 8042.771387] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.771390] INFO: task tar:3405 blocked for more than 120 seconds.
[ 8042.771424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.771476] tar D 00000001000b92b6 0 3405 3401
0x00000000
[ 8042.771480] ffff88018e975cc8 0000000000000086 ffff88018e975cd8
0000000000000000
[ 8042.771484] 00000000000138c0 ffff880832b98000 ffff880832b98398
ffff88018e975fd8
[ 8042.771489] ffff880832b983a0 00000000000138c0 ffff88018e974010
00000000000138c0
[ 8042.771493] Call Trace:
[ 8042.771510] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.771528] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.771532] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.771550] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.771568] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.771587] [<ffffffffa02e0dab>] ? xfs_tosspages+0x1b/0x20 [xfs]
[ 8042.771605] [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[ 8042.771624] [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[ 8042.771642] [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[ 8042.771646] [<ffffffff81137405>] fput+0xd5/0x280
[ 8042.771650] [<ffffffff81133bf8>] filp_close+0x58/0x90
[ 8042.771653] [<ffffffff81133ce9>] sys_close+0xb9/0x110
[ 8042.771656] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.771659] INFO: task tar:3406 blocked for more than 120 seconds.
[ 8042.771692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.771745] tar D 00000001000b92b0 0 3406 3402
0x00000008
[ 8042.771749] ffff8801202c94c8 0000000000000086 0000000000000282
ffff880800000001
[ 8042.771754] 00000000000138c0 ffff880832af2d40 ffff880832af30d8
ffff8801202c9fd8
[ 8042.771758] ffff880832af30e0 00000000000138c0 ffff8801202c8010
00000000000138c0
[ 8042.771762] Call Trace:
[ 8042.771779] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.771798] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.771801] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.771819] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.771838] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.771856] [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.771874] [<ffffffffa02c05ed>]
xfs_iomap_write_allocate+0x21d/0x360 [xfs]
[ 8042.771892] [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50
[xfs]
[ 8042.771910] [<ffffffffa02c09bf>] xfs_iomap+0x28f/0x340 [xfs]
[ 8042.771928] [<ffffffffa02da287>] xfs_map_blocks+0x37/0x40 [xfs]
[ 8042.771946] [<ffffffffa02ddc18>] ? _xfs_buf_lookup_pages+0x288/0x360
[xfs]
[ 8042.771965] [<ffffffffa02dbae0>] xfs_vm_writepage+0x3f0/0x5d0 [xfs]
[ 8042.771969] [<ffffffff810f00d2>] __writepage+0x12/0x40
[ 8042.771972] [<ffffffff810f1198>] write_cache_pages+0x1f8/0x490
[ 8042.771975] [<ffffffff810f00c0>] ? __writepage+0x0/0x40
[ 8042.771993] [<ffffffffa02bea94>] ? xfs_iflush+0x214/0x220 [xfs]
[ 8042.772011] [<ffffffffa02dd86e>] ? xfs_bdwrite+0x3e/0x80 [xfs]
[ 8042.772030] [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50
[xfs]
[ 8042.772034] [<ffffffff810f144f>] generic_writepages+0x1f/0x30
[ 8042.772052] [<ffffffffa02da8a8>] xfs_vm_writepages+0x58/0x70 [xfs]
[ 8042.772056] [<ffffffff810f147c>] do_writepages+0x1c/0x40
[ 8042.772059] [<ffffffff81158ddd>] writeback_single_inode+0x9d/0x260
[ 8042.772063] [<ffffffff811591e4>] writeback_sb_inodes+0xd4/0x170
[ 8042.772066] [<ffffffff81159324>] writeback_inodes_wb+0xa4/0x170
[ 8042.772070] [<ffffffff810f0a60>]
balance_dirty_pages_ratelimited_nr+0x390/0x4a0
[ 8042.772074] [<ffffffff810e6db1>] ?
iov_iter_copy_from_user_atomic+0x91/0x160
[ 8042.772077] [<ffffffff810e71b3>] generic_file_buffered_write+0x1b3/0x250
[ 8042.772097] [<ffffffffa02e09b8>] xfs_file_aio_write+0x7b8/0xa60 [xfs]
[ 8042.772101] [<ffffffff81135582>] do_sync_write+0xd2/0x110
[ 8042.772106] [<ffffffff81243a4e>] ? security_file_permission+0x1e/0x90
[ 8042.772109] [<ffffffff81135878>] vfs_write+0xc8/0x190
[ 8042.772112] [<ffffffff811361ac>] sys_write+0x4c/0x80
[ 8042.772116] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.772119] INFO: task tar:3408 blocked for more than 120 seconds.
[ 8042.772152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.772205] tar D 00000001000b92b0 0 3408 3404
0x00000000
[ 8042.772209] ffff88082c331a98 0000000000000082 0000000000000000
0000000000000000
[ 8042.772213] 00000000000138c0 ffff88083cca43e0 ffff88083cca4778
ffff88082c331fd8
[ 8042.772217] ffff88083cca4780 00000000000138c0 ffff88082c330010
00000000000138c0
[ 8042.772221] Call Trace:
[ 8042.772239] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.772257] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.772261] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.772279] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.772297] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.772315] [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.772334] [<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
[ 8042.772338] [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.772356] [<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
[ 8042.772375] [<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
[ 8042.772379] [<ffffffff81140f97>] vfs_create+0xa7/0xd0
[ 8042.772382] [<ffffffff811420e8>] do_last+0x658/0x720
[ 8042.772386] [<ffffffff8114414b>] do_filp_open+0x1fb/0x650
[ 8042.772389] [<ffffffff81243286>] ? security_d_instantiate+0x16/0x30
[ 8042.772393] [<ffffffff8114a371>] ? d_instantiate+0x51/0x60
[ 8042.772397] [<ffffffff8114ef92>] ? alloc_fd+0x102/0x150
[ 8042.772400] [<ffffffff81133da3>] do_sys_open+0x63/0x120
[ 8042.772403] [<ffffffff8100c15d>] ? math_state_restore+0x3d/0x60
[ 8042.772407] [<ffffffff81133e8b>] sys_open+0x1b/0x20
[ 8042.772410] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.772413] INFO: task tar:3411 blocked for more than 120 seconds.
[ 8042.772446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.772499] tar D ffff88083d40ad40 0 3411 3407
0x00000000
[ 8042.772503] ffff880832545cc8 0000000000000086 ffff880832545cd8
0000000000000004
[ 8042.772507] 00000000000138c0 ffff88083d40ad40 ffff88083d40b0d8
ffff880832545fd8
[ 8042.772512] ffff88083d40b0e0 00000000000138c0 ffff880832544010
00000000000138c0
[ 8042.772516] Call Trace:
[ 8042.772533] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.772551] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.772555] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.772573] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.772591] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.772610] [<ffffffffa02e0dab>] ? xfs_tosspages+0x1b/0x20 [xfs]
[ 8042.772628] [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[ 8042.772646] [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[ 8042.772665] [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[ 8042.772669] [<ffffffff81137405>] fput+0xd5/0x280
[ 8042.772672] [<ffffffff81133bf8>] filp_close+0x58/0x90
[ 8042.772675] [<ffffffff81133ce9>] sys_close+0xb9/0x110
[ 8042.772678] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.772681] INFO: task tar:3412 blocked for more than 120 seconds.
[ 8042.772714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.772767] tar D 00000001000b92ae 0 3412 3409
0x00000000
[ 8042.772771] ffff8802b0811a98 0000000000000082 0000000000000000
0000000000000000
[ 8042.772775] 00000000000138c0 ffff8808400aad40 ffff8808400ab0d8
ffff8802b0811fd8
[ 8042.772780] ffff8808400ab0e0 00000000000138c0 ffff8802b0810010
00000000000138c0
[ 8042.772784] Call Trace:
[ 8042.772802] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.772820] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.772824] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.772841] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.772860] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.772878] [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.772896] [<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
[ 8042.772913] [<ffffffffa02b8ef6>] ? xfs_iunlock+0x96/0x130 [xfs]
[ 8042.772932] [<ffffffffa02d6bcd>] ? xfs_lookup+0x8d/0x100 [xfs]
[ 8042.772950] [<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
[ 8042.772969] [<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
[ 8042.772972] [<ffffffff81140f97>] vfs_create+0xa7/0xd0
[ 8042.772976] [<ffffffff811420e8>] do_last+0x658/0x720
[ 8042.772979] [<ffffffff8114414b>] do_filp_open+0x1fb/0x650
[ 8042.772983] [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.772986] [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.772990] [<ffffffff8114ef92>] ? alloc_fd+0x102/0x150
[ 8042.772993] [<ffffffff81133da3>] do_sys_open+0x63/0x120
[ 8042.772996] [<ffffffff8100c15d>] ? math_state_restore+0x3d/0x60
[ 8042.773000] [<ffffffff81133e8b>] sys_open+0x1b/0x20
[ 8042.773003] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.773006] INFO: task tar:3414 blocked for more than 120 seconds.
[ 8042.773039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 8042.773092] tar D 00000001000b92aa 0 3414 3410
0x00000000
[ 8042.773096] ffff8802b0a19c08 0000000000000086 ffff88015d090800
0000000500000000
[ 8042.773100] 00000000000138c0 ffff880832af0000 ffff880832af0398
ffff8802b0a19fd8
[ 8042.773104] ffff880832af03a0 00000000000138c0 ffff8802b0a18010
00000000000138c0
[ 8042.773108] Call Trace:
[ 8042.773126] [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.773144] [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.773148] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.773166] [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.773184] [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.773202] [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.773221] [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[ 8042.773239] [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[ 8042.773243] [<ffffffff8114de68>] notify_change+0x168/0x2d0
[ 8042.773247] [<ffffffff8115dc57>] utimes_common+0xd7/0x1b0
[ 8042.773252] [<ffffffff81292a3d>] ? _atomic_dec_and_lock+0x4d/0x80
[ 8042.773255] [<ffffffff8115ddc0>] do_utimes+0x90/0xf0
[ 8042.773259] [<ffffffff8115dece>] sys_utimensat+0x2e/0x80
[ 8042.773262] [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
What do you think?
Is it the delaylog? is it not stable?
Thank you
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-22 19:27 Xfs delaylog hanged up Spelic
@ 2010-11-22 23:29 ` Dave Chinner
2010-11-23 11:17 ` Spelic
2010-11-24 22:52 ` Spelic
1 sibling, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2010-11-22 23:29 UTC (permalink / raw)
To: Spelic; +Cc: xfs
On Mon, Nov 22, 2010 at 08:27:14PM +0100, Spelic wrote:
> Hi there
> EXCELLENT speed improvement with delaylog! My congratulations.
> Now there's a problem: it hanged on me (not sure it was because of
> delaylog, but anyway...)
> I am on 2.6.37rc2
>
> 16 disk MD raid-5
What is the storage hardware and the MD raid5 configuration?
> mkfs line
> mkfs.xfs -l size=128m -d agcount=16 <device>
> (then lvm2, all aligned to stripes)
> mount line:
> /dev/mapper/perftestvg-xfslv /perftest/xfs xfs rw,relatime,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota
> 0 0
FYI, relatime, attr2, logbufs=8 and noquota are default values that
you don't need to specify.
> deadline scheduler, nr_requests 1024
>
> I was doing a parallel-kernel-unpack benchmark:
>
> /perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
> /perftest/xfs# for i in {1..10} ; do time tar -jxf
> linux-2.6.37-rc2.tar.bz2 -C dir$i & done ; echo waiting now ; time
> wait; echo syncing now ; time sync
Thanks for the test case - I'll try to reproduce this. One more key
piece of information I need: the output of mkfs.xfs so I can see your
filesystem geometry. if you have the filesystem mounted, then the
output of 'xfs_info <mntpt>' will give the same information.
> Iostat showed everything totally still.
> Computer was responsive for the rest (OS is on other filesystems)
> Look at the stack traces in the 5 minutes they were hanged:
....
> # cat /proc/3403/stack
> [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
> [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
> [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
> [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
> [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
> [<ffffffff8114de68>] notify_change+0x168/0x2d0
> [<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
> [<ffffffff81134733>] sys_chmod+0x13/0x20
> [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
Ok, so it is stuck with a full log....
> dmesg:
> [ 8042.770130] xfsaild/dm-6 D 00000001000b92b8 0 2979 2
> 0x00000000
> [ 8042.770135] ffff88082c11ba90 0000000000000046 ffff88083140de08
> 0000000000000000
> [ 8042.770140] 00000000000138c0 ffff88083dfa5a80 ffff88083dfa5e18
> ffff88082c11bfd8
> [ 8042.770144] ffff88083dfa5e20 00000000000138c0 ffff88082c11a010
> 00000000000138c0
> [ 8042.770148] Call Trace:
> [ 8042.770187] [<ffffffffa02c4a8f>]
> xlog_state_get_iclog_space+0xdf/0x2c0 [xfs]
> [ 8042.770205] [<ffffffffa02c422a>] ? xlog_sync+0x1fa/0x3d0 [xfs]
> [ 8042.770213] [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
> [ 8042.770231] [<ffffffffa02c5708>] xlog_write+0x198/0x6c0 [xfs]
> [ 8042.770249] [<ffffffffa02c6fd2>] xlog_cil_push+0x252/0x3b0 [xfs]
> [ 8042.770267] [<ffffffffa02c7636>] xlog_cil_force_lsn+0x116/0x120 [xfs]
> [ 8042.770286] [<ffffffffa02dd3dc>] ? xfs_buf_delwri_queue+0xcc/0x150 [xfs]
> [ 8042.770291] [<ffffffff81033209>] ? default_spin_lock_flags+0x9/0x10
> [ 8042.770296] [<ffffffff810808d2>] ? down_trylock+0x32/0x50
> [ 8042.770314] [<ffffffffa02c4543>] _xfs_log_force+0x63/0x230 [xfs]
> [ 8042.770332] [<ffffffffa02c4933>] xfs_log_force+0x13/0x40 [xfs]
> [ 8042.770351] [<ffffffffa02d38e4>] xfsaild_push+0x3b4/0x400 [xfs]
> [ 8042.770357] [<ffffffff81571c26>] ? schedule_timeout+0x196/0x2f0
> [ 8042.770377] [<ffffffffa02e67a4>] xfsaild+0x74/0xb0 [xfs]
> [ 8042.770395] [<ffffffffa02e6730>] ? xfsaild+0x0/0xb0 [xfs]
> [ 8042.770400] [<ffffffff8107b076>] kthread+0x96/0xa0
> [ 8042.770404] [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
> [ 8042.770408] [<ffffffff8107afe0>] ? kthread+0x0/0xa0
> [ 8042.770411] [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
That's the key thread that frees up space in the log by writing out
metadata buffers. But it itself looks to be stuck trying to force
the log out. It's waiting for IO completion on a log buffer to
complete.
> What do you think?
I'll need to reproduce this to determine the cause.
> Is it the delaylog?
No idea yet.
> is it not stable?
It is stable enough that all my local machines are running with
delaylog. And as a dev team, we have enough confidence in it to
remove the EXPERIMENTAL status for the 2.6.37 release. That doesn't
mean it is perfectly bug free - it just means more people will start
to use it. And as this happens, we will find new problems in
different environments (just like you have), but this is a pretty
normal occurrence for new code.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-22 23:29 ` Dave Chinner
@ 2010-11-23 11:17 ` Spelic
2010-11-23 13:28 ` Spelic
2010-11-23 20:46 ` Dave Chinner
0 siblings, 2 replies; 21+ messages in thread
From: Spelic @ 2010-11-23 11:17 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 11/23/2010 12:29 AM, Dave Chinner wrote:
>> 16 disk MD raid-5
>>
> What is the storage hardware and the MD raid5 configuration?
>
Tyan motherboard with 5400 chipset
dual Xeon E5420
16 disks on this one:
05:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at ce000000 (64-bit, prefetchable) [size=32M]
Memory at d2600000 (64-bit, non-prefetchable) [size=4K]
I/O ports at 3000 [size=256]
[virtual] Expansion ROM at d26e0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: 3w-9xxx
Kernel modules: 3w-9xxx
mdadm --create /dev/md5 --raid-devices=16 --chunk=1024 --level=5
--metadata=1.0 pci-0000:05:00.0-scsi-0:0:{15..0}:0
--name=backup:perftest --assume-clean
>> mount line:
>> /dev/mapper/perftestvg-xfslv /perftest/xfs xfs rw,relatime,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota
>> 0 0
>>
> FYI, relatime, attr2, logbufs=8 and noquota are default values that
> you don't need to specify.
>
ok, that was the /proc/mounts line..
mkfs.xfs -l size=128m -d agcount=16 <device>
mount -o delaylog,nobarrier,inode64,logbufs=8,logbsize=256k <device>
<mountpoint>
> Thanks for the test case - I'll try to reproduce this. One more key
> piece of information I need: the output of mkfs.xfs so I can see your
> filesystem geometry. if you have the filesystem mounted, then the
> output of 'xfs_info<mntpt>' will give the same information.
>
/# xfs_info /perftest/xfs/
meta-data=/dev/mapper/perftestvg-xfslv isize=256 agcount=16,
agsize=50331648 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=805306368, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
> It is stable enough that all my local machines are running with
> delaylog. And as a dev team, we have enough confidence in it to
> remove the EXPERIMENTAL status for the 2.6.37 release. That doesn't
> mean it is perfectly bug free - it just means more people will start
> to use it. And as this happens, we will find new problems in
> different environments (just like you have), but this is a pretty
> normal occurrence for new code.
>
ok I understand...
thanks for your work!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 11:17 ` Spelic
@ 2010-11-23 13:28 ` Spelic
2010-11-23 20:46 ` Dave Chinner
1 sibling, 0 replies; 21+ messages in thread
From: Spelic @ 2010-11-23 13:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 11/23/2010 12:17 PM, Spelic wrote:
> mdadm --create /dev/md5 --raid-devices=16 --chunk=1024 --level=5
> --metadata=1.0 pci-0000:05:00.0-scsi-0:0:{15..0}:0
> --name=backup:perftest --assume-clean
I forgot one setting: md/stripe_cache_size was set to 32768
I forgot to check stripe_cache_active during the hang... but I don't
think it was an MD bug
Regards
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 11:17 ` Spelic
2010-11-23 13:28 ` Spelic
@ 2010-11-23 20:46 ` Dave Chinner
2010-11-23 22:14 ` Stan Hoeppner
` (3 more replies)
1 sibling, 4 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-23 20:46 UTC (permalink / raw)
To: Spelic; +Cc: xfs
On Tue, Nov 23, 2010 at 12:17:41PM +0100, Spelic wrote:
> On 11/23/2010 12:29 AM, Dave Chinner wrote:
> >>16 disk MD raid-5
> >What is the storage hardware and the MD raid5 configuration?
>
> Tyan motherboard with 5400 chipset
> dual Xeon E5420
>
> 16 disks on this one:
> 05:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
> Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
> Flags: bus master, fast devsel, latency 0, IRQ 16
> Memory at ce000000 (64-bit, prefetchable) [size=32M]
> Memory at d2600000 (64-bit, non-prefetchable) [size=4K]
> I/O ports at 3000 [size=256]
> [virtual] Expansion ROM at d26e0000 [disabled] [size=128K]
> Capabilities: <access denied>
> Kernel driver in use: 3w-9xxx
> Kernel modules: 3w-9xxx
Hmmmm. We get plenty of reports about problems with 3ware RAID
controllers, many of which are RAID controller problems. Can you
make sure you are running the latest firmware on the controller?
> /# xfs_info /perftest/xfs/
> meta-data=/dev/mapper/perftestvg-xfslv isize=256 agcount=16,
> agsize=50331648 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=805306368, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal bsize=4096 blocks=32768, version=2
> = sectsz=512 sunit=0 blks, lazy-count=0
> realtime =none extsz=4096 blocks=0, rtextents=0
Nothing unusual there.
I've been unable to reproduce the problem with your test case (been
running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
trying to reproduce it for a while. I note that the load is
generating close to 10,000 iops on my test system, so it may very
well be triggering load related problems in your raid controller...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 20:46 ` Dave Chinner
@ 2010-11-23 22:14 ` Stan Hoeppner
2010-11-24 0:20 ` Dave Chinner
2010-11-23 22:48 ` Emmanuel Florac
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: Stan Hoeppner @ 2010-11-23 22:14 UTC (permalink / raw)
To: xfs
Dave Chinner put forth on 11/23/2010 2:46 PM:
> I've been unable to reproduce the problem with your test case (been
> running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> trying to reproduce it for a while. I note that the load is
> generating close to 10,000 iops on my test system, so it may very
> well be triggering load related problems in your raid controller...
Somewhat off topic, but how are you generating 10,000 IOPS by carving a
16TB LUN/volume from 12 x 2TB SATA disk spindles? Such drives aren't
even capable of 200 seeks per second. Even if they were you'd top out
at less than 2,500 IOPS (random). 16TB/12=1.33 TB per disk. No such
capacity disk exists. So I assume you're using 12 x 2TB disks and
slicing/dicing out 16TB. What am I missing Dave?
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 20:46 ` Dave Chinner
2010-11-23 22:14 ` Stan Hoeppner
@ 2010-11-23 22:48 ` Emmanuel Florac
2010-11-24 0:36 ` Spelic
2010-11-24 0:58 ` Spelic
2010-11-25 23:34 ` Spelic
3 siblings, 1 reply; 21+ messages in thread
From: Emmanuel Florac @ 2010-11-23 22:48 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs, Spelic
Le Wed, 24 Nov 2010 07:46:09 +1100 vous écriviez:
> Hmmmm. We get plenty of reports about problems with 3ware RAID
> controllers, many of which are RAID controller problems. Can you
> make sure you are running the latest firmware on the controller?
So Spelic, you're using the 3Ware as a SATA controller, but not a RAID
controller? This is quite unexpected, though going with md
instead provides better IOPS.
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 22:14 ` Stan Hoeppner
@ 2010-11-24 0:20 ` Dave Chinner
2010-11-24 13:12 ` Spelic
0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2010-11-24 0:20 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
On Tue, Nov 23, 2010 at 04:14:16PM -0600, Stan Hoeppner wrote:
> Dave Chinner put forth on 11/23/2010 2:46 PM:
>
> > I've been unable to reproduce the problem with your test case (been
> > running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> > trying to reproduce it for a while. I note that the load is
> > generating close to 10,000 iops on my test system, so it may very
> > well be triggering load related problems in your raid controller...
>
> Somewhat off topic, but how are you generating 10,000 IOPS by carving a
> 16TB LUN/volume from 12 x 2TB SATA disk spindles? Such drives aren't
> even capable of 200 seeks per second. Even if they were you'd top out
> at less than 2,500 IOPS (random). 16TB/12=1.33 TB per disk. No such
> capacity disk exists. So I assume you're using 12 x 2TB disks and
> slicing/dicing out 16TB. What am I missing Dave?
512MB of BBWC backing the disks. The BBWC does a much better job of
reordering out-of-order writes than the Linux elevators because
512MB is a much bigger window than a couple of thousand 4k IOs.
Hence metadata/small file intensive workloads go much faster than
you'd expect from just looking at the IO patterns and the capability
of the disks.
IOWs, for write workloads that are not purely random, the disk
subsystem behaves more like an SSD than a RAID0 array of spinning
rust...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 22:48 ` Emmanuel Florac
@ 2010-11-24 0:36 ` Spelic
2010-11-24 1:40 ` Stan Hoeppner
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Spelic @ 2010-11-24 0:36 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
On 11/23/2010 11:48 PM, Emmanuel Florac wrote:
> So Spelic, you're using the 3Ware as a SATA controller, but not a RAID
> controller? This is quite unexpected, though going with md
> instead provides better IOPS.
>
Yep, because I know MD better, I can see the code, it's more flexible
and customizable, can be controlled with scripts more easily... Also
parameters to the underlying drives can be controlled if I use MD, and
it's easier to align filesystem to stripes. Also my time is better spent
learning it as MD won't change the next time I buy a different hw
controller.
What would be the advantage of using hardware raid?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 20:46 ` Dave Chinner
2010-11-23 22:14 ` Stan Hoeppner
2010-11-23 22:48 ` Emmanuel Florac
@ 2010-11-24 0:58 ` Spelic
2010-11-24 5:44 ` Dave Chinner
2010-11-25 23:34 ` Spelic
3 siblings, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-24 0:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 11/23/2010 09:46 PM, Dave Chinner wrote:
> Hmmmm. We get plenty of reports about problems with 3ware RAID
> controllers, many of which are RAID controller problems. Can you
> make sure you are running the latest firmware on the controller?
>
No, sorry, my firmware is: FE9X 4.06.00.004
But when controllers hang, there is usually something in dmesg, and in
my case there wasn't. Then after a while it resets (it has something
like a watchdog in it).
In the past during testing I did have reproducible hangups on high load
with these controllers (seemed like a lost interrupt), but they were
fixed by disabling NCQ.
The controller would reset in those cases, drives caches would reset to
"off", and there were entries in dmesg.
But that issue was definitely fixed by disabling NCQ: I tested many
times with and without NCQ with reproducible results; and after that we
had reliable operation for more than 1 year on that machine.
> I've been unable to reproduce the problem with your test case (been
> running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> trying to reproduce it for a while.
It seems to me that 12 disk raid0 dm is quite different from 16 disk md
raid5 array because you don't have the stripe cache and there are likely
to be fewer in-flight operations, if it was a pool of something which
was drained you might not hit it... But I understand that you had the
raid0 array already up :-D I'll see if I can reproduce this but I can't
guarantee: the machine should go back to production very soon.
If I hit it again, what should I look at?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 0:36 ` Spelic
@ 2010-11-24 1:40 ` Stan Hoeppner
2010-11-24 6:18 ` Michael Monnerie
2010-11-24 7:44 ` Emmanuel Florac
2 siblings, 0 replies; 21+ messages in thread
From: Stan Hoeppner @ 2010-11-24 1:40 UTC (permalink / raw)
To: xfs
Spelic put forth on 11/23/2010 6:36 PM:
> On 11/23/2010 11:48 PM, Emmanuel Florac wrote:
>> So Spelic, you're using the 3Ware as a SATA controller, but not a RAID
>> controller? This is quite unexpected, though going with md
>> instead provides better IOPS.
>>
>
> Yep, because I know MD better, I can see the code, it's more flexible
> and customizable, can be controlled with scripts more easily... Also
> parameters to the underlying drives can be controlled if I use MD, and
> it's easier to align filesystem to stripes. Also my time is better spent
> learning it as MD won't change the next time I buy a different hw
> controller.
> What would be the advantage of using hardware raid?
If there is no advantage then why buy an $800+ real hardware RAID card
and then deploy it as a vanilla SAS/SATA HBA with software RAID? What a
waste of a decent RAID card...
You could have gone with two of these instead
http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157
saving ~$500 and having similar if not slightly better performance with
software RAID.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
@ 2010-11-24 4:03 Richard Scobie
0 siblings, 0 replies; 21+ messages in thread
From: Richard Scobie @ 2010-11-24 4:03 UTC (permalink / raw)
To: xfs
Spelic wrote:
> Yep, because I know MD better, I can see the code, it's more flexible
> and customizable, can be controlled with scripts more easily... Also
> parameters to the underlying drives can be controlled if I use MD,
and > it's easier to align filesystem to stripes. Also my time is better
Slightly off topic, but your previously mentioned xfs_info shows sunit
and swidth values of zero, so your filesystem is not aligned to stripes.
/# xfs_info /perftest/xfs/
> meta-data=/dev/mapper/perftestvg-xfslv isize=256 agcount=16,
> agsize=50331648 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=805306368, imaxpct=5
> = sunit=0 swidth=0 blks
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 0:58 ` Spelic
@ 2010-11-24 5:44 ` Dave Chinner
0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-24 5:44 UTC (permalink / raw)
To: Spelic; +Cc: xfs
On Wed, Nov 24, 2010 at 01:58:25AM +0100, Spelic wrote:
> On 11/23/2010 09:46 PM, Dave Chinner wrote:
> >Hmmmm. We get plenty of reports about problems with 3ware RAID
> >controllers, many of which are RAID controller problems. Can you
> >make sure you are running the latest firmware on the controller?
>
> No, sorry, my firmware is: FE9X 4.06.00.004
>
> But when controllers hang, there is usually something in dmesg, and
> in my case there wasn't. Then after a while it resets (it has
> something like a watchdog in it).
> In the past during testing I did have reproducible hangups on high
> load with these controllers (seemed like a lost interrupt), but they
> were fixed by disabling NCQ.
> The controller would reset in those cases, drives caches would reset
> to "off", and there were entries in dmesg.
> But that issue was definitely fixed by disabling NCQ: I tested many
> times with and without NCQ with reproducible results; and after that
> we had reliable operation for more than 1 year on that machine.
>
> >I've been unable to reproduce the problem with your test case (been
> >running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> >trying to reproduce it for a while.
>
> It seems to me that 12 disk raid0 dm is quite different from 16 disk
> md raid5 array because you don't have the stripe cache and there are
> likely to be fewer in-flight operations, if it was a pool of
> something which was drained you might not hit it...
But if that is the cause, then it would indicate an MD problem
rather than an XFS problem. Testing on a similar but slightly
different configuration helps isolate where the problem may lie.
As it is, it is probably caused by the same bug as the problem Nick
reported. See my last post in the thread "XFS performance oddity"
for the patch that fixes the hang Nick reported.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 0:36 ` Spelic
2010-11-24 1:40 ` Stan Hoeppner
@ 2010-11-24 6:18 ` Michael Monnerie
2010-11-24 7:44 ` Emmanuel Florac
2 siblings, 0 replies; 21+ messages in thread
From: Michael Monnerie @ 2010-11-24 6:18 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: Text/Plain, Size: 670 bytes --]
On Mittwoch, 24. November 2010 Spelic wrote:
> What would be the advantage of using hardware raid?
Better broken drive detection (hw raid controllers are more picky than
linux), server CPU offload, out-of-band notification of problems, and
broken disks don't lead to a non-bootable system - what if your boot
disk crashes?
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
//
// Haus zu verkaufen: http://zmi.at/langegg/
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 0:36 ` Spelic
2010-11-24 1:40 ` Stan Hoeppner
2010-11-24 6:18 ` Michael Monnerie
@ 2010-11-24 7:44 ` Emmanuel Florac
2 siblings, 0 replies; 21+ messages in thread
From: Emmanuel Florac @ 2010-11-24 7:44 UTC (permalink / raw)
To: Spelic; +Cc: xfs
Le Wed, 24 Nov 2010 01:36:37 +0100 vous écriviez:
> What would be the advantage of using hardware raid?
More predictable performance under heavy load (nfs kernel server,
typically). BTW the 3Ware is not a very good sata controller; I've found
that it is actually one of the worst of its class when it comes to
IOPS, the Areca 16xx, LSI Megaraid and Adaptec 5xx5 all perform better
as simple controllers in combination with MD (tested with 8 SSDs).
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 0:20 ` Dave Chinner
@ 2010-11-24 13:12 ` Spelic
2010-11-24 21:50 ` Dave Chinner
0 siblings, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-24 13:12 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 11/24/2010 01:20 AM, Dave Chinner wrote:
>
> 512MB of BBWC backing the disks. The BBWC does a much better job of
> reordering out-of-order writes than the Linux elevators because
> 512MB is a much bigger window than a couple of thousand 4k IOs.
>
Hmmm very interesting...
so you are using a MD or DM raid-0 above a SATA controller with a BBWC?
That would probably be a RAID controller used as SATA because I have
never seen SATA controllers with a BBWC. I'd be interested in the brand
if you don't mind.
Also I wanted to know... the requests to the drives are really only 4K
in size for linux? Then what purpose do the elevators' merges have? When
the elevator merges two 4k requests doesn't it create an 8k request for
the drive?
Also look at this competitor's link:
http://thunk.org/tytso/blog/2010/11/01/i-have-the-money-shot-for-my-lca-presentation/
post #9
these scalability patches submit larger i/o than 4k. I can confirm that
from within iostat -x 1 (I can't understand what he means with
"bypasses the buffer cache layer" though, does it mean it's only for
DIRECTIO? it does not seem to me). When such large requests go into the
elevator, are they broken up into 4K requests again?
Thank you
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 13:12 ` Spelic
@ 2010-11-24 21:50 ` Dave Chinner
0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-24 21:50 UTC (permalink / raw)
To: Spelic; +Cc: xfs
On Wed, Nov 24, 2010 at 02:12:32PM +0100, Spelic wrote:
> On 11/24/2010 01:20 AM, Dave Chinner wrote:
> >
> >512MB of BBWC backing the disks. The BBWC does a much better job of
> >reordering out-of-order writes than the Linux elevators because
> >512MB is a much bigger window than a couple of thousand 4k IOs.
>
> Hmmm very interesting...
> so you are using a MD or DM raid-0 above a SATA controller with a BBWC?
> That would probably be a RAID controller used as SATA because I have
> never seen SATA controllers with a BBWC. I'd be interested in the
> brand if you don't mind.
Actually, it's a SAS RAID controller:
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
With each disk exported as a RAID0 lun because the raid controller
does not do JBOD.
> Also I wanted to know... the requests to the drives are really only
> 4K in size for linux?
No, they do much larger than that. However, for small file
workloads the IO size is determined mostly by the file size.
> Then what purpose do the elevators' merges
> have? When the elevator merges two 4k requests doesn't it create an
> 8k request for the drive?
Yes. But when the two 4k blocks are not adjacent, they can't be
merged and hence are two IOs. And if the block that separated them
is then written 5ms after the other two completed, it's three IOs
that get combined in the BBWC into one...
> Also look at this competitor's link:
> http://thunk.org/tytso/blog/2010/11/01/i-have-the-money-shot-for-my-lca-presentation/
> post #9
> these scalability patches submit larger i/o than 4k. I can confirm
> that from within iostat -x 1 (I can't understand what he means with
> "bypasses the buffer cache layer" though, does it mean it's only for
> DIRECTIO? it does not seem to me).
It means he's calling submit_bio() rather than submit_bh(). Most of
that "new" code in ext4 was copied directly from XFS - XFS has been
using submit_bio() for large IO submission since roughly 2.6.15.
> When such large requests go into
> the elevator, are they broken up into 4K requests again?
No, the opposite used to happen - ext4 would break large contiguous
regions into 4k IOs (becaus submit_bh could only handle one block at
a time), and then the elevator would re-merge them into a large IO.
The issue here is CPU overhead of merging thousands of blocks
unnecessarily.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-22 19:27 Xfs delaylog hanged up Spelic
2010-11-22 23:29 ` Dave Chinner
@ 2010-11-24 22:52 ` Spelic
2010-11-26 22:43 ` Spelic
1 sibling, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-24 22:52 UTC (permalink / raw)
To: xfs
On 11/22/2010 08:27 PM, Spelic wrote:
> ....
> I was doing a parallel-kernel-unpack benchmark:
> ....
I tried to reproduce it today but unfortunately I wasn't able to, not
even on the same hardware.
The first time it happened so soon after starting such
parallel-kernel-unpack benchmark (at the second run!) that I was
confident it was possible to reproduce it easily, but today in many
hours it never happened again. Sorry. Unfortunately the machine is going
back to production soon so I'm not sure I can do further testing.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-23 20:46 ` Dave Chinner
` (2 preceding siblings ...)
2010-11-24 0:58 ` Spelic
@ 2010-11-25 23:34 ` Spelic
2010-11-26 4:20 ` Dave Chinner
3 siblings, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-25 23:34 UTC (permalink / raw)
To: xfs
On 11/23/2010 09:46 PM, Dave Chinner wrote:
> ...
> I note that the load is
> generating close to 10,000 iops on my test system, so it may very
> well be triggering load related problems in your raid controller...
>
Dave thanks for all explanations on the BBWC,
I wanted to ask how can you measure that it's 10,000 IOPS with that
workload. Is it by iostat -x ?
If yes, what cell do you exactly look at and what is the period you use
for averaging values? I also can sometimes see values up to around 10000
in the cell "w/s" that corresponds to my MD RAID array (currently a 16
disk raid-5 with XFS delaylog), if I use
iostat -x 10 (this averages write IOPS on 10 seconds I think)
but only for a few shots of iostat, not for the whole run of the
"benchmark". Do you mean you have 10000 averaged over the whole benchmark?
Also I'm curious, do you remember how much time does it take to complete
one run (10 parallel tar unpacks) on your 12-disk raid0 + BBWC?
Probably a better test would excluding the unbzip2 part from the
benchmark, like the following but it probably won't make more than 10sec
difference:
/perftest/xfs# bzcat linux-2.6.37-rc2.tar.bz2 > linux-2.6.37-rc2.tar
/perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
/perftest/xfs# for i in {1..10} ; do time tar -xf linux-2.6.37-rc2.tar
-C dir$i & done ; echo waiting now ; time wait; echo syncing now ; time
sync
Thanks for all explanations
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-25 23:34 ` Spelic
@ 2010-11-26 4:20 ` Dave Chinner
0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-26 4:20 UTC (permalink / raw)
To: Spelic; +Cc: xfs
On Fri, Nov 26, 2010 at 12:34:13AM +0100, Spelic wrote:
> On 11/23/2010 09:46 PM, Dave Chinner wrote:
> >...
> >I note that the load is
> >generating close to 10,000 iops on my test system, so it may very
> >well be triggering load related problems in your raid controller...
>
> Dave thanks for all explanations on the BBWC,
>
> I wanted to ask how can you measure that it's 10,000 IOPS with that
> workload. Is it by iostat -x ?
http://marc.info/?l=linux-fsdevel&m=129013629728687&w=2
> but only for a few shots of iostat, not for the whole run of the
> "benchmark". Do you mean you have 10000 averaged over the whole
> benchmark?
It peaked at over 10,000 iops, lowest rate was ~4000iops and the
average would have been around 7000iops.
> Also I'm curious, do you remember how much time does it take to
> complete one run (10 parallel tar unpacks) on your 12-disk raid0 +
> BBWC?
33 seconds, with it being limited by the decompression rate (i.e. CPU
bound).
> Probably a better test would excluding the unbzip2 part from the
> benchmark, like the following but it probably won't make more than
> 10sec difference:
>
> /perftest/xfs# bzcat linux-2.6.37-rc2.tar.bz2 > linux-2.6.37-rc2.tar
> /perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
> /perftest/xfs# for i in {1..10} ; do time tar -xf
> linux-2.6.37-rc2.tar -C dir$i & done ; echo waiting now ; time wait;
> echo syncing now ; time sync
I'm currently running qa test right now on my test machine, so I
don't have a direct comparison with the above number for you.
However, my workstation has a pair of 120GB sandforce 1200 SSDs in
RAID0 running 2.6.37-rc1 w/ delaylog and the results are 40s for the
compressed tarball and 16s for the uncompressed tarball.
The uncompressed tarball run had lower IOPS and much higher
bandwidth as much more merging in the IO elevators was being done
compared to the compressed tarball...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Xfs delaylog hanged up
2010-11-24 22:52 ` Spelic
@ 2010-11-26 22:43 ` Spelic
0 siblings, 0 replies; 21+ messages in thread
From: Spelic @ 2010-11-26 22:43 UTC (permalink / raw)
To: xfs
On 11/24/2010 11:52 PM, Spelic wrote:
> On 11/22/2010 08:27 PM, Spelic wrote:
>> ....
>> I was doing a parallel-kernel-unpack benchmark:
>> ....
>
> I tried to reproduce it today but unfortunately I wasn't able to, not
> even on the same hardware.
>
> The first time it happened so soon after starting such
> parallel-kernel-unpack benchmark (at the second run!) that I was
> confident it was possible to reproduce it easily, but today in many
> hours it never happened again. Sorry. Unfortunately the machine is
> going back to production soon so I'm not sure I can do further testing.
I had another smaller hangup, and unfortunately I wasn't paying
attention. I just noticed that between a zillion runs of the "benchmark"
taking 1 minute (for the part before the sync) I just had one that took
2m30s for the same part. The system was absolutely idle except that test
so I can't really see an explanation other than being another smaller
hangup...
The good news is that it definitely came out of the hangup by itself:
this time I didn't press Ctrl-C. So even if it was an xfs bug it doesn't
seem fatal.
The dmesg this time showed nothing; probably because the hangup this
time was shorter than 120 seconds so the watchdog didn't notice anything.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2010-11-26 22:40 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-22 19:27 Xfs delaylog hanged up Spelic
2010-11-22 23:29 ` Dave Chinner
2010-11-23 11:17 ` Spelic
2010-11-23 13:28 ` Spelic
2010-11-23 20:46 ` Dave Chinner
2010-11-23 22:14 ` Stan Hoeppner
2010-11-24 0:20 ` Dave Chinner
2010-11-24 13:12 ` Spelic
2010-11-24 21:50 ` Dave Chinner
2010-11-23 22:48 ` Emmanuel Florac
2010-11-24 0:36 ` Spelic
2010-11-24 1:40 ` Stan Hoeppner
2010-11-24 6:18 ` Michael Monnerie
2010-11-24 7:44 ` Emmanuel Florac
2010-11-24 0:58 ` Spelic
2010-11-24 5:44 ` Dave Chinner
2010-11-25 23:34 ` Spelic
2010-11-26 4:20 ` Dave Chinner
2010-11-24 22:52 ` Spelic
2010-11-26 22:43 ` Spelic
-- strict thread matches above, loose matches on Subject: below --
2010-11-24 4:03 Richard Scobie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox