public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Xfs delaylog hanged up
@ 2010-11-22 19:27 Spelic
  2010-11-22 23:29 ` Dave Chinner
  2010-11-24 22:52 ` Spelic
  0 siblings, 2 replies; 21+ messages in thread
From: Spelic @ 2010-11-22 19:27 UTC (permalink / raw)
  To: xfs

Hi there
EXCELLENT speed improvement with delaylog! My congratulations.
Now there's a problem: it hanged on me (not sure it was because of 
delaylog, but anyway...)
I am on 2.6.37rc2

16 disk MD raid-5
mkfs line
        mkfs.xfs -l size=128m -d agcount=16 <device>
(then lvm2, all aligned to stripes)
mount line:
        /dev/mapper/perftestvg-xfslv /perftest/xfs xfs 
rw,relatime,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota 
0 0

deadline scheduler, nr_requests 1024

I was doing a parallel-kernel-unpack benchmark:

     /perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
     /perftest/xfs# for i in {1..10} ; do time tar -jxf 
linux-2.6.37-rc2.tar.bz2 -C dir$i & done ; echo waiting now ; time wait; 
echo syncing now ; time sync

     [1] 3400
     [2] 3401
     [3] 3402
     [4] 3404
     [5] 3407
     [6] 3409
     [7] 3410
     [8] 3413
     [9] 3415
     [10] 3417
     waiting now

     [unfortunately I hit ctrl-C here by mistake after a while and it 
might have unlocked the situation]
     [These are the final times for tars: 6 minutes, while it usually 
takes about 1min20sec]

     real    6m34.883s
     user    0m19.770s
     sys     0m5.610s

     real    6m45.220s
     user    0m20.600s
     sys     0m5.750s

     real    6m46.638s
     user    0m20.920s
     sys     0m5.800s

     real    6m46.731s
     user    0m20.640s
     sys     0m5.840s

     real    6m46.743s
     user    0m20.860s
     sys     0m5.950s

     real    6m46.857s
     user    0m20.330s
     sys     0m6.000s

     real    6m47.885s
     user    0m21.000s
     sys     0m6.010s

     real    6m48.718s
     user    0m20.780s
     sys     0m5.700s

     real    6m51.915s
     user    0m19.530s
     sys     0m4.680s

     real    7m0.249s
     user    0m20.330s
     sys     0m6.290s



     Iostat showed everything totally still.
     Computer was responsive for the rest (OS is on other filesystems)
     Look at the stack traces in the 5 minutes they were hanged:

# ps aux | grep tar
     root       875  0.0  0.0  12688   556 ?        S    16:47   0:00 
upstart-udev-bridge --daemon
     root      3403  1.9  0.0  18224  1168 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir1
     root      3405  1.8  0.0  18224  1168 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir2
     root      3406  0.9  0.0  18224  1164 pts/1    D    18:58   0:02 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir3
     root      3408  1.7  0.0  18224  1164 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir4
     root      3411  1.8  0.0  18224  1164 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir5
     root      3412  1.6  0.0  18224  1164 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir6
     root      3414  1.7  0.0  18224  1164 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir7
     root      3416  1.8  0.0  18224  1164 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir8
     root      3418  1.8  0.0  18224  1168 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir9
     root      3419  1.7  0.0  18224  1164 pts/1    D    18:58   0:04 
tar -jxf linux-2.6.37-rc2.tar.bz2 -C dir10
# cat /proc/3403/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
     [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
     [<ffffffff8114de68>] notify_change+0x168/0x2d0
     [<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
     [<ffffffff81134733>] sys_chmod+0x13/0x20
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3405/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
     [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
     [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
     [<ffffffff81137405>] fput+0xd5/0x280
     [<ffffffff81133bf8>] filp_close+0x58/0x90
     [<ffffffff81133ce9>] sys_close+0xb9/0x110
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3406/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02c05ed>] xfs_iomap_write_allocate+0x21d/0x360 [xfs]
     [<ffffffffa02c09bf>] xfs_iomap+0x28f/0x340 [xfs]
     [<ffffffffa02da287>] xfs_map_blocks+0x37/0x40 [xfs]
     [<ffffffffa02dbae0>] xfs_vm_writepage+0x3f0/0x5d0 [xfs]
     [<ffffffff810f00d2>] __writepage+0x12/0x40
     [<ffffffff810f1198>] write_cache_pages+0x1f8/0x490
     [<ffffffff810f144f>] generic_writepages+0x1f/0x30
     [<ffffffffa02da8a8>] xfs_vm_writepages+0x58/0x70 [xfs]
     [<ffffffff810f147c>] do_writepages+0x1c/0x40
     [<ffffffff81158ddd>] writeback_single_inode+0x9d/0x260
     [<ffffffff811591e4>] writeback_sb_inodes+0xd4/0x170
     [<ffffffff81159324>] writeback_inodes_wb+0xa4/0x170
     [<ffffffff810f0a60>] balance_dirty_pages_ratelimited_nr+0x390/0x4a0
     [<ffffffff810e71b3>] generic_file_buffered_write+0x1b3/0x250
     [<ffffffffa02e09b8>] xfs_file_aio_write+0x7b8/0xa60 [xfs]
     [<ffffffff81135582>] do_sync_write+0xd2/0x110
     [<ffffffff81135878>] vfs_write+0xc8/0x190
     [<ffffffff811361ac>] sys_write+0x4c/0x80
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3408/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
     [<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
     [<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
     [<ffffffff81140f97>] vfs_create+0xa7/0xd0
     [<ffffffff811420e8>] do_last+0x658/0x720
     [<ffffffff8114414b>] do_filp_open+0x1fb/0x650
     [<ffffffff81133da3>] do_sys_open+0x63/0x120
     [<ffffffff81133e8b>] sys_open+0x1b/0x20
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3411/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
     [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
     [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
     [<ffffffff81137405>] fput+0xd5/0x280
     [<ffffffff81133bf8>] filp_close+0x58/0x90
     [<ffffffff81133ce9>] sys_close+0xb9/0x110
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3412/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
     [<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
     [<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
     [<ffffffff81140f97>] vfs_create+0xa7/0xd0
     [<ffffffff811420e8>] do_last+0x658/0x720
     [<ffffffff8114414b>] do_filp_open+0x1fb/0x650
     [<ffffffff81133da3>] do_sys_open+0x63/0x120
     [<ffffffff81133e8b>] sys_open+0x1b/0x20
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3414/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
     [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
     [<ffffffff8114de68>] notify_change+0x168/0x2d0
     [<ffffffff8115dc57>] utimes_common+0xd7/0x1b0
     [<ffffffff8115ddc0>] do_utimes+0x90/0xf0
     [<ffffffff8115dece>] sys_utimensat+0x2e/0x80
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3416/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
     [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
     [<ffffffff8114de68>] notify_change+0x168/0x2d0
     [<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
     [<ffffffff81134733>] sys_chmod+0x13/0x20
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3418/stack
     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
     [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
     [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
     [<ffffffff81137405>] fput+0xd5/0x280
     [<ffffffff81133bf8>] filp_close+0x58/0x90
     [<ffffffff81133ce9>] sys_close+0xb9/0x110
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/3419/stack
     [<ffffffffa02c6195>] xlog_grant_log_space+0x2b5/0x5e0 [xfs]
     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
     [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
     [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
     [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
     [<ffffffff81137405>] fput+0xd5/0x280
     [<ffffffff81133bf8>] filp_close+0x58/0x90
     [<ffffffff81133ce9>] sys_close+0xb9/0x110
     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
     [<ffffffffffffffff>] 0xffffffffffffffff



dmesg:
[ 8042.770130] xfsaild/dm-6  D 00000001000b92b8     0  2979      2 
0x00000000
[ 8042.770135]  ffff88082c11ba90 0000000000000046 ffff88083140de08 
0000000000000000
[ 8042.770140]  00000000000138c0 ffff88083dfa5a80 ffff88083dfa5e18 
ffff88082c11bfd8
[ 8042.770144]  ffff88083dfa5e20 00000000000138c0 ffff88082c11a010 
00000000000138c0
[ 8042.770148] Call Trace:
[ 8042.770187]  [<ffffffffa02c4a8f>] 
xlog_state_get_iclog_space+0xdf/0x2c0 [xfs]
[ 8042.770205]  [<ffffffffa02c422a>] ? xlog_sync+0x1fa/0x3d0 [xfs]
[ 8042.770213]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.770231]  [<ffffffffa02c5708>] xlog_write+0x198/0x6c0 [xfs]
[ 8042.770249]  [<ffffffffa02c6fd2>] xlog_cil_push+0x252/0x3b0 [xfs]
[ 8042.770267]  [<ffffffffa02c7636>] xlog_cil_force_lsn+0x116/0x120 [xfs]
[ 8042.770286]  [<ffffffffa02dd3dc>] ? xfs_buf_delwri_queue+0xcc/0x150 [xfs]
[ 8042.770291]  [<ffffffff81033209>] ? default_spin_lock_flags+0x9/0x10
[ 8042.770296]  [<ffffffff810808d2>] ? down_trylock+0x32/0x50
[ 8042.770314]  [<ffffffffa02c4543>] _xfs_log_force+0x63/0x230 [xfs]
[ 8042.770332]  [<ffffffffa02c4933>] xfs_log_force+0x13/0x40 [xfs]
[ 8042.770351]  [<ffffffffa02d38e4>] xfsaild_push+0x3b4/0x400 [xfs]
[ 8042.770357]  [<ffffffff81571c26>] ? schedule_timeout+0x196/0x2f0
[ 8042.770377]  [<ffffffffa02e67a4>] xfsaild+0x74/0xb0 [xfs]
[ 8042.770395]  [<ffffffffa02e6730>] ? xfsaild+0x0/0xb0 [xfs]
[ 8042.770400]  [<ffffffff8107b076>] kthread+0x96/0xa0
[ 8042.770404]  [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
[ 8042.770408]  [<ffffffff8107afe0>] ? kthread+0x0/0xa0
[ 8042.770411]  [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
[ 8042.770414] INFO: task xfssyncd/dm-6:2980 blocked for more than 120 
seconds.
[ 8042.770449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.770501] xfssyncd/dm-6 D ffff88083dfa0000     0  2980      2 
0x00000000
[ 8042.770505]  ffff88015d98fcf0 0000000000000046 ffff88015d98fc80 
ffffffffa02cdb39
[ 8042.770510]  00000000000138c0 ffff88083dfa0000 ffff88083dfa0398 
ffff88015d98ffd8
[ 8042.770514]  ffff88083dfa03a0 00000000000138c0 ffff88015d98e010 
00000000000138c0
[ 8042.770518] Call Trace:
[ 8042.770537]  [<ffffffffa02cdb39>] ? xfs_perag_get_tag+0x39/0xc0 [xfs]
[ 8042.770555]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.770574]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.770578]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.770595]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.770614]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.770631]  [<ffffffffa02b5ac9>] xfs_fs_log_dummy+0x49/0xb0 [xfs]
[ 8042.770649]  [<ffffffffa02c3c46>] ? xfs_log_need_covered+0x66/0xd0 [xfs]
[ 8042.770668]  [<ffffffffa02e82e3>] xfs_sync_worker+0x83/0x90 [xfs]
[ 8042.770686]  [<ffffffffa02e70b3>] xfssyncd+0x183/0x230 [xfs]
[ 8042.770705]  [<ffffffffa02e6f30>] ? xfssyncd+0x0/0x230 [xfs]
[ 8042.770709]  [<ffffffff8107b076>] kthread+0x96/0xa0
[ 8042.770712]  [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
[ 8042.770715]  [<ffffffff8107afe0>] ? kthread+0x0/0xa0
[ 8042.770718]  [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
[ 8042.770721] INFO: task flush-252:6:3320 blocked for more than 120 
seconds.
[ 8042.770756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.770808] flush-252:6   D 00000001000b92b0     0  3320      2 
0x00000000
[ 8042.770812]  ffff8808329776b0 0000000000000046 0000000000000286 
ffff880800000000
[ 8042.770817]  00000000000138c0 ffff88083dbe96a0 ffff88083dbe9a38 
ffff880832977fd8
[ 8042.770821]  ffff88083dbe9a40 00000000000138c0 ffff880832976010 
00000000000138c0
[ 8042.770825] Call Trace:
[ 8042.770843]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.770861]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.770865]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.770883]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.770901]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.770920]  [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.770937]  [<ffffffffa02c05ed>] 
xfs_iomap_write_allocate+0x21d/0x360 [xfs]
[ 8042.770957]  [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50 
[xfs]
[ 8042.770974]  [<ffffffffa02c09bf>] xfs_iomap+0x28f/0x340 [xfs]
[ 8042.770992]  [<ffffffffa02da287>] xfs_map_blocks+0x37/0x40 [xfs]
[ 8042.771011]  [<ffffffffa02dbae0>] xfs_vm_writepage+0x3f0/0x5d0 [xfs]
[ 8042.771017]  [<ffffffff810f00d2>] __writepage+0x12/0x40
[ 8042.771020]  [<ffffffff810f1198>] write_cache_pages+0x1f8/0x490
[ 8042.771024]  [<ffffffff810f00c0>] ? __writepage+0x0/0x40
[ 8042.771027]  [<ffffffff8157130c>] ? schedule+0x44c/0xa80
[ 8042.771045]  [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50 
[xfs]
[ 8042.771050]  [<ffffffff810f144f>] generic_writepages+0x1f/0x30
[ 8042.771068]  [<ffffffffa02da8a8>] xfs_vm_writepages+0x58/0x70 [xfs]
[ 8042.771071]  [<ffffffff810f147c>] do_writepages+0x1c/0x40
[ 8042.771077]  [<ffffffff81158ddd>] writeback_single_inode+0x9d/0x260
[ 8042.771080]  [<ffffffff811591e4>] writeback_sb_inodes+0xd4/0x170
[ 8042.771083]  [<ffffffff81158258>] ? queue_io+0x98/0x1b0
[ 8042.771087]  [<ffffffff81159324>] writeback_inodes_wb+0xa4/0x170
[ 8042.771090]  [<ffffffff8115967b>] wb_writeback+0x28b/0x400
[ 8042.771094]  [<ffffffff8105477b>] ? dequeue_task_fair+0x8b/0x90
[ 8042.771099]  [<ffffffff810095fb>] ? __switch_to+0xcb/0x350
[ 8042.771103]  [<ffffffff812a7ab9>] ? __percpu_counter_sum+0x69/0x80
[ 8042.771106]  [<ffffffff8115998b>] wb_do_writeback+0x19b/0x1e0
[ 8042.771110]  [<ffffffff81159a7a>] bdi_writeback_thread+0xaa/0x270
[ 8042.771113]  [<ffffffff811599d0>] ? bdi_writeback_thread+0x0/0x270
[ 8042.771117]  [<ffffffff8107b076>] kthread+0x96/0xa0
[ 8042.771120]  [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
[ 8042.771123]  [<ffffffff8107afe0>] ? kthread+0x0/0xa0
[ 8042.771126]  [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10
[ 8042.771130] INFO: task tar:3403 blocked for more than 120 seconds.
[ 8042.771163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.771216] tar           D ffff88083d04ad40     0  3403   3400 
0x00000000
[ 8042.771220]  ffff88015e7a9c68 0000000000000082 0000000000000001 
ffff880832815f00
[ 8042.771224]  00000000000138c0 ffff88083d04ad40 ffff88083d04b0d8 
ffff88015e7a9fd8
[ 8042.771228]  ffff88083d04b0e0 00000000000138c0 ffff88015e7a8010 
00000000000138c0
[ 8042.771232] Call Trace:
[ 8042.771250]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.771268]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.771272]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.771290]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.771308]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.771326]  [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.771345]  [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[ 8042.771363]  [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[ 8042.771368]  [<ffffffff8114de68>] notify_change+0x168/0x2d0
[ 8042.771373]  [<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
[ 8042.771377]  [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.771381]  [<ffffffff811345e8>] ? sys_chown+0x68/0x90
[ 8042.771384]  [<ffffffff81134733>] sys_chmod+0x13/0x20
[ 8042.771387]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.771390] INFO: task tar:3405 blocked for more than 120 seconds.
[ 8042.771424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.771476] tar           D 00000001000b92b6     0  3405   3401 
0x00000000
[ 8042.771480]  ffff88018e975cc8 0000000000000086 ffff88018e975cd8 
0000000000000000
[ 8042.771484]  00000000000138c0 ffff880832b98000 ffff880832b98398 
ffff88018e975fd8
[ 8042.771489]  ffff880832b983a0 00000000000138c0 ffff88018e974010 
00000000000138c0
[ 8042.771493] Call Trace:
[ 8042.771510]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.771528]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.771532]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.771550]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.771568]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.771587]  [<ffffffffa02e0dab>] ? xfs_tosspages+0x1b/0x20 [xfs]
[ 8042.771605]  [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[ 8042.771624]  [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[ 8042.771642]  [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[ 8042.771646]  [<ffffffff81137405>] fput+0xd5/0x280
[ 8042.771650]  [<ffffffff81133bf8>] filp_close+0x58/0x90
[ 8042.771653]  [<ffffffff81133ce9>] sys_close+0xb9/0x110
[ 8042.771656]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.771659] INFO: task tar:3406 blocked for more than 120 seconds.
[ 8042.771692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.771745] tar           D 00000001000b92b0     0  3406   3402 
0x00000008
[ 8042.771749]  ffff8801202c94c8 0000000000000086 0000000000000282 
ffff880800000001
[ 8042.771754]  00000000000138c0 ffff880832af2d40 ffff880832af30d8 
ffff8801202c9fd8
[ 8042.771758]  ffff880832af30e0 00000000000138c0 ffff8801202c8010 
00000000000138c0
[ 8042.771762] Call Trace:
[ 8042.771779]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.771798]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.771801]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.771819]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.771838]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.771856]  [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.771874]  [<ffffffffa02c05ed>] 
xfs_iomap_write_allocate+0x21d/0x360 [xfs]
[ 8042.771892]  [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50 
[xfs]
[ 8042.771910]  [<ffffffffa02c09bf>] xfs_iomap+0x28f/0x340 [xfs]
[ 8042.771928]  [<ffffffffa02da287>] xfs_map_blocks+0x37/0x40 [xfs]
[ 8042.771946]  [<ffffffffa02ddc18>] ? _xfs_buf_lookup_pages+0x288/0x360 
[xfs]
[ 8042.771965]  [<ffffffffa02dbae0>] xfs_vm_writepage+0x3f0/0x5d0 [xfs]
[ 8042.771969]  [<ffffffff810f00d2>] __writepage+0x12/0x40
[ 8042.771972]  [<ffffffff810f1198>] write_cache_pages+0x1f8/0x490
[ 8042.771975]  [<ffffffff810f00c0>] ? __writepage+0x0/0x40
[ 8042.771993]  [<ffffffffa02bea94>] ? xfs_iflush+0x214/0x220 [xfs]
[ 8042.772011]  [<ffffffffa02dd86e>] ? xfs_bdwrite+0x3e/0x80 [xfs]
[ 8042.772030]  [<ffffffffa02d31f1>] ? xfs_trans_unlocked_item+0x31/0x50 
[xfs]
[ 8042.772034]  [<ffffffff810f144f>] generic_writepages+0x1f/0x30
[ 8042.772052]  [<ffffffffa02da8a8>] xfs_vm_writepages+0x58/0x70 [xfs]
[ 8042.772056]  [<ffffffff810f147c>] do_writepages+0x1c/0x40
[ 8042.772059]  [<ffffffff81158ddd>] writeback_single_inode+0x9d/0x260
[ 8042.772063]  [<ffffffff811591e4>] writeback_sb_inodes+0xd4/0x170
[ 8042.772066]  [<ffffffff81159324>] writeback_inodes_wb+0xa4/0x170
[ 8042.772070]  [<ffffffff810f0a60>] 
balance_dirty_pages_ratelimited_nr+0x390/0x4a0
[ 8042.772074]  [<ffffffff810e6db1>] ? 
iov_iter_copy_from_user_atomic+0x91/0x160
[ 8042.772077]  [<ffffffff810e71b3>] generic_file_buffered_write+0x1b3/0x250
[ 8042.772097]  [<ffffffffa02e09b8>] xfs_file_aio_write+0x7b8/0xa60 [xfs]
[ 8042.772101]  [<ffffffff81135582>] do_sync_write+0xd2/0x110
[ 8042.772106]  [<ffffffff81243a4e>] ? security_file_permission+0x1e/0x90
[ 8042.772109]  [<ffffffff81135878>] vfs_write+0xc8/0x190
[ 8042.772112]  [<ffffffff811361ac>] sys_write+0x4c/0x80
[ 8042.772116]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.772119] INFO: task tar:3408 blocked for more than 120 seconds.
[ 8042.772152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.772205] tar           D 00000001000b92b0     0  3408   3404 
0x00000000
[ 8042.772209]  ffff88082c331a98 0000000000000082 0000000000000000 
0000000000000000
[ 8042.772213]  00000000000138c0 ffff88083cca43e0 ffff88083cca4778 
ffff88082c331fd8
[ 8042.772217]  ffff88083cca4780 00000000000138c0 ffff88082c330010 
00000000000138c0
[ 8042.772221] Call Trace:
[ 8042.772239]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.772257]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.772261]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.772279]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.772297]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.772315]  [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.772334]  [<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
[ 8042.772338]  [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.772356]  [<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
[ 8042.772375]  [<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
[ 8042.772379]  [<ffffffff81140f97>] vfs_create+0xa7/0xd0
[ 8042.772382]  [<ffffffff811420e8>] do_last+0x658/0x720
[ 8042.772386]  [<ffffffff8114414b>] do_filp_open+0x1fb/0x650
[ 8042.772389]  [<ffffffff81243286>] ? security_d_instantiate+0x16/0x30
[ 8042.772393]  [<ffffffff8114a371>] ? d_instantiate+0x51/0x60
[ 8042.772397]  [<ffffffff8114ef92>] ? alloc_fd+0x102/0x150
[ 8042.772400]  [<ffffffff81133da3>] do_sys_open+0x63/0x120
[ 8042.772403]  [<ffffffff8100c15d>] ? math_state_restore+0x3d/0x60
[ 8042.772407]  [<ffffffff81133e8b>] sys_open+0x1b/0x20
[ 8042.772410]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.772413] INFO: task tar:3411 blocked for more than 120 seconds.
[ 8042.772446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.772499] tar           D ffff88083d40ad40     0  3411   3407 
0x00000000
[ 8042.772503]  ffff880832545cc8 0000000000000086 ffff880832545cd8 
0000000000000004
[ 8042.772507]  00000000000138c0 ffff88083d40ad40 ffff88083d40b0d8 
ffff880832545fd8
[ 8042.772512]  ffff88083d40b0e0 00000000000138c0 ffff880832544010 
00000000000138c0
[ 8042.772516] Call Trace:
[ 8042.772533]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.772551]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.772555]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.772573]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.772591]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.772610]  [<ffffffffa02e0dab>] ? xfs_tosspages+0x1b/0x20 [xfs]
[ 8042.772628]  [<ffffffffa02d6db5>] xfs_free_eofblocks+0x175/0x270 [xfs]
[ 8042.772646]  [<ffffffffa02d782d>] xfs_release+0x10d/0x1c0 [xfs]
[ 8042.772665]  [<ffffffffa02dfa90>] xfs_file_release+0x10/0x20 [xfs]
[ 8042.772669]  [<ffffffff81137405>] fput+0xd5/0x280
[ 8042.772672]  [<ffffffff81133bf8>] filp_close+0x58/0x90
[ 8042.772675]  [<ffffffff81133ce9>] sys_close+0xb9/0x110
[ 8042.772678]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.772681] INFO: task tar:3412 blocked for more than 120 seconds.
[ 8042.772714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.772767] tar           D 00000001000b92ae     0  3412   3409 
0x00000000
[ 8042.772771]  ffff8802b0811a98 0000000000000082 0000000000000000 
0000000000000000
[ 8042.772775]  00000000000138c0 ffff8808400aad40 ffff8808400ab0d8 
ffff8802b0811fd8
[ 8042.772780]  ffff8808400ab0e0 00000000000138c0 ffff8802b0810010 
00000000000138c0
[ 8042.772784] Call Trace:
[ 8042.772802]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.772820]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.772824]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.772841]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.772860]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.772878]  [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.772896]  [<ffffffffa02d666b>] xfs_create+0x17b/0x650 [xfs]
[ 8042.772913]  [<ffffffffa02b8ef6>] ? xfs_iunlock+0x96/0x130 [xfs]
[ 8042.772932]  [<ffffffffa02d6bcd>] ? xfs_lookup+0x8d/0x100 [xfs]
[ 8042.772950]  [<ffffffffa02e3868>] xfs_vn_mknod+0xa8/0x1c0 [xfs]
[ 8042.772969]  [<ffffffffa02e399b>] xfs_vn_create+0xb/0x10 [xfs]
[ 8042.772972]  [<ffffffff81140f97>] vfs_create+0xa7/0xd0
[ 8042.772976]  [<ffffffff811420e8>] do_last+0x658/0x720
[ 8042.772979]  [<ffffffff8114414b>] do_filp_open+0x1fb/0x650
[ 8042.772983]  [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.772986]  [<ffffffff815737e9>] ? _raw_spin_lock+0x9/0x10
[ 8042.772990]  [<ffffffff8114ef92>] ? alloc_fd+0x102/0x150
[ 8042.772993]  [<ffffffff81133da3>] do_sys_open+0x63/0x120
[ 8042.772996]  [<ffffffff8100c15d>] ? math_state_restore+0x3d/0x60
[ 8042.773000]  [<ffffffff81133e8b>] sys_open+0x1b/0x20
[ 8042.773003]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
[ 8042.773006] INFO: task tar:3414 blocked for more than 120 seconds.
[ 8042.773039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 8042.773092] tar           D 00000001000b92aa     0  3414   3410 
0x00000000
[ 8042.773096]  ffff8802b0a19c08 0000000000000086 ffff88015d090800 
0000000500000000
[ 8042.773100]  00000000000138c0 ffff880832af0000 ffff880832af0398 
ffff8802b0a19fd8
[ 8042.773104]  ffff880832af03a0 00000000000138c0 ffff8802b0a18010 
00000000000138c0
[ 8042.773108] Call Trace:
[ 8042.773126]  [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
[ 8042.773144]  [<ffffffffa02d97e5>] ? kmem_zone_zalloc+0x35/0x50 [xfs]
[ 8042.773148]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
[ 8042.773166]  [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
[ 8042.773184]  [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
[ 8042.773202]  [<ffffffffa02d28b7>] ? xfs_trans_alloc+0x97/0xb0 [xfs]
[ 8042.773221]  [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
[ 8042.773239]  [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
[ 8042.773243]  [<ffffffff8114de68>] notify_change+0x168/0x2d0
[ 8042.773247]  [<ffffffff8115dc57>] utimes_common+0xd7/0x1b0
[ 8042.773252]  [<ffffffff81292a3d>] ? _atomic_dec_and_lock+0x4d/0x80
[ 8042.773255]  [<ffffffff8115ddc0>] do_utimes+0x90/0xf0
[ 8042.773259]  [<ffffffff8115dece>] sys_utimensat+0x2e/0x80
[ 8042.773262]  [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b


What do you think?
Is it the delaylog? is it not stable?

Thank you


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-22 19:27 Spelic
@ 2010-11-22 23:29 ` Dave Chinner
  2010-11-23 11:17   ` Spelic
  2010-11-24 22:52 ` Spelic
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2010-11-22 23:29 UTC (permalink / raw)
  To: Spelic; +Cc: xfs

On Mon, Nov 22, 2010 at 08:27:14PM +0100, Spelic wrote:
> Hi there
> EXCELLENT speed improvement with delaylog! My congratulations.
> Now there's a problem: it hanged on me (not sure it was because of
> delaylog, but anyway...)
> I am on 2.6.37rc2
> 
> 16 disk MD raid-5

What is the storage hardware and the MD raid5 configuration?

> mkfs line
>        mkfs.xfs -l size=128m -d agcount=16 <device>
> (then lvm2, all aligned to stripes)
> mount line:
>        /dev/mapper/perftestvg-xfslv /perftest/xfs xfs rw,relatime,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota
> 0 0

FYI, relatime, attr2, logbufs=8 and noquota are default values that
you don't need to specify.

> deadline scheduler, nr_requests 1024
> 
> I was doing a parallel-kernel-unpack benchmark:
> 
>     /perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
>     /perftest/xfs# for i in {1..10} ; do time tar -jxf
> linux-2.6.37-rc2.tar.bz2 -C dir$i & done ; echo waiting now ; time
> wait; echo syncing now ; time sync

Thanks for the test case - I'll try to reproduce this. One more key
piece of information I need: the output of mkfs.xfs so I can see your
filesystem geometry. if you have the filesystem mounted, then the
output of 'xfs_info <mntpt>' will give the same information.

>     Iostat showed everything totally still.
>     Computer was responsive for the rest (OS is on other filesystems)
>     Look at the stack traces in the 5 minutes they were hanged:
....
> # cat /proc/3403/stack
>     [<ffffffffa02c607b>] xlog_grant_log_space+0x19b/0x5e0 [xfs]
>     [<ffffffffa02c65be>] xfs_log_reserve+0xfe/0x160 [xfs]
>     [<ffffffffa02d25cb>] xfs_trans_reserve+0x9b/0x210 [xfs]
>     [<ffffffffa02d80fa>] xfs_setattr+0x81a/0xa10 [xfs]
>     [<ffffffffa02e3df6>] xfs_vn_setattr+0x16/0x20 [xfs]
>     [<ffffffff8114de68>] notify_change+0x168/0x2d0
>     [<ffffffff811346e7>] sys_fchmodat+0xd7/0x110
>     [<ffffffff81134733>] sys_chmod+0x13/0x20
>     [<ffffffff8100af42>] system_call_fastpath+0x16/0x1b
>     [<ffffffffffffffff>] 0xffffffffffffffff

Ok, so it is stuck with a full log....

> dmesg:
> [ 8042.770130] xfsaild/dm-6  D 00000001000b92b8     0  2979      2
> 0x00000000
> [ 8042.770135]  ffff88082c11ba90 0000000000000046 ffff88083140de08
> 0000000000000000
> [ 8042.770140]  00000000000138c0 ffff88083dfa5a80 ffff88083dfa5e18
> ffff88082c11bfd8
> [ 8042.770144]  ffff88083dfa5e20 00000000000138c0 ffff88082c11a010
> 00000000000138c0
> [ 8042.770148] Call Trace:
> [ 8042.770187]  [<ffffffffa02c4a8f>]
> xlog_state_get_iclog_space+0xdf/0x2c0 [xfs]
> [ 8042.770205]  [<ffffffffa02c422a>] ? xlog_sync+0x1fa/0x3d0 [xfs]
> [ 8042.770213]  [<ffffffff810538d0>] ? default_wake_function+0x0/0x10
> [ 8042.770231]  [<ffffffffa02c5708>] xlog_write+0x198/0x6c0 [xfs]
> [ 8042.770249]  [<ffffffffa02c6fd2>] xlog_cil_push+0x252/0x3b0 [xfs]
> [ 8042.770267]  [<ffffffffa02c7636>] xlog_cil_force_lsn+0x116/0x120 [xfs]
> [ 8042.770286]  [<ffffffffa02dd3dc>] ? xfs_buf_delwri_queue+0xcc/0x150 [xfs]
> [ 8042.770291]  [<ffffffff81033209>] ? default_spin_lock_flags+0x9/0x10
> [ 8042.770296]  [<ffffffff810808d2>] ? down_trylock+0x32/0x50
> [ 8042.770314]  [<ffffffffa02c4543>] _xfs_log_force+0x63/0x230 [xfs]
> [ 8042.770332]  [<ffffffffa02c4933>] xfs_log_force+0x13/0x40 [xfs]
> [ 8042.770351]  [<ffffffffa02d38e4>] xfsaild_push+0x3b4/0x400 [xfs]
> [ 8042.770357]  [<ffffffff81571c26>] ? schedule_timeout+0x196/0x2f0
> [ 8042.770377]  [<ffffffffa02e67a4>] xfsaild+0x74/0xb0 [xfs]
> [ 8042.770395]  [<ffffffffa02e6730>] ? xfsaild+0x0/0xb0 [xfs]
> [ 8042.770400]  [<ffffffff8107b076>] kthread+0x96/0xa0
> [ 8042.770404]  [<ffffffff8100bd64>] kernel_thread_helper+0x4/0x10
> [ 8042.770408]  [<ffffffff8107afe0>] ? kthread+0x0/0xa0
> [ 8042.770411]  [<ffffffff8100bd60>] ? kernel_thread_helper+0x0/0x10

That's the key thread that frees up space in the log by writing out
metadata buffers. But it itself looks to be stuck trying to force
the log out. It's waiting for IO completion on a log buffer to
complete.

> What do you think?

I'll need to reproduce this to determine the cause.

> Is it the delaylog?

No idea yet.

> is it not stable?

It is stable enough that all my local machines are running with
delaylog. And as a dev team, we have enough confidence in it to
remove the EXPERIMENTAL status for the 2.6.37 release. That doesn't
mean it is perfectly bug free - it just means more people will start
to use it.  And as this happens, we will find new problems in
different environments (just like you have), but this is a pretty
normal occurrence for new code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-22 23:29 ` Dave Chinner
@ 2010-11-23 11:17   ` Spelic
  2010-11-23 13:28     ` Spelic
  2010-11-23 20:46     ` Dave Chinner
  0 siblings, 2 replies; 21+ messages in thread
From: Spelic @ 2010-11-23 11:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/23/2010 12:29 AM, Dave Chinner wrote:
>> 16 disk MD raid-5
>>      
> What is the storage hardware and the MD raid5 configuration?
>    

Tyan motherboard with 5400 chipset
dual Xeon E5420

16 disks on this one:
05:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
         Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
         Flags: bus master, fast devsel, latency 0, IRQ 16
         Memory at ce000000 (64-bit, prefetchable) [size=32M]
         Memory at d2600000 (64-bit, non-prefetchable) [size=4K]
         I/O ports at 3000 [size=256]
         [virtual] Expansion ROM at d26e0000 [disabled] [size=128K]
         Capabilities: <access denied>
         Kernel driver in use: 3w-9xxx
         Kernel modules: 3w-9xxx

mdadm --create /dev/md5 --raid-devices=16 --chunk=1024 --level=5 
--metadata=1.0 pci-0000:05:00.0-scsi-0:0:{15..0}:0 
--name=backup:perftest --assume-clean

>> mount line:
>>         /dev/mapper/perftestvg-xfslv /perftest/xfs xfs rw,relatime,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota
>> 0 0
>>      
> FYI, relatime, attr2, logbufs=8 and noquota are default values that
> you don't need to specify.
>    

ok, that was the /proc/mounts line..

mkfs.xfs -l size=128m -d agcount=16 <device>

mount -o delaylog,nobarrier,inode64,logbufs=8,logbsize=256k <device> 
<mountpoint>


> Thanks for the test case - I'll try to reproduce this. One more key
> piece of information I need: the output of mkfs.xfs so I can see your
> filesystem geometry. if you have the filesystem mounted, then the
> output of 'xfs_info<mntpt>' will give the same information.
>    

/# xfs_info /perftest/xfs/
meta-data=/dev/mapper/perftestvg-xfslv isize=256    agcount=16, 
agsize=50331648 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=805306368, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

> It is stable enough that all my local machines are running with
> delaylog. And as a dev team, we have enough confidence in it to
> remove the EXPERIMENTAL status for the 2.6.37 release. That doesn't
> mean it is perfectly bug free - it just means more people will start
> to use it.  And as this happens, we will find new problems in
> different environments (just like you have), but this is a pretty
> normal occurrence for new code.
>    

ok I understand...

thanks for your work!


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 11:17   ` Spelic
@ 2010-11-23 13:28     ` Spelic
  2010-11-23 20:46     ` Dave Chinner
  1 sibling, 0 replies; 21+ messages in thread
From: Spelic @ 2010-11-23 13:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/23/2010 12:17 PM, Spelic wrote:
> mdadm --create /dev/md5 --raid-devices=16 --chunk=1024 --level=5 
> --metadata=1.0 pci-0000:05:00.0-scsi-0:0:{15..0}:0 
> --name=backup:perftest --assume-clean

I forgot one setting: md/stripe_cache_size was set to 32768
I forgot to check stripe_cache_active during the hang... but I don't 
think it was an MD bug
Regards

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 11:17   ` Spelic
  2010-11-23 13:28     ` Spelic
@ 2010-11-23 20:46     ` Dave Chinner
  2010-11-23 22:14       ` Stan Hoeppner
                         ` (3 more replies)
  1 sibling, 4 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-23 20:46 UTC (permalink / raw)
  To: Spelic; +Cc: xfs

On Tue, Nov 23, 2010 at 12:17:41PM +0100, Spelic wrote:
> On 11/23/2010 12:29 AM, Dave Chinner wrote:
> >>16 disk MD raid-5
> >What is the storage hardware and the MD raid5 configuration?
> 
> Tyan motherboard with 5400 chipset
> dual Xeon E5420
> 
> 16 disks on this one:
> 05:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
>         Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
>         Flags: bus master, fast devsel, latency 0, IRQ 16
>         Memory at ce000000 (64-bit, prefetchable) [size=32M]
>         Memory at d2600000 (64-bit, non-prefetchable) [size=4K]
>         I/O ports at 3000 [size=256]
>         [virtual] Expansion ROM at d26e0000 [disabled] [size=128K]
>         Capabilities: <access denied>
>         Kernel driver in use: 3w-9xxx
>         Kernel modules: 3w-9xxx

Hmmmm. We get plenty of reports about problems with 3ware RAID
controllers, many of which are RAID controller problems. Can you
make sure you are running the latest firmware on the controller?

> /# xfs_info /perftest/xfs/
> meta-data=/dev/mapper/perftestvg-xfslv isize=256    agcount=16,
> agsize=50331648 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=805306368, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime =none                   extsz=4096   blocks=0, rtextents=0

Nothing unusual there.

I've been unable to reproduce the problem with your test case (been
running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
trying to reproduce it for a while. I note that the load is
generating close to 10,000 iops on my test system, so it may very
well be triggering load related problems in your raid controller...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 20:46     ` Dave Chinner
@ 2010-11-23 22:14       ` Stan Hoeppner
  2010-11-24  0:20         ` Dave Chinner
  2010-11-23 22:48       ` Emmanuel Florac
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Stan Hoeppner @ 2010-11-23 22:14 UTC (permalink / raw)
  To: xfs

Dave Chinner put forth on 11/23/2010 2:46 PM:

> I've been unable to reproduce the problem with your test case (been
> running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> trying to reproduce it for a while. I note that the load is
> generating close to 10,000 iops on my test system, so it may very
> well be triggering load related problems in your raid controller...

Somewhat off topic, but how are you generating 10,000 IOPS by carving a
16TB LUN/volume from 12 x 2TB SATA disk spindles?  Such drives aren't
even capable of 200 seeks per second.  Even if they were you'd top out
at less than 2,500 IOPS (random).  16TB/12=1.33 TB per disk.  No such
capacity disk exists.  So I assume you're using 12 x 2TB disks and
slicing/dicing out 16TB.  What am I missing Dave?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 20:46     ` Dave Chinner
  2010-11-23 22:14       ` Stan Hoeppner
@ 2010-11-23 22:48       ` Emmanuel Florac
  2010-11-24  0:36         ` Spelic
  2010-11-24  0:58       ` Spelic
  2010-11-25 23:34       ` Spelic
  3 siblings, 1 reply; 21+ messages in thread
From: Emmanuel Florac @ 2010-11-23 22:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Spelic

Le Wed, 24 Nov 2010 07:46:09 +1100 vous écriviez:

> Hmmmm. We get plenty of reports about problems with 3ware RAID
> controllers, many of which are RAID controller problems. Can you
> make sure you are running the latest firmware on the controller?

So Spelic, you're using the 3Ware as a SATA controller, but not a RAID
controller? This is quite unexpected, though going with md
instead provides better IOPS. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 22:14       ` Stan Hoeppner
@ 2010-11-24  0:20         ` Dave Chinner
  2010-11-24 13:12           ` Spelic
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2010-11-24  0:20 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Tue, Nov 23, 2010 at 04:14:16PM -0600, Stan Hoeppner wrote:
> Dave Chinner put forth on 11/23/2010 2:46 PM:
> 
> > I've been unable to reproduce the problem with your test case (been
> > running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> > trying to reproduce it for a while. I note that the load is
> > generating close to 10,000 iops on my test system, so it may very
> > well be triggering load related problems in your raid controller...
> 
> Somewhat off topic, but how are you generating 10,000 IOPS by carving a
> 16TB LUN/volume from 12 x 2TB SATA disk spindles?  Such drives aren't
> even capable of 200 seeks per second.  Even if they were you'd top out
> at less than 2,500 IOPS (random).  16TB/12=1.33 TB per disk.  No such
> capacity disk exists.  So I assume you're using 12 x 2TB disks and
> slicing/dicing out 16TB.  What am I missing Dave?

512MB of BBWC backing the disks. The BBWC does a much better job of
reordering out-of-order writes than the Linux elevators because
512MB is a much bigger window than a couple of thousand 4k IOs.
Hence metadata/small file intensive workloads go much faster than
you'd expect from just looking at the IO patterns and the capability
of the disks.

IOWs, for write workloads that are not purely random, the disk
subsystem behaves more like an SSD than a RAID0 array of spinning
rust...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 22:48       ` Emmanuel Florac
@ 2010-11-24  0:36         ` Spelic
  2010-11-24  1:40           ` Stan Hoeppner
                             ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Spelic @ 2010-11-24  0:36 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On 11/23/2010 11:48 PM, Emmanuel Florac wrote:
> So Spelic, you're using the 3Ware as a SATA controller, but not a RAID
> controller? This is quite unexpected, though going with md
> instead provides better IOPS.
>    

Yep, because I know MD better, I can see the code, it's more flexible 
and customizable, can be controlled with scripts more easily... Also 
parameters to the underlying drives can be controlled if I use MD, and 
it's easier to align filesystem to stripes. Also my time is better spent 
learning it as MD won't change the next time I buy a different hw 
controller.
What would be the advantage of using hardware raid?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 20:46     ` Dave Chinner
  2010-11-23 22:14       ` Stan Hoeppner
  2010-11-23 22:48       ` Emmanuel Florac
@ 2010-11-24  0:58       ` Spelic
  2010-11-24  5:44         ` Dave Chinner
  2010-11-25 23:34       ` Spelic
  3 siblings, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-24  0:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/23/2010 09:46 PM, Dave Chinner wrote:
> Hmmmm. We get plenty of reports about problems with 3ware RAID
> controllers, many of which are RAID controller problems. Can you
> make sure you are running the latest firmware on the controller?
>    

No, sorry, my firmware is: FE9X 4.06.00.004

But when controllers hang, there is usually something in dmesg, and in 
my case there wasn't. Then after a while it resets (it has something 
like a watchdog in it).
In the past during testing I did have reproducible hangups on high load 
with these controllers (seemed like a lost interrupt), but they were 
fixed by disabling NCQ.
The controller would reset in those cases, drives caches would reset to 
"off", and there were entries in dmesg.
But that issue was definitely fixed by disabling NCQ: I tested many 
times with and without NCQ with reproducible results; and after that we 
had reliable operation for more than 1 year on that machine.

> I've been unable to reproduce the problem with your test case (been
> running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> trying to reproduce it for a while.

It seems to me that 12 disk raid0 dm is quite different from 16 disk md 
raid5 array because you don't have the stripe cache and there are likely 
to be fewer in-flight operations, if it was a pool of something which 
was drained you might not hit it... But I understand that you had the 
raid0 array already up :-D I'll see if I can reproduce this but I can't 
guarantee: the machine should go back to production very soon.
If I hit it again, what should I look at?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24  0:36         ` Spelic
@ 2010-11-24  1:40           ` Stan Hoeppner
  2010-11-24  6:18           ` Michael Monnerie
  2010-11-24  7:44           ` Emmanuel Florac
  2 siblings, 0 replies; 21+ messages in thread
From: Stan Hoeppner @ 2010-11-24  1:40 UTC (permalink / raw)
  To: xfs

Spelic put forth on 11/23/2010 6:36 PM:
> On 11/23/2010 11:48 PM, Emmanuel Florac wrote:
>> So Spelic, you're using the 3Ware as a SATA controller, but not a RAID
>> controller? This is quite unexpected, though going with md
>> instead provides better IOPS.
>>    
> 
> Yep, because I know MD better, I can see the code, it's more flexible
> and customizable, can be controlled with scripts more easily... Also
> parameters to the underlying drives can be controlled if I use MD, and
> it's easier to align filesystem to stripes. Also my time is better spent
> learning it as MD won't change the next time I buy a different hw
> controller.
> What would be the advantage of using hardware raid?

If there is no advantage then why buy an $800+ real hardware RAID card
and then deploy it as a vanilla SAS/SATA HBA with software RAID?  What a
waste of a decent RAID card...

You could have gone with two of these instead
http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157

saving ~$500 and having similar if not slightly better performance with
software RAID.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
@ 2010-11-24  4:03 Richard Scobie
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Scobie @ 2010-11-24  4:03 UTC (permalink / raw)
  To: xfs

Spelic wrote:

 > Yep, because I know MD better, I can see the code, it's more flexible 
 > and customizable, can be controlled with scripts more easily... Also
 > parameters to the underlying drives can be controlled if I use MD, 
and > it's easier to align filesystem to stripes. Also my time is better

Slightly off topic, but your previously mentioned xfs_info shows sunit 
and swidth values of zero, so your filesystem is not aligned to stripes.

/# xfs_info /perftest/xfs/
 > meta-data=/dev/mapper/perftestvg-xfslv isize=256    agcount=16,
 > agsize=50331648 blks
 >          =                       sectsz=512   attr=2
 > data     =                       bsize=4096   blocks=805306368, imaxpct=5
 >          =                       sunit=0      swidth=0 blks

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24  0:58       ` Spelic
@ 2010-11-24  5:44         ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-24  5:44 UTC (permalink / raw)
  To: Spelic; +Cc: xfs

On Wed, Nov 24, 2010 at 01:58:25AM +0100, Spelic wrote:
> On 11/23/2010 09:46 PM, Dave Chinner wrote:
> >Hmmmm. We get plenty of reports about problems with 3ware RAID
> >controllers, many of which are RAID controller problems. Can you
> >make sure you are running the latest firmware on the controller?
> 
> No, sorry, my firmware is: FE9X 4.06.00.004
> 
> But when controllers hang, there is usually something in dmesg, and
> in my case there wasn't. Then after a while it resets (it has
> something like a watchdog in it).
> In the past during testing I did have reproducible hangups on high
> load with these controllers (seemed like a lost interrupt), but they
> were fixed by disabling NCQ.
> The controller would reset in those cases, drives caches would reset
> to "off", and there were entries in dmesg.
> But that issue was definitely fixed by disabling NCQ: I tested many
> times with and without NCQ with reproducible results; and after that
> we had reliable operation for more than 1 year on that machine.
> 
> >I've been unable to reproduce the problem with your test case (been
> >running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> >trying to reproduce it for a while.
> 
> It seems to me that 12 disk raid0 dm is quite different from 16 disk
> md raid5 array because you don't have the stripe cache and there are
> likely to be fewer in-flight operations, if it was a pool of
> something which was drained you might not hit it...

But if that is the cause, then it would indicate an MD problem
rather than an XFS problem. Testing on a similar but slightly
different configuration helps isolate where the problem may lie.

As it is, it is probably caused by the same bug as the problem Nick
reported. See my last post in the thread "XFS performance oddity"
for the patch that fixes the hang Nick reported.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24  0:36         ` Spelic
  2010-11-24  1:40           ` Stan Hoeppner
@ 2010-11-24  6:18           ` Michael Monnerie
  2010-11-24  7:44           ` Emmanuel Florac
  2 siblings, 0 replies; 21+ messages in thread
From: Michael Monnerie @ 2010-11-24  6:18 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 670 bytes --]

On Mittwoch, 24. November 2010 Spelic wrote:
> What would be the advantage of using hardware raid?

Better broken drive detection (hw raid controllers are more picky than 
linux), server CPU offload, out-of-band notification of problems, and 
broken disks don't lead to a non-bootable system - what if your boot 
disk crashes?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24  0:36         ` Spelic
  2010-11-24  1:40           ` Stan Hoeppner
  2010-11-24  6:18           ` Michael Monnerie
@ 2010-11-24  7:44           ` Emmanuel Florac
  2 siblings, 0 replies; 21+ messages in thread
From: Emmanuel Florac @ 2010-11-24  7:44 UTC (permalink / raw)
  To: Spelic; +Cc: xfs

Le Wed, 24 Nov 2010 01:36:37 +0100 vous écriviez:

> What would be the advantage of using hardware raid?

More predictable performance under heavy load (nfs kernel server,
typically). BTW the 3Ware is not a very good sata controller; I've found
that it is actually one of the worst of its class when it comes to
IOPS, the Areca 16xx, LSI Megaraid and Adaptec 5xx5 all perform better
as simple controllers in combination with MD (tested with 8 SSDs).

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24  0:20         ` Dave Chinner
@ 2010-11-24 13:12           ` Spelic
  2010-11-24 21:50             ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-24 13:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/24/2010 01:20 AM, Dave Chinner wrote:
>
> 512MB of BBWC backing the disks. The BBWC does a much better job of
> reordering out-of-order writes than the Linux elevators because
> 512MB is a much bigger window than a couple of thousand 4k IOs.
>    

Hmmm very interesting...
so you are using a MD or DM raid-0 above a SATA controller with a BBWC?
That would probably be a RAID controller used as SATA because I have 
never seen SATA controllers with a BBWC. I'd be interested in the brand 
if you don't mind.


Also I wanted to know... the requests to the drives are really only 4K 
in size for linux? Then what purpose do the elevators' merges have? When 
the elevator merges two 4k requests doesn't it create an 8k request for 
the drive?


Also look at this competitor's link:
http://thunk.org/tytso/blog/2010/11/01/i-have-the-money-shot-for-my-lca-presentation/
post #9
these scalability patches submit larger i/o than 4k. I can confirm that 
from within iostat -x 1  (I can't understand what he means with 
"bypasses the buffer cache layer" though, does it mean it's only for 
DIRECTIO? it does not seem to me). When such large requests go into the 
elevator, are they broken up into 4K requests again?


Thank you

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24 13:12           ` Spelic
@ 2010-11-24 21:50             ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-24 21:50 UTC (permalink / raw)
  To: Spelic; +Cc: xfs

On Wed, Nov 24, 2010 at 02:12:32PM +0100, Spelic wrote:
> On 11/24/2010 01:20 AM, Dave Chinner wrote:
> >
> >512MB of BBWC backing the disks. The BBWC does a much better job of
> >reordering out-of-order writes than the Linux elevators because
> >512MB is a much bigger window than a couple of thousand 4k IOs.
> 
> Hmmm very interesting...
> so you are using a MD or DM raid-0 above a SATA controller with a BBWC?
> That would probably be a RAID controller used as SATA because I have
> never seen SATA controllers with a BBWC. I'd be interested in the
> brand if you don't mind.

Actually, it's a SAS RAID controller:

03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

With each disk exported as a RAID0 lun because the raid controller
does not do JBOD.

> Also I wanted to know... the requests to the drives are really only
> 4K in size for linux?

No, they do much larger than that.  However, for small file
workloads the IO size is determined mostly by the file size.

> Then what purpose do the elevators' merges
> have? When the elevator merges two 4k requests doesn't it create an
> 8k request for the drive?

Yes. But when the two 4k blocks are not adjacent, they can't be
merged and hence are two IOs. And if the block that separated them
is then written 5ms after the other two completed, it's three IOs
that get combined in the BBWC into one...

> Also look at this competitor's link:
> http://thunk.org/tytso/blog/2010/11/01/i-have-the-money-shot-for-my-lca-presentation/
> post #9
> these scalability patches submit larger i/o than 4k. I can confirm
> that from within iostat -x 1  (I can't understand what he means with
> "bypasses the buffer cache layer" though, does it mean it's only for
> DIRECTIO? it does not seem to me).

It means he's calling submit_bio() rather than submit_bh(). Most of
that "new" code in ext4 was copied directly from XFS - XFS has been
using submit_bio() for large IO submission since roughly 2.6.15.

> When such large requests go into
> the elevator, are they broken up into 4K requests again?

No, the opposite used to happen - ext4 would break large contiguous
regions into 4k IOs (becaus submit_bh could only handle one block at
a time), and then the elevator would re-merge them into a large IO.
The issue here is CPU overhead of merging thousands of blocks
unnecessarily.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-22 19:27 Spelic
  2010-11-22 23:29 ` Dave Chinner
@ 2010-11-24 22:52 ` Spelic
  2010-11-26 22:43   ` Spelic
  1 sibling, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-24 22:52 UTC (permalink / raw)
  To: xfs

On 11/22/2010 08:27 PM, Spelic wrote:
> ....
> I was doing a parallel-kernel-unpack benchmark:
> ....

I tried to reproduce it today but unfortunately I wasn't able to, not 
even on the same hardware.

The first time it happened so soon after starting such 
parallel-kernel-unpack benchmark (at the second run!) that I was 
confident it was possible to reproduce it easily, but today in many 
hours it never happened again. Sorry. Unfortunately the machine is going 
back to production soon so I'm not sure I can do further testing.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-23 20:46     ` Dave Chinner
                         ` (2 preceding siblings ...)
  2010-11-24  0:58       ` Spelic
@ 2010-11-25 23:34       ` Spelic
  2010-11-26  4:20         ` Dave Chinner
  3 siblings, 1 reply; 21+ messages in thread
From: Spelic @ 2010-11-25 23:34 UTC (permalink / raw)
  To: xfs

On 11/23/2010 09:46 PM, Dave Chinner wrote:
> ...
> I note that the load is
> generating close to 10,000 iops on my test system, so it may very
> well be triggering load related problems in your raid controller...
>    

Dave thanks for all explanations on the BBWC,

I wanted to ask how can you measure that it's 10,000 IOPS with that 
workload. Is it by iostat -x ?

If yes, what cell do you exactly look at and what is the period you use 
for averaging values? I also can sometimes see values up to around 10000 
in the cell "w/s" that corresponds to my MD RAID array (currently a 16 
disk raid-5 with XFS delaylog), if I use

iostat -x 10      (this averages write IOPS on 10 seconds I think)

but only for a few shots of iostat, not for the whole run of the 
"benchmark". Do you mean you have 10000 averaged over the whole benchmark?

Also I'm curious, do you remember how much time does it take to complete 
one run (10 parallel tar unpacks) on your 12-disk raid0 + BBWC?

Probably a better test would excluding the unbzip2 part from the 
benchmark, like the following but it probably won't make more than 10sec 
difference:

/perftest/xfs# bzcat linux-2.6.37-rc2.tar.bz2 > linux-2.6.37-rc2.tar
/perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
/perftest/xfs# for i in {1..10} ; do time tar -xf linux-2.6.37-rc2.tar 
-C dir$i & done ; echo waiting now ; time wait; echo syncing now ; time 
sync

Thanks for all explanations

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-25 23:34       ` Spelic
@ 2010-11-26  4:20         ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2010-11-26  4:20 UTC (permalink / raw)
  To: Spelic; +Cc: xfs

On Fri, Nov 26, 2010 at 12:34:13AM +0100, Spelic wrote:
> On 11/23/2010 09:46 PM, Dave Chinner wrote:
> >...
> >I note that the load is
> >generating close to 10,000 iops on my test system, so it may very
> >well be triggering load related problems in your raid controller...
> 
> Dave thanks for all explanations on the BBWC,
> 
> I wanted to ask how can you measure that it's 10,000 IOPS with that
> workload. Is it by iostat -x ?

http://marc.info/?l=linux-fsdevel&m=129013629728687&w=2

> but only for a few shots of iostat, not for the whole run of the
> "benchmark". Do you mean you have 10000 averaged over the whole
> benchmark?

It peaked at over 10,000 iops, lowest rate was ~4000iops and the
average would have been around 7000iops.

> Also I'm curious, do you remember how much time does it take to
> complete one run (10 parallel tar unpacks) on your 12-disk raid0 +
> BBWC?

33 seconds, with it being limited by the decompression rate (i.e. CPU
bound).

> Probably a better test would excluding the unbzip2 part from the
> benchmark, like the following but it probably won't make more than
> 10sec difference:
> 
> /perftest/xfs# bzcat linux-2.6.37-rc2.tar.bz2 > linux-2.6.37-rc2.tar
> /perftest/xfs# mkdir dir{1,2,3,4,5,6,7,8,9,10}
> /perftest/xfs# for i in {1..10} ; do time tar -xf
> linux-2.6.37-rc2.tar -C dir$i & done ; echo waiting now ; time wait;
> echo syncing now ; time sync

I'm currently running qa test right now on my test machine, so I
don't have a direct comparison with the above number for you.

However, my workstation has a pair of 120GB sandforce 1200 SSDs in
RAID0 running 2.6.37-rc1 w/ delaylog and the results are 40s for the
compressed tarball and 16s for the uncompressed tarball.

The uncompressed tarball run had lower IOPS and much higher
bandwidth as much more merging in the IO elevators was being done
compared to the compressed tarball...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xfs delaylog hanged up
  2010-11-24 22:52 ` Spelic
@ 2010-11-26 22:43   ` Spelic
  0 siblings, 0 replies; 21+ messages in thread
From: Spelic @ 2010-11-26 22:43 UTC (permalink / raw)
  To: xfs

On 11/24/2010 11:52 PM, Spelic wrote:
> On 11/22/2010 08:27 PM, Spelic wrote:
>> ....
>> I was doing a parallel-kernel-unpack benchmark:
>> ....
>
> I tried to reproduce it today but unfortunately I wasn't able to, not 
> even on the same hardware.
>
> The first time it happened so soon after starting such 
> parallel-kernel-unpack benchmark (at the second run!) that I was 
> confident it was possible to reproduce it easily, but today in many 
> hours it never happened again. Sorry. Unfortunately the machine is 
> going back to production soon so I'm not sure I can do further testing.


I had another smaller hangup, and unfortunately I wasn't paying 
attention. I just noticed that between a zillion runs of the "benchmark" 
taking 1 minute (for the part before the sync) I just had one that took 
2m30s for the same part. The system was absolutely idle except that test 
so I can't really see an explanation other than being another smaller 
hangup...

The good news is that it definitely came out of the hangup by itself: 
this time I didn't press Ctrl-C. So even if it was an xfs bug it doesn't 
seem fatal.

The dmesg this time showed nothing; probably because the hangup this 
time was shorter than 120 seconds so the watchdog didn't notice anything.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-11-26 22:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-24  4:03 Xfs delaylog hanged up Richard Scobie
  -- strict thread matches above, loose matches on Subject: below --
2010-11-22 19:27 Spelic
2010-11-22 23:29 ` Dave Chinner
2010-11-23 11:17   ` Spelic
2010-11-23 13:28     ` Spelic
2010-11-23 20:46     ` Dave Chinner
2010-11-23 22:14       ` Stan Hoeppner
2010-11-24  0:20         ` Dave Chinner
2010-11-24 13:12           ` Spelic
2010-11-24 21:50             ` Dave Chinner
2010-11-23 22:48       ` Emmanuel Florac
2010-11-24  0:36         ` Spelic
2010-11-24  1:40           ` Stan Hoeppner
2010-11-24  6:18           ` Michael Monnerie
2010-11-24  7:44           ` Emmanuel Florac
2010-11-24  0:58       ` Spelic
2010-11-24  5:44         ` Dave Chinner
2010-11-25 23:34       ` Spelic
2010-11-26  4:20         ` Dave Chinner
2010-11-24 22:52 ` Spelic
2010-11-26 22:43   ` Spelic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox