* BUG: ext3 hang in transaction commit
@ 2008-03-26 23:16 David Chinner
2008-04-03 10:07 ` Jan Kara
0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2008-03-26 23:16 UTC (permalink / raw)
To: lkml
ia32 XFS QA machine, ext3 root on a raw partition. 2.6.25-rc3.
kjournald hung journal_commit_transaction():
Stack traceback for pid 2046
0xf6e0b350 2046 2 0 0 D 0xf6e0b550 kjournald
esp eip Function (args)
0xf68d5e70 0xc04c20b2 schedule+0x51e
0xf68d5ec0 0xc04c2394 io_schedule+0x1d
0xf68d5ecc 0xc0179b64 sync_buffer+0x33 (invalid)
0xf68d5ed4 0xc04c2570 __wait_on_bit+0x36 (0xc2000b78, 0xf68d5f00, 0xc0179b31, 0x2)
0xf68d5ef0 0xc04c25ef out_of_line_wait_on_bit+0x58 (0xde89bc18, 0x2, 0xc0179b31, 0x2)
0xf68d5f2c 0xc0179adf __wait_on_buffer+0x19
0xf68d5f38 0xc01cecba journal_commit_transaction+0x40b (0xf6d94a00)
0xf68d5fa0 0xc01d180a kjournald+0xa4 (0xf6d94a00)
0xf68d5fd4 0xc01301f1 kthread+0x3b (invalid)
The buffer kjournald is stuck on:
[0]kdb> bh 0xde89bc18
buffer_head at 0xde89bc18
bno 9695555 size 4096 dev 0x300002
count 3 state 0x802d [Uptodate Lock Req Mapped Private]
b_data 0x00000000
b_page 0xc1a23e60 b_this_page 0xde89bc18 b_private 0xf76237f0
b_end_io 0xc017afe0 end_buffer_write_sync
[0]kdb> page 0xc1a23e60
struct page at 0xc1a23e60
addr space 0xf565917c index 279 (offset 0x117000)
count 2 flags PG_referenced PG_uptodate PG_dirty PG_lru PG_private
virtual 0x00000000
buffers 0xde89bc18
[0]kdb> inode f56590d4
struct inode at 0xf56590d4
i_ino = 4833353 i_count = 1 i_size 1145510
i_mode = 0100644 i_nlink = 1 i_rdev = 0x0
i_hash.nxt = 0x00000000 i_hash.pprev = 0xc20ec880
i_list.nxt = 0xf772749c i_list.prv = 0xf75aa0d4
i_dentry.nxt = 0xf7795a5c i_dentry.prv = 0xf7795a5c
i_sb = 0xf73ca400 i_op = 0xc04d9160 i_data = 0xf565917c nrpages = 280
i_fop= 0xc04d90e0 i_flock = 0x00000000 i_mapping = 0xf565917c
i_flags 0x0 i_state 0x4 [I_DIRTY_PAGES] fs specific info @ 0xf565921c
We're waiting on the last page/buffer in the file, and it doesn't appear
to be under writeback....
Everything train-smashed up behind it such as:
Stack traceback for pid 4876
0xf7390910 4876 1 0 0 D 0xf7390b10 nscd
esp eip Function (args)
0xf6009d5c 0xc04c20b2 schedule+0x51e
0xf6009dac 0xc04c2394 io_schedule+0x1d
0xf6009db8 0xc0179b64 sync_buffer+0x33 (invalid)
0xf6009dc0 0xc04c24b1 __wait_on_bit_lock+0x30 (0xc20024a4, 0xf6009de8, 0xc0179b31, 0x2)
0xf6009dd8 0xc04c2532 out_of_line_wait_on_bit_lock+0x58 (0xf7643780, 0x2, 0xc0179b31, 0x2)
0xf6009e14 0xc0179ca1 __lock_buffer+0x24 (0xf7643780)
0xf6009e24 0xc01cd686 do_get_write_access+0x4f (0xf5db748c, 0xf761fe70, 0x0)
0xf6009e68 0xc01cd99a journal_get_write_access+0x1b (0xf5db748c, invalid)
0xf6009e78 0xc01cada0 __ext3_journal_get_write_access+0x19 (0xc04d9230, 0xf5db748c, 0xf7643780)
0xf6009e90 0xc01bffaa ext3_reserve_inode_write+0x34 (0xf5db748c, 0xf76b5e10, 0xf6009eac)
0xf6009ea8 0xc01bfffe ext3_mark_inode_dirty+0x20 (0xf5db748c, 0xf76b5e10)
0xf6009ec8 0xc01c281a ext3_dirty_inode+0x53 (0xf76b5e10)
0xf6009edc 0xc0175e2b __mark_inode_dirty+0x26 (0xf76b5e10, 0x1)
0xf6009ef4 0xc016ea8e file_update_time+0x74 (invalid)
0xf6009f04 0xc014d681 do_wp_page+0x356 (0xf6e15880, 0xf6d6941c, 0xb5d8db71, 0xf6134634, 0xf68ecb5c, 0xc16c268c, 0x7ed7a025)
0xf6009f3c 0xc014ed6c handle_mm_fault+0x450 (0xf6e15880, 0xf6d6941c, 0xb5d8db71, 0x1)
0xf6009f88 0xc04c4d09 do_page_fault+0x233 (0xf6009fb8, 0x7, invalid)
0xf6009fb4 0xc04c3712 error_code+0x72 (invalid, invalid, invalid)
The only processes that aren't hung in touch_atime() are:
Stack traceback for pid 4995
0xf6b79150 4995 1 0 1 D 0xf6b79350 cupsd
esp eip Function (args)
0xf6d09ddc 0xc04c20b2 schedule+0x51e
0xf6d09e2c 0xc04c2394 io_schedule+0x1d
0xf6d09e38 0xc014285f sync_page+0x36 (invalid)
0xf6d09e40 0xc04c2570 __wait_on_bit+0x36 (0xc2006574, 0xf6d09e64, 0xc0142829, 0x2)
0xf6d09e5c 0xc0142a51 wait_on_page_bit+0x56 (0xc1f76a00, 0xc)
0xf6d09e88 0xc014998f truncate_inode_pages_range+0x23a (0xf76fb90c, 0x0, 0x0, 0xffffffff, 0xffffffff)
0xf6d09ef8 0xc0149a04 truncate_inode_pages+0xc
0xf6d09f08 0xc01c1ed6 ext3_delete_inode+0x16 (0xf76fb864)
0xf6d09f18 0xc016e3ef generic_delete_inode+0x8b (0xf76fb864)
0xf6d09f28 0xc016e467 generic_drop_inode+0x12 (0xf76fb864)
0xf6d09f38 0xc016db1b iput+0x63 (0xf76fb864)
0xf6d09f44 0xc0166984 do_unlinkat+0xa8 (invalid, invalid)
0xf6d09fac 0xc0166a02 sys_unlink+0x10 (0xbfbf3488)
The page it's stuck on:
[0]kdb> page 0xc1f76a00
struct page at 0xc1f76a00
addr space 0xf76fb90c index 0 (offset 0x0)
count 3 flags PG_locked PG_uptodate PG_lru PG_private PG_writeback
virtual 0x00000000
buffers 0xf1991e48
[0]kdb> inode f76fb864
struct inode at 0xf76fb864
i_ino = 5981454 i_count = 0 i_size 32
i_mode = 0100440 i_nlink = 0 i_rdev = 0x0
i_hash.nxt = 0x00000000 i_hash.pprev = 0xc20cd990
i_list.nxt = 0xf76fb864 i_list.prv = 0xf76fb864
i_dentry.nxt = 0xf76fb83c i_dentry.prv = 0xf76fb83c
i_sb = 0xf73ca400 i_op = 0xc04d9160 i_data = 0xf76fb90c nrpages = 1
i_fop= 0xc04d90e0 i_flock = 0x00000000 i_mapping = 0xf76fb90c
i_flags 0x0 i_state 0x20 [I_FREEING] fs specific info @ 0xf76fb9ac
[0]kdb> inode_pages f76fb864
page_struct index cnt zone nid flags
00000000 0 3 2 0 0x1829 Locked Uptodate LRU Private Writeback HighMem
bh 0xf1991e48 bno 11968107 [Uptodate Lock Req Mapped Async_write Priavate]
mapping= f76fb90c
Appears to be under writeback.
Stack traceback for pid 28191
0xf6e0adf0 28191 28016 0 1 D 0xf6e0aff0 umount
esp eip Function (args)
0xf3913cd0 0xc04c20b2 schedule+0x51e
0xf3913d20 0xc04c2394 io_schedule+0x1d
0xf3913d2c 0xc0179b64 sync_buffer+0x33 (invalid)
0xf3913d34 0xc04c24b1 __wait_on_bit_lock+0x30 (0xc20009bc, 0xf3913d5c, 0xc0179b31, 0x2)
0xf3913d4c 0xc04c2532 out_of_line_wait_on_bit_lock+0x58 (0xf74ec4a8, 0x2, 0xc0179b31, 0x2)
0xf3913d88 0xc0179ca1 __lock_buffer+0x24 (0xf74ec4a8)
0xf3913d98 0xc01cddb2 journal_invalidatepage+0x66 (0xf6d94a00, 0xc1fb65a0, 0x0)
0xf3913dc8 0xc01c1a12 ext3_invalidatepage+0x2f (0xc1fb65a0, invalid)
0xf3913dd4 0xc01493b5 do_invalidatepage+0x19 (invalid)
0xf3913de0 0xc014972e truncate_complete_page+0x1f (invalid, 0xc1fb65a0)
0xf3913dec 0xc0149818 truncate_inode_pages_range+0xc3 (0xf5659af0, 0x0, 0x0, 0xffffffff, 0xffffffff)
0xf3913e5c 0xc0149a04 truncate_inode_pages+0xc
0xf3913e6c 0xc01c1ed6 ext3_delete_inode+0x16 (0xf5659a48)
0xf3913e7c 0xc016e3ef generic_delete_inode+0x8b (0xf5659a48)
0xf3913e8c 0xc016e467 generic_drop_inode+0x12 (0xf5659a48)
0xf3913e9c 0xc016db1b iput+0x63 (0xf5659a48)
0xf3913ea8 0xc016bf74 dentry_iput+0x70 (0xf7795954)
0xf3913eb8 0xc016c013 d_kill+0x30 (0xf7795954)
0xf3913ec8 0xc016cde2 dput+0xc1 (invalid)
0xf3913ed4 0xc016679b sys_renameat+0x17a (0xffffff9c, 0x80523c0, 0xffffff9c, 0x805234a)
0xf3913f9c 0xc0166810 sys_rename+0x12 (0x80523c0, 0x805234a)
[0]kdb> page 0xc1fb65a0
struct page at 0xc1fb65a0
addr space 0xf5659af0 index 0 (offset 0x0)
count 3 flags PG_locked PG_referenced PG_uptodate PG_dirty PG_lru PG_active
PG_private
virtual 0x00000000
buffers 0xf74ec4a8
[0]kdb> md4c1 0xf5659af0
0xf5659af0 f5659a48 H.e.
[0]kdb> inode f5659a48
struct inode at 0xf5659a48
i_ino = 5984203 i_count = 0 i_size 261
i_mode = 0100644 i_nlink = 0 i_rdev = 0x0
i_hash.nxt = 0x00000000 i_hash.pprev = 0xc20e121c
i_list.nxt = 0xf5659a48 i_list.prv = 0xf5659a48
i_dentry.nxt = 0xf5659a20 i_dentry.prv = 0xf5659a20
i_sb = 0xf73ca400 i_op = 0xc04d9160 i_data = 0xf5659af0 nrpages = 1
i_fop= 0xc04d90e0 i_flock = 0x00000000 i_mapping = 0xf5659af0
i_flags 0x0 i_state 0x27 [I_DIRTY_SYNC I_DIRTY_DATASYNC I_DIRTY_PAGES I_FREEING] fs specific info @ 0xf5659b90
[0]kdb> inode_pages f5659a48
page_struct index cnt zone nid flags
00000000 0 3 2 0 0x87d Locked Referenced Uptodate Dirty LRU Active Private HighMem
bh 0xf74ec4a8 bno 11984903 [Uptodate Lock Req Mapped Priavate]
mapping= f5659af0
There's no I/O errors in the syslog and the XFS tests were running on
a different disk so there should have been no interaction with them.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG: ext3 hang in transaction commit
2008-03-26 23:16 BUG: ext3 hang in transaction commit David Chinner
@ 2008-04-03 10:07 ` Jan Kara
2008-04-03 15:08 ` Peter Zijlstra
2008-04-04 1:27 ` David Chinner
0 siblings, 2 replies; 7+ messages in thread
From: Jan Kara @ 2008-04-03 10:07 UTC (permalink / raw)
To: David Chinner; +Cc: lkml
Hi,
> ia32 XFS QA machine, ext3 root on a raw partition. 2.6.25-rc3.
>
> kjournald hung journal_commit_transaction():
>
> Stack traceback for pid 2046
> 0xf6e0b350 2046 2 0 0 D 0xf6e0b550 kjournald
> esp eip Function (args)
> 0xf68d5e70 0xc04c20b2 schedule+0x51e
> 0xf68d5ec0 0xc04c2394 io_schedule+0x1d
> 0xf68d5ecc 0xc0179b64 sync_buffer+0x33 (invalid)
> 0xf68d5ed4 0xc04c2570 __wait_on_bit+0x36 (0xc2000b78, 0xf68d5f00, 0xc0179b31, 0x2)
> 0xf68d5ef0 0xc04c25ef out_of_line_wait_on_bit+0x58 (0xde89bc18, 0x2, 0xc0179b31, 0x2)
> 0xf68d5f2c 0xc0179adf __wait_on_buffer+0x19
> 0xf68d5f38 0xc01cecba journal_commit_transaction+0x40b (0xf6d94a00)
> 0xf68d5fa0 0xc01d180a kjournald+0xa4 (0xf6d94a00)
> 0xf68d5fd4 0xc01301f1 kthread+0x3b (invalid)
I suppose this is wait_on_buffer() in line 444 in fs/jbd/commit.c, isn't it?
> The buffer kjournald is stuck on:
>
> [0]kdb> bh 0xde89bc18
> buffer_head at 0xde89bc18
> bno 9695555 size 4096 dev 0x300002
> count 3 state 0x802d [Uptodate Lock Req Mapped Private]
> b_data 0x00000000
> b_page 0xc1a23e60 b_this_page 0xde89bc18 b_private 0xf76237f0
> b_end_io 0xc017afe0 end_buffer_write_sync
> [0]kdb> page 0xc1a23e60
> struct page at 0xc1a23e60
> addr space 0xf565917c index 279 (offset 0x117000)
> count 2 flags PG_referenced PG_uptodate PG_dirty PG_lru PG_private
> virtual 0x00000000
> buffers 0xde89bc18
> [0]kdb> inode f56590d4
> struct inode at 0xf56590d4
> i_ino = 4833353 i_count = 1 i_size 1145510
> i_mode = 0100644 i_nlink = 1 i_rdev = 0x0
> i_hash.nxt = 0x00000000 i_hash.pprev = 0xc20ec880
> i_list.nxt = 0xf772749c i_list.prv = 0xf75aa0d4
> i_dentry.nxt = 0xf7795a5c i_dentry.prv = 0xf7795a5c
> i_sb = 0xf73ca400 i_op = 0xc04d9160 i_data = 0xf565917c nrpages = 280
> i_fop= 0xc04d90e0 i_flock = 0x00000000 i_mapping = 0xf565917c
> i_flags 0x0 i_state 0x4 [I_DIRTY_PAGES] fs specific info @ 0xf565921c
>
> We're waiting on the last page/buffer in the file, and it doesn't appear
> to be under writeback....
We wait for write of ordered-data to finish. Which seems to never
happen. Page isn't under writeback, but that just means we submitted the
buffer from the commit code (that doesn't change the page state).
Anyway, the cause is that either due to some bug IO never finished and
so buffer never got unlocked, or we somewhere locked the buffer and
forgot to unlock it (but I've checked all the relevant places and think
they are correct). The traces of all the processes seem harmless - I see
no place trace where we are holding a buffer lock.
If you happen to hit this again, please let me know and I'll look into
it further...
> Everything train-smashed up behind it such as:
>
> Stack traceback for pid 4876
> 0xf7390910 4876 1 0 0 D 0xf7390b10 nscd
> esp eip Function (args)
> 0xf6009d5c 0xc04c20b2 schedule+0x51e
> 0xf6009dac 0xc04c2394 io_schedule+0x1d
> 0xf6009db8 0xc0179b64 sync_buffer+0x33 (invalid)
> 0xf6009dc0 0xc04c24b1 __wait_on_bit_lock+0x30 (0xc20024a4, 0xf6009de8, 0xc0179b31, 0x2)
> 0xf6009dd8 0xc04c2532 out_of_line_wait_on_bit_lock+0x58 (0xf7643780, 0x2, 0xc0179b31, 0x2)
> 0xf6009e14 0xc0179ca1 __lock_buffer+0x24 (0xf7643780)
> 0xf6009e24 0xc01cd686 do_get_write_access+0x4f (0xf5db748c, 0xf761fe70, 0x0)
> 0xf6009e68 0xc01cd99a journal_get_write_access+0x1b (0xf5db748c, invalid)
> 0xf6009e78 0xc01cada0 __ext3_journal_get_write_access+0x19 (0xc04d9230, 0xf5db748c, 0xf7643780)
> 0xf6009e90 0xc01bffaa ext3_reserve_inode_write+0x34 (0xf5db748c, 0xf76b5e10, 0xf6009eac)
> 0xf6009ea8 0xc01bfffe ext3_mark_inode_dirty+0x20 (0xf5db748c, 0xf76b5e10)
> 0xf6009ec8 0xc01c281a ext3_dirty_inode+0x53 (0xf76b5e10)
> 0xf6009edc 0xc0175e2b __mark_inode_dirty+0x26 (0xf76b5e10, 0x1)
> 0xf6009ef4 0xc016ea8e file_update_time+0x74 (invalid)
> 0xf6009f04 0xc014d681 do_wp_page+0x356 (0xf6e15880, 0xf6d6941c, 0xb5d8db71, 0xf6134634, 0xf68ecb5c, 0xc16c268c, 0x7ed7a025)
> 0xf6009f3c 0xc014ed6c handle_mm_fault+0x450 (0xf6e15880, 0xf6d6941c, 0xb5d8db71, 0x1)
> 0xf6009f88 0xc04c4d09 do_page_fault+0x233 (0xf6009fb8, 0x7, invalid)
> 0xf6009fb4 0xc04c3712 error_code+0x72 (invalid, invalid, invalid)
>
>
> The only processes that aren't hung in touch_atime() are:
>
> Stack traceback for pid 4995
> 0xf6b79150 4995 1 0 1 D 0xf6b79350 cupsd
> esp eip Function (args)
> 0xf6d09ddc 0xc04c20b2 schedule+0x51e
> 0xf6d09e2c 0xc04c2394 io_schedule+0x1d
> 0xf6d09e38 0xc014285f sync_page+0x36 (invalid)
> 0xf6d09e40 0xc04c2570 __wait_on_bit+0x36 (0xc2006574, 0xf6d09e64, 0xc0142829, 0x2)
> 0xf6d09e5c 0xc0142a51 wait_on_page_bit+0x56 (0xc1f76a00, 0xc)
> 0xf6d09e88 0xc014998f truncate_inode_pages_range+0x23a (0xf76fb90c, 0x0, 0x0, 0xffffffff, 0xffffffff)
> 0xf6d09ef8 0xc0149a04 truncate_inode_pages+0xc
> 0xf6d09f08 0xc01c1ed6 ext3_delete_inode+0x16 (0xf76fb864)
> 0xf6d09f18 0xc016e3ef generic_delete_inode+0x8b (0xf76fb864)
> 0xf6d09f28 0xc016e467 generic_drop_inode+0x12 (0xf76fb864)
> 0xf6d09f38 0xc016db1b iput+0x63 (0xf76fb864)
> 0xf6d09f44 0xc0166984 do_unlinkat+0xa8 (invalid, invalid)
> 0xf6d09fac 0xc0166a02 sys_unlink+0x10 (0xbfbf3488)
>
> The page it's stuck on:
>
> [0]kdb> page 0xc1f76a00
> struct page at 0xc1f76a00
> addr space 0xf76fb90c index 0 (offset 0x0)
> count 3 flags PG_locked PG_uptodate PG_lru PG_private PG_writeback
> virtual 0x00000000
> buffers 0xf1991e48
> [0]kdb> inode f76fb864
> struct inode at 0xf76fb864
> i_ino = 5981454 i_count = 0 i_size 32
> i_mode = 0100440 i_nlink = 0 i_rdev = 0x0
> i_hash.nxt = 0x00000000 i_hash.pprev = 0xc20cd990
> i_list.nxt = 0xf76fb864 i_list.prv = 0xf76fb864
> i_dentry.nxt = 0xf76fb83c i_dentry.prv = 0xf76fb83c
> i_sb = 0xf73ca400 i_op = 0xc04d9160 i_data = 0xf76fb90c nrpages = 1
> i_fop= 0xc04d90e0 i_flock = 0x00000000 i_mapping = 0xf76fb90c
> i_flags 0x0 i_state 0x20 [I_FREEING] fs specific info @ 0xf76fb9ac
> [0]kdb> inode_pages f76fb864
> page_struct index cnt zone nid flags
> 00000000 0 3 2 0 0x1829 Locked Uptodate LRU Private Writeback HighMem
> bh 0xf1991e48 bno 11968107 [Uptodate Lock Req Mapped Async_write Priavate]
> mapping= f76fb90c
>
>
> Appears to be under writeback.
>
> Stack traceback for pid 28191
> 0xf6e0adf0 28191 28016 0 1 D 0xf6e0aff0 umount
> esp eip Function (args)
> 0xf3913cd0 0xc04c20b2 schedule+0x51e
> 0xf3913d20 0xc04c2394 io_schedule+0x1d
> 0xf3913d2c 0xc0179b64 sync_buffer+0x33 (invalid)
> 0xf3913d34 0xc04c24b1 __wait_on_bit_lock+0x30 (0xc20009bc, 0xf3913d5c, 0xc0179b31, 0x2)
> 0xf3913d4c 0xc04c2532 out_of_line_wait_on_bit_lock+0x58 (0xf74ec4a8, 0x2, 0xc0179b31, 0x2)
> 0xf3913d88 0xc0179ca1 __lock_buffer+0x24 (0xf74ec4a8)
> 0xf3913d98 0xc01cddb2 journal_invalidatepage+0x66 (0xf6d94a00, 0xc1fb65a0, 0x0)
> 0xf3913dc8 0xc01c1a12 ext3_invalidatepage+0x2f (0xc1fb65a0, invalid)
> 0xf3913dd4 0xc01493b5 do_invalidatepage+0x19 (invalid)
> 0xf3913de0 0xc014972e truncate_complete_page+0x1f (invalid, 0xc1fb65a0)
> 0xf3913dec 0xc0149818 truncate_inode_pages_range+0xc3 (0xf5659af0, 0x0, 0x0, 0xffffffff, 0xffffffff)
> 0xf3913e5c 0xc0149a04 truncate_inode_pages+0xc
> 0xf3913e6c 0xc01c1ed6 ext3_delete_inode+0x16 (0xf5659a48)
> 0xf3913e7c 0xc016e3ef generic_delete_inode+0x8b (0xf5659a48)
> 0xf3913e8c 0xc016e467 generic_drop_inode+0x12 (0xf5659a48)
> 0xf3913e9c 0xc016db1b iput+0x63 (0xf5659a48)
> 0xf3913ea8 0xc016bf74 dentry_iput+0x70 (0xf7795954)
> 0xf3913eb8 0xc016c013 d_kill+0x30 (0xf7795954)
> 0xf3913ec8 0xc016cde2 dput+0xc1 (invalid)
> 0xf3913ed4 0xc016679b sys_renameat+0x17a (0xffffff9c, 0x80523c0, 0xffffff9c, 0x805234a)
> 0xf3913f9c 0xc0166810 sys_rename+0x12 (0x80523c0, 0x805234a)
>
> [0]kdb> page 0xc1fb65a0
> struct page at 0xc1fb65a0
> addr space 0xf5659af0 index 0 (offset 0x0)
> count 3 flags PG_locked PG_referenced PG_uptodate PG_dirty PG_lru PG_active
> PG_private
> virtual 0x00000000
> buffers 0xf74ec4a8
> [0]kdb> md4c1 0xf5659af0
> 0xf5659af0 f5659a48 H.e.
> [0]kdb> inode f5659a48
> struct inode at 0xf5659a48
> i_ino = 5984203 i_count = 0 i_size 261
> i_mode = 0100644 i_nlink = 0 i_rdev = 0x0
> i_hash.nxt = 0x00000000 i_hash.pprev = 0xc20e121c
> i_list.nxt = 0xf5659a48 i_list.prv = 0xf5659a48
> i_dentry.nxt = 0xf5659a20 i_dentry.prv = 0xf5659a20
> i_sb = 0xf73ca400 i_op = 0xc04d9160 i_data = 0xf5659af0 nrpages = 1
> i_fop= 0xc04d90e0 i_flock = 0x00000000 i_mapping = 0xf5659af0
> i_flags 0x0 i_state 0x27 [I_DIRTY_SYNC I_DIRTY_DATASYNC I_DIRTY_PAGES I_FREEING] fs specific info @ 0xf5659b90
> [0]kdb> inode_pages f5659a48
> page_struct index cnt zone nid flags
> 00000000 0 3 2 0 0x87d Locked Referenced Uptodate Dirty LRU Active Private HighMem
> bh 0xf74ec4a8 bno 11984903 [Uptodate Lock Req Mapped Priavate]
> mapping= f5659af0
>
>
> There's no I/O errors in the syslog and the XFS tests were running on
> a different disk so there should have been no interaction with them.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG: ext3 hang in transaction commit
2008-04-03 10:07 ` Jan Kara
@ 2008-04-03 15:08 ` Peter Zijlstra
2008-04-03 16:15 ` Marcelo Tosatti
2008-04-04 1:27 ` David Chinner
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2008-04-03 15:08 UTC (permalink / raw)
To: Jan Kara; +Cc: David Chinner, lkml, marcelo
Marcelo has seen something similar, kjournald got stuck and the rest of
userspace piled on top of it.
The softlockup watchdog triggered because kjournald got stuck in D state
for over 2 minutes. However Marcelo told me that it did get going
eventually...
He had a rather easy to reproduce case IIRC, I'll let him tell you more
about it :-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG: ext3 hang in transaction commit
2008-04-03 15:08 ` Peter Zijlstra
@ 2008-04-03 16:15 ` Marcelo Tosatti
2008-04-04 10:34 ` Jan Kara
0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2008-04-03 16:15 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Jan Kara, David Chinner, lkml, marcelo
On Thu, Apr 03, 2008 at 05:08:55PM +0200, Peter Zijlstra wrote:
>
> Marcelo has seen something similar, kjournald got stuck and the rest of
> userspace piled on top of it.
>
> The softlockup watchdog triggered because kjournald got stuck in D state
> for over 2 minutes. However Marcelo told me that it did get going
> eventually...
>
> He had a rather easy to reproduce case IIRC, I'll let him tell you more
> about it :-)
Run iozone (-t 4) with a sufficiently large file size under a KVM guest
with a disk file residing in an ext3 filesystem with ordered data mode.
There's nothing special in this testcase, boils down to the QEMU process
writing data to a file.
Issue is not seen under "writeback" mode. While fsync latency in ordered
data mode is known to be bad, _2 minutes_ feels more like a DoS rather
than latency issue to me.
vmstat 2 output on the host shows that the first write pass (initial
writers and rewriters) blasts at full disk speed:
0 2 0 6552980 748972 662876 0 0 58 69838 322 9912 5 6 78 10 0
1 1 0 6493620 779452 693596 0 0 10 54884 417 7032 1 3 78 17 0
1 2 0 6322968 870556 772156 0 0 58 43042 379 12651 4 7 77 12 0
1 1 0 6248416 915004 806428 0 0 20 37600 367 7800 2 4 71 23 0
1 0 0 5981148 1070300 906120 0 0 88 64 243 19734 8 10 80 1 0
0 4 0 5875992 1165220 906700 0 0 80 62620 332 13192 5 7 76 13 0
0 4 0 5867244 1177440 906724 0 0 10 66728 385 2753 1 2 66 31 0
0 2 0 5792912 1252668 907900 0 0 86 26288 401 12130 4 6 72 18 0
2 0 0 5760280 1290860 908092 0 0 18 34704 487 7130 2 3 80 15 0
But in the "mixed workload" stage write throughtput slows down drastically:
0 2 0 5666016 468036 1823052 0 0 0 1248 433 8236 5 5 83 7 0
0 2 0 5666016 468036 1823052 0 0 0 3162 899 679 0 0 78 22 0
0 2 0 5666016 468036 1823052 0 0 0 3290 864 829 0 0 76 24 0
0 2 0 5666024 468036 1823052 0 0 0 3414 925 2170 1 1 79 19 0
1 1 0 5665528 468036 1823052 0 0 0 3148 854 7935 6 6 83 4 0
0 2 0 5665024 468036 1823052 0 0 0 2570 799 4065 3 4 80 13 0
0 2 0 5665024 468036 1823052 0 0 0 2462 779 464 0 0 88 12 0
0 2 0 5664908 468036 1823052 0 0 0 2458 794 1028 0 0 88 12 0
1 1 0 5664908 468036 1823052 0 0 0 2548 799 4153 2 2 87 9 0
Which can be seen in the iozone performance inside the guest:
Children see throughput for 4 mixed workload = 106073.50 KB/sec
Parent sees throughput for 4 mixed workload = 2560.13 KB/sec
Min throughput per process = 3700.33 KB/sec
Max throughput per process = 66393.62 KB/sec
Avg throughput per process = 26518.38 KB/sec
Min xfer = 44424.00 KB
And this is when fsync becomes nasty.
blktrace output shows that the maximum latency for a single write
request io complete is 1.5 seconds, which is similar to what is seen
under "writeback" mode.
I reduced hung_task_timeout_secs to 30 for this report, but vim and
rsyslogd have been seen hung up to 120 seconds.
INFO: task vim:4476 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vim D 00000001000e3c44 0 4476 3591
ffff8101da80dd28 0000000000000046 0000000000000000 00000000001cc549
ffff8101fe1e8a00 ffff81022f20e740 ffff8101fe1e8c30 0000000600000246
00000000ffffffff 0000000000000246 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8030985b>] log_wait_commit+0xa9/0xfe
[<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
[<ffffffff80304c19>] journal_stop+0x1a4/0x1ed
[<ffffffff803056c6>] journal_force_commit+0x23/0x25
[<ffffffff802fe337>] ext3_force_commit+0x26/0x28
[<ffffffff802f8756>] ext3_write_inode+0x39/0x3f
[<ffffffff802a58e9>] __writeback_single_inode+0x18c/0x2ab
[<ffffffff802a5a2c>] sync_inode+0x24/0x39
[<ffffffff802f4baf>] ext3_sync_file+0x8f/0xa0
[<ffffffff802a892d>] do_fsync+0x56/0xac
[<ffffffff802a89b1>] __do_fsync+0x2e/0x44
[<ffffffff802a89e2>] sys_fsync+0xb/0xd
[<ffffffff8020b21b>] system_call_after_swapgs+0x7b/0x80
2 locks held by vim/4476:
#0: (&sb->s_type->i_mutex_key#9){--..}, at: [<ffffffff802a891c>] do_fsync+0x45/0xac
#1: (jbd_handle){--..}, at: [<ffffffff8030567c>] journal_start+0xef/0x116
2 locks held by rsyslogd/3266:
#0: (&sb->s_type->i_mutex_key#9){--..}, at: [<ffffffff802a891c>] do_fsync+0x45/0xac
#1: (jbd_handle){--..}, at: [<ffffffff8030567c>] journal_start+0xef/0x116
INFO: task rklogd:3269 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rklogd D 00000001000fbb55 0 3269 1
ffff8102280b9d28 0000000000000046 0000000000000000 00000000001cc553
ffff81022d9b0b40 ffff81022f196540 ffff81022d9b0d70 0000000400000246
00000000ffffffff 0000000000000246 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8030985b>] log_wait_commit+0xa9/0xfe
[<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
[<ffffffff80304c19>] journal_stop+0x1a4/0x1ed
[<ffffffff803056c6>] journal_force_commit+0x23/0x25
[<ffffffff802fe337>] ext3_force_commit+0x26/0x28
[<ffffffff802f8756>] ext3_write_inode+0x39/0x3f
[<ffffffff802a58e9>] __writeback_single_inode+0x18c/0x2ab
[<ffffffff802a5a2c>] sync_inode+0x24/0x39
[<ffffffff802f4baf>] ext3_sync_file+0x8f/0xa0
[<ffffffff802a892d>] do_fsync+0x56/0xac
[<ffffffff802a89b1>] __do_fsync+0x2e/0x44
[<ffffffff802a89e2>] sys_fsync+0xb/0xd
[<ffffffff8020b21b>] system_call_after_swapgs+0x7b/0x80
qemu-system-x D 00000001000c1809 0 4359 4356
ffff8102111fd838 0000000000000046 0000000000000000 0000000000000046
ffff81021101c080 ffff81022f196540 ffff81021101c2b0 0000000400000002
00000000ffffffff ffffffff8024bc0b 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8024bc0b>] ? mark_held_locks+0x4a/0x6a
[<ffffffff805593d1>] io_schedule+0x28/0x34
[<ffffffff803596dc>] get_request_wait+0x122/0x170
[<ffffffff8055b685>] ? _spin_unlock_irqrestore+0x3f/0x47
[<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
[<ffffffff80357872>] ? elv_merge+0x1df/0x221
[<ffffffff80359d27>] __make_request+0x33e/0x3dd
[<ffffffff80245115>] ? up_read+0x26/0x2a
[<ffffffff803585ad>] generic_make_request+0x33c/0x377
[<ffffffff802626f3>] ? mempool_alloc_slab+0x11/0x13
[<ffffffff803599e0>] submit_bio+0xce/0xd7
[<ffffffff802a970e>] submit_bh+0xea/0x10e
[<ffffffff802aca88>] sync_dirty_buffer+0x6e/0xd3
[<ffffffff80305e90>] journal_dirty_data+0xeb/0x1ce
[<ffffffff802f6805>] ext3_journal_dirty_data+0x18/0x41
[<ffffffff802f5b74>] walk_page_buffers+0x67/0x8e
[<ffffffff802f67ed>] ? ext3_journal_dirty_data+0x0/0x41
[<ffffffff802f84ad>] ext3_ordered_write_end+0x73/0x110
[<ffffffff8026117d>] generic_file_buffered_write+0x1c2/0x63f
[<ffffffff8024bda1>] ? trace_hardirqs_on+0xf1/0x115
[<ffffffff80233eed>] ? current_fs_time+0x22/0x29
[<ffffffff80261953>] __generic_file_aio_write_nolock+0x359/0x3c3
[<ffffffff80249d4d>] ? debug_mutex_free_waiter+0x46/0x4a
[<ffffffff80261a23>] generic_file_aio_write+0x66/0xc2
[<ffffffff802f4a1e>] ext3_file_write+0x1e/0x9e
[<ffffffff80289912>] do_sync_write+0xe2/0x126
[<ffffffff80242347>] ? autoremove_wake_function+0x0/0x38
[<ffffffff8024c6fa>] ? __lock_acquire+0x4e0/0xc0e
[<ffffffff80246881>] ? getnstimeofday+0x31/0x88
[<ffffffff8028a09b>] vfs_write+0xae/0x137
[<ffffffff8028a6f9>] sys_pwrite64+0x55/0x76
[<ffffffff8020b21b>] system_call_after_swapgs+0x7b/0x80
kjournald D 00000001000d0612 0 1252 2
ffff81022c66fcf0 0000000000000046 0000000000000000 ffff81022c4b6280
ffff81022e90e2c0 ffff81022f196540 ffff81022e90e4f0 000000042c66fd80
00000000ffffffff ffffffff8805d272 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8805d272>] ? :dm_mod:dm_unplug_all+0x2d/0x31
[<ffffffff805593d1>] io_schedule+0x28/0x34
[<ffffffff802aa7e1>] sync_buffer+0x3e/0x42
[<ffffffff8055962e>] __wait_on_bit+0x45/0x77
[<ffffffff802aa7a3>] ? sync_buffer+0x0/0x42
[<ffffffff802aa7a3>] ? sync_buffer+0x0/0x42
[<ffffffff805596ce>] out_of_line_wait_on_bit+0x6e/0x7b
[<ffffffff8024237f>] ? wake_bit_function+0x0/0x2a
[<ffffffff802aa72b>] __wait_on_buffer+0x1f/0x21
[<ffffffff80306a02>] journal_commit_transaction+0x516/0xe70
[<ffffffff8055b685>] ? _spin_unlock_irqrestore+0x3f/0x47
[<ffffffff8024bda1>] ? trace_hardirqs_on+0xf1/0x115
As Peter mentioned it eventually gets out of this state (after several minutes) and fsync
instances complete.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG: ext3 hang in transaction commit
2008-04-03 10:07 ` Jan Kara
2008-04-03 15:08 ` Peter Zijlstra
@ 2008-04-04 1:27 ` David Chinner
1 sibling, 0 replies; 7+ messages in thread
From: David Chinner @ 2008-04-04 1:27 UTC (permalink / raw)
To: Jan Kara; +Cc: David Chinner, lkml
On Thu, Apr 03, 2008 at 12:07:42PM +0200, Jan Kara wrote:
> Hi,
>
> > ia32 XFS QA machine, ext3 root on a raw partition. 2.6.25-rc3.
> >
> > kjournald hung journal_commit_transaction():
> >
> > Stack traceback for pid 2046
> > 0xf6e0b350 2046 2 0 0 D 0xf6e0b550 kjournald
> > esp eip Function (args)
> > 0xf68d5e70 0xc04c20b2 schedule+0x51e
> > 0xf68d5ec0 0xc04c2394 io_schedule+0x1d
> > 0xf68d5ecc 0xc0179b64 sync_buffer+0x33 (invalid)
> > 0xf68d5ed4 0xc04c2570 __wait_on_bit+0x36 (0xc2000b78, 0xf68d5f00, 0xc0179b31, 0x2)
> > 0xf68d5ef0 0xc04c25ef out_of_line_wait_on_bit+0x58 (0xde89bc18, 0x2, 0xc0179b31, 0x2)
> > 0xf68d5f2c 0xc0179adf __wait_on_buffer+0x19
> > 0xf68d5f38 0xc01cecba journal_commit_transaction+0x40b (0xf6d94a00)
> > 0xf68d5fa0 0xc01d180a kjournald+0xa4 (0xf6d94a00)
> > 0xf68d5fd4 0xc01301f1 kthread+0x3b (invalid)
>
> I suppose this is wait_on_buffer() in line 444 in fs/jbd/commit.c, isn't it?
No idea. I haven't looked at the code....
> > We're waiting on the last page/buffer in the file, and it doesn't appear
> > to be under writeback....
> We wait for write of ordered-data to finish. Which seems to never
> happen. Page isn't under writeback, but that just means we submitted the
> buffer from the commit code (that doesn't change the page state).
> Anyway, the cause is that either due to some bug IO never finished and
Yes, I certainly beleive that is possible. We see it often enough with
XFS....
> so buffer never got unlocked, or we somewhere locked the buffer and
> forgot to unlock it (but I've checked all the relevant places and think
> they are correct). The traces of all the processes seem harmless - I see
> no place trace where we are holding a buffer lock.
> If you happen to hit this again, please let me know and I'll look into
> it further...
Will do.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG: ext3 hang in transaction commit
2008-04-03 16:15 ` Marcelo Tosatti
@ 2008-04-04 10:34 ` Jan Kara
2008-04-04 15:02 ` Marcelo Tosatti
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2008-04-04 10:34 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Peter Zijlstra, David Chinner, lkml, marcelo
Hi,
> On Thu, Apr 03, 2008 at 05:08:55PM +0200, Peter Zijlstra wrote:
> >
> > Marcelo has seen something similar, kjournald got stuck and the rest of
> > userspace piled on top of it.
> >
> > The softlockup watchdog triggered because kjournald got stuck in D state
> > for over 2 minutes. However Marcelo told me that it did get going
> > eventually...
> >
> > He had a rather easy to reproduce case IIRC, I'll let him tell you more
> > about it :-)
>
> Run iozone (-t 4) with a sufficiently large file size under a KVM guest
> with a disk file residing in an ext3 filesystem with ordered data mode.
>
> There's nothing special in this testcase, boils down to the QEMU process
> writing data to a file.
Yes, this just means, generate tons of ordered data.
> Issue is not seen under "writeback" mode. While fsync latency in ordered
> data mode is known to be bad, _2 minutes_ feels more like a DoS rather
> than latency issue to me.
>
> vmstat 2 output on the host shows that the first write pass (initial
> writers and rewriters) blasts at full disk speed:
OK, here the data is generated nicely ordered on disk...
> 0 2 0 6552980 748972 662876 0 0 58 69838 322 9912 5 6 78 10 0
> 1 1 0 6493620 779452 693596 0 0 10 54884 417 7032 1 3 78 17 0
> 1 2 0 6322968 870556 772156 0 0 58 43042 379 12651 4 7 77 12 0
> 1 1 0 6248416 915004 806428 0 0 20 37600 367 7800 2 4 71 23 0
> 1 0 0 5981148 1070300 906120 0 0 88 64 243 19734 8 10 80 1 0
> 0 4 0 5875992 1165220 906700 0 0 80 62620 332 13192 5 7 76 13 0
> 0 4 0 5867244 1177440 906724 0 0 10 66728 385 2753 1 2 66 31 0
> 0 2 0 5792912 1252668 907900 0 0 86 26288 401 12130 4 6 72 18 0
> 2 0 0 5760280 1290860 908092 0 0 18 34704 487 7130 2 3 80 15 0
>
> But in the "mixed workload" stage write throughtput slows down drastically:
And here we get random access to the file which we currently translate
to random access on transaction commit (as the commit code writes out
the data in the order in which they were dirtied).
> 0 2 0 5666016 468036 1823052 0 0 0 1248 433 8236 5 5 83 7 0
> 0 2 0 5666016 468036 1823052 0 0 0 3162 899 679 0 0 78 22 0
> 0 2 0 5666016 468036 1823052 0 0 0 3290 864 829 0 0 76 24 0
> 0 2 0 5666024 468036 1823052 0 0 0 3414 925 2170 1 1 79 19 0
> 1 1 0 5665528 468036 1823052 0 0 0 3148 854 7935 6 6 83 4 0
> 0 2 0 5665024 468036 1823052 0 0 0 2570 799 4065 3 4 80 13 0
> 0 2 0 5665024 468036 1823052 0 0 0 2462 779 464 0 0 88 12 0
> 0 2 0 5664908 468036 1823052 0 0 0 2458 794 1028 0 0 88 12 0
> 1 1 0 5664908 468036 1823052 0 0 0 2548 799 4153 2 2 87 9 0
>
> Which can be seen in the iozone performance inside the guest:
>
> Children see throughput for 4 mixed workload = 106073.50 KB/sec
> Parent sees throughput for 4 mixed workload = 2560.13 KB/sec
> Min throughput per process = 3700.33 KB/sec
> Max throughput per process = 66393.62 KB/sec
> Avg throughput per process = 26518.38 KB/sec
> Min xfer = 44424.00 KB
>
> And this is when fsync becomes nasty.
Could you get for me movies from Seekwatcher (taken on the host) -
http://oss.oracle.com/~mason/seekwatcher/
That should confirm my suspicion. Actually, if writing data in the bad
order is really the problem than my rewrite of ordered mode in JBD can
substantially help this workload (I can send you the patches if you dare
to try something really experimental ;).
> blktrace output shows that the maximum latency for a single write
> request io complete is 1.5 seconds, which is similar to what is seen
> under "writeback" mode.
>
> I reduced hung_task_timeout_secs to 30 for this report, but vim and
> rsyslogd have been seen hung up to 120 seconds.
>
<snip traces of processes waiting for commit>
> As Peter mentioned it eventually gets out of this state (after several
> minutes) and fsync instances complete.
Yes, that is just a combined effect of *lots* of ordered data
accumulated in one transaction (we don't limit amount of ordered data in
a transaction in any way) and writing them out during transaction order
in a random order.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG: ext3 hang in transaction commit
2008-04-04 10:34 ` Jan Kara
@ 2008-04-04 15:02 ` Marcelo Tosatti
0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2008-04-04 15:02 UTC (permalink / raw)
To: Jan Kara; +Cc: Peter Zijlstra, David Chinner, lkml, marcelo
On Fri, Apr 04, 2008 at 12:34:50PM +0200, Jan Kara wrote:
> > Max throughput per process = 66393.62 KB/sec
> > Avg throughput per process = 26518.38 KB/sec
> > Min xfer = 44424.00 KB
> >
> > And this is when fsync becomes nasty.
> Could you get for me movies from Seekwatcher (taken on the host) -
> http://oss.oracle.com/~mason/seekwatcher/
> That should confirm my suspicion. Actually, if writing data in the bad
> order is really the problem than my rewrite of ordered mode in JBD can
> substantially help this workload (I can send you the patches if you dare
> to try something really experimental ;).
OK, will record the animation.
Will be pleased to try your patches. Where can they be found?
> > blktrace output shows that the maximum latency for a single write
> > request io complete is 1.5 seconds, which is similar to what is seen
> > under "writeback" mode.
> >
> > I reduced hung_task_timeout_secs to 30 for this report, but vim and
> > rsyslogd have been seen hung up to 120 seconds.
> >
>
> <snip traces of processes waiting for commit>
>
> > As Peter mentioned it eventually gets out of this state (after several
> > minutes) and fsync instances complete.
> Yes, that is just a combined effect of *lots* of ordered data
> accumulated in one transaction (we don't limit amount of ordered data in
> a transaction in any way) and writing them out during transaction order
> in a random order.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-04-04 14:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-26 23:16 BUG: ext3 hang in transaction commit David Chinner
2008-04-03 10:07 ` Jan Kara
2008-04-03 15:08 ` Peter Zijlstra
2008-04-03 16:15 ` Marcelo Tosatti
2008-04-04 10:34 ` Jan Kara
2008-04-04 15:02 ` Marcelo Tosatti
2008-04-04 1:27 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).