From: Akira Fujita <a-fujita@rs.jp.nec.com>
To: Jan Kara <jack@suse.cz>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [BUG] xfstest269 causes deadlock on linux-3.9.0 (ext4)
Date: Mon, 27 May 2013 10:49:11 +0900 [thread overview]
Message-ID: <51A2BB97.7030802@rs.jp.nec.com> (raw)
In-Reply-To: <5192F998.6080706@rs.jp.nec.com>
Hi Jan,
(2013/05/15 11:57), Akira Fujita wrote:
> Hi,
>
> (2013/05/15 6:37), Jan Kara wrote:
>> Hello,
>>
>> On Mon 13-05-13 15:49:24, Akira Fujita wrote:
>>> I ran into the deaclock with xfs_test 269 on linux-3.9.0.
>>> It seems happen between jbd2_log_wait_commit, sleep_on_buffer
>>> and writeback_indoes (Please see ps log below).
>>> Once it occurs we can't touch FS anymore.
>>> In my case 300 - 1000 trials to occur. Is this known issue?
>>>
>>> The following kernels seems to have same problem:
>>> - linux-3.5-rc5
>>> - linux-3.8.5
>>> - linux-3.9-rc7
>>> And now I'm trying it on linux-3.10-rc1.
>>>
>>> # ./check generic/269
>>> FSTYP -- ext4
>>> PLATFORM -- Linux/x86_64 mcds1 3.9.0
>>> MKFS_OPTIONS -- /dev/sda12
>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sda12 /mnt/mp2
>>>
>>>
>>> # ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:16,comm
>>> PID TID CLS RTPRIO NI PRI PSR %CPU STAT WCHAN COMMAND
>>> 1 1 TS - 0 19 0 0.0 Ss poll_schedule_ti init
>>> 2 2 TS - 0 19 0 0.0 S kthreadd kthreadd
>>> 3 3 TS - 0 19 0 0.0 S smpboot_thread_f ksoftirqd/0
>>> ...
>>> 2391 2391 TS - 0 19 2 0.1 D jbd2_log_wait_co flush-8:0
>>> ...
>>> 22647 22647 TS - 0 19 3 0.0 S worker_thread kworker/3:1
>>> 22655 22655 TS - 0 19 0 0.0 S hrtimer_nanoslee sleep
>>> 22657 22657 TS - 0 19 2 0.0 R+ - ps
>>> 25330 25330 TS - 0 19 0 0.0 S worker_thread kworker/0:0
>>> 28963 28963 TS - 0 19 1 0.0 S+ wait loop_xfstests.s
>>> 28964 28964 TS - 0 19 1 0.0 S+ wait check
>>> 29180 29180 TS - 0 19 3 0.0 S kjournald2 jbd2/sda11-8
>>> 29181 29181 TS - -20 39 3 0.0 S< rescuer_thread ext4-dio-unwrit
>>> 29199 29199 TS - 0 19 3 0.0 S+ wait 269
>>> 29391 29391 TS - 0 19 0 0.6 D sleep_on_buffer jbd2/sda12-8
>>> 29392 29392 TS - -20 39 3 0.0 S< rescuer_thread ext4-dio-unwrit
>>> 29394 29394 TS - 0 19 0 0.0 S wait fsstress
>>> 29505 29505 TS - 0 19 3 0.0 D writeback_inodes fsstress
>>>
>>> # df -T /dev/sda11 /dev/sda12
>>> Filesystem Type 1K-blocks Used Available Use% Mounted on
>>> /dev/sda11 ext4 9857264 22308 9327564 1% /mnt/mp1
>>> /dev/sda12 ext4 499656 499656 0 100% /mnt/mp2
>> Thanks for report. No I don't think this problem has been reported
>> before. Seeing that sda12 is out of space and fsstress hangs in
>> writeback_inodes(), I suspect we have some deadlock in ENOSPC recovery path
>> when we want to flush data to disk to reduce delalloc uncertainty. Can you
>> run 'echo w >/proc/sysrq-trigger' when the deadlock happens and post your
>> dmesg here? Thanks!
>>
>
> Thanks for reply.
> I'll take that information when the deadlock happens again.
Finally, I got the following messages on linux-3.9.0.
Could you take a look?
#echo w > /proc/sysrq-trigger
/var/log/messages:
-----------------------
May 23 19:15:50 mcds1 kernel: [210728.230978] SysRq : Show Blocked State
May 23 19:15:50 mcds1 kernel: [210728.230985] task PC stack pid father
May 23 19:15:50 mcds1 kernel: [210728.230992] flush-8:0 D ffffffff81806d60 3192 2361 2 0x00000000
May 23 19:15:50 mcds1 kernel: [210728.230997] ffff880214d27a08 0000000000000046 0000000000000282 ffff88021608d330
May 23 19:15:50 mcds1 kernel: [210728.231001] 0000000000000000 ffff880215d95ac0 0000000000012980 ffff880214d27fd8
May 23 19:15:50 mcds1 kernel: [210728.231005] ffff880214d26010 0000000000012980 0000000000012980 ffff880214d27fd8
May 23 19:15:50 mcds1 kernel: [210728.231008] Call Trace:
May 23 19:15:50 mcds1 kernel: [210728.231018] [<ffffffff81798ae4>] schedule+0x24/0x70
May 23 19:15:50 mcds1 kernel: [210728.231023] [<ffffffff81204b95>] jbd2_log_wait_commit+0xb5/0x130
May 23 19:15:50 mcds1 kernel: [210728.231029] [<ffffffff81060830>] ? wake_up_bit+0x40/0x40
May 23 19:15:50 mcds1 kernel: [210728.231033] [<ffffffff81204cb4>] ? __jbd2_log_start_commit+0xa4/0xb0
May 23 19:15:50 mcds1 kernel: [210728.231037] [<ffffffff81204e4b>] jbd2_journal_force_commit_nested+0x5b/0xa0
May 23 19:15:50 mcds1 kernel: [210728.231041] [<ffffffff811b75c8>] ext4_da_writepages+0x348/0x5a0
May 23 19:15:50 mcds1 kernel: [210728.231045] [<ffffffff81074e77>] ? check_preempt_wakeup+0x1b7/0x2a0
May 23 19:15:50 mcds1 kernel: [210728.231051] [<ffffffff810f0f5b>] do_writepages+0x1b/0x30
May 23 19:15:50 mcds1 kernel: [210728.231056] [<ffffffff8115d9d4>] __writeback_single_inode+0x44/0x220
May 23 19:15:50 mcds1 kernel: [210728.231061] [<ffffffff8115e93f>] writeback_sb_inodes+0x23f/0x430
May 23 19:15:50 mcds1 kernel: [210728.231065] [<ffffffff8115ed6d>] wb_writeback+0xed/0x2b0
May 23 19:15:50 mcds1 kernel: [210728.231071] [<ffffffff8104c537>] ? lock_timer_base+0x37/0x70
May 23 19:15:50 mcds1 kernel: [210728.231075] [<ffffffff8115efcd>] wb_do_writeback+0x9d/0x240
May 23 19:15:50 mcds1 kernel: [210728.231079] [<ffffffff8104c570>] ? lock_timer_base+0x70/0x70
May 23 19:15:50 mcds1 kernel: [210728.231084] [<ffffffff8115f212>] bdi_writeback_thread+0xa2/0x230
May 23 19:15:50 mcds1 kernel: [210728.231088] [<ffffffff8115f170>] ? wb_do_writeback+0x240/0x240
May 23 19:15:50 mcds1 kernel: [210728.231092] [<ffffffff8115f170>] ? wb_do_writeback+0x240/0x240
May 23 19:15:50 mcds1 kernel: [210728.231095] [<ffffffff81060136>] kthread+0xc6/0xd0
May 23 19:15:50 mcds1 kernel: [210728.231099] [<ffffffff81060070>] ? kthread_freezable_should_stop+0x70/0x70
May 23 19:15:50 mcds1 kernel: [210728.231104] [<ffffffff817a19ac>] ret_from_fork+0x7c/0xb0
May 23 19:15:50 mcds1 kernel: [210728.231108] [<ffffffff81060070>] ? kthread_freezable_should_stop+0x70/0x70
May 23 19:15:50 mcds1 kernel: [210728.231124] jbd2/sda12-8 D ffffffff81806d60 4736 25056 2 0x00000000
May 23 19:15:50 mcds1 kernel: [210728.231128] ffff8801e436db78 0000000000000046 ffff880215c06f00 ffffffff81c10420
May 23 19:15:50 mcds1 kernel: [210728.231131] ffff880200011200 ffff8801f8f296b0 0000000000012980 ffff8801e436dfd8
May 23 19:15:50 mcds1 kernel: [210728.231135] ffff8801e436c010 0000000000012980 0000000000012980 ffff8801e436dfd8
May 23 19:15:50 mcds1 kernel: [210728.231138] Call Trace:
May 23 19:15:50 mcds1 kernel: [210728.231142] [<ffffffff81166c50>] ? __wait_on_buffer+0x30/0x30
May 23 19:15:50 mcds1 kernel: [210728.231146] [<ffffffff81798ae4>] schedule+0x24/0x70
May 23 19:15:50 mcds1 kernel: [210728.231150] [<ffffffff81798bb7>] io_schedule+0x87/0xd0
May 23 19:15:50 mcds1 kernel: [210728.231154] [<ffffffff81166c59>] sleep_on_buffer+0x9/0x10
May 23 19:15:50 mcds1 kernel: [210728.231157] [<ffffffff817971d7>] __wait_on_bit+0x57/0x80
May 23 19:15:50 mcds1 kernel: [210728.231161] [<ffffffff81165f7c>] ? submit_bh+0x13c/0x1f0
May 23 19:15:50 mcds1 kernel: [210728.231164] [<ffffffff81166c50>] ? __wait_on_buffer+0x30/0x30
May 23 19:15:50 mcds1 kernel: [210728.231168] [<ffffffff81797273>] out_of_line_wait_on_bit+0x73/0x90
May 23 19:15:50 mcds1 kernel: [210728.231172] [<ffffffff81060870>] ? autoremove_wake_function+0x40/0x40
May 23 19:15:50 mcds1 kernel: [210728.231175] [<ffffffff81166c46>] __wait_on_buffer+0x26/0x30
May 23 19:15:50 mcds1 kernel: [210728.231179] [<ffffffff811ff130>] jbd2_journal_commit_transaction+0xee0/0x1960
May 23 19:15:50 mcds1 kernel: [210728.231183] [<ffffffff817984fd>] ? __schedule+0x3ad/0x7a0
May 23 19:15:50 mcds1 kernel: [210728.231188] [<ffffffff8104c537>] ? lock_timer_base+0x37/0x70
May 23 19:15:50 mcds1 kernel: [210728.231191] [<ffffffff81204f62>] kjournald2+0xc2/0x240
May 23 19:15:50 mcds1 kernel: [210728.231195] [<ffffffff81060830>] ? wake_up_bit+0x40/0x40
May 23 19:15:50 mcds1 kernel: [210728.231199] [<ffffffff81204ea0>] ? commit_timeout+0x10/0x10
May 23 19:15:50 mcds1 kernel: [210728.231202] [<ffffffff81060136>] kthread+0xc6/0xd0
May 23 19:15:50 mcds1 kernel: [210728.231206] [<ffffffff81060070>] ? kthread_freezable_should_stop+0x70/0x70
May 23 19:15:50 mcds1 kernel: [210728.231210] [<ffffffff817a19ac>] ret_from_fork+0x7c/0xb0
May 23 19:15:50 mcds1 kernel: [210728.231213] [<ffffffff81060070>] ? kthread_freezable_should_stop+0x70/0x70
May 23 19:15:50 mcds1 kernel: [210728.231216] umount D ffffffff81806d60 5448 25241 24864 0x00000000
May 23 19:15:50 mcds1 kernel: [210728.231220] ffff8801f3a89c28 0000000000000086 ffff88021fc929f0 ffff88021608cba0
May 23 19:15:50 mcds1 kernel: [210728.231223] ffff8801f3a89bf8 ffff8801f84d8790 0000000000012980 ffff8801f3a89fd8
May 23 19:15:50 mcds1 kernel: [210728.231226] ffff8801f3a88010 0000000000012980 0000000000012980 ffff8801f3a89fd8
May 23 19:15:50 mcds1 kernel: [210728.231230] Call Trace:
May 23 19:15:50 mcds1 kernel: [210728.231234] [<ffffffff81798ae4>] schedule+0x24/0x70
May 23 19:15:50 mcds1 kernel: [210728.231237] [<ffffffff81796f4d>] schedule_timeout+0x15d/0x1f0
May 23 19:15:50 mcds1 kernel: [210728.231241] [<ffffffff8106f2b5>] ? __cond_resched+0x25/0x40
May 23 19:15:50 mcds1 kernel: [210728.231245] [<ffffffff81798f64>] wait_for_completion+0x94/0x100
May 23 19:15:50 mcds1 kernel: [210728.231248] [<ffffffff81070ca0>] ? try_to_wake_up+0x2a0/0x2a0
May 23 19:15:50 mcds1 kernel: [210728.231252] [<ffffffff8115ddb1>] ? bdi_queue_work+0x81/0x110
May 23 19:15:50 mcds1 kernel: [210728.231256] [<ffffffff8115dec3>] writeback_inodes_sb_nr+0x83/0xb0
May 23 19:15:50 mcds1 kernel: [210728.231261] [<ffffffff8115df4a>] writeback_inodes_sb+0x5a/0x70
May 23 19:15:50 mcds1 kernel: [210728.231264] [<ffffffff8116463a>] __sync_filesystem+0x4a/0x50
May 23 19:15:50 mcds1 kernel: [210728.231267] [<ffffffff81164672>] sync_filesystem+0x32/0x60
May 23 19:15:50 mcds1 kernel: [210728.231273] [<ffffffff81137be6>] generic_shutdown_super+0x36/0xe0
May 23 19:15:50 mcds1 kernel: [210728.231277] [<ffffffff81137cbc>] kill_block_super+0x2c/0x80
May 23 19:15:50 mcds1 kernel: [210728.231281] [<ffffffff811381f5>] deactivate_locked_super+0x45/0x70
May 23 19:15:50 mcds1 kernel: [210728.231284] [<ffffffff81139025>] deactivate_super+0x45/0x60
May 23 19:15:50 mcds1 kernel: [210728.231287] [<ffffffff81152f63>] mntput_no_expire+0xf3/0x150
May 23 19:15:50 mcds1 kernel: [210728.231290] [<ffffffff811537dd>] sys_umount+0xcd/0x3f0
May 23 19:15:50 mcds1 kernel: [210728.231294] [<ffffffff817a1a52>] system_call_fastpath+0x16/0x1b
Regards,
Akira Fujita
next prev parent reply other threads:[~2013-05-27 1:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-13 6:49 [BUG] xfstest269 causes deadlock on linux-3.9.0 (ext4) Akira Fujita
2013-05-14 21:37 ` Jan Kara
2013-05-15 2:57 ` Akira Fujita
2013-05-27 1:49 ` Akira Fujita [this message]
2013-05-27 8:16 ` Dmitry Monakhov
2013-05-27 8:33 ` Jan Kara
[not found] ` <51A46C79.4000105@rs.jp.nec.com>
2013-06-24 11:04 ` Jan Kara
2013-06-25 10:06 ` Akira Fujita
2013-06-25 11:27 ` Jan Kara
2013-06-26 8:38 ` Akira Fujita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51A2BB97.7030802@rs.jp.nec.com \
--to=a-fujita@rs.jp.nec.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.