Re: [BUG] xfstest269 causes deadlock on linux-3.9.0 (ext4)

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Akira Fujita <a-fujita@rs.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>, Dmitry Monakhov <dmonakhov@openvz.org>,
	ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [BUG] xfstest269 causes deadlock on linux-3.9.0 (ext4)
Date: Tue, 25 Jun 2013 13:27:38 +0200	[thread overview]
Message-ID: <20130625112738.GF20471@quack.suse.cz> (raw)
In-Reply-To: <51C96BBD.8060701@rs.jp.nec.com>

On Tue 25-06-13 19:06:53, Akira Fujita wrote:
> 
> 
> (2013/06/24 20:04), Jan Kara wrote:
> > On Tue 28-05-13 17:36:09, Akira Fujita wrote:
> >> Hello,
> >>
> >> (2013/05/27 17:33), Jan Kara wrote:
> >>> On Mon 27-05-13 12:16:01, Dmitry Monakhov wrote:
> >>>> On Mon, 27 May 2013 10:49:11 +0900, Akira Fujita <a-fujita@rs.jp.nec.com> wrote:
> >>>>> Hi Jan,
> >>>>>
> >>>>> (2013/05/15 11:57), Akira Fujita wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> (2013/05/15 6:37), Jan Kara wrote:
> >>>>>>>       Hello,
> >>>>>>>
> >>>>>>> On Mon 13-05-13 15:49:24, Akira Fujita wrote:
> >>>>>>>> I ran into the deaclock with xfs_test 269 on linux-3.9.0.
> >>>>>>>> It seems happen between jbd2_log_wait_commit, sleep_on_buffer
> >>>>>>>> and writeback_indoes (Please see ps log below).
> >>>>>>>> Once it occurs we can't touch FS anymore.
> >>>>>>>> In my case 300 - 1000 trials to occur. Is this known issue?
> >>>>>>>>
> >>>>>>>> The following kernels seems to have same problem:
> >>>>>>>> - linux-3.5-rc5
> >>>>>>>> - linux-3.8.5
> >>>>>>>> - linux-3.9-rc7
> >>>>>>>> And now I'm trying it on linux-3.10-rc1.
> >>>>>>>>
> >>>>>>>> # ./check generic/269
> >>>>>>>> FSTYP         -- ext4
> >>>>>>>> PLATFORM      -- Linux/x86_64 mcds1 3.9.0
> >>>>>>>> MKFS_OPTIONS  -- /dev/sda12
> >>>>>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sda12 /mnt/mp2
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> # ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:16,comm
> >>>>>>>>       PID   TID CLS RTPRIO  NI PRI PSR %CPU STAT WCHAN            COMMAND
> >>>>>>>>         1     1 TS       -   0  19   0  0.0 Ss   poll_schedule_ti init
> >>>>>>>>         2     2 TS       -   0  19   0  0.0 S    kthreadd         kthreadd
> >>>>>>>>         3     3 TS       -   0  19   0  0.0 S    smpboot_thread_f ksoftirqd/0
> >>>>>>>> ...
> >>>>>>>>      2391  2391 TS       -   0  19   2  0.1 D    jbd2_log_wait_co flush-8:0
> >>>>>>>> ...
> >>>>>>>> 22647 22647 TS       -   0  19   3  0.0 S    worker_thread    kworker/3:1
> >>>>>>>> 22655 22655 TS       -   0  19   0  0.0 S    hrtimer_nanoslee sleep
> >>>>>>>> 22657 22657 TS       -   0  19   2  0.0 R+   -                ps
> >>>>>>>> 25330 25330 TS       -   0  19   0  0.0 S    worker_thread    kworker/0:0
> >>>>>>>> 28963 28963 TS       -   0  19   1  0.0 S+   wait             loop_xfstests.s
> >>>>>>>> 28964 28964 TS       -   0  19   1  0.0 S+   wait             check
> >>>>>>>> 29180 29180 TS       -   0  19   3  0.0 S    kjournald2       jbd2/sda11-8
> >>>>>>>> 29181 29181 TS       - -20  39   3  0.0 S<   rescuer_thread   ext4-dio-unwrit
> >>>>>>>> 29199 29199 TS       -   0  19   3  0.0 S+   wait             269
> >>>>>>>> 29391 29391 TS       -   0  19   0  0.6 D    sleep_on_buffer  jbd2/sda12-8
> >>>>>>>> 29392 29392 TS       - -20  39   3  0.0 S<   rescuer_thread   ext4-dio-unwrit
> >>>>>>>> 29394 29394 TS       -   0  19   0  0.0 S    wait             fsstress
> >>>>>>>> 29505 29505 TS       -   0  19   3  0.0 D    writeback_inodes fsstress
> >>>>>>>>
> >>>>>>>> # df -T /dev/sda11 /dev/sda12
> >>>>>>>> Filesystem    Type   1K-blocks      Used Available Use% Mounted on
> >>>>>>>> /dev/sda11    ext4     9857264     22308   9327564   1% /mnt/mp1
> >>>>>>>> /dev/sda12    ext4      499656    499656         0 100% /mnt/mp2
> >>>>>>>       Thanks for report. No I don't think this problem has been reported
> >>>>>>> before. Seeing that sda12 is out of space and fsstress hangs in
> >>>>>>> writeback_inodes(), I suspect we have some deadlock in ENOSPC recovery path
> >>>>>>> when we want to flush data to disk to reduce delalloc uncertainty. Can you
> >>>>>>> run 'echo w >/proc/sysrq-trigger' when the deadlock happens and post your
> >>>>>>> dmesg here? Thanks!
> >>>>>>>
> >>>>>>
> >>>>>> Thanks for reply.
> >>>>>> I'll take that information when the deadlock happens again.
> >>>>>
> >>>>> Finally, I got the following messages on linux-3.9.0.
> >>>>> Could you take a look?
> >>>> Is it a real deadlock?
> >>>> umount wait for flush-8:0
> >>>> flush wait for jbd2
> >>>> jbd2 wait for bh.
> >>>> All buffer-heads will be written at sooner or later.
> >>>> It probably will require significant amount of time, because fs is too
> >>>> fragmented(that is what this test all about), but it is not a deadlock.
> >>>     Yeah, that is my analysis as well. I was only thinking whether we
> >>> couldn't forget some buffer locked on some error recovery path (we are
> >>> getting ENOSPC) but it's not obvious where it would be. Akira-san, could
> >>> you also send us disassembly of your jbd2_journal_commit_transaction()
> >>> function so that we can check where exactly offset 0xee0 is in the
> >>> function?
> >>>
> >>> 								Honza
> >>
> >> I attached disasemble code of jbd2_journal_commit_tnrasaction,
> >> please check it.
> >    Thanks. I've looked into the disassembly now and we are waiting for the
> > write of commit record to finish. We have submitted the buffer just before
> > so it can hardly be some issue like forgotten locked buffer. Really
> > strange. It looks like some problem in block layer, device driver, or HW
> > itself... I'm thinking how to debug this further - it would be good if we
> > had blktrace data just before the moment when the hang happens but given
> > how hard this is to reproduce the traces would be extremly large...
> > 
> > One idea - can you try whether the problem reproduces with 'nobarrier'
> > mount option? It could be some problem with flush / fua handling.
> > 
> 
> OK, I'll try to reproduce this issue with 'nobarrier'
> (and take blktrace log if I can).
  You can restart the blktrace every 5 minutes or so, so that trace files
don't get too big. If we have a trace for a minute before the machine crashes
it should be enough for debugging.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2013-06-25 11:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-13  6:49 [BUG] xfstest269 causes deadlock on linux-3.9.0 (ext4) Akira Fujita
2013-05-14 21:37 ` Jan Kara
2013-05-15  2:57   ` Akira Fujita
2013-05-27  1:49     ` Akira Fujita
2013-05-27  8:16       ` Dmitry Monakhov
2013-05-27  8:33         ` Jan Kara
     [not found]           ` <51A46C79.4000105@rs.jp.nec.com>
2013-06-24 11:04             ` Jan Kara
2013-06-25 10:06               ` Akira Fujita
2013-06-25 11:27                 ` Jan Kara [this message]
2013-06-26  8:38                   ` Akira Fujita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130625112738.GF20471@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=a-fujita@rs.jp.nec.com \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).