Re: generic/269 hangs on lastest upstream kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Cc: Jan Kara <jack@suse.cz>, Theodore Ts'o <tytso@mit.edu>,
	fstests <fstests@vger.kernel.org>
Subject: Re: generic/269 hangs on lastest upstream kernel
Date: Tue, 18 Feb 2020 12:03:11 +0100	[thread overview]
Message-ID: <20200218110311.GI16121@quack2.suse.cz> (raw)
In-Reply-To: <83cc6208-3fa3-86bb-eb91-77b90b22d98f@cn.fujitsu.com>

On Tue 18-02-20 17:46:54, Yang Xu wrote:
> 
> on 2020/02/18 16:24, Jan Kara wrote:
> > On Tue 18-02-20 11:25:37, Yang Xu wrote:
> > > on 2020/02/14 23:00, Jan Kara wrote:
> > > > On Fri 14-02-20 18:24:50, Yang Xu wrote:
> > > > > on 2020/02/14 5:10, Jan Kara wrote:
> > > > > > On Thu 13-02-20 16:49:21, Yang Xu wrote:
> > > > > > > > > When I test generic/269(ext4) on 5.6.0-rc1 kernel, it hangs.
> > > > > > > > > ----------------------------------------------
> > > > > > > > > dmesg as below:
> > > > > > > > >        76.506753] run fstests generic/269 at 2020-02-11 05:53:44
> > > > > > > > > [   76.955667] EXT4-fs (sdc): mounted filesystem with ordered data mode.
> > > > > > > > > Opts: acl,                           user_xattr
> > > > > > > > > [  100.912511] device virbr0-nic left promiscuous mode
> > > > > > > > > [  100.912520] virbr0: port 1(virbr0-nic) entered disabled state
> > > > > > > > > [  246.801561] INFO: task dd:17284 blocked for more than 122 seconds.
> > > > > > > > > [  246.801564]       Not tainted 5.6.0-rc1 #41
> > > > > > > > > [  246.801565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > > > > > > > this mes                           sage.
> > > > > > > > > [  246.801566] dd              D    0 17284  16931 0x00000080
> > > > > > > > > [  246.801568] Call Trace:
> > > > > > > > > [  246.801584]  ? __schedule+0x251/0x690
> > > > > > > > > [  246.801586]  schedule+0x40/0xb0
> > > > > > > > > [  246.801588]  wb_wait_for_completion+0x52/0x80
> > > > > > > > > [  246.801591]  ? finish_wait+0x80/0x80
> > > > > > > > > [  246.801592]  __writeback_inodes_sb_nr+0xaa/0xd0
> > > > > > > > > [  246.801593]  try_to_writeback_inodes_sb+0x3c/0x50
> > > > > > > > 
> > > > > > > > Interesting. Does the hang resolve eventually or the machine is hung
> > > > > > > > permanently? If the hang is permanent, can you do:
> > > > > > > > 
> > > > > > > > echo w >/proc/sysrq-trigger
> > > > > > > > 
> > > > > > > > and send us the stacktraces from dmesg? Thanks!
> > > > > > > Yes. the hang is permanent, log as below:
> > > > > full dmesg as attach
> > > > ...
> > > > 
> > > > Thanks! So the culprit seems to be:
> > > > 
> > > > > [  388.087799] kworker/u12:0   D    0    32      2 0x80004000
> > > > > [  388.087803] Workqueue: writeback wb_workfn (flush-8:32)
> > > > > [  388.087805] Call Trace:
> > > > > [  388.087810]  ? __schedule+0x251/0x690
> > > > > [  388.087811]  ? __switch_to_asm+0x34/0x70
> > > > > [  388.087812]  ? __switch_to_asm+0x34/0x70
> > > > > [  388.087814]  schedule+0x40/0xb0
> > > > > [  388.087816]  schedule_timeout+0x20d/0x310
> > > > > [  388.087818]  io_schedule_timeout+0x19/0x40
> > > > > [  388.087819]  wait_for_completion_io+0x113/0x180
> > > > > [  388.087822]  ? wake_up_q+0xa0/0xa0
> > > > > [  388.087824]  submit_bio_wait+0x5b/0x80
> > > > > [  388.087827]  blkdev_issue_flush+0x81/0xb0
> > > > > [  388.087834]  jbd2_cleanup_journal_tail+0x80/0xa0 [jbd2]
> > > > > [  388.087837]  jbd2_log_do_checkpoint+0xf4/0x3f0 [jbd2]
> > > > > [  388.087840]  __jbd2_log_wait_for_space+0x66/0x190 [jbd2]
> > > > > [  388.087843]  ? finish_wait+0x80/0x80
> > > > > [  388.087845]  add_transaction_credits+0x27d/0x290 [jbd2]
> > > > > [  388.087847]  ? blk_mq_make_request+0x289/0x5d0
> > > > > [  388.087849]  start_this_handle+0x10a/0x510 [jbd2]
> > > > > [  388.087851]  ? _cond_resched+0x15/0x30
> > > > > [  388.087853]  jbd2__journal_start+0xea/0x1f0 [jbd2]
> > > > > [  388.087869]  ? ext4_writepages+0x518/0xd90 [ext4]
> > > > > [  388.087875]  __ext4_journal_start_sb+0x6e/0x130 [ext4]
> > > > > [  388.087883]  ext4_writepages+0x518/0xd90 [ext4]
> > > > > [  388.087886]  ? do_writepages+0x41/0xd0
> > > > > [  388.087893]  ? ext4_mark_inode_dirty+0x1f0/0x1f0 [ext4]
> > > > > [  388.087894]  do_writepages+0x41/0xd0
> > > > > [  388.087896]  ? snprintf+0x49/0x60
> > > > > [  388.087898]  __writeback_single_inode+0x3d/0x340
> > > > > [  388.087899]  writeback_sb_inodes+0x1e5/0x480
> > > > > [  388.087901]  wb_writeback+0xfb/0x2f0
> > > > > [  388.087902]  wb_workfn+0xf0/0x430
> > > > > [  388.087903]  ? __switch_to_asm+0x34/0x70
> > > > > [  388.087905]  ? finish_task_switch+0x75/0x250
> > > > > [  388.087907]  process_one_work+0x1a7/0x370
> > > > > [  388.087909]  worker_thread+0x30/0x380
> > > > > [  388.087911]  ? process_one_work+0x370/0x370
> > > > > [  388.087912]  kthread+0x10c/0x130
> > > > > [  388.087913]  ? kthread_park+0x80/0x80
> > > > > [  388.087914]  ret_from_fork+0x35/0x40
> > > > 
> > > > This process is actually waiting for IO to complete while holding
> > > > checkpoint_mutex which holds up everybody else. The question is why the IO
> > > > doesn't complete - that's definitely outside of filesystem. Maybe a bug in
> > > > the block layer, storage driver, or something like that... What does
> > > > 'cat /sys/block/<device-with-xfstests>/inflight' show?
> > > Sorry for the late reply.
> > > This value is 0, it represent it doesn't have inflight data(but it may be
> > > counted bug or storage driver bug, is it right?).
> > > Also, it doesn't hang on my physical machine, but only hang on vm.
> > 
> > Hum, curious. Just do make sure, did you check sdc (because that appears to
> > be the stuck device)?
> Yes, I check sdc, its value is 0.
> # cat /sys/block/sdc/inflight
>        0        0

OK, thanks!

> > > So what should I do in next step(change storge disk format)?
> > 
> > I'd try couple of things:
> > 
> > 1) If you mount ext4 with barrier=0 mount option, does the problem go away?
> Yes. Use barrier=0, this case doesn't hang,

OK, so there's some problem with how the block layer is handling flush
bios...

> > 2) Can you run the test and at the same time run 'blktrace -d /dev/sdc' to
> > gather traces? Once the machine is stuck, abort blktrace, process the
> > resulting files with 'blkparse -i sdc' and send here compressed blkparse
> > output. We should be able to see what was happening with the stuck request
> > in the trace and maybe that will tell us something.
> The log size is too big(58M) and our emali limit is 5M.

OK, can you put the log somewhere for download? Alternatively you could
provide only last say 20s of the trace which should hopefully fit into the
limit...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2020-02-18 11:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11  8:14 generic/269 hangs on lastest upstream kernel Yang Xu
2020-02-12 10:54 ` Jan Kara
2020-02-13  8:49   ` Yang Xu
2020-02-13 17:08     ` Theodore Y. Ts'o
2020-02-14  1:14       ` Yang Xu
2020-02-14 14:05         ` Theodore Y. Ts'o
     [not found]           ` <7adf16bf-d527-1c25-1a24-b4d5e4d757c4@cn.fujitsu.com>
2020-02-18 14:35             ` Theodore Y. Ts'o
2020-02-19 10:57               ` Yang Xu
2020-02-13 21:10     ` Jan Kara
     [not found]       ` <062ac52c-3a16-22ef-6396-53334ed94783@cn.fujitsu.com>
2020-02-14 15:00         ` Jan Kara
2020-02-18  3:25           ` Yang Xu
2020-02-18  8:24             ` Jan Kara
2020-02-18  9:46               ` Yang Xu
2020-02-18 11:03                 ` Jan Kara [this message]
2020-02-19 10:09                   ` Yang Xu
     [not found]                     ` <73af3d5c-ca64-3ad3-aee2-1e78ee4fae4a@cn.fujitsu.com>
2020-02-19 12:43                       ` Jan Kara
2020-02-19 15:20                         ` Theodore Y. Ts'o
2020-02-20  1:35                           ` Yang Xu
2020-02-25  6:03                             ` Yang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200218110311.GI16121@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=fstests@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xuyang2018.jy@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.