From: yi.zhang@redhat.com (Yi Zhang)
Subject: [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
Date: Wed, 6 Jun 2018 16:32:12 +0800 [thread overview]
Message-ID: <1dd52330-93db-a89a-e1d8-fbda9060d2bd@redhat.com> (raw)
In-Reply-To: <CACVXFVM4JAYd79EPMBNg_0NcMg+y2_6s+UcDBgt_8GsEXb-vWA@mail.gmail.com>
Hi Ming
I just tried 4.17.0 + Isreal's patch, but with bad luck, still can
reproduce the issue bellow.
On 06/05/2018 04:56 PM, Ming Lei wrote:
> On Mon, Jun 4, 2018@12:46 AM, Yi Zhang <yi.zhang@redhat.com> wrote:
>> Hi Max/Sagi
>>
>> Thanks for looking this issue.
>> I just tried Isreal and Sagi's patch, the kernel NULL pointer cannot be reproduced any more.
>> But the reset operation always hang with less than 100 times, then I found IO hang on Host side and "keep-alive timer expired" observed on target[2].
>> I also found some other error log during my test[1].
>>
>> [1]
>> [ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>> [ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>>
>> [2]
>> Target:
>> [ 647.523888] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
>> --snip--
>> [ 715.975355] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
>> [ 731.214264] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>> [ 731.222435] nvmet: ctrl 1 fatal error occurred!
>>
>> Host:
>> [ 245.219461] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.1.92:4420
>> [ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>> [ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>> [ 245.302714] nvme nvme0: creating 40 I/O queues.
>> [ 246.011104] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.1.92:4420
>> [ 252.159464] nvme nvme0: Property Set error: 7, offset 0x14
>> [ 252.204801] nvme nvme0: creating 40 I/O queues.
>> [ 255.005423] nvme nvme0: Property Set error: 7, offset 0x14
>> --snip--
>> [ 426.191472] nvme nvme0: creating 40 I/O queues.
>> [ 428.994740] nvme nvme0: Property Set error: 7, offset 0x14
>> [ 429.042588] nvme nvme0: creating 40 I/O queues.
>> [ 615.682478] INFO: task kworker/u81:8:685 blocked for more than 120 seconds.
>> [ 615.690891] Not tainted 4.17.0-rc7.Sagi+ #10
>> [ 615.696721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 615.706003] kworker/u81:8 D 0 685 2 0x80000000
>> [ 615.712675] Workqueue: writeback wb_workfn (flush-259:0)
>> [ 615.719150] Call Trace:
>> [ 615.722423] ? __schedule+0x290/0x870
>> [ 615.727053] schedule+0x32/0x80
>> [ 615.731092] io_schedule+0x12/0x40
>> [ 615.735425] blk_mq_get_tag+0x12e/0x260
>> [ 615.740240] ? remove_wait_queue+0x60/0x60
>> [ 615.745343] blk_mq_get_request+0xce/0x390
>> [ 615.750437] ? __blk_mq_sched_bio_merge+0xec/0x190
>> [ 615.756311] blk_mq_make_request+0x11c/0x560
>> [ 615.761608] generic_make_request+0x18c/0x390
>> [ 615.766996] submit_bio+0x6e/0x130
>> [ 615.771320] ? guard_bio_eod+0x2c/0xa0
>> [ 615.776020] submit_bh_wbc+0x157/0x190
>> [ 615.780727] __block_write_full_page+0x14b/0x400
>> [ 615.786406] __writepage+0x19/0x50
>> [ 615.790722] write_cache_pages+0x222/0x470
>> [ 615.795803] ? bdi_set_max_ratio+0x70/0x70
>> [ 615.800895] generic_writepages+0x51/0x80
>> [ 615.805883] ? __wake_up_common_lock+0x87/0xc0
>> [ 615.811356] do_writepages+0x1a/0x70
>> [ 615.815849] __writeback_single_inode+0x3d/0x340
>> [ 615.821508] writeback_sb_inodes+0x24f/0x4b0
>> [ 615.826784] __writeback_inodes_wb+0x87/0xb0
>> [ 615.832060] wb_writeback+0x27c/0x310
>> [ 615.836651] wb_workfn+0x306/0x450
>> [ 615.840952] process_one_work+0x158/0x360
>> [ 615.845932] worker_thread+0x47/0x3e0
>> [ 615.850517] kthread+0xf8/0x130
>> [ 615.854518] ? max_active_store+0x80/0x80
>> [ 615.859480] ? kthread_bind+0x10/0x10
>> [ 615.864063] ret_from_fork+0x35/0x40
> This one seems one fixed blk-mq issue, and the following merged commit
> might be helpful:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6fc46498784e799d3eb95d83079180e413c4e7d
>
>
> Thanks,
> Ming Lei
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2018-06-06 8:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1119455866.5604170.1527936593726.JavaMail.zimbra@redhat.com>
2018-06-02 11:25 ` [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer Yi Zhang
2018-06-03 12:20 ` Sagi Grimberg
2018-06-03 12:59 ` Max Gurtovoy
2018-06-03 16:46 ` Yi Zhang
2018-06-05 8:56 ` Ming Lei
2018-06-06 8:32 ` Yi Zhang [this message]
2018-06-06 9:48 ` Max Gurtovoy
2018-06-07 3:20 ` Yi Zhang
2018-06-07 8:27 ` Sagi Grimberg
2018-06-07 11:02 ` Yi Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1dd52330-93db-a89a-e1d8-fbda9060d2bd@redhat.com \
--to=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox