From: Ming Lei <ming.lei@redhat.com>
To: "chenxiang (M)" <chenxiang66@hisilicon.com>
Cc: John Garry <john.garry@huawei.com>,
Hannes Reinecke <hare@suse.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Bart Van Assche <bvanassche@acm.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Ewan Milne <emilne@redhat.com>, Long Li <longli@microsoft.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>
Subject: Re: [bug report] Hang on sync after dd
Date: Wed, 2 Dec 2020 15:26:19 +0800 [thread overview]
Message-ID: <20201202072619.GA511454@T590> (raw)
In-Reply-To: <8cb5cd8e-5a48-dc36-879c-37950e6228c8@hisilicon.com>
On Wed, Dec 02, 2020 at 02:22:10PM +0800, chenxiang (M) wrote:
>
>
> 在 2020/12/2 11:22, Ming Lei 写道:
> > On Wed, Dec 02, 2020 at 09:44:48AM +0800, chenxiang (M) wrote:
> > >
> > > 在 2020/12/1 20:34, Ming Lei 写道:
> > > > On Mon, Nov 30, 2020 at 11:22:33AM +0000, John Garry wrote:
> > > > > Hi all,
> > > > >
> > > > > Some guys internally upgraded to v5.10-rcX and start to see a hang after dd
> > > > > + sync for a large file:
> > > > > - mount /dev/sda1 (ext4 filesystem) to directory /mnt;
> > > > > - run "if=/dev/zero of=test1 bs=1M count=2000" on directory /mnt;
> > > > > - run "sync"
> > > > >
> > > > > and get:
> > > > >
> > > > > [ 367.912761] INFO: task jbd2/sdb1-8:3602 blocked for more than 120
> > > > > seconds.
> > > > > [ 367.919618] Not tainted 5.10.0-rc1-109488-g32ded76956b6 #948
> > > > > [ 367.925776] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > > disables this message.
> > > > > [ 367.933579] task:jbd2/sdb1-8 state:D stack: 0 pid: 3602
> > > > > ppid: 2 flags:0x00000028
> > > > > [ 367.941901] Call trace:
> > > > > [ 367.944351] __switch_to+0xb8/0x168
> > > > > [ 367.947840] __schedule+0x30c/0x670
> > > > > [ 367.951326] schedule+0x70/0x108
> > > > > [ 367.954550] io_schedule+0x1c/0xe8
> > > > > [ 367.957948] bit_wait_io+0x18/0x68
> > > > > [ 367.961346] __wait_on_bit+0x78/0xf0
> > > > > [ 367.964919] out_of_line_wait_on_bit+0x8c/0xb0
> > > > > [ 367.969356] __wait_on_buffer+0x30/0x40
> > > > > [ 367.973188] jbd2_journal_commit_transaction+0x1370/0x1958
> > > > > [ 367.978661] kjournald2+0xcc/0x260
> > > > > [ 367.982061] kthread+0x150/0x158
> > > > > [ 367.985288] ret_from_fork+0x10/0x34
> > > > > [ 367.988860] INFO: task sync:3823 blocked for more than 120 seconds.
> > > > > [ 367.995102] Not tainted 5.10.0-rc1-109488-g32ded76956b6 #948
> > > > > [ 368.001265] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > > disables this message.
> > > > > [ 368.009067] task:sync state:D stack: 0 pid: 3823 ppid:
> > > > > 3450 flags:0x00000009
> > > > > [ 368.017397] Call trace:
> > > > > [ 368.019841] __switch_to+0xb8/0x168
> > > > > [ 368.023320] __schedule+0x30c/0x670
> > > > > [ 368.026804] schedule+0x70/0x108
> > > > > [ 368.030025] jbd2_log_wait_commit+0xbc/0x158
> > > > > [ 368.034290] ext4_sync_fs+0x188/0x1c8
> > > > > [ 368.037947] sync_fs_one_sb+0x30/0x40
> > > > > [ 368.041606] iterate_supers+0x9c/0x138
> > > > > [ 368.045350] ksys_sync+0x64/0xc0
> > > > > [ 368.048569] __arm64_sys_sync+0x10/0x20
> > > > > [ 368.052398] el0_svc_common.constprop.3+0x68/0x170
> > > > > [ 368.057177] do_el0_svc+0x24/0x90
> > > > > [ 368.060482] el0_sync_handler+0x118/0x168
> > > > > [ 368.064478] el0_sync+0x158/0x180
> > > > >
> > > > > The issue was reported here originally:
> > > > > https://lore.kernel.org/linux-ext4/4d18326e-9ca2-d0cb-7cb8-cb56981280da@hisilicon.com/
> > > > >
> > > > > But it looks like issue related to recent work for SCSI MQ.
> > > > >
> > > > > They can only create with hisi_sas v3 hw. I could not create with megaraid
> > > > > sas on the same dev platform or hisi_sas on a similar dev board.
> > > > >
> > > > > Reverting "scsi: core: Only re-run queue in scsi_end_request() if device
> > > > > queue is busy" seems solve the issue. Also, checking out to patch prior to
> > > > > "scsi: hisi_sas: Switch v3 hw to MQ" seems to not have the issue.
> > > > If the issue can be reproduced, you may try the following patch:
> > > I tried the change, and the issue is still.
> > > We find that the number of completed IO is less than dispatched, but from
> > > sysfs of block device (such as /sys/devices/pci0000:74/0000:74:02.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda/sda1/inflight),
> > >
> > > the number of inflight is 0.
> > Hello chenxiang,
> >
> > Can you collect the debugfs log via the following commands after the io
> > hang is triggered?
> >
> > 1) debugfs log:
> >
> > (cd /sys/kernel/debug/block/sda && find . -type f -exec grep -aH . {} \;)
> >
> > 2) scsi sysfs info:
> >
> > (cd /sys/block/sda/device && find . -type f -exec grep -aH . {} \;)
> >
> > Suppose the disk is /dev/sda.
>
> The issue occurs on /dev/sdb1, and those logs are as follows (please notice
> that i add the change you provide):
Hello chenxiang,
Please try the following patch:
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 60c7a7d74852..03c6d0620bfd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
break;
case BLK_STS_RESOURCE:
case BLK_STS_ZONE_RESOURCE:
- if (atomic_read(&sdev->device_busy) ||
- scsi_device_blocked(sdev))
+ if (scsi_device_blocked(sdev))
ret = BLK_STS_DEV_RESOURCE;
break;
default:
Thanks,
Ming
next prev parent reply other threads:[~2020-12-02 7:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-30 11:22 [bug report] Hang on sync after dd John Garry
2020-12-01 10:26 ` Kashyap Desai
2020-12-01 11:48 ` John Garry
2020-12-01 12:34 ` Ming Lei
2020-12-02 1:44 ` chenxiang (M)
2020-12-02 3:22 ` Ming Lei
2020-12-02 6:22 ` chenxiang (M)
2020-12-02 7:26 ` Ming Lei [this message]
2020-12-02 9:06 ` chenxiang (M)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201202072619.GA511454@T590 \
--to=ming.lei@redhat.com \
--cc=bvanassche@acm.org \
--cc=chenxiang66@hisilicon.com \
--cc=emilne@redhat.com \
--cc=hare@suse.com \
--cc=john.garry@huawei.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=martin.petersen@oracle.com \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.