public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Qiang Ma <maqianga@uniontech.com>,
	James.Bottomley@HansenPartnership.com,
	martin.petersen@oracle.com, axboe@kernel.dk, dwagner@suse.de,
	ming.lei@redhat.com, hare@suse.de
Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] scsi: Don't wait for completion of in-flight requests
Date: Tue, 26 Nov 2024 12:21:24 +0000	[thread overview]
Message-ID: <7c95b86b-68a0-41f8-a09c-3cb4b06fe61a@oracle.com> (raw)
In-Reply-To: <20241126115008.31272-1-maqianga@uniontech.com>

On 26/11/2024 11:50, Qiang Ma wrote:
> Problem:
> When the system disk uses the scsi disk bus, The main
> qemu command line includes:
> ...
> -device virtio-scsi-pci,id=scsi0 \
> -device scsi-hd,scsi-id=1,drive=drive-virtio-disk
> -drive id=drive-virtio-disk,if=none,file=/home/kvm/test.qcow2
> ...
> 
> The dmesg log is as follows::
> 
> [   50.304591][ T4382] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   50.377002][ T4382] kexec_core: Starting new kernel
> [   50.669775][  T194] psci: CPU1 killed (polled 0 ms)
> [   50.849665][  T194] psci: CPU2 killed (polled 0 ms)
> [   51.109625][  T194] psci: CPU3 killed (polled 0 ms)
> [   51.319594][  T194] psci: CPU4 killed (polled 0 ms)
> [   51.489667][  T194] psci: CPU5 killed (polled 0 ms)
> [   51.709582][  T194] psci: CPU6 killed (polled 0 ms)
> [   51.949508][   T10] psci: CPU7 killed (polled 0 ms)
> [   52.139499][   T10] psci: CPU8 killed (polled 0 ms)
> [   52.289426][   T10] psci: CPU9 killed (polled 0 ms)
> [   52.439552][   T10] psci: CPU10 killed (polled 0 ms)
> [   52.579525][   T10] psci: CPU11 killed (polled 0 ms)
> [   52.709501][   T10] psci: CPU12 killed (polled 0 ms)
> [   52.819509][  T194] psci: CPU13 killed (polled 0 ms)
> [   52.919509][  T194] psci: CPU14 killed (polled 0 ms)
> [  243.214009][  T115] INFO: task kworker/0:1:10 blocked for more than 122 seconds.
> [  243.214810][  T115]       Not tainted 6.6.0+ #1
> [  243.215517][  T115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.216390][  T115] task:kworker/0:1     state:D stack:0     pid:10    ppid:2      flags:0x00000008
> [  243.217299][  T115] Workqueue: events vmstat_shepherd
> [  243.217816][  T115] Call trace:
> [  243.218133][  T115]  __switch_to+0x130/0x1e8
> [  243.218568][  T115]  __schedule+0x660/0xcf8
> [  243.219013][  T115]  schedule+0x58/0xf0
> [  243.219402][  T115]  percpu_rwsem_wait+0xb0/0x1d0
> [  243.219880][  T115]  __percpu_down_read+0x40/0xe0
> [  243.220353][  T115]  cpus_read_lock+0x5c/0x70
> [  243.220795][  T115]  vmstat_shepherd+0x40/0x140
> [  243.221250][  T115]  process_one_work+0x170/0x3c0
> [  243.221726][  T115]  worker_thread+0x234/0x3b8
> [  243.222176][  T115]  kthread+0xf0/0x108
> [  243.222564][  T115]  ret_from_fork+0x10/0x20
> ...
> [  243.254080][  T115] INFO: task kworker/0:2:194 blocked for more than 122 seconds.
> [  243.254834][  T115]       Not tainted 6.6.0+ #1
> [  243.255529][  T115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.256378][  T115] task:kworker/0:2     state:D stack:0     pid:194   ppid:2      flags:0x00000008
> [  243.257284][  T115] Workqueue: events work_for_cpu_fn
> [  243.257793][  T115] Call trace:
> [  243.258111][  T115]  __switch_to+0x130/0x1e8
> [  243.258541][  T115]  __schedule+0x660/0xcf8
> [  243.258971][  T115]  schedule+0x58/0xf0
> [  243.259360][  T115]  schedule_timeout+0x280/0x2f0
> [  243.259832][  T115]  wait_for_common+0xcc/0x2d8
> [  243.260287][  T115]  wait_for_completion+0x20/0x38
> [  243.260767][  T115]  cpuhp_kick_ap+0xe8/0x278
> [  243.261207][  T115]  cpuhp_kick_ap_work+0x5c/0x188
> [  243.261688][  T115]  _cpu_down+0x120/0x378
> [  243.262103][  T115]  __cpu_down_maps_locked+0x20/0x38
> [  243.262609][  T115]  work_for_cpu_fn+0x24/0x40
> [  243.263059][  T115]  process_one_work+0x170/0x3c0
> [  243.263533][  T115]  worker_thread+0x234/0x3b8
> [  243.263981][  T115]  kthread+0xf0/0x108
> [  243.264405][  T115]  ret_from_fork+0x10/0x20
> [  243.264846][  T115] INFO: task kworker/15:2:639 blocked for more than 122 seconds.
> [  243.265602][  T115]       Not tainted 6.6.0+ #1
> [  243.266296][  T115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.267143][  T115] task:kworker/15:2    state:D stack:0     pid:639   ppid:2      flags:0x00000008
> [  243.268044][  T115] Workqueue: events_freezable_power_ disk_events_workfn
> [  243.268727][  T115] Call trace:
> [  243.269051][  T115]  __switch_to+0x130/0x1e8
> [  243.269481][  T115]  __schedule+0x660/0xcf8
> [  243.269903][  T115]  schedule+0x58/0xf0
> [  243.270293][  T115]  schedule_timeout+0x280/0x2f0
> [  243.270763][  T115]  io_schedule_timeout+0x50/0x70
> [  243.271245][  T115]  wait_for_common_io.constprop.0+0xb0/0x298
> [  243.271830][  T115]  wait_for_completion_io+0x1c/0x30
> [  243.272335][  T115]  blk_execute_rq+0x1d8/0x278
> [  243.272793][  T115]  scsi_execute_cmd+0x114/0x238
> [  243.273267][  T115]  sr_check_events+0xc8/0x310 [sr_mod]
> [  243.273808][  T115]  cdrom_check_events+0x2c/0x50 [cdrom]
> [  243.274408][  T115]  sr_block_check_events+0x34/0x48 [sr_mod]
> [  243.274994][  T115]  disk_check_events+0x44/0x1b0
> [  243.275468][  T115]  disk_events_workfn+0x20/0x38
> [  243.275939][  T115]  process_one_work+0x170/0x3c0
> [  243.276410][  T115]  worker_thread+0x234/0x3b8
> [  243.276855][  T115]  kthread+0xf0/0x108
> [  243.277241][  T115]  ret_from_fork+0x10/0x20
> 
> ftrace finds that it enters an endless loop, code as follows:
> 
> if (percpu_ref_tryget(&hctx->queue->q_usage_counter)) {
> 	while (blk_mq_hctx_has_requests(hctx))
> 		msleep(5);
> 	percpu_ref_put(&hctx->queue->q_usage_counter);
> }
> 
> Solution:
> Refer to the loop and dm-rq in patch commit bf0beec0607d
> ("blk-mq: drain I/O when all CPUs in a hctx are offline"),
> add a BLK_MQ_F_STACKING and set it for scsi, so we don't need
> to wait for completion of in-flight requests  to avoid a potential
> hung task.
> 
> Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
> 
> Signed-off-by: Qiang Ma <maqianga@uniontech.com>
> ---
>   drivers/scsi/scsi_lib.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index adee6f60c966..0a2d5d9327fc 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2065,7 +2065,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>   	tag_set->queue_depth = shost->can_queue;
>   	tag_set->cmd_size = cmd_size;
>   	tag_set->numa_node = dev_to_node(shost->dma_dev);
> -	tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
> +	tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING;

This should not be set for all SCSI hosts. Some SCSI hosts rely on 
bf0beec0607d.


>   	tag_set->flags |=
>   		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
>   	if (shost->queuecommand_may_block)


  reply	other threads:[~2024-11-26 12:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-26 11:50 [PATCH] scsi: Don't wait for completion of in-flight requests Qiang Ma
2024-11-26 12:21 ` John Garry [this message]
2024-11-26 13:49   ` Bart Van Assche
2024-11-27 13:53   ` Qiang Ma
2024-11-26 14:15 ` Ming Lei
2024-11-27 14:04   ` Qiang Ma
2024-11-27 15:20     ` Ming Lei
2024-12-03 13:21       ` Qiang Ma
2024-12-03 14:36         ` Ming Lei
2024-12-04  8:45           ` Qiang Ma
2024-12-04 10:02             ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c95b86b-68a0-41f8-a09c-3cb4b06fe61a@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=axboe@kernel.dk \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=maqianga@uniontech.com \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox