* [PATCH v3] scsi_debug: add new defer_type for mq_poll
@ 2021-02-09 22:56 Douglas Gilbert
2021-02-10 3:08 ` Kashyap Desai
0 siblings, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-09 22:56 UTC (permalink / raw)
To: linux-scsi; +Cc: martin.petersen, jejb, hare, kashyap.desai
Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
that have REQ_HIPRI set in cmd_flags field. It is expected that
these requests will be polled via the mq_poll entry point which
is driven by calls to blk_poll() in the block layer. Therefore
timer events are not 'wired up' in the normal fashion.
There are still cases with short delays (e.g. < 10 microseconds)
where by the time the command response processing occurs, the delay
is already exceeded in which case the code calls scsi_done()
directly. In such cases there is no window for mq_poll() to be
called.
Add 'mq_polls' counter that increments on each scsi_done() called
via the mq_poll entry point. Can be used to show (with 'cat
/proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
completions rather than some other mechanism.
This patch is against the 5.12/scsi-staging branch which includes
a98d6bdf181eb71bf4686666eaf47c642a061642
scsi_debug : iouring iopoll support
which it alters. So this patch is a fix of that patch.
Changes since version 2 [sent 20210206 to linux-scsi list]
- the sdebug_blk_mq_poll() callback didn't cope with the
uncommon case where sqcp->sd_dp is NULL. Fix.
Changes since version 1 [sent 20210201 to linux-scsi list]
- harden SDEB_DEFER_POLL which broke under testing
- add mq_polls counter for debug output
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
drivers/scsi/scsi_debug.c | 141 +++++++++++++++++++++++++-------------
1 file changed, 94 insertions(+), 47 deletions(-)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 746eec521f79..b9ec3a47fbf1 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -322,17 +322,19 @@ struct sdeb_store_info {
container_of(d, struct sdebug_host_info, dev)
enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
- SDEB_DEFER_WQ = 2};
+ SDEB_DEFER_WQ = 2, SDEB_DEFER_POLL = 3};
struct sdebug_defer {
struct hrtimer hrt;
struct execute_work ew;
+ ktime_t cmpl_ts;/* time since boot to complete this cmd */
int sqa_idx; /* index of sdebug_queue array */
int qc_idx; /* index of sdebug_queued_cmd array within sqa_idx */
int hc_idx; /* hostwide tag index */
int issuing_cpu;
bool init_hrt;
bool init_wq;
+ bool init_poll;
bool aborted; /* true when blk_abort_request() already called */
enum sdeb_defer_type defer_t;
};
@@ -357,6 +359,7 @@ static atomic_t sdebug_completions; /* count of deferred completions */
static atomic_t sdebug_miss_cpus; /* submission + completion cpus differ */
static atomic_t sdebug_a_tsf; /* 'almost task set full' counter */
static atomic_t sdeb_inject_pending;
+static atomic_t sdeb_mq_poll_count; /* bumped when mq_poll returns > 0 */
struct opcode_info_t {
u8 num_attached; /* 0 if this is it (i.e. a leaf); use 0xff */
@@ -4730,7 +4733,6 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
struct scsi_cmnd *scp;
struct sdebug_dev_info *devip;
- sd_dp->defer_t = SDEB_DEFER_NONE;
if (unlikely(aborted))
sd_dp->aborted = false;
qc_idx = sd_dp->qc_idx;
@@ -4745,6 +4747,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
return;
}
spin_lock_irqsave(&sqp->qc_lock, iflags);
+ sd_dp->defer_t = SDEB_DEFER_NONE;
sqcp = &sqp->qc_arr[qc_idx];
scp = sqcp->a_cmnd;
if (unlikely(scp == NULL)) {
@@ -5517,40 +5520,66 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
kt -= d;
}
}
- if (!sd_dp->init_hrt) {
- sd_dp->init_hrt = true;
- sqcp->sd_dp = sd_dp;
- hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
- HRTIMER_MODE_REL_PINNED);
- sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
- sd_dp->sqa_idx = sqp - sdebug_q_arr;
- sd_dp->qc_idx = k;
+ sd_dp->cmpl_ts = ktime_add(ns_to_ktime(ns_from_boot), kt);
+ if (cmnd->request->cmd_flags & REQ_HIPRI) {
+ spin_lock_irqsave(&sqp->qc_lock, iflags);
+ if (!sd_dp->init_poll) {
+ sd_dp->init_poll = true;
+ sqcp->sd_dp = sd_dp;
+ sd_dp->sqa_idx = sqp - sdebug_q_arr;
+ sd_dp->qc_idx = k;
+ }
+ sd_dp->defer_t = SDEB_DEFER_POLL;
+ spin_unlock_irqrestore(&sqp->qc_lock, iflags);
+ } else {
+ if (!sd_dp->init_hrt) {
+ sd_dp->init_hrt = true;
+ sqcp->sd_dp = sd_dp;
+ hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
+ HRTIMER_MODE_REL_PINNED);
+ sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
+ sd_dp->sqa_idx = sqp - sdebug_q_arr;
+ sd_dp->qc_idx = k;
+ }
+ sd_dp->defer_t = SDEB_DEFER_HRT;
+ /* schedule the invocation of scsi_done() for a later time */
+ hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
}
if (sdebug_statistics)
sd_dp->issuing_cpu = raw_smp_processor_id();
- sd_dp->defer_t = SDEB_DEFER_HRT;
- /* schedule the invocation of scsi_done() for a later time */
- hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
} else { /* jdelay < 0, use work queue */
- if (!sd_dp->init_wq) {
- sd_dp->init_wq = true;
- sqcp->sd_dp = sd_dp;
- sd_dp->sqa_idx = sqp - sdebug_q_arr;
- sd_dp->qc_idx = k;
- INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
- }
- if (sdebug_statistics)
- sd_dp->issuing_cpu = raw_smp_processor_id();
- sd_dp->defer_t = SDEB_DEFER_WQ;
if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
atomic_read(&sdeb_inject_pending)))
sd_dp->aborted = true;
- schedule_work(&sd_dp->ew.work);
- if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
- atomic_read(&sdeb_inject_pending))) {
+ sd_dp->cmpl_ts = ns_to_ktime(ns_from_boot);
+ if (cmnd->request->cmd_flags & REQ_HIPRI) {
+ spin_lock_irqsave(&sqp->qc_lock, iflags);
+ if (!sd_dp->init_poll) {
+ sd_dp->init_poll = true;
+ sqcp->sd_dp = sd_dp;
+ sd_dp->sqa_idx = sqp - sdebug_q_arr;
+ sd_dp->qc_idx = k;
+ }
+ sd_dp->defer_t = SDEB_DEFER_POLL;
+ spin_unlock_irqrestore(&sqp->qc_lock, iflags);
+ } else {
+ if (!sd_dp->init_wq) {
+ sd_dp->init_wq = true;
+ sqcp->sd_dp = sd_dp;
+ sd_dp->sqa_idx = sqp - sdebug_q_arr;
+ sd_dp->qc_idx = k;
+ INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
+ }
+ sd_dp->defer_t = SDEB_DEFER_WQ;
+ schedule_work(&sd_dp->ew.work);
+ }
+ if (sdebug_statistics)
+ sd_dp->issuing_cpu = raw_smp_processor_id();
+ if (unlikely(sd_dp->aborted)) {
sdev_printk(KERN_INFO, sdp, "abort request tag %d\n", cmnd->request->tag);
blk_abort_request(cmnd->request);
atomic_set(&sdeb_inject_pending, 0);
+ sd_dp->aborted = false;
}
}
if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) && scsi_result == device_qfull_result))
@@ -5779,11 +5808,12 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
dix_reads, dix_writes, dif_errors);
seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
sdebug_statistics);
- seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
+ seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d, mq_polls=%d\n",
atomic_read(&sdebug_cmnd_count),
atomic_read(&sdebug_completions),
"miss_cpus", atomic_read(&sdebug_miss_cpus),
- atomic_read(&sdebug_a_tsf));
+ atomic_read(&sdebug_a_tsf),
+ atomic_read(&sdeb_mq_poll_count));
seq_printf(m, "submit_queues=%d\n", submit_queues);
for (j = 0, sqp = sdebug_q_arr; j < submit_queues; ++j, ++sqp) {
@@ -7246,52 +7276,68 @@ static int sdebug_map_queues(struct Scsi_Host *shost)
static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
{
- int qc_idx;
- int retiring = 0;
+ bool first;
+ bool retiring = false;
+ int num_entries = 0;
+ unsigned int qc_idx = 0;
unsigned long iflags;
+ ktime_t kt_from_boot = ktime_get_boottime();
struct sdebug_queue *sqp;
struct sdebug_queued_cmd *sqcp;
struct scsi_cmnd *scp;
struct sdebug_dev_info *devip;
- int num_entries = 0;
+ struct sdebug_defer *sd_dp;
sqp = sdebug_q_arr + queue_num;
+ spin_lock_irqsave(&sqp->qc_lock, iflags);
- do {
- spin_lock_irqsave(&sqp->qc_lock, iflags);
- qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
- if (unlikely((qc_idx < 0) || (qc_idx >= sdebug_max_queue)))
- goto out;
+ for (first = true; first || qc_idx + 1 < sdebug_max_queue; ) {
+ if (first) {
+ qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
+ first = false;
+ } else {
+ qc_idx = find_next_bit(sqp->in_use_bm, sdebug_max_queue, qc_idx + 1);
+ }
+ if (unlikely(qc_idx >= sdebug_max_queue))
+ break;
sqcp = &sqp->qc_arr[qc_idx];
+ sd_dp = sqcp->sd_dp;
+ if (unlikely(!sd_dp))
+ continue;
scp = sqcp->a_cmnd;
if (unlikely(scp == NULL)) {
- pr_err("scp is NULL, queue_num=%d, qc_idx=%d from %s\n",
+ pr_err("scp is NULL, queue_num=%d, qc_idx=%u from %s\n",
queue_num, qc_idx, __func__);
- goto out;
+ break;
}
+ if (sd_dp->defer_t == SDEB_DEFER_POLL) {
+ if (kt_from_boot < sd_dp->cmpl_ts)
+ continue;
+
+ } else /* ignoring non REQ_HIPRI requests */
+ continue;
devip = (struct sdebug_dev_info *)scp->device->hostdata;
if (likely(devip))
atomic_dec(&devip->num_in_q);
else
pr_err("devip=NULL from %s\n", __func__);
if (unlikely(atomic_read(&retired_max_queue) > 0))
- retiring = 1;
+ retiring = true;
sqcp->a_cmnd = NULL;
if (unlikely(!test_and_clear_bit(qc_idx, sqp->in_use_bm))) {
- pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%d from %s\n",
+ pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%u from %s\n",
sqp, queue_num, qc_idx, __func__);
- goto out;
+ break;
}
-
if (unlikely(retiring)) { /* user has reduced max_queue */
int k, retval;
retval = atomic_read(&retired_max_queue);
if (qc_idx >= retval) {
pr_err("index %d too large\n", retval);
- goto out;
+ break;
}
k = find_last_bit(sqp->in_use_bm, retval);
if ((k < sdebug_max_queue) || (k == retval))
@@ -7299,17 +7345,18 @@ static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
else
atomic_set(&retired_max_queue, k + 1);
}
+ sd_dp->defer_t = SDEB_DEFER_NONE;
spin_unlock_irqrestore(&sqp->qc_lock, iflags);
scp->scsi_done(scp); /* callback to mid level */
+ spin_lock_irqsave(&sqp->qc_lock, iflags);
num_entries++;
- } while (1);
-
-out:
+ }
spin_unlock_irqrestore(&sqp->qc_lock, iflags);
+ if (num_entries > 0)
+ atomic_add(num_entries, &sdeb_mq_poll_count);
return num_entries;
}
-
static int scsi_debug_queuecommand(struct Scsi_Host *shost,
struct scsi_cmnd *scp)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
2021-02-09 22:56 [PATCH v3] scsi_debug: add new defer_type for mq_poll Douglas Gilbert
@ 2021-02-10 3:08 ` Kashyap Desai
2021-02-10 17:31 ` Douglas Gilbert
2021-02-11 5:56 ` Douglas Gilbert
0 siblings, 2 replies; 7+ messages in thread
From: Kashyap Desai @ 2021-02-10 3:08 UTC (permalink / raw)
To: Douglas Gilbert
Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
Hannes Reinecke
[-- Attachment #1: Type: text/plain, Size: 15705 bytes --]
On Wed, Feb 10, 2021 at 4:26 AM Douglas Gilbert <dgilbert@interlog.com> wrote:
>
> Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
> that have REQ_HIPRI set in cmd_flags field. It is expected that
> these requests will be polled via the mq_poll entry point which
> is driven by calls to blk_poll() in the block layer. Therefore
> timer events are not 'wired up' in the normal fashion.
>
> There are still cases with short delays (e.g. < 10 microseconds)
> where by the time the command response processing occurs, the delay
> is already exceeded in which case the code calls scsi_done()
> directly. In such cases there is no window for mq_poll() to be
> called.
>
> Add 'mq_polls' counter that increments on each scsi_done() called
> via the mq_poll entry point. Can be used to show (with 'cat
> /proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
> completions rather than some other mechanism.
>
>
> This patch is against the 5.12/scsi-staging branch which includes
> a98d6bdf181eb71bf4686666eaf47c642a061642
> scsi_debug : iouring iopoll support
> which it alters. So this patch is a fix of that patch.
>
> Changes since version 2 [sent 20210206 to linux-scsi list]
> - the sdebug_blk_mq_poll() callback didn't cope with the
> uncommon case where sqcp->sd_dp is NULL. Fix.
Hi Doug, I tried this patch on top of below iouring patch series -
"[v3,1/4] add io_uring with IOPOLL support in scsi layer"
After applying patch, I am seeing an IO hang issue - I see
io_wqe_worker goes into a tight loop.
18210 root 20 0 1316348 3552 2376 R 99.1 0.0 0:24.09
fio 18303 root 20 0
0 0 0 D 78.8 0.0 0:01.75 io_wqe_worker-0
18219 root 20 0 0 0 0 R 71.7 0.0
0:06.59 io_wqe_worker-0
I used below command -
insmod drivers/scsi/scsi_debug.ko dev_size_mb=1024 sector_size=512
add_host=24 per_host_store=1 ndelay=10000 host_max_queue=32
submit_queues=76 num_parts=1 poll_queues=8
and here is my fio script -
[global]
ioengine=io_uring
hipri=1
direct=1
runtime=30s
rw=randread
norandommap
bs=4k
iodepth=64
[seqprecon]
filename=/dev/sdd
[seqprecon]
filename=/dev/sde
[seqprecon]
filename=/dev/sdf
I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
with IOPOLL support in the scsi layer" except scsi_debug. We can work
on scsi_debug and send standalone patch.
Kashyap
>
> Changes since version 1 [sent 20210201 to linux-scsi list]
> - harden SDEB_DEFER_POLL which broke under testing
> - add mq_polls counter for debug output
>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
> drivers/scsi/scsi_debug.c | 141 +++++++++++++++++++++++++-------------
> 1 file changed, 94 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index 746eec521f79..b9ec3a47fbf1 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -322,17 +322,19 @@ struct sdeb_store_info {
> container_of(d, struct sdebug_host_info, dev)
>
> enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
> - SDEB_DEFER_WQ = 2};
> + SDEB_DEFER_WQ = 2, SDEB_DEFER_POLL = 3};
>
> struct sdebug_defer {
> struct hrtimer hrt;
> struct execute_work ew;
> + ktime_t cmpl_ts;/* time since boot to complete this cmd */
> int sqa_idx; /* index of sdebug_queue array */
> int qc_idx; /* index of sdebug_queued_cmd array within sqa_idx */
> int hc_idx; /* hostwide tag index */
> int issuing_cpu;
> bool init_hrt;
> bool init_wq;
> + bool init_poll;
> bool aborted; /* true when blk_abort_request() already called */
> enum sdeb_defer_type defer_t;
> };
> @@ -357,6 +359,7 @@ static atomic_t sdebug_completions; /* count of deferred completions */
> static atomic_t sdebug_miss_cpus; /* submission + completion cpus differ */
> static atomic_t sdebug_a_tsf; /* 'almost task set full' counter */
> static atomic_t sdeb_inject_pending;
> +static atomic_t sdeb_mq_poll_count; /* bumped when mq_poll returns > 0 */
>
> struct opcode_info_t {
> u8 num_attached; /* 0 if this is it (i.e. a leaf); use 0xff */
> @@ -4730,7 +4733,6 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
> struct scsi_cmnd *scp;
> struct sdebug_dev_info *devip;
>
> - sd_dp->defer_t = SDEB_DEFER_NONE;
> if (unlikely(aborted))
> sd_dp->aborted = false;
> qc_idx = sd_dp->qc_idx;
> @@ -4745,6 +4747,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
> return;
> }
> spin_lock_irqsave(&sqp->qc_lock, iflags);
> + sd_dp->defer_t = SDEB_DEFER_NONE;
> sqcp = &sqp->qc_arr[qc_idx];
> scp = sqcp->a_cmnd;
> if (unlikely(scp == NULL)) {
> @@ -5517,40 +5520,66 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
> kt -= d;
> }
> }
> - if (!sd_dp->init_hrt) {
> - sd_dp->init_hrt = true;
> - sqcp->sd_dp = sd_dp;
> - hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
> - HRTIMER_MODE_REL_PINNED);
> - sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
> - sd_dp->sqa_idx = sqp - sdebug_q_arr;
> - sd_dp->qc_idx = k;
> + sd_dp->cmpl_ts = ktime_add(ns_to_ktime(ns_from_boot), kt);
> + if (cmnd->request->cmd_flags & REQ_HIPRI) {
> + spin_lock_irqsave(&sqp->qc_lock, iflags);
> + if (!sd_dp->init_poll) {
> + sd_dp->init_poll = true;
> + sqcp->sd_dp = sd_dp;
> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
> + sd_dp->qc_idx = k;
> + }
> + sd_dp->defer_t = SDEB_DEFER_POLL;
> + spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> + } else {
> + if (!sd_dp->init_hrt) {
> + sd_dp->init_hrt = true;
> + sqcp->sd_dp = sd_dp;
> + hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
> + HRTIMER_MODE_REL_PINNED);
> + sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
> + sd_dp->qc_idx = k;
> + }
> + sd_dp->defer_t = SDEB_DEFER_HRT;
> + /* schedule the invocation of scsi_done() for a later time */
> + hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
> }
> if (sdebug_statistics)
> sd_dp->issuing_cpu = raw_smp_processor_id();
> - sd_dp->defer_t = SDEB_DEFER_HRT;
> - /* schedule the invocation of scsi_done() for a later time */
> - hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
> } else { /* jdelay < 0, use work queue */
> - if (!sd_dp->init_wq) {
> - sd_dp->init_wq = true;
> - sqcp->sd_dp = sd_dp;
> - sd_dp->sqa_idx = sqp - sdebug_q_arr;
> - sd_dp->qc_idx = k;
> - INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
> - }
> - if (sdebug_statistics)
> - sd_dp->issuing_cpu = raw_smp_processor_id();
> - sd_dp->defer_t = SDEB_DEFER_WQ;
> if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
> atomic_read(&sdeb_inject_pending)))
> sd_dp->aborted = true;
> - schedule_work(&sd_dp->ew.work);
> - if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
> - atomic_read(&sdeb_inject_pending))) {
> + sd_dp->cmpl_ts = ns_to_ktime(ns_from_boot);
> + if (cmnd->request->cmd_flags & REQ_HIPRI) {
> + spin_lock_irqsave(&sqp->qc_lock, iflags);
> + if (!sd_dp->init_poll) {
> + sd_dp->init_poll = true;
> + sqcp->sd_dp = sd_dp;
> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
> + sd_dp->qc_idx = k;
> + }
> + sd_dp->defer_t = SDEB_DEFER_POLL;
> + spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> + } else {
> + if (!sd_dp->init_wq) {
> + sd_dp->init_wq = true;
> + sqcp->sd_dp = sd_dp;
> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
> + sd_dp->qc_idx = k;
> + INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
> + }
> + sd_dp->defer_t = SDEB_DEFER_WQ;
> + schedule_work(&sd_dp->ew.work);
> + }
> + if (sdebug_statistics)
> + sd_dp->issuing_cpu = raw_smp_processor_id();
> + if (unlikely(sd_dp->aborted)) {
> sdev_printk(KERN_INFO, sdp, "abort request tag %d\n", cmnd->request->tag);
> blk_abort_request(cmnd->request);
> atomic_set(&sdeb_inject_pending, 0);
> + sd_dp->aborted = false;
> }
> }
> if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) && scsi_result == device_qfull_result))
> @@ -5779,11 +5808,12 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
> dix_reads, dix_writes, dif_errors);
> seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
> sdebug_statistics);
> - seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
> + seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d, mq_polls=%d\n",
> atomic_read(&sdebug_cmnd_count),
> atomic_read(&sdebug_completions),
> "miss_cpus", atomic_read(&sdebug_miss_cpus),
> - atomic_read(&sdebug_a_tsf));
> + atomic_read(&sdebug_a_tsf),
> + atomic_read(&sdeb_mq_poll_count));
>
> seq_printf(m, "submit_queues=%d\n", submit_queues);
> for (j = 0, sqp = sdebug_q_arr; j < submit_queues; ++j, ++sqp) {
> @@ -7246,52 +7276,68 @@ static int sdebug_map_queues(struct Scsi_Host *shost)
>
> static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
> {
> - int qc_idx;
> - int retiring = 0;
> + bool first;
> + bool retiring = false;
> + int num_entries = 0;
> + unsigned int qc_idx = 0;
> unsigned long iflags;
> + ktime_t kt_from_boot = ktime_get_boottime();
> struct sdebug_queue *sqp;
> struct sdebug_queued_cmd *sqcp;
> struct scsi_cmnd *scp;
> struct sdebug_dev_info *devip;
> - int num_entries = 0;
> + struct sdebug_defer *sd_dp;
>
> sqp = sdebug_q_arr + queue_num;
> + spin_lock_irqsave(&sqp->qc_lock, iflags);
>
> - do {
> - spin_lock_irqsave(&sqp->qc_lock, iflags);
> - qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
> - if (unlikely((qc_idx < 0) || (qc_idx >= sdebug_max_queue)))
> - goto out;
> + for (first = true; first || qc_idx + 1 < sdebug_max_queue; ) {
> + if (first) {
> + qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
> + first = false;
> + } else {
> + qc_idx = find_next_bit(sqp->in_use_bm, sdebug_max_queue, qc_idx + 1);
> + }
> + if (unlikely(qc_idx >= sdebug_max_queue))
> + break;
>
> sqcp = &sqp->qc_arr[qc_idx];
> + sd_dp = sqcp->sd_dp;
> + if (unlikely(!sd_dp))
> + continue;
> scp = sqcp->a_cmnd;
> if (unlikely(scp == NULL)) {
> - pr_err("scp is NULL, queue_num=%d, qc_idx=%d from %s\n",
> + pr_err("scp is NULL, queue_num=%d, qc_idx=%u from %s\n",
> queue_num, qc_idx, __func__);
> - goto out;
> + break;
> }
> + if (sd_dp->defer_t == SDEB_DEFER_POLL) {
> + if (kt_from_boot < sd_dp->cmpl_ts)
> + continue;
> +
> + } else /* ignoring non REQ_HIPRI requests */
> + continue;
> devip = (struct sdebug_dev_info *)scp->device->hostdata;
> if (likely(devip))
> atomic_dec(&devip->num_in_q);
> else
> pr_err("devip=NULL from %s\n", __func__);
> if (unlikely(atomic_read(&retired_max_queue) > 0))
> - retiring = 1;
> + retiring = true;
>
> sqcp->a_cmnd = NULL;
> if (unlikely(!test_and_clear_bit(qc_idx, sqp->in_use_bm))) {
> - pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%d from %s\n",
> + pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%u from %s\n",
> sqp, queue_num, qc_idx, __func__);
> - goto out;
> + break;
> }
> -
> if (unlikely(retiring)) { /* user has reduced max_queue */
> int k, retval;
>
> retval = atomic_read(&retired_max_queue);
> if (qc_idx >= retval) {
> pr_err("index %d too large\n", retval);
> - goto out;
> + break;
> }
> k = find_last_bit(sqp->in_use_bm, retval);
> if ((k < sdebug_max_queue) || (k == retval))
> @@ -7299,17 +7345,18 @@ static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
> else
> atomic_set(&retired_max_queue, k + 1);
> }
> + sd_dp->defer_t = SDEB_DEFER_NONE;
> spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> scp->scsi_done(scp); /* callback to mid level */
> + spin_lock_irqsave(&sqp->qc_lock, iflags);
> num_entries++;
> - } while (1);
> -
> -out:
> + }
> spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> + if (num_entries > 0)
> + atomic_add(num_entries, &sdeb_mq_poll_count);
> return num_entries;
> }
>
> -
> static int scsi_debug_queuecommand(struct Scsi_Host *shost,
> struct scsi_cmnd *scp)
> {
> --
> 2.25.1
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
2021-02-10 3:08 ` Kashyap Desai
@ 2021-02-10 17:31 ` Douglas Gilbert
2021-02-11 5:56 ` Douglas Gilbert
1 sibling, 0 replies; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-10 17:31 UTC (permalink / raw)
To: Kashyap Desai
Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
Hannes Reinecke
On 2021-02-09 10:08 p.m., Kashyap Desai wrote:
> On Wed, Feb 10, 2021 at 4:26 AM Douglas Gilbert <dgilbert@interlog.com> wrote:
>>
>> Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
>> that have REQ_HIPRI set in cmd_flags field. It is expected that
>> these requests will be polled via the mq_poll entry point which
>> is driven by calls to blk_poll() in the block layer. Therefore
>> timer events are not 'wired up' in the normal fashion.
>>
>> There are still cases with short delays (e.g. < 10 microseconds)
>> where by the time the command response processing occurs, the delay
>> is already exceeded in which case the code calls scsi_done()
>> directly. In such cases there is no window for mq_poll() to be
>> called.
>>
>> Add 'mq_polls' counter that increments on each scsi_done() called
>> via the mq_poll entry point. Can be used to show (with 'cat
>> /proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
>> completions rather than some other mechanism.
>>
>>
>> This patch is against the 5.12/scsi-staging branch which includes
>> a98d6bdf181eb71bf4686666eaf47c642a061642
>> scsi_debug : iouring iopoll support
>> which it alters. So this patch is a fix of that patch.
>>
>> Changes since version 2 [sent 20210206 to linux-scsi list]
>> - the sdebug_blk_mq_poll() callback didn't cope with the
>> uncommon case where sqcp->sd_dp is NULL. Fix.
>
> Hi Doug, I tried this patch on top of below iouring patch series -
> "[v3,1/4] add io_uring with IOPOLL support in scsi layer"
>
> After applying patch, I am seeing an IO hang issue - I see
> io_wqe_worker goes into a tight loop.
>
> 18210 root 20 0 1316348 3552 2376 R 99.1 0.0 0:24.09
> fio 18303 root 20 0
> 0 0 0 D 78.8 0.0 0:01.75 io_wqe_worker-0
> 18219 root 20 0 0 0 0 R 71.7 0.0
> 0:06.59 io_wqe_worker-0
>
>
> I used below command -
> insmod drivers/scsi/scsi_debug.ko dev_size_mb=1024 sector_size=512
> add_host=24 per_host_store=1 ndelay=10000 host_max_queue=32
> submit_queues=76 num_parts=1 poll_queues=8
>
> and here is my fio script -
> [global]
> ioengine=io_uring
> hipri=1
> direct=1
> runtime=30s
> rw=randread
> norandommap
> bs=4k
> iodepth=64
>
> [seqprecon]
> filename=/dev/sdd
> [seqprecon]
> filename=/dev/sde
> [seqprecon]
> filename=/dev/sdf
>
>
> I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
> with IOPOLL support in the scsi layer" except scsi_debug. We can work
> on scsi_debug and send standalone patch.
Hi Kashyap,
Yes, you are correct, that script certainly fails when sdd, sde
and sdf are scsi_debug devices.
On closer examination, IMO there is a fundamental misunderstanding about
iopoll on one of our parts. I am assuming that all these things are
closely coupled:
1) a request with REQ_HIPRI set in its cmd_flags
2) the associated LLD receiving mq_poll callbacks
3) _something_ in the upper layers calls blk_poll()
And it is point 3) that is missing in the case of that fio script shown
above. To be able to use sd devices, I believe the sd driver needs to
be altered to issue blk_poll() calls when it sees that it is waiting
for completion of request/commands that have REQ_HIPRI set. And that
is not happening when I run the above fio script.
My scsi_debug patch counts the number of times mq_poll returns a value
> 0 (i.e. it completed a request) and this is what I see:
# cat /proc/scsi/scsi_debug/3
scsi_debug adapter driver, version 0190 [20200710]
num_tgts=1, shared (ram) size=1024 MB, opts=0x0, every_nth=0
delay=-9999, ndelay=10000, max_luns=1, sector_size=4096 bytes
cylinders=128, heads=64, sectors=32, command aborts=98
RESETs: device=5, target=5, bus=5, host=2
dix_reads=0, dix_writes=0, dif_errors=0
usec_in_jiffy=4000, statistics=0
cmnd_count=0, completions=0, miss_cpus=0, a_tsf=0, mq_polls=0
submit_queues=4
queue 0:
queue 1:
queue 2:
queue 3:
this host_no=3
....
The fact that 'mq_poll=0' tells me that blk_poll() was never called.
Assuming the above fio script works when sdd, sde snd sdf are MR
devices suggests to me your LLD treats mq_poll callbacks as optional.
The way I have coded the sg an scsi_debug drivers is that when REQ_HIPRI
is given, then blk_poll() calls are _essential_ and said request will
not complete without them.
This code is from drivers/nvme/host/core.c :
static void nvme_execute_rq_polled(struct request_queue *q,
struct gendisk *bd_disk, struct request *rq, int at_head)
{
DECLARE_COMPLETION_ONSTACK(wait);
WARN_ON_ONCE(!test_bit(QUEUE_FLAG_POLL, &q->queue_flags));
rq->cmd_flags |= REQ_HIPRI;
rq->end_io_data = &wait;
blk_execute_rq_nowait(q, bd_disk, rq, at_head, nvme_end_sync_rq);
while (!completion_done(&wait)) {
blk_poll(q, request_to_qc_t(rq->mq_hctx, rq), true);
cond_resched();
}
}
For requests with REQ_HIPRI set, the sg driver (with blk_poll patch) does
an async version of the synchronous procedure shown in that nvme function.
Doug Gilbert
>> Changes since version 1 [sent 20210201 to linux-scsi list]
>> - harden SDEB_DEFER_POLL which broke under testing
>> - add mq_polls counter for debug output
>>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>> drivers/scsi/scsi_debug.c | 141 +++++++++++++++++++++++++-------------
>> 1 file changed, 94 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
>> index 746eec521f79..b9ec3a47fbf1 100644
>> --- a/drivers/scsi/scsi_debug.c
>> +++ b/drivers/scsi/scsi_debug.c
>> @@ -322,17 +322,19 @@ struct sdeb_store_info {
>> container_of(d, struct sdebug_host_info, dev)
>>
>> enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
>> - SDEB_DEFER_WQ = 2};
>> + SDEB_DEFER_WQ = 2, SDEB_DEFER_POLL = 3};
>>
>> struct sdebug_defer {
>> struct hrtimer hrt;
>> struct execute_work ew;
>> + ktime_t cmpl_ts;/* time since boot to complete this cmd */
>> int sqa_idx; /* index of sdebug_queue array */
>> int qc_idx; /* index of sdebug_queued_cmd array within sqa_idx */
>> int hc_idx; /* hostwide tag index */
>> int issuing_cpu;
>> bool init_hrt;
>> bool init_wq;
>> + bool init_poll;
>> bool aborted; /* true when blk_abort_request() already called */
>> enum sdeb_defer_type defer_t;
>> };
>> @@ -357,6 +359,7 @@ static atomic_t sdebug_completions; /* count of deferred completions */
>> static atomic_t sdebug_miss_cpus; /* submission + completion cpus differ */
>> static atomic_t sdebug_a_tsf; /* 'almost task set full' counter */
>> static atomic_t sdeb_inject_pending;
>> +static atomic_t sdeb_mq_poll_count; /* bumped when mq_poll returns > 0 */
>>
>> struct opcode_info_t {
>> u8 num_attached; /* 0 if this is it (i.e. a leaf); use 0xff */
>> @@ -4730,7 +4733,6 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
>> struct scsi_cmnd *scp;
>> struct sdebug_dev_info *devip;
>>
>> - sd_dp->defer_t = SDEB_DEFER_NONE;
>> if (unlikely(aborted))
>> sd_dp->aborted = false;
>> qc_idx = sd_dp->qc_idx;
>> @@ -4745,6 +4747,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
>> return;
>> }
>> spin_lock_irqsave(&sqp->qc_lock, iflags);
>> + sd_dp->defer_t = SDEB_DEFER_NONE;
>> sqcp = &sqp->qc_arr[qc_idx];
>> scp = sqcp->a_cmnd;
>> if (unlikely(scp == NULL)) {
>> @@ -5517,40 +5520,66 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
>> kt -= d;
>> }
>> }
>> - if (!sd_dp->init_hrt) {
>> - sd_dp->init_hrt = true;
>> - sqcp->sd_dp = sd_dp;
>> - hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
>> - HRTIMER_MODE_REL_PINNED);
>> - sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
>> - sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> - sd_dp->qc_idx = k;
>> + sd_dp->cmpl_ts = ktime_add(ns_to_ktime(ns_from_boot), kt);
>> + if (cmnd->request->cmd_flags & REQ_HIPRI) {
>> + spin_lock_irqsave(&sqp->qc_lock, iflags);
>> + if (!sd_dp->init_poll) {
>> + sd_dp->init_poll = true;
>> + sqcp->sd_dp = sd_dp;
>> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> + sd_dp->qc_idx = k;
>> + }
>> + sd_dp->defer_t = SDEB_DEFER_POLL;
>> + spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> + } else {
>> + if (!sd_dp->init_hrt) {
>> + sd_dp->init_hrt = true;
>> + sqcp->sd_dp = sd_dp;
>> + hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
>> + HRTIMER_MODE_REL_PINNED);
>> + sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
>> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> + sd_dp->qc_idx = k;
>> + }
>> + sd_dp->defer_t = SDEB_DEFER_HRT;
>> + /* schedule the invocation of scsi_done() for a later time */
>> + hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
>> }
>> if (sdebug_statistics)
>> sd_dp->issuing_cpu = raw_smp_processor_id();
>> - sd_dp->defer_t = SDEB_DEFER_HRT;
>> - /* schedule the invocation of scsi_done() for a later time */
>> - hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
>> } else { /* jdelay < 0, use work queue */
>> - if (!sd_dp->init_wq) {
>> - sd_dp->init_wq = true;
>> - sqcp->sd_dp = sd_dp;
>> - sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> - sd_dp->qc_idx = k;
>> - INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
>> - }
>> - if (sdebug_statistics)
>> - sd_dp->issuing_cpu = raw_smp_processor_id();
>> - sd_dp->defer_t = SDEB_DEFER_WQ;
>> if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
>> atomic_read(&sdeb_inject_pending)))
>> sd_dp->aborted = true;
>> - schedule_work(&sd_dp->ew.work);
>> - if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
>> - atomic_read(&sdeb_inject_pending))) {
>> + sd_dp->cmpl_ts = ns_to_ktime(ns_from_boot);
>> + if (cmnd->request->cmd_flags & REQ_HIPRI) {
>> + spin_lock_irqsave(&sqp->qc_lock, iflags);
>> + if (!sd_dp->init_poll) {
>> + sd_dp->init_poll = true;
>> + sqcp->sd_dp = sd_dp;
>> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> + sd_dp->qc_idx = k;
>> + }
>> + sd_dp->defer_t = SDEB_DEFER_POLL;
>> + spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> + } else {
>> + if (!sd_dp->init_wq) {
>> + sd_dp->init_wq = true;
>> + sqcp->sd_dp = sd_dp;
>> + sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> + sd_dp->qc_idx = k;
>> + INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
>> + }
>> + sd_dp->defer_t = SDEB_DEFER_WQ;
>> + schedule_work(&sd_dp->ew.work);
>> + }
>> + if (sdebug_statistics)
>> + sd_dp->issuing_cpu = raw_smp_processor_id();
>> + if (unlikely(sd_dp->aborted)) {
>> sdev_printk(KERN_INFO, sdp, "abort request tag %d\n", cmnd->request->tag);
>> blk_abort_request(cmnd->request);
>> atomic_set(&sdeb_inject_pending, 0);
>> + sd_dp->aborted = false;
>> }
>> }
>> if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) && scsi_result == device_qfull_result))
>> @@ -5779,11 +5808,12 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
>> dix_reads, dix_writes, dif_errors);
>> seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
>> sdebug_statistics);
>> - seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
>> + seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d, mq_polls=%d\n",
>> atomic_read(&sdebug_cmnd_count),
>> atomic_read(&sdebug_completions),
>> "miss_cpus", atomic_read(&sdebug_miss_cpus),
>> - atomic_read(&sdebug_a_tsf));
>> + atomic_read(&sdebug_a_tsf),
>> + atomic_read(&sdeb_mq_poll_count));
>>
>> seq_printf(m, "submit_queues=%d\n", submit_queues);
>> for (j = 0, sqp = sdebug_q_arr; j < submit_queues; ++j, ++sqp) {
>> @@ -7246,52 +7276,68 @@ static int sdebug_map_queues(struct Scsi_Host *shost)
>>
>> static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
>> {
>> - int qc_idx;
>> - int retiring = 0;
>> + bool first;
>> + bool retiring = false;
>> + int num_entries = 0;
>> + unsigned int qc_idx = 0;
>> unsigned long iflags;
>> + ktime_t kt_from_boot = ktime_get_boottime();
>> struct sdebug_queue *sqp;
>> struct sdebug_queued_cmd *sqcp;
>> struct scsi_cmnd *scp;
>> struct sdebug_dev_info *devip;
>> - int num_entries = 0;
>> + struct sdebug_defer *sd_dp;
>>
>> sqp = sdebug_q_arr + queue_num;
>> + spin_lock_irqsave(&sqp->qc_lock, iflags);
>>
>> - do {
>> - spin_lock_irqsave(&sqp->qc_lock, iflags);
>> - qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
>> - if (unlikely((qc_idx < 0) || (qc_idx >= sdebug_max_queue)))
>> - goto out;
>> + for (first = true; first || qc_idx + 1 < sdebug_max_queue; ) {
>> + if (first) {
>> + qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
>> + first = false;
>> + } else {
>> + qc_idx = find_next_bit(sqp->in_use_bm, sdebug_max_queue, qc_idx + 1);
>> + }
>> + if (unlikely(qc_idx >= sdebug_max_queue))
>> + break;
>>
>> sqcp = &sqp->qc_arr[qc_idx];
>> + sd_dp = sqcp->sd_dp;
>> + if (unlikely(!sd_dp))
>> + continue;
>> scp = sqcp->a_cmnd;
>> if (unlikely(scp == NULL)) {
>> - pr_err("scp is NULL, queue_num=%d, qc_idx=%d from %s\n",
>> + pr_err("scp is NULL, queue_num=%d, qc_idx=%u from %s\n",
>> queue_num, qc_idx, __func__);
>> - goto out;
>> + break;
>> }
>> + if (sd_dp->defer_t == SDEB_DEFER_POLL) {
>> + if (kt_from_boot < sd_dp->cmpl_ts)
>> + continue;
>> +
>> + } else /* ignoring non REQ_HIPRI requests */
>> + continue;
>> devip = (struct sdebug_dev_info *)scp->device->hostdata;
>> if (likely(devip))
>> atomic_dec(&devip->num_in_q);
>> else
>> pr_err("devip=NULL from %s\n", __func__);
>> if (unlikely(atomic_read(&retired_max_queue) > 0))
>> - retiring = 1;
>> + retiring = true;
>>
>> sqcp->a_cmnd = NULL;
>> if (unlikely(!test_and_clear_bit(qc_idx, sqp->in_use_bm))) {
>> - pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%d from %s\n",
>> + pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%u from %s\n",
>> sqp, queue_num, qc_idx, __func__);
>> - goto out;
>> + break;
>> }
>> -
>> if (unlikely(retiring)) { /* user has reduced max_queue */
>> int k, retval;
>>
>> retval = atomic_read(&retired_max_queue);
>> if (qc_idx >= retval) {
>> pr_err("index %d too large\n", retval);
>> - goto out;
>> + break;
>> }
>> k = find_last_bit(sqp->in_use_bm, retval);
>> if ((k < sdebug_max_queue) || (k == retval))
>> @@ -7299,17 +7345,18 @@ static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
>> else
>> atomic_set(&retired_max_queue, k + 1);
>> }
>> + sd_dp->defer_t = SDEB_DEFER_NONE;
>> spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> scp->scsi_done(scp); /* callback to mid level */
>> + spin_lock_irqsave(&sqp->qc_lock, iflags);
>> num_entries++;
>> - } while (1);
>> -
>> -out:
>> + }
>> spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> + if (num_entries > 0)
>> + atomic_add(num_entries, &sdeb_mq_poll_count);
>> return num_entries;
>> }
>>
>> -
>> static int scsi_debug_queuecommand(struct Scsi_Host *shost,
>> struct scsi_cmnd *scp)
>> {
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
2021-02-10 3:08 ` Kashyap Desai
2021-02-10 17:31 ` Douglas Gilbert
@ 2021-02-11 5:56 ` Douglas Gilbert
2021-02-11 12:13 ` Kashyap Desai
1 sibling, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-11 5:56 UTC (permalink / raw)
To: Kashyap Desai
Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
Hannes Reinecke
On 2021-02-09 10:08 p.m., Kashyap Desai wrote:
> On Wed, Feb 10, 2021 at 4:26 AM Douglas Gilbert <dgilbert@interlog.com> wrote:
>>
>> Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
>> that have REQ_HIPRI set in cmd_flags field. It is expected that
>> these requests will be polled via the mq_poll entry point which
>> is driven by calls to blk_poll() in the block layer. Therefore
>> timer events are not 'wired up' in the normal fashion.
>>
>> There are still cases with short delays (e.g. < 10 microseconds)
>> where by the time the command response processing occurs, the delay
>> is already exceeded in which case the code calls scsi_done()
>> directly. In such cases there is no window for mq_poll() to be
>> called.
>>
>> Add 'mq_polls' counter that increments on each scsi_done() called
>> via the mq_poll entry point. Can be used to show (with 'cat
>> /proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
>> completions rather than some other mechanism.
>>
>>
>> This patch is against the 5.12/scsi-staging branch which includes
>> a98d6bdf181eb71bf4686666eaf47c642a061642
>> scsi_debug : iouring iopoll support
>> which it alters. So this patch is a fix of that patch.
>>
>> Changes since version 2 [sent 20210206 to linux-scsi list]
>> - the sdebug_blk_mq_poll() callback didn't cope with the
>> uncommon case where sqcp->sd_dp is NULL. Fix.
>
> Hi Doug, I tried this patch on top of below iouring patch series -
> "[v3,1/4] add io_uring with IOPOLL support in scsi layer"
>
> After applying patch, I am seeing an IO hang issue - I see
> io_wqe_worker goes into a tight loop.
>
> 18210 root 20 0 1316348 3552 2376 R 99.1 0.0 0:24.09
> fio 18303 root 20 0
> 0 0 0 D 78.8 0.0 0:01.75 io_wqe_worker-0
> 18219 root 20 0 0 0 0 R 71.7 0.0
> 0:06.59 io_wqe_worker-0
>
>
> I used below command -
> insmod drivers/scsi/scsi_debug.ko dev_size_mb=1024 sector_size=512
> add_host=24 per_host_store=1 ndelay=10000 host_max_queue=32
> submit_queues=76 num_parts=1 poll_queues=8
>
> and here is my fio script -
> [global]
> ioengine=io_uring
> hipri=1
> direct=1
> runtime=30s
> rw=randread
> norandommap
> bs=4k
> iodepth=64
Kashyap,
There is another issue here, namely iodepth > host_max_queue (64 > 32)
and in my setup that is not handled well. So there is a problem with
scsi_debug *** or the mid-level in handling this case.
If you have modified the sd driver to call blk_poll() then perhaps
you could try the above test again with a reduced iodepth.
Doug Gilbert
*** the scsi_debug should either yield a TASK SET FULL status or return
SCSI_MLQUEUE_HOST_BUSY from queuecommand() when it has run out of
slots.
> [seqprecon]
> filename=/dev/sdd
> [seqprecon]
> filename=/dev/sde
> [seqprecon]
> filename=/dev/sdf
>
>
> I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
> with IOPOLL support in the scsi layer" except scsi_debug. We can work
> on scsi_debug and send standalone patch.
>
> Kashyap
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v3] scsi_debug: add new defer_type for mq_poll
2021-02-11 5:56 ` Douglas Gilbert
@ 2021-02-11 12:13 ` Kashyap Desai
2021-02-11 16:11 ` Douglas Gilbert
0 siblings, 1 reply; 7+ messages in thread
From: Kashyap Desai @ 2021-02-11 12:13 UTC (permalink / raw)
To: dgilbert
Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
Hannes Reinecke
[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]
> Kashyap,
> There is another issue here, namely iodepth > host_max_queue (64 > 32) and
> in my setup that is not handled well. So there is a problem with
> scsi_debug
> *** or the mid-level in handling this case.
>
> If you have modified the sd driver to call blk_poll() then perhaps you
> could try
> the above test again with a reduced iodepth.
Doug -
I debug the issue and found the fix. We need to remove below code to handle
this new defer type for mq poll.
Earlier I did not used " REQ_HIPRI" to handle mq poll, but now you are
managing it through REQ_HIPRI check, it is safe and required to have below
patch.
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index b9ec3a47fbf1..c50de49a2c2f 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -5437,13 +5437,6 @@ static int schedule_resp(struct scsi_cmnd *cmnd,
struct sdebug_dev_info *devip,
sd_dp = sqcp->sd_dp;
spin_unlock_irqrestore(&sqp->qc_lock, iflags);
- /* Do not complete IO from default completion path.
- * Let it to be on queue.
- * Completion should happen from mq_poll interface.
- */
- if ((sqp - sdebug_q_arr) >= (submit_queues - poll_queues))
- return 0;
-
if (!sd_dp) {
sd_dp = kzalloc(sizeof(*sd_dp), GFP_ATOMIC);
if (!sd_dp) {
On top of your v3 patch + above fix, I am able to run IO and there are no
issues. There is no issue with higher QD as well, so we are good to go.
Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>
So how shall we progress now ? Shall I ask Martin to pick all the patches
from " [PATCH v3 0/4] io_uring iopoll in scsi layer" since this patch has
dependency on my series.
Kashyap
>
> Doug Gilbert
>
> *** the scsi_debug should either yield a TASK SET FULL status or return
> SCSI_MLQUEUE_HOST_BUSY from queuecommand() when it has run out of
> slots.
>
> > [seqprecon]
> > filename=/dev/sdd
> > [seqprecon]
> > filename=/dev/sde
> > [seqprecon]
> > filename=/dev/sdf
> >
> >
> > I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
> > with IOPOLL support in the scsi layer" except scsi_debug. We can work
> > on scsi_debug and send standalone patch.
> >
> > Kashyap
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
2021-02-11 12:13 ` Kashyap Desai
@ 2021-02-11 16:11 ` Douglas Gilbert
2021-02-12 11:11 ` Kashyap Desai
0 siblings, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-11 16:11 UTC (permalink / raw)
To: Kashyap Desai
Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
Hannes Reinecke
On 2021-02-11 7:13 a.m., Kashyap Desai wrote:
>> Kashyap,
>> There is another issue here, namely iodepth > host_max_queue (64 > 32) and
>> in my setup that is not handled well. So there is a problem with
>> scsi_debug
>> *** or the mid-level in handling this case.
>>
>> If you have modified the sd driver to call blk_poll() then perhaps you
>> could try
>> the above test again with a reduced iodepth.
>
> Doug -
>
> I debug the issue and found the fix. We need to remove below code to handle
> this new defer type for mq poll.
> Earlier I did not used " REQ_HIPRI" to handle mq poll, but now you are
> managing it through REQ_HIPRI check, it is safe and required to have below
> patch.
>
>
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index b9ec3a47fbf1..c50de49a2c2f 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -5437,13 +5437,6 @@ static int schedule_resp(struct scsi_cmnd *cmnd,
> struct sdebug_dev_info *devip,
> sd_dp = sqcp->sd_dp;
> spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>
> - /* Do not complete IO from default completion path.
> - * Let it to be on queue.
> - * Completion should happen from mq_poll interface.
> - */
> - if ((sqp - sdebug_q_arr) >= (submit_queues - poll_queues))
> - return 0;
> -
> if (!sd_dp) {
> sd_dp = kzalloc(sizeof(*sd_dp), GFP_ATOMIC);
> if (!sd_dp) {
>
>
> On top of your v3 patch + above fix, I am able to run IO and there are no
> issues. There is no issue with higher QD as well, so we are good to go.
>
> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>
>
> So how shall we progress now ? Shall I ask Martin to pick all the patches
> from " [PATCH v3 0/4] io_uring iopoll in scsi layer" since this patch has
> dependency on my series.
Kashyap,
Good catch!
You could roll all the scsi_debug stuff into one patch and put both
our sign-offs on it, or present it as two patches:
- first one: your original plus this fix
- second one: mine with your tested-by on it
Since this is new functionality for the scsi sub-system, it is unlikely
that all the corner cases have been visited. However, with what we have
in place for the next cycle, others have something reasonably solid to
build on.
Doug Gilbert
>> *** the scsi_debug should either yield a TASK SET FULL status or return
>> SCSI_MLQUEUE_HOST_BUSY from queuecommand() when it has run out of
>> slots.
>>
>>> [seqprecon]
>>> filename=/dev/sdd
>>> [seqprecon]
>>> filename=/dev/sde
>>> [seqprecon]
>>> filename=/dev/sdf
>>>
>>>
>>> I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
>>> with IOPOLL support in the scsi layer" except scsi_debug. We can work
>>> on scsi_debug and send standalone patch.
>>>
>>> Kashyap
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v3] scsi_debug: add new defer_type for mq_poll
2021-02-11 16:11 ` Douglas Gilbert
@ 2021-02-12 11:11 ` Kashyap Desai
0 siblings, 0 replies; 7+ messages in thread
From: Kashyap Desai @ 2021-02-12 11:11 UTC (permalink / raw)
To: dgilbert
Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
Hannes Reinecke
[-- Attachment #1: Type: text/plain, Size: 859 bytes --]
> > So how shall we progress now ? Shall I ask Martin to pick all the
> > patches from " [PATCH v3 0/4] io_uring iopoll in scsi layer" since
> > this patch has dependency on my series.
>
> Kashyap,
> Good catch!
>
> You could roll all the scsi_debug stuff into one patch and put both our
> sign-
> offs on it, or present it as two patches:
> - first one: your original plus this fix
> - second one: mine with your tested-by on it
Doug - I will include your patch "scsi_debug: add new defer_type for
mq_poll" (including the recent fix) as separate patch in my series and
provide Tested-by tag for the same.
>
> Since this is new functionality for the scsi sub-system, it is unlikely
> that all the
> corner cases have been visited. However, with what we have in place for
> the
> next cycle, others have something reasonably solid to build on.
Agree.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-02-12 11:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-09 22:56 [PATCH v3] scsi_debug: add new defer_type for mq_poll Douglas Gilbert
2021-02-10 3:08 ` Kashyap Desai
2021-02-10 17:31 ` Douglas Gilbert
2021-02-11 5:56 ` Douglas Gilbert
2021-02-11 12:13 ` Kashyap Desai
2021-02-11 16:11 ` Douglas Gilbert
2021-02-12 11:11 ` Kashyap Desai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox