[PATCH v3] scsi_debug: add new defer_type for mq

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] scsi_debug: add new defer_type for mq_poll
@ 2021-02-09 22:56 Douglas Gilbert
  2021-02-10  3:08 ` Kashyap Desai
  0 siblings, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-09 22:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, kashyap.desai

Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
that have REQ_HIPRI set in cmd_flags field. It is expected that
these requests will be polled via the mq_poll entry point which
is driven by calls to blk_poll() in the block layer. Therefore
timer events are not 'wired up' in the normal fashion.

There are still cases with short delays (e.g. < 10 microseconds)
where by the time the command response processing occurs, the delay
is already exceeded in which case the code calls scsi_done()
directly. In such cases there is no window for mq_poll() to be
called.

Add 'mq_polls' counter that increments on each scsi_done() called
via the mq_poll entry point. Can be used to show (with 'cat
/proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
completions rather than some other mechanism.


This patch is against the 5.12/scsi-staging branch which includes
   a98d6bdf181eb71bf4686666eaf47c642a061642
   scsi_debug : iouring iopoll support
which it alters. So this patch is a fix of that patch.

Changes since version 2 [sent 20210206 to linux-scsi list]
  - the sdebug_blk_mq_poll() callback didn't cope with the
    uncommon case where sqcp->sd_dp is NULL. Fix.

Changes since version 1 [sent 20210201 to linux-scsi list]
  - harden SDEB_DEFER_POLL which broke under testing
  - add mq_polls counter for debug output

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi_debug.c | 141 +++++++++++++++++++++++++-------------
 1 file changed, 94 insertions(+), 47 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 746eec521f79..b9ec3a47fbf1 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -322,17 +322,19 @@ struct sdeb_store_info {
 	container_of(d, struct sdebug_host_info, dev)
 
 enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
-		      SDEB_DEFER_WQ = 2};
+		      SDEB_DEFER_WQ = 2, SDEB_DEFER_POLL = 3};
 
 struct sdebug_defer {
 	struct hrtimer hrt;
 	struct execute_work ew;
+	ktime_t cmpl_ts;/* time since boot to complete this cmd */
 	int sqa_idx;	/* index of sdebug_queue array */
 	int qc_idx;	/* index of sdebug_queued_cmd array within sqa_idx */
 	int hc_idx;	/* hostwide tag index */
 	int issuing_cpu;
 	bool init_hrt;
 	bool init_wq;
+	bool init_poll;
 	bool aborted;	/* true when blk_abort_request() already called */
 	enum sdeb_defer_type defer_t;
 };
@@ -357,6 +359,7 @@ static atomic_t sdebug_completions;  /* count of deferred completions */
 static atomic_t sdebug_miss_cpus;    /* submission + completion cpus differ */
 static atomic_t sdebug_a_tsf;	     /* 'almost task set full' counter */
 static atomic_t sdeb_inject_pending;
+static atomic_t sdeb_mq_poll_count;  /* bumped when mq_poll returns > 0 */
 
 struct opcode_info_t {
 	u8 num_attached;	/* 0 if this is it (i.e. a leaf); use 0xff */
@@ -4730,7 +4733,6 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
 	struct scsi_cmnd *scp;
 	struct sdebug_dev_info *devip;
 
-	sd_dp->defer_t = SDEB_DEFER_NONE;
 	if (unlikely(aborted))
 		sd_dp->aborted = false;
 	qc_idx = sd_dp->qc_idx;
@@ -4745,6 +4747,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
 		return;
 	}
 	spin_lock_irqsave(&sqp->qc_lock, iflags);
+	sd_dp->defer_t = SDEB_DEFER_NONE;
 	sqcp = &sqp->qc_arr[qc_idx];
 	scp = sqcp->a_cmnd;
 	if (unlikely(scp == NULL)) {
@@ -5517,40 +5520,66 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
 				kt -= d;
 			}
 		}
-		if (!sd_dp->init_hrt) {
-			sd_dp->init_hrt = true;
-			sqcp->sd_dp = sd_dp;
-			hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
-				     HRTIMER_MODE_REL_PINNED);
-			sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
-			sd_dp->sqa_idx = sqp - sdebug_q_arr;
-			sd_dp->qc_idx = k;
+		sd_dp->cmpl_ts = ktime_add(ns_to_ktime(ns_from_boot), kt);
+		if (cmnd->request->cmd_flags & REQ_HIPRI) {
+			spin_lock_irqsave(&sqp->qc_lock, iflags);
+			if (!sd_dp->init_poll) {
+				sd_dp->init_poll = true;
+				sqcp->sd_dp = sd_dp;
+				sd_dp->sqa_idx = sqp - sdebug_q_arr;
+				sd_dp->qc_idx = k;
+			}
+			sd_dp->defer_t = SDEB_DEFER_POLL;
+			spin_unlock_irqrestore(&sqp->qc_lock, iflags);
+		} else {
+			if (!sd_dp->init_hrt) {
+				sd_dp->init_hrt = true;
+				sqcp->sd_dp = sd_dp;
+				hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
+					     HRTIMER_MODE_REL_PINNED);
+				sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
+				sd_dp->sqa_idx = sqp - sdebug_q_arr;
+				sd_dp->qc_idx = k;
+			}
+			sd_dp->defer_t = SDEB_DEFER_HRT;
+			/* schedule the invocation of scsi_done() for a later time */
+			hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
 		}
 		if (sdebug_statistics)
 			sd_dp->issuing_cpu = raw_smp_processor_id();
-		sd_dp->defer_t = SDEB_DEFER_HRT;
-		/* schedule the invocation of scsi_done() for a later time */
-		hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
 	} else {	/* jdelay < 0, use work queue */
-		if (!sd_dp->init_wq) {
-			sd_dp->init_wq = true;
-			sqcp->sd_dp = sd_dp;
-			sd_dp->sqa_idx = sqp - sdebug_q_arr;
-			sd_dp->qc_idx = k;
-			INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
-		}
-		if (sdebug_statistics)
-			sd_dp->issuing_cpu = raw_smp_processor_id();
-		sd_dp->defer_t = SDEB_DEFER_WQ;
 		if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
 			     atomic_read(&sdeb_inject_pending)))
 			sd_dp->aborted = true;
-		schedule_work(&sd_dp->ew.work);
-		if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
-			     atomic_read(&sdeb_inject_pending))) {
+		sd_dp->cmpl_ts = ns_to_ktime(ns_from_boot);
+		if (cmnd->request->cmd_flags & REQ_HIPRI) {
+			spin_lock_irqsave(&sqp->qc_lock, iflags);
+			if (!sd_dp->init_poll) {
+				sd_dp->init_poll = true;
+				sqcp->sd_dp = sd_dp;
+				sd_dp->sqa_idx = sqp - sdebug_q_arr;
+				sd_dp->qc_idx = k;
+			}
+			sd_dp->defer_t = SDEB_DEFER_POLL;
+			spin_unlock_irqrestore(&sqp->qc_lock, iflags);
+		} else {
+			if (!sd_dp->init_wq) {
+				sd_dp->init_wq = true;
+				sqcp->sd_dp = sd_dp;
+				sd_dp->sqa_idx = sqp - sdebug_q_arr;
+				sd_dp->qc_idx = k;
+				INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
+			}
+			sd_dp->defer_t = SDEB_DEFER_WQ;
+			schedule_work(&sd_dp->ew.work);
+		}
+		if (sdebug_statistics)
+			sd_dp->issuing_cpu = raw_smp_processor_id();
+		if (unlikely(sd_dp->aborted)) {
 			sdev_printk(KERN_INFO, sdp, "abort request tag %d\n", cmnd->request->tag);
 			blk_abort_request(cmnd->request);
 			atomic_set(&sdeb_inject_pending, 0);
+			sd_dp->aborted = false;
 		}
 	}
 	if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) && scsi_result == device_qfull_result))
@@ -5779,11 +5808,12 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
 		   dix_reads, dix_writes, dif_errors);
 	seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
 		   sdebug_statistics);
-	seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
+	seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d, mq_polls=%d\n",
 		   atomic_read(&sdebug_cmnd_count),
 		   atomic_read(&sdebug_completions),
 		   "miss_cpus", atomic_read(&sdebug_miss_cpus),
-		   atomic_read(&sdebug_a_tsf));
+		   atomic_read(&sdebug_a_tsf),
+		   atomic_read(&sdeb_mq_poll_count));
 
 	seq_printf(m, "submit_queues=%d\n", submit_queues);
 	for (j = 0, sqp = sdebug_q_arr; j < submit_queues; ++j, ++sqp) {
@@ -7246,52 +7276,68 @@ static int sdebug_map_queues(struct Scsi_Host *shost)
 
 static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
 {
-	int qc_idx;
-	int retiring = 0;
+	bool first;
+	bool retiring = false;
+	int num_entries = 0;
+	unsigned int qc_idx = 0;
 	unsigned long iflags;
+	ktime_t kt_from_boot = ktime_get_boottime();
 	struct sdebug_queue *sqp;
 	struct sdebug_queued_cmd *sqcp;
 	struct scsi_cmnd *scp;
 	struct sdebug_dev_info *devip;
-	int num_entries = 0;
+	struct sdebug_defer *sd_dp;
 
 	sqp = sdebug_q_arr + queue_num;
+	spin_lock_irqsave(&sqp->qc_lock, iflags);
 
-	do {
-		spin_lock_irqsave(&sqp->qc_lock, iflags);
-		qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
-		if (unlikely((qc_idx < 0) || (qc_idx >= sdebug_max_queue)))
-			goto out;
+	for (first = true; first || qc_idx + 1 < sdebug_max_queue; )   {
+		if (first) {
+			qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
+			first = false;
+		} else {
+			qc_idx = find_next_bit(sqp->in_use_bm, sdebug_max_queue, qc_idx + 1);
+		}
+		if (unlikely(qc_idx >= sdebug_max_queue))
+			break;
 
 		sqcp = &sqp->qc_arr[qc_idx];
+		sd_dp = sqcp->sd_dp;
+		if (unlikely(!sd_dp))
+			continue;
 		scp = sqcp->a_cmnd;
 		if (unlikely(scp == NULL)) {
-			pr_err("scp is NULL, queue_num=%d, qc_idx=%d from %s\n",
+			pr_err("scp is NULL, queue_num=%d, qc_idx=%u from %s\n",
 			       queue_num, qc_idx, __func__);
-			goto out;
+			break;
 		}
+		if (sd_dp->defer_t == SDEB_DEFER_POLL) {
+			if (kt_from_boot < sd_dp->cmpl_ts)
+				continue;
+
+		} else		/* ignoring non REQ_HIPRI requests */
+			continue;
 		devip = (struct sdebug_dev_info *)scp->device->hostdata;
 		if (likely(devip))
 			atomic_dec(&devip->num_in_q);
 		else
 			pr_err("devip=NULL from %s\n", __func__);
 		if (unlikely(atomic_read(&retired_max_queue) > 0))
-			retiring = 1;
+			retiring = true;
 
 		sqcp->a_cmnd = NULL;
 		if (unlikely(!test_and_clear_bit(qc_idx, sqp->in_use_bm))) {
-			pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%d from %s\n",
+			pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%u from %s\n",
 				sqp, queue_num, qc_idx, __func__);
-			goto out;
+			break;
 		}
-
 		if (unlikely(retiring)) {	/* user has reduced max_queue */
 			int k, retval;
 
 			retval = atomic_read(&retired_max_queue);
 			if (qc_idx >= retval) {
 				pr_err("index %d too large\n", retval);
-				goto out;
+				break;
 			}
 			k = find_last_bit(sqp->in_use_bm, retval);
 			if ((k < sdebug_max_queue) || (k == retval))
@@ -7299,17 +7345,18 @@ static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
 			else
 				atomic_set(&retired_max_queue, k + 1);
 		}
+		sd_dp->defer_t = SDEB_DEFER_NONE;
 		spin_unlock_irqrestore(&sqp->qc_lock, iflags);
 		scp->scsi_done(scp); /* callback to mid level */
+		spin_lock_irqsave(&sqp->qc_lock, iflags);
 		num_entries++;
-	} while (1);
-
-out:
+	}
 	spin_unlock_irqrestore(&sqp->qc_lock, iflags);
+	if (num_entries > 0)
+		atomic_add(num_entries, &sdeb_mq_poll_count);
 	return num_entries;
 }
 
-
 static int scsi_debug_queuecommand(struct Scsi_Host *shost,
 				   struct scsi_cmnd *scp)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
  2021-02-09 22:56 [PATCH v3] scsi_debug: add new defer_type for mq_poll Douglas Gilbert
@ 2021-02-10  3:08 ` Kashyap Desai
  2021-02-10 17:31   ` Douglas Gilbert
  2021-02-11  5:56   ` Douglas Gilbert
  0 siblings, 2 replies; 7+ messages in thread
From: Kashyap Desai @ 2021-02-10  3:08 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 15705 bytes --]

On Wed, Feb 10, 2021 at 4:26 AM Douglas Gilbert <dgilbert@interlog.com> wrote:
>
> Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
> that have REQ_HIPRI set in cmd_flags field. It is expected that
> these requests will be polled via the mq_poll entry point which
> is driven by calls to blk_poll() in the block layer. Therefore
> timer events are not 'wired up' in the normal fashion.
>
> There are still cases with short delays (e.g. < 10 microseconds)
> where by the time the command response processing occurs, the delay
> is already exceeded in which case the code calls scsi_done()
> directly. In such cases there is no window for mq_poll() to be
> called.
>
> Add 'mq_polls' counter that increments on each scsi_done() called
> via the mq_poll entry point. Can be used to show (with 'cat
> /proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
> completions rather than some other mechanism.
>
>
> This patch is against the 5.12/scsi-staging branch which includes
>    a98d6bdf181eb71bf4686666eaf47c642a061642
>    scsi_debug : iouring iopoll support
> which it alters. So this patch is a fix of that patch.
>
> Changes since version 2 [sent 20210206 to linux-scsi list]
>   - the sdebug_blk_mq_poll() callback didn't cope with the
>     uncommon case where sqcp->sd_dp is NULL. Fix.

Hi Doug, I tried this patch on top of below iouring patch series -
"[v3,1/4] add io_uring with IOPOLL support in scsi layer"

After applying patch, I am seeing an IO hang issue - I see
io_wqe_worker goes into a tight loop.

18210 root      20   0 1316348   3552   2376 R  99.1   0.0   0:24.09
fio                                       18303 root      20   0
0      0      0 D  78.8   0.0   0:01.75 io_wqe_worker-0
           18219 root      20   0       0      0      0 R  71.7   0.0
 0:06.59 io_wqe_worker-0


I used below command -
insmod drivers/scsi/scsi_debug.ko dev_size_mb=1024 sector_size=512
add_host=24 per_host_store=1 ndelay=10000 host_max_queue=32
submit_queues=76 num_parts=1 poll_queues=8

and here is my fio script -
[global]
ioengine=io_uring
hipri=1
direct=1
runtime=30s
rw=randread
norandommap
bs=4k
iodepth=64

[seqprecon]
filename=/dev/sdd
[seqprecon]
filename=/dev/sde
[seqprecon]
filename=/dev/sdf


I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
with IOPOLL support in the scsi layer" except scsi_debug. We can work
on scsi_debug and send standalone patch.

Kashyap




>
> Changes since version 1 [sent 20210201 to linux-scsi list]
>   - harden SDEB_DEFER_POLL which broke under testing
>   - add mq_polls counter for debug output
>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>  drivers/scsi/scsi_debug.c | 141 +++++++++++++++++++++++++-------------
>  1 file changed, 94 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index 746eec521f79..b9ec3a47fbf1 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -322,17 +322,19 @@ struct sdeb_store_info {
>         container_of(d, struct sdebug_host_info, dev)
>
>  enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
> -                     SDEB_DEFER_WQ = 2};
> +                     SDEB_DEFER_WQ = 2, SDEB_DEFER_POLL = 3};
>
>  struct sdebug_defer {
>         struct hrtimer hrt;
>         struct execute_work ew;
> +       ktime_t cmpl_ts;/* time since boot to complete this cmd */
>         int sqa_idx;    /* index of sdebug_queue array */
>         int qc_idx;     /* index of sdebug_queued_cmd array within sqa_idx */
>         int hc_idx;     /* hostwide tag index */
>         int issuing_cpu;
>         bool init_hrt;
>         bool init_wq;
> +       bool init_poll;
>         bool aborted;   /* true when blk_abort_request() already called */
>         enum sdeb_defer_type defer_t;
>  };
> @@ -357,6 +359,7 @@ static atomic_t sdebug_completions;  /* count of deferred completions */
>  static atomic_t sdebug_miss_cpus;    /* submission + completion cpus differ */
>  static atomic_t sdebug_a_tsf;       /* 'almost task set full' counter */
>  static atomic_t sdeb_inject_pending;
> +static atomic_t sdeb_mq_poll_count;  /* bumped when mq_poll returns > 0 */
>
>  struct opcode_info_t {
>         u8 num_attached;        /* 0 if this is it (i.e. a leaf); use 0xff */
> @@ -4730,7 +4733,6 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
>         struct scsi_cmnd *scp;
>         struct sdebug_dev_info *devip;
>
> -       sd_dp->defer_t = SDEB_DEFER_NONE;
>         if (unlikely(aborted))
>                 sd_dp->aborted = false;
>         qc_idx = sd_dp->qc_idx;
> @@ -4745,6 +4747,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
>                 return;
>         }
>         spin_lock_irqsave(&sqp->qc_lock, iflags);
> +       sd_dp->defer_t = SDEB_DEFER_NONE;
>         sqcp = &sqp->qc_arr[qc_idx];
>         scp = sqcp->a_cmnd;
>         if (unlikely(scp == NULL)) {
> @@ -5517,40 +5520,66 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
>                                 kt -= d;
>                         }
>                 }
> -               if (!sd_dp->init_hrt) {
> -                       sd_dp->init_hrt = true;
> -                       sqcp->sd_dp = sd_dp;
> -                       hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
> -                                    HRTIMER_MODE_REL_PINNED);
> -                       sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
> -                       sd_dp->sqa_idx = sqp - sdebug_q_arr;
> -                       sd_dp->qc_idx = k;
> +               sd_dp->cmpl_ts = ktime_add(ns_to_ktime(ns_from_boot), kt);
> +               if (cmnd->request->cmd_flags & REQ_HIPRI) {
> +                       spin_lock_irqsave(&sqp->qc_lock, iflags);
> +                       if (!sd_dp->init_poll) {
> +                               sd_dp->init_poll = true;
> +                               sqcp->sd_dp = sd_dp;
> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
> +                               sd_dp->qc_idx = k;
> +                       }
> +                       sd_dp->defer_t = SDEB_DEFER_POLL;
> +                       spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> +               } else {
> +                       if (!sd_dp->init_hrt) {
> +                               sd_dp->init_hrt = true;
> +                               sqcp->sd_dp = sd_dp;
> +                               hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
> +                                            HRTIMER_MODE_REL_PINNED);
> +                               sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
> +                               sd_dp->qc_idx = k;
> +                       }
> +                       sd_dp->defer_t = SDEB_DEFER_HRT;
> +                       /* schedule the invocation of scsi_done() for a later time */
> +                       hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
>                 }
>                 if (sdebug_statistics)
>                         sd_dp->issuing_cpu = raw_smp_processor_id();
> -               sd_dp->defer_t = SDEB_DEFER_HRT;
> -               /* schedule the invocation of scsi_done() for a later time */
> -               hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
>         } else {        /* jdelay < 0, use work queue */
> -               if (!sd_dp->init_wq) {
> -                       sd_dp->init_wq = true;
> -                       sqcp->sd_dp = sd_dp;
> -                       sd_dp->sqa_idx = sqp - sdebug_q_arr;
> -                       sd_dp->qc_idx = k;
> -                       INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
> -               }
> -               if (sdebug_statistics)
> -                       sd_dp->issuing_cpu = raw_smp_processor_id();
> -               sd_dp->defer_t = SDEB_DEFER_WQ;
>                 if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
>                              atomic_read(&sdeb_inject_pending)))
>                         sd_dp->aborted = true;
> -               schedule_work(&sd_dp->ew.work);
> -               if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
> -                            atomic_read(&sdeb_inject_pending))) {
> +               sd_dp->cmpl_ts = ns_to_ktime(ns_from_boot);
> +               if (cmnd->request->cmd_flags & REQ_HIPRI) {
> +                       spin_lock_irqsave(&sqp->qc_lock, iflags);
> +                       if (!sd_dp->init_poll) {
> +                               sd_dp->init_poll = true;
> +                               sqcp->sd_dp = sd_dp;
> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
> +                               sd_dp->qc_idx = k;
> +                       }
> +                       sd_dp->defer_t = SDEB_DEFER_POLL;
> +                       spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> +               } else {
> +                       if (!sd_dp->init_wq) {
> +                               sd_dp->init_wq = true;
> +                               sqcp->sd_dp = sd_dp;
> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
> +                               sd_dp->qc_idx = k;
> +                               INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
> +                       }
> +                       sd_dp->defer_t = SDEB_DEFER_WQ;
> +                       schedule_work(&sd_dp->ew.work);
> +               }
> +               if (sdebug_statistics)
> +                       sd_dp->issuing_cpu = raw_smp_processor_id();
> +               if (unlikely(sd_dp->aborted)) {
>                         sdev_printk(KERN_INFO, sdp, "abort request tag %d\n", cmnd->request->tag);
>                         blk_abort_request(cmnd->request);
>                         atomic_set(&sdeb_inject_pending, 0);
> +                       sd_dp->aborted = false;
>                 }
>         }
>         if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) && scsi_result == device_qfull_result))
> @@ -5779,11 +5808,12 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
>                    dix_reads, dix_writes, dif_errors);
>         seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
>                    sdebug_statistics);
> -       seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
> +       seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d, mq_polls=%d\n",
>                    atomic_read(&sdebug_cmnd_count),
>                    atomic_read(&sdebug_completions),
>                    "miss_cpus", atomic_read(&sdebug_miss_cpus),
> -                  atomic_read(&sdebug_a_tsf));
> +                  atomic_read(&sdebug_a_tsf),
> +                  atomic_read(&sdeb_mq_poll_count));
>
>         seq_printf(m, "submit_queues=%d\n", submit_queues);
>         for (j = 0, sqp = sdebug_q_arr; j < submit_queues; ++j, ++sqp) {
> @@ -7246,52 +7276,68 @@ static int sdebug_map_queues(struct Scsi_Host *shost)
>
>  static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
>  {
> -       int qc_idx;
> -       int retiring = 0;
> +       bool first;
> +       bool retiring = false;
> +       int num_entries = 0;
> +       unsigned int qc_idx = 0;
>         unsigned long iflags;
> +       ktime_t kt_from_boot = ktime_get_boottime();
>         struct sdebug_queue *sqp;
>         struct sdebug_queued_cmd *sqcp;
>         struct scsi_cmnd *scp;
>         struct sdebug_dev_info *devip;
> -       int num_entries = 0;
> +       struct sdebug_defer *sd_dp;
>
>         sqp = sdebug_q_arr + queue_num;
> +       spin_lock_irqsave(&sqp->qc_lock, iflags);
>
> -       do {
> -               spin_lock_irqsave(&sqp->qc_lock, iflags);
> -               qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
> -               if (unlikely((qc_idx < 0) || (qc_idx >= sdebug_max_queue)))
> -                       goto out;
> +       for (first = true; first || qc_idx + 1 < sdebug_max_queue; )   {
> +               if (first) {
> +                       qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
> +                       first = false;
> +               } else {
> +                       qc_idx = find_next_bit(sqp->in_use_bm, sdebug_max_queue, qc_idx + 1);
> +               }
> +               if (unlikely(qc_idx >= sdebug_max_queue))
> +                       break;
>
>                 sqcp = &sqp->qc_arr[qc_idx];
> +               sd_dp = sqcp->sd_dp;
> +               if (unlikely(!sd_dp))
> +                       continue;
>                 scp = sqcp->a_cmnd;
>                 if (unlikely(scp == NULL)) {
> -                       pr_err("scp is NULL, queue_num=%d, qc_idx=%d from %s\n",
> +                       pr_err("scp is NULL, queue_num=%d, qc_idx=%u from %s\n",
>                                queue_num, qc_idx, __func__);
> -                       goto out;
> +                       break;
>                 }
> +               if (sd_dp->defer_t == SDEB_DEFER_POLL) {
> +                       if (kt_from_boot < sd_dp->cmpl_ts)
> +                               continue;
> +
> +               } else          /* ignoring non REQ_HIPRI requests */
> +                       continue;
>                 devip = (struct sdebug_dev_info *)scp->device->hostdata;
>                 if (likely(devip))
>                         atomic_dec(&devip->num_in_q);
>                 else
>                         pr_err("devip=NULL from %s\n", __func__);
>                 if (unlikely(atomic_read(&retired_max_queue) > 0))
> -                       retiring = 1;
> +                       retiring = true;
>
>                 sqcp->a_cmnd = NULL;
>                 if (unlikely(!test_and_clear_bit(qc_idx, sqp->in_use_bm))) {
> -                       pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%d from %s\n",
> +                       pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%u from %s\n",
>                                 sqp, queue_num, qc_idx, __func__);
> -                       goto out;
> +                       break;
>                 }
> -
>                 if (unlikely(retiring)) {       /* user has reduced max_queue */
>                         int k, retval;
>
>                         retval = atomic_read(&retired_max_queue);
>                         if (qc_idx >= retval) {
>                                 pr_err("index %d too large\n", retval);
> -                               goto out;
> +                               break;
>                         }
>                         k = find_last_bit(sqp->in_use_bm, retval);
>                         if ((k < sdebug_max_queue) || (k == retval))
> @@ -7299,17 +7345,18 @@ static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
>                         else
>                                 atomic_set(&retired_max_queue, k + 1);
>                 }
> +               sd_dp->defer_t = SDEB_DEFER_NONE;
>                 spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>                 scp->scsi_done(scp); /* callback to mid level */
> +               spin_lock_irqsave(&sqp->qc_lock, iflags);
>                 num_entries++;
> -       } while (1);
> -
> -out:
> +       }
>         spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> +       if (num_entries > 0)
> +               atomic_add(num_entries, &sdeb_mq_poll_count);
>         return num_entries;
>  }
>
> -
>  static int scsi_debug_queuecommand(struct Scsi_Host *shost,
>                                    struct scsi_cmnd *scp)
>  {
> --
> 2.25.1
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
  2021-02-10  3:08 ` Kashyap Desai
@ 2021-02-10 17:31   ` Douglas Gilbert
  2021-02-11  5:56   ` Douglas Gilbert
  1 sibling, 0 replies; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-10 17:31 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Hannes Reinecke

On 2021-02-09 10:08 p.m., Kashyap Desai wrote:
> On Wed, Feb 10, 2021 at 4:26 AM Douglas Gilbert <dgilbert@interlog.com> wrote:
>>
>> Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
>> that have REQ_HIPRI set in cmd_flags field. It is expected that
>> these requests will be polled via the mq_poll entry point which
>> is driven by calls to blk_poll() in the block layer. Therefore
>> timer events are not 'wired up' in the normal fashion.
>>
>> There are still cases with short delays (e.g. < 10 microseconds)
>> where by the time the command response processing occurs, the delay
>> is already exceeded in which case the code calls scsi_done()
>> directly. In such cases there is no window for mq_poll() to be
>> called.
>>
>> Add 'mq_polls' counter that increments on each scsi_done() called
>> via the mq_poll entry point. Can be used to show (with 'cat
>> /proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
>> completions rather than some other mechanism.
>>
>>
>> This patch is against the 5.12/scsi-staging branch which includes
>>     a98d6bdf181eb71bf4686666eaf47c642a061642
>>     scsi_debug : iouring iopoll support
>> which it alters. So this patch is a fix of that patch.
>>
>> Changes since version 2 [sent 20210206 to linux-scsi list]
>>    - the sdebug_blk_mq_poll() callback didn't cope with the
>>      uncommon case where sqcp->sd_dp is NULL. Fix.
> 
> Hi Doug, I tried this patch on top of below iouring patch series -
> "[v3,1/4] add io_uring with IOPOLL support in scsi layer"
> 
> After applying patch, I am seeing an IO hang issue - I see
> io_wqe_worker goes into a tight loop.
> 
> 18210 root      20   0 1316348   3552   2376 R  99.1   0.0   0:24.09
> fio                                       18303 root      20   0
> 0      0      0 D  78.8   0.0   0:01.75 io_wqe_worker-0
>             18219 root      20   0       0      0      0 R  71.7   0.0
>   0:06.59 io_wqe_worker-0
> 
> 
> I used below command -
> insmod drivers/scsi/scsi_debug.ko dev_size_mb=1024 sector_size=512
> add_host=24 per_host_store=1 ndelay=10000 host_max_queue=32
> submit_queues=76 num_parts=1 poll_queues=8
> 
> and here is my fio script -
> [global]
> ioengine=io_uring
> hipri=1
> direct=1
> runtime=30s
> rw=randread
> norandommap
> bs=4k
> iodepth=64
> 
> [seqprecon]
> filename=/dev/sdd
> [seqprecon]
> filename=/dev/sde
> [seqprecon]
> filename=/dev/sdf
> 
> 
> I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
> with IOPOLL support in the scsi layer" except scsi_debug. We can work
> on scsi_debug and send standalone patch.

Hi Kashyap,
Yes, you are correct, that script certainly fails when sdd, sde
and sdf are scsi_debug devices.


On closer examination, IMO there is a fundamental misunderstanding about
iopoll on one of our parts. I am assuming that all these things are
closely coupled:
    1) a request with REQ_HIPRI set in its cmd_flags
    2) the associated LLD receiving mq_poll callbacks
    3) _something_ in the upper layers calls blk_poll()

And it is point 3) that is missing in the case of that fio script shown
above. To be able to use sd devices, I believe the sd driver needs to
be altered to issue blk_poll() calls when it sees that it is waiting
for completion of request/commands that have REQ_HIPRI set. And that
is not happening when I run the above fio script.

My scsi_debug patch counts the number of times mq_poll returns a value
 > 0 (i.e. it completed a request) and this is what I see:

# cat /proc/scsi/scsi_debug/3
scsi_debug adapter driver, version 0190 [20200710]
num_tgts=1, shared (ram) size=1024 MB, opts=0x0, every_nth=0
delay=-9999, ndelay=10000, max_luns=1, sector_size=4096 bytes
cylinders=128, heads=64, sectors=32, command aborts=98
RESETs: device=5, target=5, bus=5, host=2
dix_reads=0, dix_writes=0, dif_errors=0
usec_in_jiffy=4000, statistics=0
cmnd_count=0, completions=0, miss_cpus=0, a_tsf=0, mq_polls=0
submit_queues=4
   queue 0:
   queue 1:
   queue 2:
   queue 3:
this host_no=3
....

The fact that 'mq_poll=0' tells me that blk_poll() was never called.

Assuming the above fio script works when sdd, sde snd sdf are MR
devices suggests to me your LLD treats mq_poll callbacks as optional.
The way I have coded the sg an scsi_debug drivers is that when REQ_HIPRI
is given, then blk_poll() calls are _essential_ and said request will
not complete without them.

This code is from drivers/nvme/host/core.c :

static void nvme_execute_rq_polled(struct request_queue *q,
                 struct gendisk *bd_disk, struct request *rq, int at_head)
{
         DECLARE_COMPLETION_ONSTACK(wait);

         WARN_ON_ONCE(!test_bit(QUEUE_FLAG_POLL, &q->queue_flags));

         rq->cmd_flags |= REQ_HIPRI;
         rq->end_io_data = &wait;
         blk_execute_rq_nowait(q, bd_disk, rq, at_head, nvme_end_sync_rq);

         while (!completion_done(&wait)) {
                 blk_poll(q, request_to_qc_t(rq->mq_hctx, rq), true);
                 cond_resched();
         }
}


For requests with REQ_HIPRI set, the sg driver (with blk_poll patch) does
an async version of the synchronous procedure shown in that nvme function.

Doug Gilbert


>> Changes since version 1 [sent 20210201 to linux-scsi list]
>>    - harden SDEB_DEFER_POLL which broke under testing
>>    - add mq_polls counter for debug output
>>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>   drivers/scsi/scsi_debug.c | 141 +++++++++++++++++++++++++-------------
>>   1 file changed, 94 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
>> index 746eec521f79..b9ec3a47fbf1 100644
>> --- a/drivers/scsi/scsi_debug.c
>> +++ b/drivers/scsi/scsi_debug.c
>> @@ -322,17 +322,19 @@ struct sdeb_store_info {
>>          container_of(d, struct sdebug_host_info, dev)
>>
>>   enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
>> -                     SDEB_DEFER_WQ = 2};
>> +                     SDEB_DEFER_WQ = 2, SDEB_DEFER_POLL = 3};
>>
>>   struct sdebug_defer {
>>          struct hrtimer hrt;
>>          struct execute_work ew;
>> +       ktime_t cmpl_ts;/* time since boot to complete this cmd */
>>          int sqa_idx;    /* index of sdebug_queue array */
>>          int qc_idx;     /* index of sdebug_queued_cmd array within sqa_idx */
>>          int hc_idx;     /* hostwide tag index */
>>          int issuing_cpu;
>>          bool init_hrt;
>>          bool init_wq;
>> +       bool init_poll;
>>          bool aborted;   /* true when blk_abort_request() already called */
>>          enum sdeb_defer_type defer_t;
>>   };
>> @@ -357,6 +359,7 @@ static atomic_t sdebug_completions;  /* count of deferred completions */
>>   static atomic_t sdebug_miss_cpus;    /* submission + completion cpus differ */
>>   static atomic_t sdebug_a_tsf;       /* 'almost task set full' counter */
>>   static atomic_t sdeb_inject_pending;
>> +static atomic_t sdeb_mq_poll_count;  /* bumped when mq_poll returns > 0 */
>>
>>   struct opcode_info_t {
>>          u8 num_attached;        /* 0 if this is it (i.e. a leaf); use 0xff */
>> @@ -4730,7 +4733,6 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
>>          struct scsi_cmnd *scp;
>>          struct sdebug_dev_info *devip;
>>
>> -       sd_dp->defer_t = SDEB_DEFER_NONE;
>>          if (unlikely(aborted))
>>                  sd_dp->aborted = false;
>>          qc_idx = sd_dp->qc_idx;
>> @@ -4745,6 +4747,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
>>                  return;
>>          }
>>          spin_lock_irqsave(&sqp->qc_lock, iflags);
>> +       sd_dp->defer_t = SDEB_DEFER_NONE;
>>          sqcp = &sqp->qc_arr[qc_idx];
>>          scp = sqcp->a_cmnd;
>>          if (unlikely(scp == NULL)) {
>> @@ -5517,40 +5520,66 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
>>                                  kt -= d;
>>                          }
>>                  }
>> -               if (!sd_dp->init_hrt) {
>> -                       sd_dp->init_hrt = true;
>> -                       sqcp->sd_dp = sd_dp;
>> -                       hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
>> -                                    HRTIMER_MODE_REL_PINNED);
>> -                       sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
>> -                       sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> -                       sd_dp->qc_idx = k;
>> +               sd_dp->cmpl_ts = ktime_add(ns_to_ktime(ns_from_boot), kt);
>> +               if (cmnd->request->cmd_flags & REQ_HIPRI) {
>> +                       spin_lock_irqsave(&sqp->qc_lock, iflags);
>> +                       if (!sd_dp->init_poll) {
>> +                               sd_dp->init_poll = true;
>> +                               sqcp->sd_dp = sd_dp;
>> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> +                               sd_dp->qc_idx = k;
>> +                       }
>> +                       sd_dp->defer_t = SDEB_DEFER_POLL;
>> +                       spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> +               } else {
>> +                       if (!sd_dp->init_hrt) {
>> +                               sd_dp->init_hrt = true;
>> +                               sqcp->sd_dp = sd_dp;
>> +                               hrtimer_init(&sd_dp->hrt, CLOCK_MONOTONIC,
>> +                                            HRTIMER_MODE_REL_PINNED);
>> +                               sd_dp->hrt.function = sdebug_q_cmd_hrt_complete;
>> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> +                               sd_dp->qc_idx = k;
>> +                       }
>> +                       sd_dp->defer_t = SDEB_DEFER_HRT;
>> +                       /* schedule the invocation of scsi_done() for a later time */
>> +                       hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
>>                  }
>>                  if (sdebug_statistics)
>>                          sd_dp->issuing_cpu = raw_smp_processor_id();
>> -               sd_dp->defer_t = SDEB_DEFER_HRT;
>> -               /* schedule the invocation of scsi_done() for a later time */
>> -               hrtimer_start(&sd_dp->hrt, kt, HRTIMER_MODE_REL_PINNED);
>>          } else {        /* jdelay < 0, use work queue */
>> -               if (!sd_dp->init_wq) {
>> -                       sd_dp->init_wq = true;
>> -                       sqcp->sd_dp = sd_dp;
>> -                       sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> -                       sd_dp->qc_idx = k;
>> -                       INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
>> -               }
>> -               if (sdebug_statistics)
>> -                       sd_dp->issuing_cpu = raw_smp_processor_id();
>> -               sd_dp->defer_t = SDEB_DEFER_WQ;
>>                  if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
>>                               atomic_read(&sdeb_inject_pending)))
>>                          sd_dp->aborted = true;
>> -               schedule_work(&sd_dp->ew.work);
>> -               if (unlikely((sdebug_opts & SDEBUG_OPT_CMD_ABORT) &&
>> -                            atomic_read(&sdeb_inject_pending))) {
>> +               sd_dp->cmpl_ts = ns_to_ktime(ns_from_boot);
>> +               if (cmnd->request->cmd_flags & REQ_HIPRI) {
>> +                       spin_lock_irqsave(&sqp->qc_lock, iflags);
>> +                       if (!sd_dp->init_poll) {
>> +                               sd_dp->init_poll = true;
>> +                               sqcp->sd_dp = sd_dp;
>> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> +                               sd_dp->qc_idx = k;
>> +                       }
>> +                       sd_dp->defer_t = SDEB_DEFER_POLL;
>> +                       spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> +               } else {
>> +                       if (!sd_dp->init_wq) {
>> +                               sd_dp->init_wq = true;
>> +                               sqcp->sd_dp = sd_dp;
>> +                               sd_dp->sqa_idx = sqp - sdebug_q_arr;
>> +                               sd_dp->qc_idx = k;
>> +                               INIT_WORK(&sd_dp->ew.work, sdebug_q_cmd_wq_complete);
>> +                       }
>> +                       sd_dp->defer_t = SDEB_DEFER_WQ;
>> +                       schedule_work(&sd_dp->ew.work);
>> +               }
>> +               if (sdebug_statistics)
>> +                       sd_dp->issuing_cpu = raw_smp_processor_id();
>> +               if (unlikely(sd_dp->aborted)) {
>>                          sdev_printk(KERN_INFO, sdp, "abort request tag %d\n", cmnd->request->tag);
>>                          blk_abort_request(cmnd->request);
>>                          atomic_set(&sdeb_inject_pending, 0);
>> +                       sd_dp->aborted = false;
>>                  }
>>          }
>>          if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) && scsi_result == device_qfull_result))
>> @@ -5779,11 +5808,12 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
>>                     dix_reads, dix_writes, dif_errors);
>>          seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
>>                     sdebug_statistics);
>> -       seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
>> +       seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d, mq_polls=%d\n",
>>                     atomic_read(&sdebug_cmnd_count),
>>                     atomic_read(&sdebug_completions),
>>                     "miss_cpus", atomic_read(&sdebug_miss_cpus),
>> -                  atomic_read(&sdebug_a_tsf));
>> +                  atomic_read(&sdebug_a_tsf),
>> +                  atomic_read(&sdeb_mq_poll_count));
>>
>>          seq_printf(m, "submit_queues=%d\n", submit_queues);
>>          for (j = 0, sqp = sdebug_q_arr; j < submit_queues; ++j, ++sqp) {
>> @@ -7246,52 +7276,68 @@ static int sdebug_map_queues(struct Scsi_Host *shost)
>>
>>   static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
>>   {
>> -       int qc_idx;
>> -       int retiring = 0;
>> +       bool first;
>> +       bool retiring = false;
>> +       int num_entries = 0;
>> +       unsigned int qc_idx = 0;
>>          unsigned long iflags;
>> +       ktime_t kt_from_boot = ktime_get_boottime();
>>          struct sdebug_queue *sqp;
>>          struct sdebug_queued_cmd *sqcp;
>>          struct scsi_cmnd *scp;
>>          struct sdebug_dev_info *devip;
>> -       int num_entries = 0;
>> +       struct sdebug_defer *sd_dp;
>>
>>          sqp = sdebug_q_arr + queue_num;
>> +       spin_lock_irqsave(&sqp->qc_lock, iflags);
>>
>> -       do {
>> -               spin_lock_irqsave(&sqp->qc_lock, iflags);
>> -               qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
>> -               if (unlikely((qc_idx < 0) || (qc_idx >= sdebug_max_queue)))
>> -                       goto out;
>> +       for (first = true; first || qc_idx + 1 < sdebug_max_queue; )   {
>> +               if (first) {
>> +                       qc_idx = find_first_bit(sqp->in_use_bm, sdebug_max_queue);
>> +                       first = false;
>> +               } else {
>> +                       qc_idx = find_next_bit(sqp->in_use_bm, sdebug_max_queue, qc_idx + 1);
>> +               }
>> +               if (unlikely(qc_idx >= sdebug_max_queue))
>> +                       break;
>>
>>                  sqcp = &sqp->qc_arr[qc_idx];
>> +               sd_dp = sqcp->sd_dp;
>> +               if (unlikely(!sd_dp))
>> +                       continue;
>>                  scp = sqcp->a_cmnd;
>>                  if (unlikely(scp == NULL)) {
>> -                       pr_err("scp is NULL, queue_num=%d, qc_idx=%d from %s\n",
>> +                       pr_err("scp is NULL, queue_num=%d, qc_idx=%u from %s\n",
>>                                 queue_num, qc_idx, __func__);
>> -                       goto out;
>> +                       break;
>>                  }
>> +               if (sd_dp->defer_t == SDEB_DEFER_POLL) {
>> +                       if (kt_from_boot < sd_dp->cmpl_ts)
>> +                               continue;
>> +
>> +               } else          /* ignoring non REQ_HIPRI requests */
>> +                       continue;
>>                  devip = (struct sdebug_dev_info *)scp->device->hostdata;
>>                  if (likely(devip))
>>                          atomic_dec(&devip->num_in_q);
>>                  else
>>                          pr_err("devip=NULL from %s\n", __func__);
>>                  if (unlikely(atomic_read(&retired_max_queue) > 0))
>> -                       retiring = 1;
>> +                       retiring = true;
>>
>>                  sqcp->a_cmnd = NULL;
>>                  if (unlikely(!test_and_clear_bit(qc_idx, sqp->in_use_bm))) {
>> -                       pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%d from %s\n",
>> +                       pr_err("Unexpected completion sqp %p queue_num=%d qc_idx=%u from %s\n",
>>                                  sqp, queue_num, qc_idx, __func__);
>> -                       goto out;
>> +                       break;
>>                  }
>> -
>>                  if (unlikely(retiring)) {       /* user has reduced max_queue */
>>                          int k, retval;
>>
>>                          retval = atomic_read(&retired_max_queue);
>>                          if (qc_idx >= retval) {
>>                                  pr_err("index %d too large\n", retval);
>> -                               goto out;
>> +                               break;
>>                          }
>>                          k = find_last_bit(sqp->in_use_bm, retval);
>>                          if ((k < sdebug_max_queue) || (k == retval))
>> @@ -7299,17 +7345,18 @@ static int sdebug_blk_mq_poll(struct Scsi_Host *shost, unsigned int queue_num)
>>                          else
>>                                  atomic_set(&retired_max_queue, k + 1);
>>                  }
>> +               sd_dp->defer_t = SDEB_DEFER_NONE;
>>                  spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>>                  scp->scsi_done(scp); /* callback to mid level */
>> +               spin_lock_irqsave(&sqp->qc_lock, iflags);
>>                  num_entries++;
>> -       } while (1);
>> -
>> -out:
>> +       }
>>          spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>> +       if (num_entries > 0)
>> +               atomic_add(num_entries, &sdeb_mq_poll_count);
>>          return num_entries;
>>   }
>>
>> -
>>   static int scsi_debug_queuecommand(struct Scsi_Host *shost,
>>                                     struct scsi_cmnd *scp)
>>   {
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
  2021-02-10  3:08 ` Kashyap Desai
  2021-02-10 17:31   ` Douglas Gilbert
@ 2021-02-11  5:56   ` Douglas Gilbert
  2021-02-11 12:13     ` Kashyap Desai
  1 sibling, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-11  5:56 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Hannes Reinecke

On 2021-02-09 10:08 p.m., Kashyap Desai wrote:
> On Wed, Feb 10, 2021 at 4:26 AM Douglas Gilbert <dgilbert@interlog.com> wrote:
>>
>> Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests
>> that have REQ_HIPRI set in cmd_flags field. It is expected that
>> these requests will be polled via the mq_poll entry point which
>> is driven by calls to blk_poll() in the block layer. Therefore
>> timer events are not 'wired up' in the normal fashion.
>>
>> There are still cases with short delays (e.g. < 10 microseconds)
>> where by the time the command response processing occurs, the delay
>> is already exceeded in which case the code calls scsi_done()
>> directly. In such cases there is no window for mq_poll() to be
>> called.
>>
>> Add 'mq_polls' counter that increments on each scsi_done() called
>> via the mq_poll entry point. Can be used to show (with 'cat
>> /proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing
>> completions rather than some other mechanism.
>>
>>
>> This patch is against the 5.12/scsi-staging branch which includes
>>     a98d6bdf181eb71bf4686666eaf47c642a061642
>>     scsi_debug : iouring iopoll support
>> which it alters. So this patch is a fix of that patch.
>>
>> Changes since version 2 [sent 20210206 to linux-scsi list]
>>    - the sdebug_blk_mq_poll() callback didn't cope with the
>>      uncommon case where sqcp->sd_dp is NULL. Fix.
> 
> Hi Doug, I tried this patch on top of below iouring patch series -
> "[v3,1/4] add io_uring with IOPOLL support in scsi layer"
> 
> After applying patch, I am seeing an IO hang issue - I see
> io_wqe_worker goes into a tight loop.
> 
> 18210 root      20   0 1316348   3552   2376 R  99.1   0.0   0:24.09
> fio                                       18303 root      20   0
> 0      0      0 D  78.8   0.0   0:01.75 io_wqe_worker-0
>             18219 root      20   0       0      0      0 R  71.7   0.0
>   0:06.59 io_wqe_worker-0
> 
> 
> I used below command -
> insmod drivers/scsi/scsi_debug.ko dev_size_mb=1024 sector_size=512
> add_host=24 per_host_store=1 ndelay=10000 host_max_queue=32
> submit_queues=76 num_parts=1 poll_queues=8
> 
> and here is my fio script -
> [global]
> ioengine=io_uring
> hipri=1
> direct=1
> runtime=30s
> rw=randread
> norandommap
> bs=4k
> iodepth=64

Kashyap,
There is another issue here, namely iodepth > host_max_queue (64 > 32)
and in my setup that is not handled well. So there is a problem with
scsi_debug *** or the mid-level in handling this case.

If you have modified the sd driver to call blk_poll() then perhaps
you could try the above test again with a reduced iodepth.

Doug Gilbert

*** the scsi_debug should either yield a TASK SET FULL status or return
     SCSI_MLQUEUE_HOST_BUSY from queuecommand() when it has run out of
     slots.

> [seqprecon]
> filename=/dev/sdd
> [seqprecon]
> filename=/dev/sde
> [seqprecon]
> filename=/dev/sdf
> 
> 
> I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
> with IOPOLL support in the scsi layer" except scsi_debug. We can work
> on scsi_debug and send standalone patch.
> 
> Kashyap

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v3] scsi_debug: add new defer_type for mq_poll
  2021-02-11  5:56   ` Douglas Gilbert
@ 2021-02-11 12:13     ` Kashyap Desai
  2021-02-11 16:11       ` Douglas Gilbert
  0 siblings, 1 reply; 7+ messages in thread
From: Kashyap Desai @ 2021-02-11 12:13 UTC (permalink / raw)
  To: dgilbert
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]

> Kashyap,
> There is another issue here, namely iodepth > host_max_queue (64 > 32) and
> in my setup that is not handled well. So there is a problem with
> scsi_debug
> *** or the mid-level in handling this case.
>
> If you have modified the sd driver to call blk_poll() then perhaps you
> could try
> the above test again with a reduced iodepth.

Doug -

I debug the issue and found the fix. We need to remove below code to handle
this new defer type for mq poll.
Earlier I did not used " REQ_HIPRI" to handle mq poll, but now you are
managing it through REQ_HIPRI check, it is safe and required to have below
patch.


diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index b9ec3a47fbf1..c50de49a2c2f 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -5437,13 +5437,6 @@ static int schedule_resp(struct scsi_cmnd *cmnd,
struct sdebug_dev_info *devip,
        sd_dp = sqcp->sd_dp;
        spin_unlock_irqrestore(&sqp->qc_lock, iflags);

-       /* Do not complete IO from default completion path.
-        * Let it to be on queue.
-        * Completion should happen from mq_poll interface.
-        */
-       if ((sqp - sdebug_q_arr) >= (submit_queues - poll_queues))
-               return 0;
-
        if (!sd_dp) {
                sd_dp = kzalloc(sizeof(*sd_dp), GFP_ATOMIC);
                if (!sd_dp) {


On top of your v3 patch + above fix, I am able to run IO and there are no
issues.  There is no issue with higher QD as well, so we are good to go.

Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>

So how shall we progress now ? Shall I ask Martin to pick all the patches
from " [PATCH v3 0/4] io_uring iopoll in scsi layer" since this patch has
dependency on my series.

Kashyap

>
> Doug Gilbert
>
> *** the scsi_debug should either yield a TASK SET FULL status or return
>      SCSI_MLQUEUE_HOST_BUSY from queuecommand() when it has run out of
>      slots.
>
> > [seqprecon]
> > filename=/dev/sdd
> > [seqprecon]
> > filename=/dev/sde
> > [seqprecon]
> > filename=/dev/sdf
> >
> >
> > I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
> > with IOPOLL support in the scsi layer" except scsi_debug. We can work
> > on scsi_debug and send standalone patch.
> >
> > Kashyap

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] scsi_debug: add new defer_type for mq_poll
  2021-02-11 12:13     ` Kashyap Desai
@ 2021-02-11 16:11       ` Douglas Gilbert
  2021-02-12 11:11         ` Kashyap Desai
  0 siblings, 1 reply; 7+ messages in thread
From: Douglas Gilbert @ 2021-02-11 16:11 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Hannes Reinecke

On 2021-02-11 7:13 a.m., Kashyap Desai wrote:
>> Kashyap,
>> There is another issue here, namely iodepth > host_max_queue (64 > 32) and
>> in my setup that is not handled well. So there is a problem with
>> scsi_debug
>> *** or the mid-level in handling this case.
>>
>> If you have modified the sd driver to call blk_poll() then perhaps you
>> could try
>> the above test again with a reduced iodepth.
> 
> Doug -
> 
> I debug the issue and found the fix. We need to remove below code to handle
> this new defer type for mq poll.
> Earlier I did not used " REQ_HIPRI" to handle mq poll, but now you are
> managing it through REQ_HIPRI check, it is safe and required to have below
> patch.
> 
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index b9ec3a47fbf1..c50de49a2c2f 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -5437,13 +5437,6 @@ static int schedule_resp(struct scsi_cmnd *cmnd,
> struct sdebug_dev_info *devip,
>          sd_dp = sqcp->sd_dp;
>          spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> 
> -       /* Do not complete IO from default completion path.
> -        * Let it to be on queue.
> -        * Completion should happen from mq_poll interface.
> -        */
> -       if ((sqp - sdebug_q_arr) >= (submit_queues - poll_queues))
> -               return 0;
> -
>          if (!sd_dp) {
>                  sd_dp = kzalloc(sizeof(*sd_dp), GFP_ATOMIC);
>                  if (!sd_dp) {
> 
> 
> On top of your v3 patch + above fix, I am able to run IO and there are no
> issues.  There is no issue with higher QD as well, so we are good to go.
> 
> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>
> 
> So how shall we progress now ? Shall I ask Martin to pick all the patches
> from " [PATCH v3 0/4] io_uring iopoll in scsi layer" since this patch has
> dependency on my series.

Kashyap,
Good catch!

You could roll all the scsi_debug stuff into one patch and put both
our sign-offs on it, or present it as two patches:
   - first one: your original plus this fix
   - second one: mine with your tested-by on it

Since this is new functionality for the scsi sub-system, it is unlikely
that all the corner cases have been visited. However, with what we have
in place for the next cycle, others have something reasonably solid to
build on.

Doug Gilbert

>> *** the scsi_debug should either yield a TASK SET FULL status or return
>>       SCSI_MLQUEUE_HOST_BUSY from queuecommand() when it has run out of
>>       slots.
>>
>>> [seqprecon]
>>> filename=/dev/sdd
>>> [seqprecon]
>>> filename=/dev/sde
>>> [seqprecon]
>>> filename=/dev/sdf
>>>
>>>
>>> I will ask Martin to pick all the patches from "[v3,1/4] add io_uring
>>> with IOPOLL support in the scsi layer" except scsi_debug. We can work
>>> on scsi_debug and send standalone patch.
>>>
>>> Kashyap


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v3] scsi_debug: add new defer_type for mq_poll
  2021-02-11 16:11       ` Douglas Gilbert
@ 2021-02-12 11:11         ` Kashyap Desai
  0 siblings, 0 replies; 7+ messages in thread
From: Kashyap Desai @ 2021-02-12 11:11 UTC (permalink / raw)
  To: dgilbert
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 859 bytes --]

> > So how shall we progress now ? Shall I ask Martin to pick all the
> > patches from " [PATCH v3 0/4] io_uring iopoll in scsi layer" since
> > this patch has dependency on my series.
>
> Kashyap,
> Good catch!
>
> You could roll all the scsi_debug stuff into one patch and put both our
> sign-
> offs on it, or present it as two patches:
>    - first one: your original plus this fix
>    - second one: mine with your tested-by on it

Doug - I will include your patch "scsi_debug: add new defer_type for
mq_poll" (including the recent fix) as separate patch in my series and
provide Tested-by tag for the same.

>
> Since this is new functionality for the scsi sub-system, it is unlikely
> that all the
> corner cases have been visited. However, with what we have in place for
> the
> next cycle, others have something reasonably solid to build on.

Agree.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-12 11:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-09 22:56 [PATCH v3] scsi_debug: add new defer_type for mq_poll Douglas Gilbert
2021-02-10  3:08 ` Kashyap Desai
2021-02-10 17:31   ` Douglas Gilbert
2021-02-11  5:56   ` Douglas Gilbert
2021-02-11 12:13     ` Kashyap Desai
2021-02-11 16:11       ` Douglas Gilbert
2021-02-12 11:11         ` Kashyap Desai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox