Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH v3 8/9] IB/mlx5: Add helper mlx5_ib_post_send_wait
From: Leon Romanovsky @ 2016-10-27  6:09 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477468874-16328-9-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

On Wed, Oct 26, 2016 at 01:31:13PM +0530, Binoy Jayan wrote:
> Clean up the following common code (to post a list of work requests to the
> send queue of the specified QP) at various places and add a helper function
> 'mlx5_ib_post_send_wait' to implement the same.
>
>  - Initialize 'mlx5_ib_umr_context' on stack
>  - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
>  - Acquire the semaphore
>  - call ib_post_send with a single ib_send_wr
>  - wait_for_completion()
>  - Check for umr_context.status
>  - Release the semaphore
>
> As semaphores are going away in the future, moving all of these into the
> shared helper leaves only a single function using the semaphore, which
> can then be rewritten to use something else.
>
> Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 115 +++++++++++-----------------------------
>  1 file changed, 32 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index d4ad672..19c292a 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -856,16 +856,40 @@ static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
>  	init_completion(&context->done);
>  }
>
> +static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> +					 struct mlx5_umr_wr *umrwr)
> +{
> +	struct umr_common *umrc = &dev->umrc;
> +	struct ib_send_wr *bad;
> +	int err;
> +	struct mlx5_ib_umr_context umr_context;
> +
> +	mlx5_ib_init_umr_context(&umr_context);
> +	umrwr->wr.wr_cqe = &umr_context.cqe;
> +
> +	down(&umrc->sem);
> +	err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
> +	if (err) {
> +		mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
> +	} else {
> +		wait_for_completion(&umr_context.done);
> +		if (umr_context.status != IB_WC_SUCCESS) {
> +			mlx5_ib_warn(dev, "reg umr failed (%u)\n",
> +				     umr_context.status);
> +			err = -EFAULT;
> +		}
> +	}
> +	up(&umrc->sem);
> +	return err;
> +}
> +
>  static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
>  				  u64 virt_addr, u64 len, int npages,
>  				  int page_shift, int order, int access_flags)
>  {
>  	struct mlx5_ib_dev *dev = to_mdev(pd->device);
>  	struct device *ddev = dev->ib_dev.dma_device;
> -	struct umr_common *umrc = &dev->umrc;
> -	struct mlx5_ib_umr_context umr_context;
>  	struct mlx5_umr_wr umrwr = {};
> -	struct ib_send_wr *bad;
>  	struct mlx5_ib_mr *mr;
>  	struct ib_sge sg;
>  	int size;
> @@ -894,24 +918,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
>  	if (err)
>  		goto free_mr;
>
> -	mlx5_ib_init_umr_context(&umr_context);
> -
> -	umrwr.wr.wr_cqe = &umr_context.cqe;
>  	prep_umr_reg_wqe(pd, &umrwr.wr, &sg, dma, npages, mr->mmkey.key,
>  			 page_shift, virt_addr, len, access_flags);
>
> -	down(&umrc->sem);
> -	err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
> -	if (err) {
> -		mlx5_ib_warn(dev, "post send failed, err %d\n", err);
> +	err = mlx5_ib_post_send_wait(dev, &umrwr);
> +	if (err != -EFAULT)
>  		goto unmap_dma;

NAK,
You are breaking driver.

it should be "if (err && err != -EFAULT)"

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2 8/8] IB/mlx5: Add helper mlx5_ib_post_send_wait
From: Leon Romanovsky @ 2016-10-27  6:05 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Arnd Bergmann,
	linux-rdma, Linux kernel mailing list
In-Reply-To: <CAHv-k__yPiQOeZxDjvaSzG6A8Qs32to9MnKiPpHAvbXMxqE5EA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On Tue, Oct 25, 2016 at 06:46:58PM +0530, Binoy Jayan wrote:
> On 25 October 2016 at 17:56, Leon Romanovsky <leon@kernel.org> wrote:
> > On Tue, Oct 25, 2016 at 05:31:59PM +0530, Binoy Jayan wrote:
>
> > In case of success (err == 0), you will call to unmap_dma instead of
> > normal flow.
> >
> > NAK,
> > Leon Romanovsky <leonro@mellanox.com>
>
> Hi Loen,
>
> Even in the original code, the regular flow seems to reach 'unmap_dma' after
> returning from 'wait_for_completion'().

In original flow, the code executed assignments to mr->mmkey. In you
code, it is skipped.

http://lxr.free-electrons.com/source/drivers/infiniband/hw/mlx5/mr.c#L900

>
> -Binoy

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Hannes Reinecke @ 2016-10-27  5:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <5143c240-39af-9fe2-d3e6-ed69f9c20531-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On 10/27/2016 12:53 AM, Bart Van Assche wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The algorithm used by blk_mq_quiesce_queue() is as follows:
> * Hold either an RCU read lock or an SRCU read lock around
>   .queue_rq() calls. The former is used if .queue_rq() does not
>   block and the latter if .queue_rq() may block.
> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>   synchronize_rcu() to wait for .queue_rq() invocations that
>   started before blk_mq_quiesce_queue() was called.
> * The blk_mq_hctx_stopped() calls that control whether or not
>   .queue_rq() will be called are called with the (S)RCU read lock
>   held. This is necessary to avoid race conditions against
>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>   sequence from another thread.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/Kconfig          |  1 +
>  block/blk-mq.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |  3 +++
>  include/linux/blkdev.h |  1 +
>  4 files changed, 67 insertions(+), 7 deletions(-)
> 
Hmm. Can't say I like having to have two RCU structure for the same
purpose, but I guess that can't be helped.

Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare-l3A5Bk7waGM@public.gmane.org			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Hannes Reinecke @ 2016-10-27  5:50 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <32b0bd88-cb8e-754a-89fc-b1825778b05a-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On 10/27/2016 12:52 AM, Bart Van Assche wrote:
> Move the "hctx stopped" test and the insert request calls into
> blk_mq_direct_issue_request(). Rename that function into
> blk_mq_try_issue_directly() to reflect its new semantics. Pass
> the hctx pointer to that function instead of looking it up a
> second time. These changes avoid that code has to be duplicated
> in the next patch.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/blk-mq.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare-l3A5Bk7waGM@public.gmane.org			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 03/12] blk-mq: Introduce blk_mq_queue_stopped()
From: Hannes Reinecke @ 2016-10-27  5:49 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <f68b2997-8b0d-7aea-2859-5fbda4f6bf71@sandisk.com>

On 10/27/2016 12:52 AM, Bart Van Assche wrote:
> The function blk_queue_stopped() allows to test whether or not a
> traditional request queue has been stopped. Introduce a helper
> function that allows block drivers to query easily whether or not
> one or more hardware contexts of a blk-mq queue have been stopped.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c         | 20 ++++++++++++++++++++
>  include/linux/blk-mq.h |  1 +
>  2 files changed, 21 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH 01/12] blk-mq: Do not invoke .queue_rq() for a stopped queue
From: Hannes Reinecke @ 2016-10-27  5:47 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lin,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <1debcf7f-c950-308b-d297-3e48a77e08d7@sandisk.com>

On 10/27/2016 12:50 AM, Bart Van Assche wrote:
> The meaning of the BLK_MQ_S_STOPPED flag is "do not call
> .queue_rq()". Hence modify blk_mq_make_request() such that requests
> are queued instead of issued if a queue has been stopped.
> 
> Reported-by: Ming Lei <tom.leiming@gmail.com>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Ming Lei <tom.leiming@gmail.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  block/blk-mq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc2eed..b5dcafb 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1332,9 +1332,9 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  		blk_mq_put_ctx(data.ctx);
>  		if (!old_rq)
>  			goto done;
> -		if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -			goto done;
> -		blk_mq_insert_request(old_rq, false, true, true);
> +		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
> +		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +			blk_mq_insert_request(old_rq, false, true, true);
>  		goto done;
>  	}
>  
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-10-27  3:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161026162053.GE24898-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 0:20, Jason Gunthorpe 写道:
> On Wed, Oct 26, 2016 at 09:04:02PM +0800, Lijun Ou wrote:
>> +static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
>> +					       int abi_version)
>> +{
>> +	struct hns_roce_device  *dev;
>> +	char			 value[128];
>> +	int			 i;
>> +
>> +	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
>> +				value, sizeof(value)) > 0)
>> +		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
>> +			if (!strcmp(value, acpi_table[i].hid))
>> +				goto found;
> 
> You shouldn't need to do both modalias and compatible, there should be
> an acceptable modalias for the DT version too.
> 
when startup by DT, the content of device/modalias is of:NinfinibandT<NULL>Chisilicon,hns-roce-v1.
it is long and complex. the content of device/of_node/compatible is hisilicon,hns-roce-v1
when startup by APCI, the content of device/modalias is acpi:HISI00D1:
Hence, we decide to adopt the above approach to  distinguish the device. when adding a device of pcie
in v2, we will add a condition branch to distinguish it, as follows:
   if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
			value, sizeof(value)) > 0)
		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
		if (!strcmp(value, acpi_table[i].hid))
			goto found;

....
  if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
	...
  if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
 	...

> But I wonder if this isn't generically better to be
> 
>  last_dir(readlink("device/driver")) == "hns"
> 
> instead?
> Jason
> 
I think it is not insteaded. because it will be find the hns in the path(device/driver)
As follows:
when startup by DT, the content of device/driver is:
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
bind module/ unbind
c4000000.infiniband/ uevent
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
>

when startup by ACPI, the content of device/driver is:
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
HISI00D1:00/ bind module/ uevent unbind
root@(none)$ cat /sys/class/infiniband/hns_0/device/driver/
cat: read error: Is a directory

thanks
Lijun Ou
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-27  3:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <CACVXFVNyRnJFnsSO9TNksE5GNy7TJKL=iZnqrv38=u0XCrW7jA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/26/16 19:48, Ming Lei wrote:
> On Thu, Oct 27, 2016 at 10:40 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>> If you have a look at the later patches in this series then you will see
>> that the dm core and the NVMe driver have been modified such that
>> blk_mq_stop_hw_queues(q) is called immediately before
>> blk_mq_quiesce_queue(q) is called.
>
> Cause any current and future users of blk_mq_quiesce_queue(q)
> have to set the flag via blk_mq_stop_hw_queues(q), why not set
> the flag explicitly in blk_mq_quiesce_queue(q)?

Hello Ming,

I'll leave it to Jens to decide whether I should repost the patch series 
with this change integrated or whether to realize this change with a 
follow-up patch.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Ming Lei @ 2016-10-27  2:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <16074653-5e31-27b0-90db-33d36a9df0bc@acm.org>

On Thu, Oct 27, 2016 at 10:40 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> On 10/26/16 19:31, Ming Lei wrote:
>>
>> On Thu, Oct 27, 2016 at 10:04 AM, Bart Van Assche <bvanassche@acm.org>
>> wrote:
>>>
>>> On 10/26/16 18:30, Ming Lei wrote:
>>>>
>>>>
>>>> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
>>>> <bart.vanassche@sandisk.com> wrote:
>>>>>
>>>>>
>>>>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>>>>> have finished. This function does *not* wait until all outstanding
>>>>> requests have finished (this means invocation of request.end_io()).
>>>>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>>>>> * Hold either an RCU read lock or an SRCU read lock around
>>>>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>>>>   block and the latter if .queue_rq() may block.
>>>>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>>>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>>>>   started before blk_mq_quiesce_queue() was called.
>>>>> * The blk_mq_hctx_stopped() calls that control whether or not
>>>>>   .queue_rq() will be called are called with the (S)RCU read lock
>>>>>   held. This is necessary to avoid race conditions against
>>>>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>>>>   sequence from another thread.
>>>>>
>>>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>>> Cc: Ming Lei <tom.leiming@gmail.com>
>>>>> Cc: Hannes Reinecke <hare@suse.com>
>>>>> Cc: Johannes Thumshirn <jthumshirn@suse.de>
>>>>> ---
>>>>>  block/Kconfig          |  1 +
>>>>>  block/blk-mq.c         | 69
>>>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>>>  include/linux/blk-mq.h |  3 +++
>>>>>  include/linux/blkdev.h |  1 +
>>>>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/block/Kconfig b/block/Kconfig
>>>>> index 1d4d624..0562ef9 100644
>>>>> --- a/block/Kconfig
>>>>> +++ b/block/Kconfig
>>>>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>>>>         bool "Enable the block layer" if EXPERT
>>>>>         default y
>>>>>         select SBITMAP
>>>>> +       select SRCU
>>>>>         help
>>>>>          Provide block layer support for the kernel.
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 0cf21c2..4945437 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue
>>>>> *q)
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>>>>
>>>>> +/**
>>>>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have
>>>>> finished
>>>>> + * @q: request queue.
>>>>> + *
>>>>> + * Note: this function does not prevent that the struct request
>>>>> end_io()
>>>>> + * callback function is invoked. Additionally, it is not prevented
>>>>> that
>>>>> + * new queue_rq() calls occur unless the queue has been stopped first.
>>>>> + */
>>>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>>>> +{
>>>>> +       struct blk_mq_hw_ctx *hctx;
>>>>> +       unsigned int i;
>>>>> +       bool rcu = false;
>>>>
>>>>
>>>>
>>>> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
>>>> or per-queue flag to block comming .queue_rq(), seems I mentioned
>>>> that before:
>>>>
>>>>    https://www.spinics.net/lists/linux-rdma/msg41389.html
>>>
>>>
>>>
>>> Hello Ming,
>>>
>>> Thanks for having included an URL to an archived version of that
>>> discussion.
>>> What I remember about that discussion is that I proposed to use the
>>> existing
>>> flag BLK_MQ_S_STOPPED instead of introducing a
>>> new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. See
>>> also https://www.spinics.net/lists/linux-rdma/msg41430.html.
>>
>>
>> Yes, I am fine with either one, but the flag need to set in
>> blk_mq_quiesce_queue(), doesnt't it?
>
>
> Hello Ming,
>
> If you have a look at the later patches in this series then you will see
> that the dm core and the NVMe driver have been modified such that
> blk_mq_stop_hw_queues(q) is called immediately before
> blk_mq_quiesce_queue(q) is called.

Cause any current and future users of blk_mq_quiesce_queue(q)
have to set the flag via blk_mq_stop_hw_queues(q), why not set
the flag explicitly in blk_mq_quiesce_queue(q)?


thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-27  2:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <CACVXFVNvdNXm4JXsXo2p1_pZLqDpCos0MTTR+svR98p_vw5PCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/26/16 19:31, Ming Lei wrote:
> On Thu, Oct 27, 2016 at 10:04 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>> On 10/26/16 18:30, Ming Lei wrote:
>>>
>>> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
>>> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
>>>>
>>>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>>>> have finished. This function does *not* wait until all outstanding
>>>> requests have finished (this means invocation of request.end_io()).
>>>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>>>> * Hold either an RCU read lock or an SRCU read lock around
>>>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>>>   block and the latter if .queue_rq() may block.
>>>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>>>   started before blk_mq_quiesce_queue() was called.
>>>> * The blk_mq_hctx_stopped() calls that control whether or not
>>>>   .queue_rq() will be called are called with the (S)RCU read lock
>>>>   held. This is necessary to avoid race conditions against
>>>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>>>   sequence from another thread.
>>>>
>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>>>> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
>>>> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
>>>> ---
>>>>  block/Kconfig          |  1 +
>>>>  block/blk-mq.c         | 69
>>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>>  include/linux/blk-mq.h |  3 +++
>>>>  include/linux/blkdev.h |  1 +
>>>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/block/Kconfig b/block/Kconfig
>>>> index 1d4d624..0562ef9 100644
>>>> --- a/block/Kconfig
>>>> +++ b/block/Kconfig
>>>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>>>         bool "Enable the block layer" if EXPERT
>>>>         default y
>>>>         select SBITMAP
>>>> +       select SRCU
>>>>         help
>>>>          Provide block layer support for the kernel.
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 0cf21c2..4945437 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>>>
>>>> +/**
>>>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have
>>>> finished
>>>> + * @q: request queue.
>>>> + *
>>>> + * Note: this function does not prevent that the struct request end_io()
>>>> + * callback function is invoked. Additionally, it is not prevented that
>>>> + * new queue_rq() calls occur unless the queue has been stopped first.
>>>> + */
>>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>>> +{
>>>> +       struct blk_mq_hw_ctx *hctx;
>>>> +       unsigned int i;
>>>> +       bool rcu = false;
>>>
>>>
>>> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
>>> or per-queue flag to block comming .queue_rq(), seems I mentioned
>>> that before:
>>>
>>>    https://www.spinics.net/lists/linux-rdma/msg41389.html
>>
>>
>> Hello Ming,
>>
>> Thanks for having included an URL to an archived version of that discussion.
>> What I remember about that discussion is that I proposed to use the existing
>> flag BLK_MQ_S_STOPPED instead of introducing a
>> new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. See
>> also https://www.spinics.net/lists/linux-rdma/msg41430.html.
>
> Yes, I am fine with either one, but the flag need to set in
> blk_mq_quiesce_queue(), doesnt't it?

Hello Ming,

If you have a look at the later patches in this series then you will see 
that the dm core and the NVMe driver have been modified such that
blk_mq_stop_hw_queues(q) is called immediately before 
blk_mq_quiesce_queue(q) is called.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Ming Lei @ 2016-10-27  2:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <7690b469-f5b7-ab04-4b6f-fa0d679e0f18@acm.org>

On Thu, Oct 27, 2016 at 10:04 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> On 10/26/16 18:30, Ming Lei wrote:
>>
>> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
>> <bart.vanassche@sandisk.com> wrote:
>>>
>>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>>> have finished. This function does *not* wait until all outstanding
>>> requests have finished (this means invocation of request.end_io()).
>>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>>> * Hold either an RCU read lock or an SRCU read lock around
>>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>>   block and the latter if .queue_rq() may block.
>>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>>   started before blk_mq_quiesce_queue() was called.
>>> * The blk_mq_hctx_stopped() calls that control whether or not
>>>   .queue_rq() will be called are called with the (S)RCU read lock
>>>   held. This is necessary to avoid race conditions against
>>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>>   sequence from another thread.
>>>
>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Ming Lei <tom.leiming@gmail.com>
>>> Cc: Hannes Reinecke <hare@suse.com>
>>> Cc: Johannes Thumshirn <jthumshirn@suse.de>
>>> ---
>>>  block/Kconfig          |  1 +
>>>  block/blk-mq.c         | 69
>>> +++++++++++++++++++++++++++++++++++++++++++++-----
>>>  include/linux/blk-mq.h |  3 +++
>>>  include/linux/blkdev.h |  1 +
>>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/block/Kconfig b/block/Kconfig
>>> index 1d4d624..0562ef9 100644
>>> --- a/block/Kconfig
>>> +++ b/block/Kconfig
>>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>>         bool "Enable the block layer" if EXPERT
>>>         default y
>>>         select SBITMAP
>>> +       select SRCU
>>>         help
>>>          Provide block layer support for the kernel.
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 0cf21c2..4945437 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>>>  }
>>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>>
>>> +/**
>>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have
>>> finished
>>> + * @q: request queue.
>>> + *
>>> + * Note: this function does not prevent that the struct request end_io()
>>> + * callback function is invoked. Additionally, it is not prevented that
>>> + * new queue_rq() calls occur unless the queue has been stopped first.
>>> + */
>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>> +{
>>> +       struct blk_mq_hw_ctx *hctx;
>>> +       unsigned int i;
>>> +       bool rcu = false;
>>
>>
>> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
>> or per-queue flag to block comming .queue_rq(), seems I mentioned
>> that before:
>>
>>    https://www.spinics.net/lists/linux-rdma/msg41389.html
>
>
> Hello Ming,
>
> Thanks for having included an URL to an archived version of that discussion.
> What I remember about that discussion is that I proposed to use the existing
> flag BLK_MQ_S_STOPPED instead of introducing a
> new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. See
> also https://www.spinics.net/lists/linux-rdma/msg41430.html.

Yes, I am fine with either one, but the flag need to set in
blk_mq_quiesce_queue(), doesnt't it?


Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH rdma-core 5/7] libhns: Add verbs of qp support
From: oulijun @ 2016-10-27  2:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161026162321.GF24898-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 0:23, Jason Gunthorpe 写道:
> On Wed, Oct 26, 2016 at 09:04:06PM +0800, Lijun Ou wrote:
> 
>> +#ifndef min
>> +#define min(a, b) \
>> +	({ typeof (a) _a = (a); \
>> +	typeof (b) _b = (b); \
>> +	_a < _b ? _a : _b; })
>> +#endif
> 
> Nope, use the ccan header
> 
> Jason
> 
thanks your reivew. I will fix it in next patch

thanks
Lijun Ou
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [For help] rdma-roce build quesiton
From: oulijun @ 2016-10-27  2:15 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Linuxarm, linux-rdma
In-Reply-To: <20161026160902.GD24898-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 0:09, Jason Gunthorpe 写道:
> On Wed, Oct 26, 2016 at 03:23:38PM +0800, oulijun wrote:
> 
>> the build is fail and the print log as follows:
>>
>> error: size of unnamed array is negative
>> attr->cap.max_recv_wr = min(context->max_qp_wr, attr->cap.max_recv_wr);
> 
> It is telling you the types are not the same, and this is a source of bugs
> as C has some counter intuitive rules regarding type promotion.
> 
> 1) Audit max_qp_wr and max_recv_wr to see if they really should be
>    different types, if not fix context->max_qp_wr to match
> 
> 2) If they are legitimately different then use
> 
>   min_t(<desired type>, context->max_qp_wr, attr->cap.max_recv_wr);
> 
> Think carefully about what common type is used because both arguments
> will be casted, and the goal is to avoid a loss of precision or
> signdedness in the cast.
> 
> Jason
> 
> .
> 
thanks, I see it.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-27  2:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <CACVXFVPWL=izFmUPtpi-+RLrhNX0LFfw2m6eurgMC5GvfDWsxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/26/16 18:30, Ming Lei wrote:
> On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
>> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
>> have finished. This function does *not* wait until all outstanding
>> requests have finished (this means invocation of request.end_io()).
>> The algorithm used by blk_mq_quiesce_queue() is as follows:
>> * Hold either an RCU read lock or an SRCU read lock around
>>   .queue_rq() calls. The former is used if .queue_rq() does not
>>   block and the latter if .queue_rq() may block.
>> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>>   synchronize_rcu() to wait for .queue_rq() invocations that
>>   started before blk_mq_quiesce_queue() was called.
>> * The blk_mq_hctx_stopped() calls that control whether or not
>>   .queue_rq() will be called are called with the (S)RCU read lock
>>   held. This is necessary to avoid race conditions against
>>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>>   sequence from another thread.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
>> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
>> ---
>>  block/Kconfig          |  1 +
>>  block/blk-mq.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++-----
>>  include/linux/blk-mq.h |  3 +++
>>  include/linux/blkdev.h |  1 +
>>  4 files changed, 67 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 1d4d624..0562ef9 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -5,6 +5,7 @@ menuconfig BLOCK
>>         bool "Enable the block layer" if EXPERT
>>         default y
>>         select SBITMAP
>> +       select SRCU
>>         help
>>          Provide block layer support for the kernel.
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 0cf21c2..4945437 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>>  }
>>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>>
>> +/**
>> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
>> + * @q: request queue.
>> + *
>> + * Note: this function does not prevent that the struct request end_io()
>> + * callback function is invoked. Additionally, it is not prevented that
>> + * new queue_rq() calls occur unless the queue has been stopped first.
>> + */
>> +void blk_mq_quiesce_queue(struct request_queue *q)
>> +{
>> +       struct blk_mq_hw_ctx *hctx;
>> +       unsigned int i;
>> +       bool rcu = false;
>
> Before synchronizing SRCU/RCU, we have to set a per-hctx flag
> or per-queue flag to block comming .queue_rq(), seems I mentioned
> that before:
>
>    https://www.spinics.net/lists/linux-rdma/msg41389.html

Hello Ming,

Thanks for having included an URL to an archived version of that 
discussion. What I remember about that discussion is that I proposed to 
use the existing flag BLK_MQ_S_STOPPED instead of introducing a
new QUEUE_FLAG_QUIESCING flag and that you agreed with that proposal. 
See also https://www.spinics.net/lists/linux-rdma/msg41430.html.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 02/12] blk-mq: Introduce blk_mq_hctx_stopped()
From: Ming Lei @ 2016-10-27  1:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <0de50789-e3b7-0a07-73c1-4fb87b1f957e@sandisk.com>

On Thu, Oct 27, 2016 at 6:51 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
> a helper function that performs this test.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>

Reviewed-by: Ming Lei <tom.leiming@gmail.com>

> ---
>  block/blk-mq.c | 12 ++++++------
>  block/blk-mq.h |  5 +++++
>  2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b5dcafb..b52b3a6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -787,7 +787,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>         struct list_head *dptr;
>         int queued;
>
> -       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +       if (unlikely(blk_mq_hctx_stopped(hctx)))
>                 return;
>
>         WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> @@ -912,8 +912,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>
>  void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
>  {
> -       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state) ||
> -           !blk_mq_hw_queue_mapped(hctx)))
> +       if (unlikely(blk_mq_hctx_stopped(hctx) ||
> +                    !blk_mq_hw_queue_mapped(hctx)))
>                 return;
>
>         if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
> @@ -938,7 +938,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
>         queue_for_each_hw_ctx(q, hctx, i) {
>                 if ((!blk_mq_hctx_has_pending(hctx) &&
>                     list_empty_careful(&hctx->dispatch)) ||
> -                   test_bit(BLK_MQ_S_STOPPED, &hctx->state))
> +                   blk_mq_hctx_stopped(hctx))
>                         continue;
>
>                 blk_mq_run_hw_queue(hctx, async);
> @@ -988,7 +988,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async)
>         int i;
>
>         queue_for_each_hw_ctx(q, hctx, i) {
> -               if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
> +               if (!blk_mq_hctx_stopped(hctx))
>                         continue;
>
>                 clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
> @@ -1332,7 +1332,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_put_ctx(data.ctx);
>                 if (!old_rq)
>                         goto done;
> -               if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
> +               if (blk_mq_hctx_stopped(data.hctx) ||
>                     blk_mq_direct_issue_request(old_rq, &cookie) != 0)
>                         blk_mq_insert_request(old_rq, false, true, true);
>                 goto done;
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index e5d2524..ac772da 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -100,6 +100,11 @@ static inline void blk_mq_set_alloc_data(struct blk_mq_alloc_data *data,
>         data->hctx = hctx;
>  }
>
> +static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
> +{
> +       return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
> +}
> +
>  static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
>  {
>         return hctx->nr_ctx && hctx->tags;
> --
> 2.10.1
>



-- 
Ming Lei

^ permalink raw reply

* Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()
From: Ming Lei @ 2016-10-27  1:30 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <5143c240-39af-9fe2-d3e6-ed69f9c20531-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The algorithm used by blk_mq_quiesce_queue() is as follows:
> * Hold either an RCU read lock or an SRCU read lock around
>   .queue_rq() calls. The former is used if .queue_rq() does not
>   block and the latter if .queue_rq() may block.
> * blk_mq_quiesce_queue() calls synchronize_srcu() or
>   synchronize_rcu() to wait for .queue_rq() invocations that
>   started before blk_mq_quiesce_queue() was called.
> * The blk_mq_hctx_stopped() calls that control whether or not
>   .queue_rq() will be called are called with the (S)RCU read lock
>   held. This is necessary to avoid race conditions against
>   the "blk_mq_stop_hw_queues(q); blk_mq_quiesce_queue(q);"
>   sequence from another thread.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/Kconfig          |  1 +
>  block/blk-mq.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |  3 +++
>  include/linux/blkdev.h |  1 +
>  4 files changed, 67 insertions(+), 7 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 1d4d624..0562ef9 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -5,6 +5,7 @@ menuconfig BLOCK
>         bool "Enable the block layer" if EXPERT
>         default y
>         select SBITMAP
> +       select SRCU
>         help
>          Provide block layer support for the kernel.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0cf21c2..4945437 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -115,6 +115,31 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
> + * @q: request queue.
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked. Additionally, it is not prevented that
> + * new queue_rq() calls occur unless the queue has been stopped first.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       bool rcu = false;

Before synchronizing SRCU/RCU, we have to set a per-hctx flag
or per-queue flag to block comming .queue_rq(), seems I mentioned
that before:

   https://www.spinics.net/lists/linux-rdma/msg41389.html

> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                       synchronize_srcu(&hctx->queue_rq_srcu);
> +               else
> +                       rcu = true;
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
>  void blk_mq_wake_waiters(struct request_queue *q)
>  {
>         struct blk_mq_hw_ctx *hctx;
> @@ -778,7 +803,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
>   * of IO. In particular, we'd like FIFO behaviour on handling existing
>   * items on the hctx->dispatch list. Ignore that for now.
>   */
> -static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
>  {
>         struct request_queue *q = hctx->queue;
>         struct request *rq;
> @@ -790,9 +815,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>         if (unlikely(blk_mq_hctx_stopped(hctx)))
>                 return;
>
> -       WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> -               cpu_online(hctx->next_cpu));
> -
>         hctx->run++;
>
>         /*
> @@ -883,6 +905,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>         }
>  }
>
> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       int srcu_idx;
> +
> +       WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> +               cpu_online(hctx->next_cpu));
> +
> +       if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
> +       } else {
> +               srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
> +               blk_mq_process_rq_list(hctx);
> +               srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
> +       }
> +}
> +
>  /*
>   * It'd be great if the workqueue API had a way to pass
>   * in a mask and had some smarts for more clever placement.
> @@ -1293,7 +1333,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>         const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
>         struct blk_map_ctx data;
>         struct request *rq;
> -       unsigned int request_count = 0;
> +       unsigned int request_count = 0, srcu_idx;
>         struct blk_plug *plug;
>         struct request *same_queue_rq = NULL;
>         blk_qc_t cookie;
> @@ -1336,7 +1376,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_bio_to_request(rq, bio);
>
>                 /*
> -                * We do limited pluging. If the bio can be merged, do that.
> +                * We do limited plugging. If the bio can be merged, do that.
>                  * Otherwise the existing request in the plug list will be
>                  * issued. So the plug list will have one request at most
>                  */
> @@ -1356,7 +1396,16 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_put_ctx(data.ctx);
>                 if (!old_rq)
>                         goto done;
> -               blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
> +
> +               if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
> +                       rcu_read_lock();
> +                       blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
> +                       rcu_read_unlock();
> +               } else {
> +                       srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
> +                       blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
> +                       srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
> +               }
>                 goto done;
>         }
>
> @@ -1635,6 +1684,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
>         if (set->ops->exit_hctx)
>                 set->ops->exit_hctx(hctx, hctx_idx);
>
> +       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +               cleanup_srcu_struct(&hctx->queue_rq_srcu);
> +
>         blk_mq_remove_cpuhp(hctx);
>         blk_free_flush_queue(hctx->fq);
>         sbitmap_free(&hctx->ctx_map);
> @@ -1715,6 +1767,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
>                                    flush_start_tag + hctx_idx, node))
>                 goto free_fq;
>
> +       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +               init_srcu_struct(&hctx->queue_rq_srcu);
> +
>         return 0;
>
>   free_fq:
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index aa93000..61be48b 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -3,6 +3,7 @@
>
>  #include <linux/blkdev.h>
>  #include <linux/sbitmap.h>
> +#include <linux/srcu.h>
>
>  struct blk_mq_tags;
>  struct blk_flush_queue;
> @@ -35,6 +36,8 @@ struct blk_mq_hw_ctx {
>
>         struct blk_mq_tags      *tags;
>
> +       struct srcu_struct      queue_rq_srcu;
> +
>         unsigned long           queued;
>         unsigned long           run;
>  #define BLK_MQ_MAX_DISPATCH_ORDER      7
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..8259d87 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q);
>  extern void __blk_run_queue_uncond(struct request_queue *q);
>  extern void blk_run_queue(struct request_queue *);
>  extern void blk_run_queue_async(struct request_queue *q);
> +extern void blk_mq_quiesce_queue(struct request_queue *q);
>  extern int blk_rq_map_user(struct request_queue *, struct request *,
>                            struct rq_map_data *, void __user *, unsigned long,
>                            gfp_t);
> --
> 2.10.1
>



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 0/12] Fix race conditions related to stopping block layer queues
From: Jens Axboe @ 2016-10-26 23:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lin,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126@sandisk.com>

On 10/26/2016 04:49 PM, Bart Van Assche wrote:
> Hello Jens,
>
> Multiple block drivers need the functionality to stop a request queue
> and to wait until all ongoing request_fn() / queue_rq() calls have
> finished without waiting until all outstanding requests have finished.
> Hence this patch series that introduces the blk_mq_quiesce_queue()
> function. The dm-mq, SRP and NVMe patches in this patch series are three
> examples of where these functions are useful. These patches have been
> tested on top of kernel v4.9-rc1. The following tests have been run to
> verify this patch series:
> - Mike's mptest suite that stress-tests dm-multipath.
> - My own srp-test suite that stress-tests SRP on top of dm-multipath.
> - fio on top of the NVMeOF host driver that was connected to the NVMeOF
>   target driver on the same host.
> - Laurence verified the previous version (v3) of this patch series by
>   running it through the Red Hat SRP and NVMe test suites.

This looks pretty good, I'll run some testing on it tomorrow and do a
full review, hopefully we can get it applied sooner rather than later
for the next series.

-- 
Jens Axboe


^ permalink raw reply

* Trouble enabling iSER for ConnectX-4 Lx
From: Robert LeBlanc @ 2016-10-26 23:13 UTC (permalink / raw)
  To: linux-rdma

We have some ConnectX-4 Lx cards that I'm trying to test RoCE and iSER
on. I downloaded and installed the Mellanox drivers with VMA [0]. I
was able to run the ib_read_bw tests over the adapters after
installing the infiniband-diags and perftest RPMs. When I went to
configure LIO for iSER, I'm getting the message "Cannot change iser"
on step 6 in the procedure here [1] which I've done many times with
Infiniband without issues. I navigated to
/sys/kernel/config/target/iscsi/{iqn}/tpgt_1/np/{portal_ip:port} and
sure enough, I can't write '1' into iser. The kernel is not giving any
messages and the ib_isert module is loaded. This is on 4.4.27,
Mellanox driver 3.4-1.0.0.3 built with `./install --add-kernel-support
--skip-repo --tmpdir /root/junk --vma`

# mstflint -d 4:00.0 q
Image type:          FS3
FW Version:          14.16.1020
FW Release Date:     20.6.2016
Rom Info:            type=UEFI version=14.10.16
                    type=PXE version=3.4.812 devid=4117
Description:         UID                GuidsNumber
Base GUID:           0cc47a000089f706        4
Base MAC:            00000cc47a89f706        4
Image VSD:
Device VSD:
PSID:                SM_2001000001034

# ibstatus
Infiniband device 'mlx5_0' port 1 status:
       default gid:     fe80:0000:0000:0000:0ec4:7aff:fe89:f706
       base lid:        0x0
       sm lid:          0x0
       state:           4: ACTIVE
       phys state:      5: LinkUp
       rate:            25 Gb/sec (1X EDR)
       link_layer:      Ethernet

Infiniband device 'mlx5_1' port 1 status:
       default gid:     fe80:0000:0000:0000:0ec4:7aff:fe89:f707
       base lid:        0x0
       sm lid:          0x0
       state:           4: ACTIVE
       phys state:      5: LinkUp
       rate:            25 Gb/sec (1X EDR)
       link_layer:      Ethernet

Any ideas of what I'm doing wrong here? I don't have any experience
with RoCE, so I'm sure I'm doing something wrong. And the manual has
nothing about configuring RoCE other than enabling --vma when
installing the drivers [2].

Thanks,
Robert LeBlanc

[0] http://www.mellanox.com/page/products_dyn?product_family=27
[1] https://community.mellanox.com/docs/DOC-1472
[2] http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v3_40.pdf
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 12/12] nvme: Fix a race condition related to stopping queues
From: Bart Van Assche @ 2016-10-26 22:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Keith Busch <keith.busch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/nvme/host/core.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b662416..d6ab9a0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
-	blk_mq_requeue_request(req, false);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_mq_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2079,8 +2073,11 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_cancel_requeue_work(ns->queue);
-		blk_mq_stop_hw_queues(ns->queue);
+		struct request_queue *q = ns->queue;
+
+		blk_mq_cancel_requeue_work(q);
+		blk_mq_stop_hw_queues(q);
+		blk_mq_quiesce_queue(q);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-26 22:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. Change
blk_queue_stopped() tests into blk_mq_queue_stopped().

This patch fixes a race condition: using queue_flag_clear_unlocked()
is not safe if any other function that manipulates the queue flags
can be called concurrently, e.g. blk_cleanup_queue().

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Keith Busch <keith.busch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/nvme/host/core.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7bb73ba..b662416 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -205,7 +205,7 @@ void nvme_requeue_req(struct request *req)
 
 	blk_mq_requeue_request(req, false);
 	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_queue_stopped(req->q))
+	if (!blk_mq_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
 	spin_unlock_irqrestore(req->q->queue_lock, flags);
 }
@@ -2079,10 +2079,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		spin_lock_irq(ns->queue->queue_lock);
-		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
-		spin_unlock_irq(ns->queue->queue_lock);
-
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
@@ -2096,7 +2092,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 10/12] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Bart Van Assche @ 2016-10-26 22:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126@sandisk.com>

Ensure that if scsi-mq is enabled that scsi_wait_for_queuecommand()
waits until ongoing shost->hostt->queuecommand() calls have finished.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_lib.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c1561e7..9b8b19e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2724,8 +2724,6 @@ EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 /**
  * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
  * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
  */
 static int scsi_request_fn_active(struct Scsi_Host *shost)
 {
@@ -2744,12 +2742,20 @@ static int scsi_request_fn_active(struct Scsi_Host *shost)
 	return request_fn_active;
 }
 
+static void scsi_mq_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, shost)
+		blk_mq_quiesce_queue(sdev->request_queue);
+}
+
 /**
  * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
  * @shost: SCSI host pointer.
  *
  * Wait until the ongoing shost->hostt->queuecommand() calls that are
- * invoked from scsi_request_fn() have finished.
+ * invoked from either scsi_request_fn() or scsi_queue_rq() have finished.
  *
  * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
  * scsi_internal_device_block() has blocked a SCSI device and remove and also
@@ -2757,8 +2763,12 @@ static int scsi_request_fn_active(struct Scsi_Host *shost)
  */
 void scsi_wait_for_queuecommand(struct Scsi_Host *shost)
 {
-	while (scsi_request_fn_active(shost))
-		msleep(20);
+	if (shost->use_blk_mq) {
+		scsi_mq_wait_for_queuecommand(shost);
+	} else {
+		while (scsi_request_fn_active(shost))
+			msleep(20);
+	}
 }
 EXPORT_SYMBOL(scsi_wait_for_queuecommand);
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH 09/12] SRP transport: Move queuecommand() wait code to SCSI core
From: Bart Van Assche @ 2016-10-26 22:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126@sandisk.com>

Additionally, rename srp_wait_for_queuecommand() into
scsi_wait_for_queuecommand() and add a comment about the
queuecommand() call from scsi_send_eh_cmnd().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_lib.c           | 41 +++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_transport_srp.c | 35 ++-------------------------------
 include/scsi/scsi_host.h          |  1 +
 3 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ab5b06f..c1561e7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2722,6 +2722,47 @@ void sdev_evt_send_simple(struct scsi_device *sdev,
 EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 
 /**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
+ *
+ * To do: add support for scsi-mq in this function.
+ */
+static int scsi_request_fn_active(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+	int request_fn_active = 0;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		spin_lock_irq(q->queue_lock);
+		request_fn_active += q->request_fn_active;
+		spin_unlock_irq(q->queue_lock);
+	}
+
+	return request_fn_active;
+}
+
+/**
+ * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
+ * @shost: SCSI host pointer.
+ *
+ * Wait until the ongoing shost->hostt->queuecommand() calls that are
+ * invoked from scsi_request_fn() have finished.
+ *
+ * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
+ * scsi_internal_device_block() has blocked a SCSI device and remove and also
+ * remove the rport mutex lock and unlock calls from srp_queuecommand().
+ */
+void scsi_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+	while (scsi_request_fn_active(shost))
+		msleep(20);
+}
+EXPORT_SYMBOL(scsi_wait_for_queuecommand);
+
+/**
  *	scsi_device_quiesce - Block user issued commands.
  *	@sdev:	scsi device to quiesce.
  *
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..8b190dc 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,7 +24,6 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
-#include <linux/delay.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -402,36 +401,6 @@ static void srp_reconnect_work(struct work_struct *work)
 	}
 }
 
-/**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
-	struct scsi_device *sdev;
-	struct request_queue *q;
-	int request_fn_active = 0;
-
-	shost_for_each_device(sdev, shost) {
-		q = sdev->request_queue;
-
-		spin_lock_irq(q->queue_lock);
-		request_fn_active += q->request_fn_active;
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	return request_fn_active;
-}
-
-/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
-static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
-{
-	while (scsi_request_fn_active(shost))
-		msleep(20);
-}
-
 static void __rport_fail_io_fast(struct srp_rport *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
@@ -446,7 +415,7 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
 	/* Involve the LLD if possible to terminate all I/O on the rport. */
 	i = to_srp_internal(shost->transportt);
 	if (i->f->terminate_rport_io) {
-		srp_wait_for_queuecommand(shost);
+		scsi_wait_for_queuecommand(shost);
 		i->f->terminate_rport_io(rport);
 	}
 }
@@ -576,7 +545,7 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	if (res)
 		goto out;
 	scsi_target_block(&shost->shost_gendev);
-	srp_wait_for_queuecommand(shost);
+	scsi_wait_for_queuecommand(shost);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 7e4cd53..0e2c361 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -789,6 +789,7 @@ extern void scsi_remove_host(struct Scsi_Host *);
 extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *);
 extern void scsi_host_put(struct Scsi_Host *t);
 extern struct Scsi_Host *scsi_host_lookup(unsigned short);
+extern void scsi_wait_for_queuecommand(struct Scsi_Host *shost);
 extern const char *scsi_host_state_name(enum scsi_host_state);
 extern void scsi_cmd_get_serial(struct Scsi_Host *, struct scsi_cmnd *);
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH 08/12] dm: Fix a race condition related to stopping and starting queues
From: Bart Van Assche @ 2016-10-26 22:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126@sandisk.com>

Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-rq.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index d47a504..107ed19 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -105,6 +105,8 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
+	/* Wait until dm_mq_queue_rq() has finished. */
+	blk_mq_quiesce_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -887,17 +889,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH 07/12] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-26 22:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126@sandisk.com>

Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED
in the dm start and stop queue functions, only manipulate the latter
flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-rq.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index fbd37b4..d47a504 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -75,12 +75,6 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	blk_mq_start_stopped_hw_queues(q, true);
 	blk_mq_kick_requeue_list(q);
 }
@@ -105,16 +99,8 @@ static void dm_old_stop_queue(struct request_queue *q)
 
 static void dm_mq_stop_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (blk_queue_stopped(q)) {
-		spin_unlock_irqrestore(q->queue_lock, flags);
+	if (blk_mq_queue_stopped(q))
 		return;
-	}
-
-	queue_flag_set(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
@@ -341,7 +327,7 @@ static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long mse
 	unsigned long flags;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_queue_stopped(q))
+	if (!blk_mq_queue_stopped(q))
 		blk_mq_delay_kick_requeue_list(q, msecs);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
-- 
2.10.1


^ permalink raw reply related

* [PATCH 06/12] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
From: Bart Van Assche @ 2016-10-26 22:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <b22edafc-725f-0419-d074-34d35d57d126-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
---
 block/blk-flush.c            |  5 +----
 block/blk-mq.c               | 10 +++++++---
 drivers/block/xen-blkfront.c |  2 +-
 drivers/md/dm-rq.c           |  2 +-
 drivers/nvme/host/core.c     |  2 +-
 drivers/scsi/scsi_lib.c      |  4 +---
 include/linux/blk-mq.h       |  5 +++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 6a14b68..a834aed 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -134,10 +134,7 @@ static void blk_flush_restore_request(struct request *rq)
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
 	if (rq->q->mq_ops) {
-		struct request_queue *q = rq->q;
-
-		blk_mq_add_to_requeue_list(rq, add_front);
-		blk_mq_kick_requeue_list(q);
+		blk_mq_add_to_requeue_list(rq, add_front, true);
 		return false;
 	} else {
 		if (add_front)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4945437..688bcc3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -492,12 +492,12 @@ static void __blk_mq_requeue_request(struct request *rq)
 	}
 }
 
-void blk_mq_requeue_request(struct request *rq)
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	__blk_mq_requeue_request(rq);
 
 	BUG_ON(blk_queued_rq(rq));
-	blk_mq_add_to_requeue_list(rq, true);
+	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
@@ -535,7 +535,8 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_start_hw_queues(q);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
 	unsigned long flags;
@@ -554,6 +555,9 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 		list_add_tail(&rq->queuelist, &q->requeue_list);
 	}
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
+
+	if (kick_requeue_list)
+		blk_mq_kick_requeue_list(q);
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..1ca702d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2043,7 +2043,7 @@ static int blkif_recover(struct blkfront_info *info)
 		/* Requeue pending requests (flush or discard) */
 		list_del_init(&req->queuelist);
 		BUG_ON(req->nr_phys_segments > segs);
-		blk_mq_requeue_request(req);
+		blk_mq_requeue_request(req, false);
 	}
 	blk_mq_kick_requeue_list(info->rq);
 
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index dc75bea..fbd37b4 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -354,7 +354,7 @@ EXPORT_SYMBOL(dm_mq_kick_requeue_list);
 
 static void dm_mq_delay_requeue_request(struct request *rq, unsigned long msecs)
 {
-	blk_mq_requeue_request(rq);
+	blk_mq_requeue_request(rq, false);
 	__dm_mq_kick_requeue_list(rq->q, msecs);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 79e679d..7bb73ba 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -203,7 +203,7 @@ void nvme_requeue_req(struct request *req)
 {
 	unsigned long flags;
 
-	blk_mq_requeue_request(req);
+	blk_mq_requeue_request(req, false);
 	spin_lock_irqsave(req->q->queue_lock, flags);
 	if (!blk_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2cca9cf..ab5b06f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -86,10 +86,8 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
 {
 	struct scsi_device *sdev = cmd->device;
-	struct request_queue *q = cmd->request->q;
 
-	blk_mq_requeue_request(cmd->request);
-	blk_mq_kick_requeue_list(q);
+	blk_mq_requeue_request(cmd->request, true);
 	put_device(&sdev->sdev_gendev);
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 61be48b..76f6319 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -218,8 +218,9 @@ void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, int error);
 void __blk_mq_end_request(struct request *rq, int error);
 
-void blk_mq_requeue_request(struct request *rq);
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list);
 void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox