Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jason Gunthorpe @ 2016-10-28 20:57 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161028172503.GA28451-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Fri, Oct 28, 2016 at 11:25:03AM -0600, Jason Gunthorpe wrote:

> > Nah, I think things look good, and we can always keep tweaking as needed,
> > this is a solid update to work on top of.
> 
> Okay, I'll patch in the changes from this discussion and send the
> pull.

I added two commits:

https://github.com/linux-rdma/rdma-core/pull/30

This bit seems important:

--- a/redhat/rdma-core.spec
+++ b/redhat/rdma-core.spec
@@ -267,7 +267,7 @@ install -D -m0644 redhat/rdma.fixup-mtrr.awk %{buildroot}%{_libexecdir}/rdma-fix
 install -D -m0755 redhat/rdma.mlx4-setup.sh %{buildroot}%{_libexecdir}/mlx4-setup.sh
 
 # ibacm
-%{buildroot}/%{_bindir}/ib_acme -D . -O
+bin/ib_acme -D . -O
 install -D -m0644 ibacm_opts.cfg %{buildroot}%{_sysconfdir}/rdma/
 install -D -m0644 redhat/ibacm.service %{buildroot}%{_unitdir}/
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Keith Busch @ 2016-10-28 21:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <655c62ca-5da8-3c8f-9e56-016b7d518267@sandisk.com>

On Fri, Oct 28, 2016 at 11:51:35AM -0700, Bart Van Assche wrote:
> I think it is wrong that kicking the requeue list starts stopped queues
> because this makes it impossible to stop request processing without setting
> an additional flag next to BLK_MQ_S_STOPPED. Can you have a look at the
> attached two patches? These patches survive my dm-multipath and SCSI tests.

Hi Bart,

These look good to me, and succesful on my NVMe tests.

Thanks,
Keith


> From e93799f726485a3eeee98837c992c5c0068d7180 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Fri, 28 Oct 2016 10:48:58 -0700
> Subject: [PATCH 1/2] block: Avoid that requeueing starts stopped queues
> 
> Since blk_mq_requeue_work() starts stopped queues and since
> execution of this function can be scheduled after a queue has
> been stopped it is not possible to stop queues without using
> an additional state variable to track whether or not the queue
> has been stopped. Hence modify blk_mq_requeue_work() such that it
> does not start stopped queues. My conclusion after a review of
> the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
> callers is as follows:
> * In the dm driver starting and stopping queues should only happen
>   if __dm_suspend() or __dm_resume() is called and not if the
>   requeue list is processed.
> * In the SCSI core queue stopping and starting should only be
>   performed by the scsi_internal_device_block() and
>   scsi_internal_device_unblock() functions but not by any other
>   function.
> * In the NVMe core only the functions that call
>   blk_mq_start_stopped_hw_queues() explicitly should start stopped
>   queues.
> * A blk_mq_start_stopped_hwqueues() call must be added in the
>   xen-blkfront driver in its blkif_recover() function.
> ---
>  block/blk-mq.c               | 6 +-----
>  drivers/block/xen-blkfront.c | 1 +
>  drivers/md/dm-rq.c           | 7 +------
>  drivers/scsi/scsi_lib.c      | 1 -
>  4 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a49b8af..24dfd0d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -528,11 +528,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  		blk_mq_insert_request(rq, false, false, false);
>  	}
>  
> -	/*
> -	 * Use the start variant of queue running here, so that running
> -	 * the requeue work will kick stopped queues.
> -	 */
> -	blk_mq_start_hw_queues(q);
> +	blk_mq_run_hw_queues(q, false);
>  }
>  
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 1ca702d..a3e1727 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
>  		BUG_ON(req->nr_phys_segments > segs);
>  		blk_mq_requeue_request(req, false);
>  	}
> +	blk_mq_start_stopped_hwqueues(info->rq);
>  	blk_mq_kick_requeue_list(info->rq);
>  
>  	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 107ed19..b951ae83 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -326,12 +326,7 @@ static void dm_old_requeue_request(struct request *rq)
>  
>  static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
>  {
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(q->queue_lock, flags);
> -	if (!blk_mq_queue_stopped(q))
> -		blk_mq_delay_kick_requeue_list(q, msecs);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
> +	blk_mq_delay_kick_requeue_list(q, msecs);
>  }
>  
>  void dm_mq_kick_requeue_list(struct mapped_device *md)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 4cddaff..94f54ac 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1939,7 +1939,6 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  out:
>  	switch (ret) {
>  	case BLK_MQ_RQ_QUEUE_BUSY:
> -		blk_mq_stop_hw_queue(hctx);
>  		if (atomic_read(&sdev->device_busy) == 0 &&
>  		    !scsi_device_blocked(sdev))
>  			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
> -- 
> 2.10.1
> 

> From 47eec3bdcf4b673e3ab606543cb3acdf7f4de593 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Fri, 28 Oct 2016 10:50:04 -0700
> Subject: [PATCH 2/2] blk-mq: Remove blk_mq_cancel_requeue_work()
> 
> Since blk_mq_requeue_work() no longer restarts stopped queues
> canceling requeue work is no longer needed to prevent that a
> stopped queue would be restarted. Hence remove this function.
> ---
>  block/blk-mq.c           | 6 ------
>  drivers/md/dm-rq.c       | 2 --
>  drivers/nvme/host/core.c | 1 -
>  include/linux/blk-mq.h   | 1 -
>  4 files changed, 10 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 24dfd0d..1aa79e5 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -557,12 +557,6 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  }
>  EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
>  
> -void blk_mq_cancel_requeue_work(struct request_queue *q)
> -{
> -	cancel_delayed_work_sync(&q->requeue_work);
> -}
> -EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
> -
>  void blk_mq_kick_requeue_list(struct request_queue *q)
>  {
>  	kblockd_schedule_delayed_work(&q->requeue_work, 0);
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index b951ae83..7f426ab 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -102,8 +102,6 @@ static void dm_mq_stop_queue(struct request_queue *q)
>  	if (blk_mq_queue_stopped(q))
>  		return;
>  
> -	/* Avoid that requeuing could restart the queue. */
> -	blk_mq_cancel_requeue_work(q);
>  	blk_mq_stop_hw_queues(q);
>  	/* Wait until dm_mq_queue_rq() has finished. */
>  	blk_mq_quiesce_queue(q);
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d6ab9a0..a67e815 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2075,7 +2075,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		struct request_queue *q = ns->queue;
>  
> -		blk_mq_cancel_requeue_work(q);
>  		blk_mq_stop_hw_queues(q);
>  		blk_mq_quiesce_queue(q);
>  	}
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 76f6319..35a0af5 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -221,7 +221,6 @@ void __blk_mq_end_request(struct request *rq, int error);
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  				bool kick_requeue_list);
> -void blk_mq_cancel_requeue_work(struct request_queue *q);
>  void blk_mq_kick_requeue_list(struct request_queue *q);
>  void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
>  void blk_mq_abort_requeue_list(struct request_queue *q);
> -- 
> 2.10.1

^ permalink raw reply

* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jarod Wilson @ 2016-10-28 21:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161028172503.GA28451-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Fri, Oct 28, 2016 at 11:25:03AM -0600, Jason Gunthorpe wrote:
> On Fri, Oct 28, 2016 at 01:11:47PM -0400, Jarod Wilson wrote:
> > On Thu, Oct 27, 2016 at 03:10:59PM -0600, Jason Gunthorpe wrote:
> > > On Thu, Oct 20, 2016 at 11:33:57AM -0400, Jarod Wilson wrote:
> > > > @@ -7,10 +7,11 @@ Summary: RDMA core userspace libraries and daemons
> > > >  #  providers/ipathverbs/ Dual licensed using a BSD license with an extra patent clause
> > > >  #  providers/rxe/ Incorporates code from ipathverbs and contains the patent clause
> > > >  #  providers/hfi1verbs Uses the 3 Clause BSD license
> > > > -License: (GPLv2 or BSD) and (GPLv2 or PathScale-BSD)
> > > > +License: GPLv2 or BSD
> > > 
> > > Is this Ok? The Fedora guidelines I read suggested the PathScale
> > > license would need to be assigned a short tag, and I'd be surprised if
> > > 'BSD' is the right tag due to the patent stuff..
> > 
> > Our standalone libipathverbs has just "GPLv2 or BSD",
> 
> I suspect that was a mistake, the difference in the pathscale license
> is subtle and several other people assumed it was the cannonical 'BSD'
> text...

I sent something off to the Fedora legal mailing list, we'll see what they
have to say. I didn't find anything interesting in the initial package
review, nobody questioned the license there, but that could easily have
been an oversight.

-- 
Jarod Wilson
jarod-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] build: Fix build script to use correct cmake cmd
From: Jason Gunthorpe @ 2016-10-28 22:09 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Dennis Dalessandro,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161025101200.GP25013-2ukJVAZIZ/Y@public.gmane.org>

On Tue, Oct 25, 2016 at 01:12:00PM +0300, Leon Romanovsky wrote:

> > Maybe this is a good time to ask if anyone is interested in the docker
> > stuff I have - eg should I make it pushable? It is easy to use, but
> > you need to have docker installed.
> 
> I would be happy to get it and be more confident in my local tests.

Okay, let me look at it some more, I think it could be streamlined a
bit.

> > The docker script is able to run almost-travis locally, as well as do
> > clean package builds for all distros.
> 
> It looks like it goes beyond rdma-core definition. Do we want to put it
> separately or anyway integrate into main library?

It is a 500 line script, I'd keep it internal. The latest version has
learned to use different spec files depend on the distro build too, so
there is a bunch of subtle integration.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
From: Hefty, Sean @ 2016-10-28 22:53 UTC (permalink / raw)
  To: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1477579398-6875-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

> The current code creates an IDR per type. Since types are currently
> common for all vendors and known in advance, this was good enough.
> However, the proposed ioctl based infrastructure allows each vendor
> to declare only some of the common types and declare its own specific
> types.
> 
> Thus, we decided to implement IDR to be per device and refactor it to
> use a new file.

I think this needs to be more abstract.  I would consider introducing the concept of an 'ioctl provider', with the idr per ioctl provider.  You could then make each ib_device an ioctl provider.  (Just embed the structure).  I believe this will be necessary to support the rdma_cm, ib_cm, as well as devices that export different sets of ioctls, where an ib_device isn't necessarily available.

Essentially, I would treat plugging into the uABI independent from plugging into the kernel verbs API.  Otherwise, I think we'll end up with multiple ioctl 'frameworks'.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [Bug 185551] New: rxe_rdma: RTNL: assertion failed at net/core/ethtool.c (550)
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-10-28 23:46 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

https://bugzilla.kernel.org/show_bug.cgi?id=185551

            Bug ID: 185551
           Summary: rxe_rdma: RTNL: assertion failed at net/core/ethtool.c
                    (550)
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.8.5-1.el7.elrepo.x86_64
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Infiniband/RDMA
          Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
          Reporter: v.badalyan-mR0gyJWGnJgvJsYlp49lxw@public.gmane.org
        Regression: No

#rxe_cfg start
#rxe_cfg add eno1.4004
#rxe_cfg 

[root@intel2-i5 ~]# rxe_cfg
  Name       Link  Driver  Speed  NMTU  IPv4_addr  RDEV  RMTU
  DevNet     yes   bridge
  eno1       yes   e1000e
  eno1.4003  yes   802.1Q
  eno1.4004  yes   802.1Q                          rxe0  4096  (5)
  enp2s1     yes   r8169
  ovirtmgmt  yes   bridge


[  174.931213] rxe: loaded
[  174.950852] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.950882] CPU: 2 PID: 3158 Comm: sh Not tainted 4.8.5-1.el7.elrepo.x86_64
#1
[  174.950883] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.950884]  0000000000000286 000000007b249d8e ffff8802fbfe3a98
ffffffff81353eef
[  174.950886]  ffff88030c174000 ffff8802fbfe3b18 ffff8802fbfe3af8
ffffffff816109f7
[  174.950888]  ffffea000befc600 00000000024080c0 0000000000000001
ffffffff82164820
[  174.950889] Call Trace:
[  174.950894]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.950897]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.950900]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.950904]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.950906]  [<ffffffff8136023e>] ? vsnprintf+0x34e/0x4d0
[  174.950908]  [<ffffffffa09226bd>] rxe_port_immutable+0x2d/0x70 [rdma_rxe]
[  174.950919]  [<ffffffffa088426c>] ib_register_device+0x27c/0x500 [ib_core]
[  174.950921]  [<ffffffff811f4cee>] ? __kmalloc+0x1ee/0x200
[  174.950923]  [<ffffffff8137307d>] ? list_del+0xd/0x30
[  174.950925]  [<ffffffffa09236a1>] rxe_register_device+0x2a1/0x300 [rdma_rxe]
[  174.950927]  [<ffffffffa091b674>] rxe_add+0x544/0x5c0 [rdma_rxe]
[  174.950929]  [<ffffffffa09281a3>] rxe_net_add+0x43/0xa0 [rdma_rxe]
[  174.950931]  [<ffffffffa0928685>] rxe_param_set_add+0x65/0x148 [rdma_rxe]
[  174.950934]  [<ffffffff810a0678>] param_attr_store+0x68/0xd0
[  174.950935]  [<ffffffff8109fa0d>] module_attr_store+0x1d/0x30
[  174.950937]  [<ffffffff8129cbba>] sysfs_kf_write+0x3a/0x50
[  174.950937]  [<ffffffff8129c6e0>] kernfs_fop_write+0x120/0x1b0
[  174.950939]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  174.950941]  [<ffffffff810cebdf>] ? percpu_down_read+0x1f/0x50
[  174.950942]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  174.950944]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  174.950945]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  174.950947]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.950948]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  174.950967] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.950985] CPU: 2 PID: 3158 Comm: sh Not tainted 4.8.5-1.el7.elrepo.x86_64
#1
[  174.950985] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.950986]  0000000000000286 000000007b249d8e ffff8802fbfe39d8
ffffffff81353eef
[  174.950987]  ffff88030c174000 ffff8802fbfe3a58 ffff8802fbfe3a38
ffffffff816109f7
[  174.950988]  ffff8802fbfe3a58 ffff8802fbfe3a48 ffff8802fbfe3a30
ffff8802fbfe3a98
[  174.950990] Call Trace:
[  174.950991]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.950993]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.950998]  [<ffffffffa08885d0>] ? _add_netdev_ips+0x2f0/0x380 [ib_core]
[  174.950999]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.951001]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.951003]  [<ffffffff8160ef61>] ? netdev_run_todo+0x61/0x320
[  174.951007]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.951008]  [<ffffffff811f462b>] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  174.951012]  [<ffffffffa08863e7>] ib_cache_update+0xc7/0x350 [ib_core]
[  174.951016]  [<ffffffffa0884677>] ? ib_enum_roce_netdev+0xe7/0x100 [ib_core]
[  174.951020]  [<ffffffffa08870a1>] ib_cache_setup_one+0x271/0x400 [ib_core]
[  174.951024]  [<ffffffffa08842ce>] ib_register_device+0x2de/0x500 [ib_core]
[  174.951025]  [<ffffffff811f4cee>] ? __kmalloc+0x1ee/0x200
[  174.951026]  [<ffffffff8137307d>] ? list_del+0xd/0x30
[  174.951029]  [<ffffffffa09236a1>] rxe_register_device+0x2a1/0x300 [rdma_rxe]
[  174.951030]  [<ffffffffa091b674>] rxe_add+0x544/0x5c0 [rdma_rxe]
[  174.951032]  [<ffffffffa09281a3>] rxe_net_add+0x43/0xa0 [rdma_rxe]
[  174.951034]  [<ffffffffa0928685>] rxe_param_set_add+0x65/0x148 [rdma_rxe]
[  174.951036]  [<ffffffff810a0678>] param_attr_store+0x68/0xd0
[  174.951037]  [<ffffffff8109fa0d>] module_attr_store+0x1d/0x30
[  174.951038]  [<ffffffff8129cbba>] sysfs_kf_write+0x3a/0x50
[  174.951039]  [<ffffffff8129c6e0>] kernfs_fop_write+0x120/0x1b0
[  174.951041]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  174.951042]  [<ffffffff810cebdf>] ? percpu_down_read+0x1f/0x50
[  174.951043]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  174.951044]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  174.951045]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  174.951047]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.951048]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  174.951084] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.951102] CPU: 2 PID: 3158 Comm: sh Not tainted 4.8.5-1.el7.elrepo.x86_64
#1
[  174.951102] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.951103]  0000000000000286 000000007b249d8e ffff8802fbfe39d8
ffffffff81353eef
[  174.951104]  ffff88030c174000 ffff8802fbfe3a58 ffff8802fbfe3a38
ffffffff816109f7
[  174.951105]  ffffffff81299ec6 0000000000000000 ffff8802fbfe3a10
ffffffff81299f20
[  174.951107] Call Trace:
[  174.951109]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.951110]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.951112]  [<ffffffff81299ec6>] ? kernfs_leftmost_descendant+0x36/0x50
[  174.951113]  [<ffffffff81299f20>] ? kernfs_next_descendant_post+0x40/0x50
[  174.951115]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.951117]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.951118]  [<ffffffff8129b191>] ? kernfs_create_dir_ns+0x51/0x80
[  174.951119]  [<ffffffff8129d5a0>] ? sysfs_create_dir_ns+0x40/0x90
[  174.951123]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.951125]  [<ffffffff811afb92>] ? kfree_const+0x22/0x30
[  174.951129]  [<ffffffffa0882d4e>] add_port+0x3e/0x480 [ib_core]
[  174.951133]  [<ffffffffa088326c>] ib_device_register_sysfs+0xdc/0x140
[ib_core]
[  174.951136]  [<ffffffffa0884360>] ib_register_device+0x370/0x500 [ib_core]
[  174.951137]  [<ffffffff811f4cee>] ? __kmalloc+0x1ee/0x200
[  174.951139]  [<ffffffff8137307d>] ? list_del+0xd/0x30
[  174.951141]  [<ffffffffa09236a1>] rxe_register_device+0x2a1/0x300 [rdma_rxe]
[  174.951142]  [<ffffffffa091b674>] rxe_add+0x544/0x5c0 [rdma_rxe]
[  174.951144]  [<ffffffffa09281a3>] rxe_net_add+0x43/0xa0 [rdma_rxe]
[  174.951146]  [<ffffffffa0928685>] rxe_param_set_add+0x65/0x148 [rdma_rxe]
[  174.951148]  [<ffffffff810a0678>] param_attr_store+0x68/0xd0
[  174.951149]  [<ffffffff8109fa0d>] module_attr_store+0x1d/0x30
[  174.951150]  [<ffffffff8129cbba>] sysfs_kf_write+0x3a/0x50
[  174.951151]  [<ffffffff8129c6e0>] kernfs_fop_write+0x120/0x1b0
[  174.951153]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  174.951154]  [<ffffffff810cebdf>] ? percpu_down_read+0x1f/0x50
[  174.951155]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  174.951156]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  174.951157]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  174.951159]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.951160]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  174.954198] rxe: set rxe0 active
[  174.954199] rxe: added rxe0 to eno1.4004
[  174.954384] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.954408] CPU: 3 PID: 90 Comm: kworker/3:1 Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  174.954409] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.954423] Workqueue: infiniband ib_cache_task [ib_core]
[  174.954425]  0000000000000286 00000000c968401e ffff88030dabfc58
ffffffff81353eef
[  174.954427]  ffff88030c174000 ffff88030dabfcd8 ffff88030dabfcb8
ffffffff816109f7
[  174.954428]  ffff8803110c0800 ffff880311663240 0000000000000000
ffff88030dabfd80
[  174.954430] Call Trace:
[  174.954434]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.954437]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.954439]  [<ffffffff810bf41e>] ? load_balance+0x19e/0x9b0
[  174.954441]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.954445]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.954448]  [<ffffffff810b5f2e>] ? account_entity_dequeue+0xae/0xd0
[  174.954449]  [<ffffffff810b942a>] ? dequeue_entity+0x19a/0x5d0
[  174.954453]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.954455]  [<ffffffff811f462b>] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  174.954459]  [<ffffffffa08863e7>] ib_cache_update+0xc7/0x350 [ib_core]
[  174.954463]  [<ffffffffa088668a>] ib_cache_task+0x1a/0x30 [ib_core]
[  174.954465]  [<ffffffff8109ad82>] process_one_work+0x152/0x400
[  174.954467]  [<ffffffff8109b675>] worker_thread+0x125/0x4b0
[  174.954468]  [<ffffffff8109b550>] ? rescuer_thread+0x380/0x380
[  174.954470]  [<ffffffff810a1168>] kthread+0xd8/0xf0
[  174.954472]  [<ffffffff8173ad3f>] ret_from_fork+0x1f/0x40
[  174.954473]  [<ffffffff810a1090>] ? kthread_park+0x60/0x60
[  174.996381] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  174.997259] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.997284] CPU: 1 PID: 3212 Comm: modprobe Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  174.997285] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.997286]  0000000000000286 0000000092f5ac1c ffff8802ed3af9f8
ffffffff81353eef
[  174.997288]  ffff88030c174000 ffff8802ed3afa78 ffff8802ed3afa58
ffffffff816109f7
[  174.997289]  ffff880313000d00 ffff8802ed3afad8 ffffffff811f41cf
ffff88031fa9c6c0
[  174.997291] Call Trace:
[  174.997296]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.997299]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.997301]  [<ffffffff811f41cf>] ? ___slab_alloc+0x34f/0x4c0
[  174.997303]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.997307]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.997316]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.997318]  [<ffffffffa092283a>] ? post_one_recv.isra.14+0xaa/0x110
[rdma_rxe]
[  174.997320]  [<ffffffffa09d6670>] srpt_refresh_port+0x80/0x1a0 [ib_srpt]
[  174.997322]  [<ffffffffa09229d2>] ? rxe_post_srq_recv+0x72/0xa0 [rdma_rxe]
[  174.997324]  [<ffffffffa09d9196>] srpt_add_one+0x276/0x440 [ib_srpt]
[  174.997326]  [<ffffffffa09d6150>] ?
srpt_tpg_attrib_srp_max_rdma_size_show+0x30/0x30 [ib_srpt]
[  174.997330]  [<ffffffffa0883c3f>] ib_register_client+0x5f/0xc0 [ib_core]
[  174.997331]  [<ffffffffa08ec000>] ? 0xffffffffa08ec000
[  174.997333]  [<ffffffffa08ec07c>] srpt_init_module+0x7c/0x1000 [ib_srpt]
[  174.997334]  [<ffffffffa08ec000>] ? 0xffffffffa08ec000
[  174.997336]  [<ffffffff81002190>] do_one_initcall+0x50/0x190
[  174.997337]  [<ffffffff811f462b>] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  174.997338]  [<ffffffff8118cb8c>] do_init_module+0x60/0x1f1
[  174.997340]  [<ffffffff8110ea81>] load_module+0x1f31/0x2700
[  174.997342]  [<ffffffff8110b980>] ? __symbol_put+0x60/0x60
[  174.997344]  [<ffffffff812fc18d>] ? ima_post_read_file+0x3d/0x80
[  174.997346]  [<ffffffff812ca41b>] ? security_kernel_post_read_file+0x6b/0x80
[  174.997347]  [<ffffffff8110f476>] SYSC_finit_module+0xa6/0xf0
[  174.997349]  [<ffffffff8110f4de>] SyS_finit_module+0xe/0x10
[  174.997350]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.997352]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  175.001504] iscsi: registered transport (iser)
[  175.040113] RPC: Registered rdma transport module.
[  175.040114] RPC: Registered rdma backchannel transport module.
[  175.064344] RTNL: assertion failed at net/core/ethtool.c (550)
[  175.064369] CPU: 0 PID: 3187 Comm: ibv_devinfo Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  175.064370] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  175.064371]  0000000000000286 00000000e5af710d ffff8802fbf97b98
ffffffff81353eef
[  175.064373]  ffff88030c174000 ffff8802fbf97c18 ffff8802fbf97bf8
ffffffff816109f7
[  175.064375]  ffffffff81235e1b ffff8802f2f3acc0 ffff8802f2f7af00
ffff8802fbf97c5c
[  175.064376] Call Trace:
[  175.064381]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  175.064384]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  175.064387]  [<ffffffff81235e1b>] ? __d_alloc+0x12b/0x1d0
[  175.064390]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  175.064393]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  175.064396]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.064397]  [<ffffffff81232f5c>] ? dentry_free+0x3c/0x90
[  175.064399]  [<ffffffff81234150>] ? __dentry_kill+0x100/0x150
[  175.064409]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  175.064410]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.064411]  [<ffffffff8121e2f4>] ? put_filp+0x44/0x50
[  175.064413]  [<ffffffffa08c7e5f>] ib_uverbs_query_port+0x5f/0x150
[ib_uverbs]
[  175.064415]  [<ffffffffa08c43dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[  175.064416]  [<ffffffff8122cc15>] ? do_filp_open+0xa5/0x100
[  175.064418]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  175.064419]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  175.064421]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  175.064423]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  175.064424]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  175.064426]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  175.103154] RTNL: assertion failed at net/core/ethtool.c (550)
[  175.103181] CPU: 1 PID: 3289 Comm: ibv_devinfo Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  175.103182] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  175.103183]  0000000000000286 0000000057c77a87 ffff8802fbf97b98
ffffffff81353eef
[  175.103185]  ffff88030c174000 ffff8802fbf97c18 ffff8802fbf97bf8
ffffffff816109f7
[  175.103186]  ffffffff81235e1b ffff8802f2f3acc0 ffff8802f2f19f00
ffff8802fbf97c5c
[  175.103188] Call Trace:
[  175.103193]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  175.103196]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  175.103199]  [<ffffffff81235e1b>] ? __d_alloc+0x12b/0x1d0
[  175.103202]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  175.103206]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  175.103209]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.103210]  [<ffffffff81232f5c>] ? dentry_free+0x3c/0x90
[  175.103211]  [<ffffffff81234150>] ? __dentry_kill+0x100/0x150
[  175.103222]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  175.103223]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.103224]  [<ffffffff8121e2f4>] ? put_filp+0x44/0x50
[  175.103226]  [<ffffffffa08c7e5f>] ib_uverbs_query_port+0x5f/0x150
[ib_uverbs]
[  175.103228]  [<ffffffffa08c43dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[  175.103230]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  175.103231]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  175.103233]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  175.103235]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  175.103236]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  175.103238]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  204.011985] RTNL: assertion failed at net/core/ethtool.c (550)
[  204.012012] CPU: 0 PID: 3354 Comm: ibv_devinfo Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  204.012012] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  204.012013]  0000000000000286 0000000069598e2b ffff8802ed337b98
ffffffff81353eef
[  204.012015]  ffff88030c174000 ffff8802ed337c18 ffff8802ed337bf8
ffffffff816109f7
[  204.012017]  ffffffff81235e1b ffff8802f2f3acc0 ffff8802f2fdf0c0
ffff8802ed337c5c
[  204.012018] Call Trace:
[  204.012023]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  204.012026]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  204.012029]  [<ffffffff81235e1b>] ? __d_alloc+0x12b/0x1d0
[  204.012032]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  204.012036]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  204.012039]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  204.012040]  [<ffffffff81232f5c>] ? dentry_free+0x3c/0x90
[  204.012041]  [<ffffffff81234150>] ? __dentry_kill+0x100/0x150
[  204.012052]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  204.012053]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  204.012054]  [<ffffffff8121e2f4>] ? put_filp+0x44/0x50
[  204.012057]  [<ffffffffa08c7e5f>] ib_uverbs_query_port+0x5f/0x150
[ib_uverbs]
[  204.012058]  [<ffffffffa08c43dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[  204.012060]  [<ffffffff8122cc15>] ? do_filp_open+0xa5/0x100
[  204.012061]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  204.012063]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  204.012065]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  204.012066]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  204.012067]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  204.012069]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5 0/14] Fix race conditions related to stopping block layer queues
From: Bart Van Assche @ 2016-10-29  0:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org

Hello Jens,

Multiple block drivers need the functionality to stop a request queue 
and to wait until all ongoing request_fn() / queue_rq() calls have 
finished without waiting until all outstanding requests have finished. 
Hence this patch series that introduces the blk_mq_quiesce_queue() 
function. The dm-mq, SRP and NVMe patches in this patch series are three 
examples of where these functions are useful. These patches have been 
tested on top of kernel v4.9-rc2. The following tests have been run to 
verify this patch series:
- Mike's mptest suite that stress-tests dm-multipath.
- My own srp-test suite that stress-tests SRP on top of dm-multipath.
- fio on top of the NVMeOF host driver that was connected to the NVMeOF
   target driver on the same host.
- Laurence verified the previous version (v3) of this patch series by
   running it through the Red Hat SRP and NVMe test suites.

The changes compared to the third version of this patch series are:
- Added a blk_mq_stop_hw_queues() call in blk_mq_quiesce_queue() as
   requested by Ming Lei.
- Modified scsi_unblock_target() such that it waits until
   .queuecommand() finished. Unexported scsi_wait_for_queuecommand().
- Reordered the two NVMe patches.
- Added a patch that avoids that blk_mq_requeue_work() restarts stopped
   queues.
- Added a patch that removes blk_mq_cancel_requeue_work().

Changes between v4 and v3:
- Left out the dm changes from the patch that introduces
   blk_mq_hctx_stopped() because a later patch deletes the changed code
   from the dm core.
- Moved the blk_mq_hctx_stopped() declaration from a public to a
   private block layer header file.
- Added a new patch that moves more code into
   blk_mq_direct_issue_request(). This patch avoids that a new function
   has to be introduced to avoid code duplication.
- Explained the implemented algorithm in the patch that introduces
   blk_mq_quiesce_queue() in the description of the patch that
   introduces this function.
- Added "select SRCU" to the patch that introduces
   blk_mq_quiesce_queue() to avoid build failures.
- Documented the shost argument in the scsi_wait_for_queuecommand()
   kerneldoc header.
- Fixed an unintended behavior change in the last patch of this series.

Changes between v3 and v2:
- Changed the order of the patches in this patch series.
- Added several new patches: a patch that avoids that .queue_rq() gets
   invoked from the direct submission path if a queue has been stopped
   and also a patch that introduces the helper function
   blk_mq_hctx_stopped().
- blk_mq_quiesce_queue() has been reworked (thanks to Ming Lin and Sagi
   for their feedback).
- A bool 'kick' argument has been added to blk_mq_requeue_request().
- As proposed by Christoph, the code that waits for queuecommand() has
   been moved from the SRP transport driver to the SCSI core.

Changes between v2 and v1:
- Dropped the non-blk-mq changes from this patch series.
- Added support for harware queues with BLK_MQ_F_BLOCKING set.
- Added a call stack to the description of the dm race fix patch.
- Dropped the non-scsi-mq changes from the SRP patch.
- Added a patch that introduces blk_mq_queue_stopped() in the dm driver.

The individual patches in this series are:

0001-blk-mq-Do-not-invoke-.queue_rq-for-a-stopped-queue.patch
0002-blk-mq-Introduce-blk_mq_hctx_stopped.patch
0003-blk-mq-Introduce-blk_mq_queue_stopped.patch
0004-blk-mq-Move-more-code-into-blk_mq_direct_issue_reque.patch
0005-blk-mq-Avoid-that-requeueing-starts-stopped-queues.patch
0006-blk-mq-Remove-blk_mq_cancel_requeue_work.patch
0007-blk-mq-Introduce-blk_mq_quiesce_queue.patch
0008-blk-mq-Add-a-kick_requeue_list-argument-to-blk_mq_re.patch
0009-dm-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOPPE.patch
0010-dm-Fix-a-race-condition-related-to-stopping-and-star.patch
0011-SRP-transport-Move-queuecommand-wait-code-to-SCSI-co.patch
0012-SRP-transport-scsi-mq-Wait-for-.queue_rq-if-necessar.patch
0013-nvme-Fix-a-race-condition-related-to-stopping-queues.patch
0014-nvme-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOP.patch

Thanks,

Bart.

^ permalink raw reply

* [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue
From: Bart Van Assche @ 2016-10-29  0:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3d27a6..ad459e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1332,9 +1332,9 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
+		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+			blk_mq_insert_request(old_rq, false, true, true);
 		goto done;
 	}
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 02/14] blk-mq: Introduce blk_mq_hctx_stopped()
From: Bart Van Assche @ 2016-10-29  0:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
a helper function that performs this test.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 12 ++++++------
 block/blk-mq.h |  5 +++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ad459e4..bc1f462 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -787,7 +787,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	struct list_head *dptr;
 	int queued;
 
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
@@ -912,8 +912,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state) ||
-	    !blk_mq_hw_queue_mapped(hctx)))
+	if (unlikely(blk_mq_hctx_stopped(hctx) ||
+		     !blk_mq_hw_queue_mapped(hctx)))
 		return;
 
 	if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
@@ -938,7 +938,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if ((!blk_mq_hctx_has_pending(hctx) &&
 		    list_empty_careful(&hctx->dispatch)) ||
-		    test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+		    blk_mq_hctx_stopped(hctx))
 			continue;
 
 		blk_mq_run_hw_queue(hctx, async);
@@ -988,7 +988,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+		if (!blk_mq_hctx_stopped(hctx))
 			continue;
 
 		clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
@@ -1332,7 +1332,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
+		if (blk_mq_hctx_stopped(data.hctx) ||
 		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
 			blk_mq_insert_request(old_rq, false, true, true);
 		goto done;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index e5d2524..ac772da 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -100,6 +100,11 @@ static inline void blk_mq_set_alloc_data(struct blk_mq_alloc_data *data,
 	data->hctx = hctx;
 }
 
+static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
+{
+	return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
+}
+
 static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 {
 	return hctx->nr_ctx && hctx->tags;
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 03/14] blk-mq: Introduce blk_mq_queue_stopped()
From: Bart Van Assche @ 2016-10-29  0:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c         | 20 ++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bc1f462..283e0eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -946,6 +946,26 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queues);
 
+/**
+ * blk_mq_queue_stopped() - check whether one or more hctxs have been stopped
+ * @q: request queue.
+ *
+ * The caller is responsible for serializing this function against
+ * blk_mq_{start,stop}_hw_queue().
+ */
+bool blk_mq_queue_stopped(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (blk_mq_hctx_stopped(hctx))
+			return true;
+
+	return false;
+}
+EXPORT_SYMBOL(blk_mq_queue_stopped);
+
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	cancel_work(&hctx->run_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 535ab2e..aa93000 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -223,6 +223,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
 void blk_mq_abort_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq, int error);
 
+bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_stop_hw_queues(struct request_queue *q);
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 04/14] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Move the "hctx stopped" test and the insert request calls into
blk_mq_direct_issue_request(). Rename that function into
blk_mq_try_issue_directly() to reflect its new semantics. Pass
the hctx pointer to that function instead of looking it up a
second time. These changes avoid that code has to be duplicated
in the next patch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 283e0eb..c8ae970 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1243,11 +1243,11 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	return rq;
 }
 
-static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
+static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+				      struct request *rq, blk_qc_t *cookie)
 {
 	int ret;
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
 	struct blk_mq_queue_data bd = {
 		.rq = rq,
 		.list = NULL,
@@ -1255,6 +1255,9 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 	};
 	blk_qc_t new_cookie = blk_tag_to_qc_t(rq->tag, hctx->queue_num);
 
+	if (blk_mq_hctx_stopped(hctx))
+		goto insert;
+
 	/*
 	 * For OK queue, we are done. For error, kill it. Any other
 	 * error (busy), just add it to our list as we previously
@@ -1263,7 +1266,7 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 	ret = q->mq_ops->queue_rq(hctx, &bd);
 	if (ret == BLK_MQ_RQ_QUEUE_OK) {
 		*cookie = new_cookie;
-		return 0;
+		return;
 	}
 
 	__blk_mq_requeue_request(rq);
@@ -1272,10 +1275,11 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 		*cookie = BLK_QC_T_NONE;
 		rq->errors = -EIO;
 		blk_mq_end_request(rq, rq->errors);
-		return 0;
+		return;
 	}
 
-	return -1;
+insert:
+	blk_mq_insert_request(rq, false, true, true);
 }
 
 /*
@@ -1352,9 +1356,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (blk_mq_hctx_stopped(data.hctx) ||
-		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
-			blk_mq_insert_request(old_rq, false, true, true);
+		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
 		goto done;
 	}
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Since blk_mq_requeue_work() starts stopped queues and since
execution of this function can be scheduled after a queue has
been stopped it is not possible to stop queues without using
an additional state variable to track whether or not the queue
has been stopped. Hence modify blk_mq_requeue_work() such that it
does not start stopped queues. My conclusion after a review of
the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
callers is as follows:
* In the dm driver starting and stopping queues should only happen
  if __dm_suspend() or __dm_resume() is called and not if the
  requeue list is processed.
* In the SCSI core queue stopping and starting should only be
  performed by the scsi_internal_device_block() and
  scsi_internal_device_unblock() functions but not by any other
  function. Although the blk_mq_stop_hw_queue() call in
  scsi_queue_rq() may help to reduce CPU load if a LLD queue is
  full, figuring out whether or not a queue should be restarted
  when requeueing a command would require to introduce additional
  locking in scsi_mq_requeue_cmd() to avoid a race with
  scsi_internal_device_block(). Avoid this complexity by removing
  the blk_mq_stop_hw_queue() call from scsi_queue_rq().
* In the NVMe core only the functions that call
  blk_mq_start_stopped_hw_queues() explicitly should start stopped
  queues.
* A blk_mq_start_stopped_hwqueues() call must be added in the
  xen-blkfront driver in its blkif_recover() function.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Roger Pau Monné <roger.pau-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Cc: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: James Bottomley <jejb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 block/blk-mq.c               | 6 +-----
 drivers/block/xen-blkfront.c | 1 +
 drivers/md/dm-rq.c           | 7 +------
 drivers/scsi/scsi_lib.c      | 1 -
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c8ae970..fe367b5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -503,11 +503,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_insert_request(rq, false, false, false);
 	}
 
-	/*
-	 * Use the start variant of queue running here, so that running
-	 * the requeue work will kick stopped queues.
-	 */
-	blk_mq_start_hw_queues(q);
+	blk_mq_run_hw_queues(q, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..60fff99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
+	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2ad..1794de5 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -338,12 +338,7 @@ static void dm_old_requeue_request(struct request *rq)
 
 static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_queue_stopped(q))
-		blk_mq_delay_kick_requeue_list(q, msecs);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_mq_delay_kick_requeue_list(q, msecs);
 }
 
 void dm_mq_kick_requeue_list(struct mapped_device *md)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2cca9cf..126a784 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1941,7 +1941,6 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 out:
 	switch (ret) {
 	case BLK_MQ_RQ_QUEUE_BUSY:
-		blk_mq_stop_hw_queue(hctx);
 		if (atomic_read(&sdev->device_busy) == 0 &&
 		    !scsi_device_blocked(sdev))
 			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work()
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Since blk_mq_requeue_work() no longer restarts stopped queues
canceling requeue work is no longer needed to prevent that a
stopped queue would be restarted. Hence remove this function.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Keith Busch <keith.busch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
---
 block/blk-mq.c           | 6 ------
 drivers/md/dm-rq.c       | 2 --
 drivers/nvme/host/core.c | 1 -
 include/linux/blk-mq.h   | 1 -
 4 files changed, 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index fe367b5..534128a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -528,12 +528,6 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
-void blk_mq_cancel_requeue_work(struct request_queue *q)
-{
-	cancel_delayed_work_sync(&q->requeue_work);
-}
-EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_schedule_delayed_work(&q->requeue_work, 0);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1794de5..2b82496 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -116,8 +116,6 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	queue_flag_set(QUEUE_FLAG_STOPPED, q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	/* Avoid that requeuing could restart the queue. */
-	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 79e679d..ab5f59e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2083,7 +2083,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index aa93000..a85a20f 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -217,7 +217,6 @@ void __blk_mq_end_request(struct request *rq, int error);
 
 void blk_mq_requeue_request(struct request *rq);
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
-void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
  followed by synchronize_srcu() or synchronize_rcu(). The latter
  call waits for .queue_rq() invocations that started before
  blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  blk_mq_quiesce_queue().

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
---
 block/Kconfig          |  1 +
 block/blk-mq.c         | 71 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 4 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..0562ef9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -5,6 +5,7 @@ menuconfig BLOCK
        bool "Enable the block layer" if EXPERT
        default y
        select SBITMAP
+       select SRCU
        help
 	 Provide block layer support for the kernel.
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 534128a..96015a9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,33 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool rcu = false;
+
+	blk_mq_stop_hw_queues(q);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			synchronize_srcu(&hctx->queue_rq_srcu);
+		else
+			rcu = true;
+	}
+	if (rcu)
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -768,7 +795,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -780,9 +807,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
-	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
-		cpu_online(hctx->next_cpu));
-
 	hctx->run++;
 
 	/*
@@ -873,6 +897,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	int srcu_idx;
+
+	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+		cpu_online(hctx->next_cpu));
+
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
+	} else {
+		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
+		blk_mq_process_rq_list(hctx);
+		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
+	}
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1283,7 +1325,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
 	struct blk_map_ctx data;
 	struct request *rq;
-	unsigned int request_count = 0;
+	unsigned int request_count = 0, srcu_idx;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1326,7 +1368,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1346,7 +1388,16 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+
+		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
+			rcu_read_lock();
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			rcu_read_unlock();
+		} else {
+			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
+		}
 		goto done;
 	}
 
@@ -1625,6 +1676,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(&hctx->queue_rq_srcu);
+
 	blk_mq_remove_cpuhp(hctx);
 	blk_free_flush_queue(hctx->fq);
 	sbitmap_free(&hctx->ctx_map);
@@ -1705,6 +1759,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 				   flush_start_tag + hctx_idx, node))
 		goto free_fq;
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		init_srcu_struct(&hctx->queue_rq_srcu);
+
 	return 0;
 
  free_fq:
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a85a20f..ed20ac7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -3,6 +3,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -35,6 +36,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct srcu_struct	queue_rq_srcu;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..8259d87 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 block/blk-flush.c            |  5 +----
 block/blk-mq.c               | 10 +++++++---
 drivers/block/xen-blkfront.c |  2 +-
 drivers/md/dm-rq.c           |  2 +-
 drivers/nvme/host/core.c     |  2 +-
 drivers/scsi/scsi_lib.c      |  4 +---
 include/linux/blk-mq.h       |  5 +++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3c882cb..7e9950b 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -134,10 +134,7 @@ static void blk_flush_restore_request(struct request *rq)
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
 	if (rq->q->mq_ops) {
-		struct request_queue *q = rq->q;
-
-		blk_mq_add_to_requeue_list(rq, add_front);
-		blk_mq_kick_requeue_list(q);
+		blk_mq_add_to_requeue_list(rq, add_front, true);
 		return false;
 	} else {
 		if (add_front)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 96015a9..b7753ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -494,12 +494,12 @@ static void __blk_mq_requeue_request(struct request *rq)
 	}
 }
 
-void blk_mq_requeue_request(struct request *rq)
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	__blk_mq_requeue_request(rq);
 
 	BUG_ON(blk_queued_rq(rq));
-	blk_mq_add_to_requeue_list(rq, true);
+	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
@@ -533,7 +533,8 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
 	unsigned long flags;
@@ -552,6 +553,9 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 		list_add_tail(&rq->queuelist, &q->requeue_list);
 	}
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
+
+	if (kick_requeue_list)
+		blk_mq_kick_requeue_list(q);
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 60fff99..a3e1727 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2043,7 +2043,7 @@ static int blkif_recover(struct blkfront_info *info)
 		/* Requeue pending requests (flush or discard) */
 		list_del_init(&req->queuelist);
 		BUG_ON(req->nr_phys_segments > segs);
-		blk_mq_requeue_request(req);
+		blk_mq_requeue_request(req, false);
 	}
 	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2b82496..4b62e74 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -347,7 +347,7 @@ EXPORT_SYMBOL(dm_mq_kick_requeue_list);
 
 static void dm_mq_delay_requeue_request(struct request *rq, unsigned long msecs)
 {
-	blk_mq_requeue_request(rq);
+	blk_mq_requeue_request(rq, false);
 	__dm_mq_kick_requeue_list(rq->q, msecs);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ab5f59e..8403996 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -203,7 +203,7 @@ void nvme_requeue_req(struct request *req)
 {
 	unsigned long flags;
 
-	blk_mq_requeue_request(req);
+	blk_mq_requeue_request(req, false);
 	spin_lock_irqsave(req->q->queue_lock, flags);
 	if (!blk_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 126a784..b4f682c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -86,10 +86,8 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
 {
 	struct scsi_device *sdev = cmd->device;
-	struct request_queue *q = cmd->request->q;
 
-	blk_mq_requeue_request(cmd->request);
-	blk_mq_kick_requeue_list(q);
+	blk_mq_requeue_request(cmd->request, true);
 	put_device(&sdev->sdev_gendev);
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ed20ac7..35a0af5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -218,8 +218,9 @@ void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, int error);
 void __blk_mq_end_request(struct request *rq, int error);
 
-void blk_mq_requeue_request(struct request *rq);
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 09/14] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED
in the dm start and stop queue functions, only manipulate the latter
flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/dm-rq.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 4b62e74..0103031 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -75,12 +75,6 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	blk_mq_start_stopped_hw_queues(q, true);
 	blk_mq_kick_requeue_list(q);
 }
@@ -105,16 +99,8 @@ static void dm_old_stop_queue(struct request_queue *q)
 
 static void dm_mq_stop_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (blk_queue_stopped(q)) {
-		spin_unlock_irqrestore(q->queue_lock, flags);
+	if (blk_mq_queue_stopped(q))
 		return;
-	}
-
-	queue_flag_set(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	blk_mq_stop_hw_queues(q);
 }
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 10/14] dm: Fix a race condition related to stopping and starting queues
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm-rq.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 0103031..f9f37ad 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,7 +102,7 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
-	blk_mq_stop_hw_queues(q);
+	blk_mq_quiesce_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -883,17 +883,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Additionally, rename srp_wait_for_queuecommand() into
scsi_wait_for_queuecommand() and add a comment about the
queuecommand() call from scsi_send_eh_cmnd().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_lib.c           | 38 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_transport_srp.c | 41 ++++++---------------------------------
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b4f682c..3ab9c87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2721,6 +2721,39 @@ void sdev_evt_send_simple(struct scsi_device *sdev,
 EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 
 /**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ * @sdev: SCSI device to count the number of scsi_request_fn() callers for.
+ */
+static int scsi_request_fn_active(struct scsi_device *sdev)
+{
+	struct request_queue *q = sdev->request_queue;
+	int request_fn_active;
+
+	WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+	spin_lock_irq(q->queue_lock);
+	request_fn_active = q->request_fn_active;
+	spin_unlock_irq(q->queue_lock);
+
+	return request_fn_active;
+}
+
+/**
+ * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
+ * @shost: SCSI host pointer.
+ *
+ * Wait until the ongoing shost->hostt->queuecommand() calls that are
+ * invoked from scsi_request_fn() have finished.
+ */
+static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
+{
+	WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+	while (scsi_request_fn_active(sdev))
+		msleep(20);
+}
+
+/**
  *	scsi_device_quiesce - Block user issued commands.
  *	@sdev:	scsi device to quiesce.
  *
@@ -2814,6 +2847,10 @@ EXPORT_SYMBOL(scsi_target_resume);
  *	(which must be a legal transition).  When the device is in this
  *	state, all commands are deferred until the scsi lld reenables
  *	the device with scsi_device_unblock or device_block_tmo fires.
+ *
+ * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
+ * scsi_internal_device_block() has blocked a SCSI device and also
+ * remove the rport mutex lock and unlock calls from srp_queuecommand().
  */
 int
 scsi_internal_device_block(struct scsi_device *sdev)
@@ -2841,6 +2878,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
 		spin_unlock_irqrestore(q->queue_lock, flags);
+		scsi_wait_for_queuecommand(sdev);
 	}
 
 	return 0;
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..b48328a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,7 +24,6 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
-#include <linux/delay.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -402,36 +401,6 @@ static void srp_reconnect_work(struct work_struct *work)
 	}
 }
 
-/**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
-	struct scsi_device *sdev;
-	struct request_queue *q;
-	int request_fn_active = 0;
-
-	shost_for_each_device(sdev, shost) {
-		q = sdev->request_queue;
-
-		spin_lock_irq(q->queue_lock);
-		request_fn_active += q->request_fn_active;
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	return request_fn_active;
-}
-
-/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
-static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
-{
-	while (scsi_request_fn_active(shost))
-		msleep(20);
-}
-
 static void __rport_fail_io_fast(struct srp_rport *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
@@ -441,14 +410,17 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
 
 	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
 		return;
+	/*
+	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
+	 * calls before invoking i->f->terminate_rport_io().
+	 */
+	scsi_target_block(rport->dev.parent);
 	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
 
 	/* Involve the LLD if possible to terminate all I/O on the rport. */
 	i = to_srp_internal(shost->transportt);
-	if (i->f->terminate_rport_io) {
-		srp_wait_for_queuecommand(shost);
+	if (i->f->terminate_rport_io)
 		i->f->terminate_rport_io(rport);
-	}
 }
 
 /**
@@ -576,7 +548,6 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	if (res)
 		goto out;
 	scsi_target_block(&shost->shost_gendev);
-	srp_wait_for_queuecommand(shost);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Ensure that if scsi-mq is enabled that scsi_internal_device_block()
waits until ongoing shost->hostt->queuecommand() calls have finished.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3ab9c87..1febc52 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2873,7 +2873,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
 	 * request queue. 
 	 */
 	if (q->mq_ops) {
-		blk_mq_stop_hw_queues(q);
+		blk_mq_quiesce_queue(q);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Keith Busch <keith.busch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8403996..fe15d94 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2083,7 +2083,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_stop_hw_queues(ns->queue);
+		blk_mq_quiesce_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. Change
blk_queue_stopped() tests into blk_mq_queue_stopped().

This patch fixes a race condition: using queue_flag_clear_unlocked()
is not safe if any other function that manipulates the queue flags
can be called concurrently, e.g. blk_cleanup_queue().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fe15d94..45dd237 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
-	blk_mq_requeue_request(req, false);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2078,13 +2072,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		spin_lock_irq(ns->queue->queue_lock);
-		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
-		spin_unlock_irq(ns->queue->queue_lock);
-
+	list_for_each_entry(ns, &ctrl->namespaces, list)
 		blk_mq_quiesce_queue(ns->queue);
-	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -2095,7 +2084,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-- 
2.10.1


^ permalink raw reply related

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-10-29  1:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161028164030.GA17289-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/29 0:40, Jason Gunthorpe 写道:
> On Fri, Oct 28, 2016 at 03:59:47PM +0800, oulijun wrote:
> 
>> total 0
>> lrwxrwxrwx    1 root     root             0 Oct 27 11:07 driver -> ../../../../bus/platform/drivers/hns_roce
>>
>> but I think it is the standard approach. because my device(hip06) is
>> only platform device and the other device(hip07/hip0x0 will be pcie
>> device, it will be distinguished separately.  Hence, we adpot the
>> origin approach.
> 
> You have to parse out 'hns_roce' at the end of the readlink result and
> compare against that, drop the 'bus/platform/drivers' stuff
> 
> Your PCI and Platform device should both have the same driver name.
> 
> Jason
> 
> .
> 
Hi, Jason
   I could not express clearly. We hope that the only copy of libhns will be used for the diff
erent type chip(hip06, hip07, ...), and it can distinguish the hardware of these chip at the same time.
Hence, i think that your plan will not attain our demand.

We hope that the only one userspace library file named libhns-rdmav2.so will be used for the different hardware version(hip06, hip07, ...),
because there are only little change between their userspace drivers. So we need to distinguish hardware version.

We can't distinguish them if only matching driver name "hns_roce".

thanks
Lijun Ou

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 rdma-core 0/7] libhns: userspace library for hns
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA

This patch series introduces userspace library for hns RoCE driver.

changes v1 -> v2:
1. Delete the min() definition and instead of ccan header
2. Delete the CHECK_C_SOURCE_COMPILES
3. sort the c file in rdma_provider()
4. Delete the unused code in hns_roce_u_db.h

Lijun Ou (7):
  libhns: Add initial main frame
  libhns: Add verbs of querying device and querying port
  libhns: Add verbs of pd and mr support
  libhns: Add verbs of cq support
  libhns: Add verbs of qp support
  libhns: Add verbs of post_send and post_recv support
  libhns: Add consolidated repo for userspace library of hns

 CMakeLists.txt                   |   1 +
 MAINTAINERS                      |   6 +
 README.md                        |   1 +
 providers/hns/CMakeLists.txt     |   6 +
 providers/hns/hns_roce_u.c       | 228 +++++++++++
 providers/hns/hns_roce_u.h       | 255 ++++++++++++
 providers/hns/hns_roce_u_abi.h   |  69 ++++
 providers/hns/hns_roce_u_buf.c   |  61 +++
 providers/hns/hns_roce_u_db.h    |  54 +++
 providers/hns/hns_roce_u_hw_v1.c | 839 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h | 242 +++++++++++
 providers/hns/hns_roce_u_verbs.c | 525 ++++++++++++++++++++++++
 12 files changed, 2287 insertions(+)
 create mode 100644 providers/hns/CMakeLists.txt
 create mode 100644 providers/hns/hns_roce_u.c
 create mode 100644 providers/hns/hns_roce_u.h
 create mode 100644 providers/hns/hns_roce_u_abi.h
 create mode 100644 providers/hns/hns_roce_u_buf.c
 create mode 100644 providers/hns/hns_roce_u_db.h
 create mode 100644 providers/hns/hns_roce_u_hw_v1.c
 create mode 100644 providers/hns/hns_roce_u_hw_v1.h
 create mode 100644 providers/hns/hns_roce_u_verbs.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 rdma-core 1/7] libhns: Add initial main frame
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces initial main frame for
userspace library of hns.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c     | 163 +++++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u.h     |  86 ++++++++++++++++++++++
 providers/hns/hns_roce_u_abi.h |  43 +++++++++++
 3 files changed, 292 insertions(+)
 create mode 100644 providers/hns/hns_roce_u.c
 create mode 100644 providers/hns/hns_roce_u.h
 create mode 100644 providers/hns/hns_roce_u_abi.h

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
new file mode 100644
index 0000000..bda4dd8
--- /dev/null
+++ b/providers/hns/hns_roce_u.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+
+#define HID_LEN			15
+#define DEV_MATCH_LEN		128
+
+static const struct {
+	char	 hid[HID_LEN];
+} acpi_table[] = {
+	{"acpi:HISI00D1:"},
+	{},
+};
+
+static const struct {
+	char	 compatible[DEV_MATCH_LEN];
+} dt_table[] = {
+	{"hisilicon,hns-roce-v1"},
+	{},
+};
+
+static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
+						  int cmd_fd)
+{
+	int i;
+	struct ibv_get_context cmd;
+	struct ibv_device_attr dev_attrs;
+	struct hns_roce_context *context;
+	struct hns_roce_alloc_ucontext_resp resp;
+	struct hns_roce_device *hr_dev = to_hr_dev(ibdev);
+
+	context = calloc(1, sizeof(*context));
+	if (!context)
+		return NULL;
+
+	context->ibv_ctx.cmd_fd = cmd_fd;
+	if (ibv_cmd_get_context(&context->ibv_ctx, &cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp)))
+		goto err_free;
+
+	context->num_qps = resp.qp_tab_size;
+	context->qp_table_shift = ffs(context->num_qps) - 1 -
+				  HNS_ROCE_QP_TABLE_BITS;
+	context->qp_table_mask = (1 << context->qp_table_shift) - 1;
+
+	pthread_mutex_init(&context->qp_table_mutex, NULL);
+	for (i = 0; i < HNS_ROCE_QP_TABLE_SIZE; ++i)
+		context->qp_table[i].refcnt = 0;
+
+	context->uar = mmap(NULL, to_hr_dev(ibdev)->page_size,
+			    PROT_READ | PROT_WRITE, MAP_SHARED, cmd_fd, 0);
+	if (context->uar == MAP_FAILED) {
+		fprintf(stderr, PFX "Warning: failed to mmap() uar page.\n");
+		goto err_free;
+	}
+
+	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
+
+	context->max_qp_wr = dev_attrs.max_qp_wr;
+	context->max_sge = dev_attrs.max_sge;
+	context->max_cqe = dev_attrs.max_cqe;
+
+	return &context->ibv_ctx;
+
+err_free:
+	free(context);
+	return NULL;
+}
+
+static void hns_roce_free_context(struct ibv_context *ibctx)
+{
+	struct hns_roce_context *context = to_hr_ctx(ibctx);
+
+	munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+
+	context->uar = NULL;
+
+	free(context);
+	context = NULL;
+}
+
+static struct ibv_device_ops hns_roce_dev_ops = {
+	.alloc_context = hns_roce_alloc_context,
+	.free_context	= hns_roce_free_context
+};
+
+static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
+					       int abi_version)
+{
+	struct hns_roce_device  *dev;
+	char			 value[128];
+	int			 i;
+
+	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
+				value, sizeof(value)) > 0)
+		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
+			if (!strcmp(value, acpi_table[i].hid))
+				goto found;
+
+	if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
+				value, sizeof(value)) > 0)
+		for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
+			if (!strcmp(value, dt_table[i].compatible))
+				goto found;
+
+	return NULL;
+
+found:
+	dev = malloc(sizeof(struct hns_roce_device));
+	if (!dev) {
+		fprintf(stderr, PFX "Fatal: couldn't allocate device for %s\n",
+			uverbs_sys_path);
+		return NULL;
+	}
+
+	dev->ibv_dev.ops = hns_roce_dev_ops;
+	dev->page_size   = sysconf(_SC_PAGESIZE);
+	return &dev->ibv_dev;
+}
+
+static __attribute__((constructor)) void hns_roce_register_driver(void)
+{
+	ibv_register_driver("hns", hns_roce_driver_init);
+}
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
new file mode 100644
index 0000000..3eef171
--- /dev/null
+++ b/providers/hns/hns_roce_u.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_H
+#define _HNS_ROCE_U_H
+
+#include <stddef.h>
+
+#include <infiniband/driver.h>
+#include <infiniband/arch.h>
+#include <infiniband/verbs.h>
+#include <ccan/container_of.h>
+
+#define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
+
+#define PFX				"hns: "
+
+enum {
+	HNS_ROCE_QP_TABLE_BITS		= 8,
+	HNS_ROCE_QP_TABLE_SIZE		= 1 << HNS_ROCE_QP_TABLE_BITS,
+};
+
+struct hns_roce_device {
+	struct ibv_device		ibv_dev;
+	int				page_size;
+};
+
+struct hns_roce_context {
+	struct ibv_context		ibv_ctx;
+	void				*uar;
+	pthread_spinlock_t		uar_lock;
+
+	struct {
+		int			refcnt;
+	} qp_table[HNS_ROCE_QP_TABLE_SIZE];
+
+	pthread_mutex_t			qp_table_mutex;
+
+	int				num_qps;
+	int				qp_table_shift;
+	int				qp_table_mask;
+	unsigned int			max_qp_wr;
+	unsigned int			max_sge;
+	int				max_cqe;
+};
+
+static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
+{
+	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
+}
+
+static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
+{
+	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
+}
+
+#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
new file mode 100644
index 0000000..4bfc8fa
--- /dev/null
+++ b/providers/hns/hns_roce_u_abi.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_ABI_H
+#define _HNS_ROCE_U_ABI_H
+
+#include <infiniband/kern-abi.h>
+
+struct hns_roce_alloc_ucontext_resp {
+	struct ibv_get_context_resp	ibv_resp;
+	__u32				qp_tab_size;
+};
+
+#endif /* _HNS_ROCE_U_ABI_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 rdma-core 2/7] libhns: Add verbs of querying device and querying port
From: Lijun Ou @ 2016-10-29  9:03 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces query verbs for querying device
and querying port.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  7 ++++
 providers/hns/hns_roce_u.h       |  4 +++
 providers/hns/hns_roce_u_verbs.c | 73 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+)
 create mode 100644 providers/hns/hns_roce_u_verbs.c

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index bda4dd8..c0f6fe9 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -95,12 +95,19 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 
 	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
 
+	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
+	context->ibv_ctx.ops.query_port    = hns_roce_u_query_port;
+
+	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
+		goto tptr_free;
+
 	context->max_qp_wr = dev_attrs.max_qp_wr;
 	context->max_sge = dev_attrs.max_sge;
 	context->max_cqe = dev_attrs.max_cqe;
 
 	return &context->ibv_ctx;
 
+tptr_free:
 err_free:
 	free(context);
 	return NULL;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 3eef171..aa58ee6 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -83,4 +83,8 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
 	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
 }
 
+int hns_roce_u_query_device(struct ibv_context *context,
+			    struct ibv_device_attr *attr);
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+			  struct ibv_port_attr *attr);
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
new file mode 100644
index 0000000..be55fe8
--- /dev/null
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_u_query_device(struct ibv_context *context,
+			    struct ibv_device_attr *attr)
+{
+	int ret;
+	struct ibv_query_device cmd;
+	unsigned long raw_fw_ver;
+	unsigned int major, minor, sub_minor;
+
+	ret = ibv_cmd_query_device(context, attr, &raw_fw_ver, &cmd,
+				   sizeof(cmd));
+	if (ret)
+		return ret;
+
+	major	   = (raw_fw_ver >> 32) & 0xffff;
+	minor	   = (raw_fw_ver >> 16) & 0xffff;
+	sub_minor = raw_fw_ver & 0xffff;
+
+	snprintf(attr->fw_ver, sizeof(attr->fw_ver), "%d.%d.%03d", major, minor,
+		 sub_minor);
+
+	return 0;
+}
+
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+			  struct ibv_port_attr *attr)
+{
+	struct ibv_query_port cmd;
+
+	return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox