Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH] qedr: Fix possible memory leak in qedr_create_qp()
From: Wei Yongjun @ 2016-10-28 16:33 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock, Ram Amrani,
	Rajesh Borundia
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

'qp' is malloced in qedr_create_qp() and should be freed before leaving
from the error handling cases, otherwise it will cause memory leak.

Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/hw/qedr/verbs.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index a615142..b60f145 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1477,6 +1477,7 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
 	struct qedr_ucontext *ctx = NULL;
 	struct qedr_create_qp_ureq ureq;
 	struct qedr_qp *qp;
+	struct ib_qp *ibqp;
 	int rc = 0;
 
 	DP_DEBUG(dev, QEDR_MSG_QP, "create qp: called from %s, pd=%p\n",
@@ -1486,13 +1487,13 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
 	if (rc)
 		return ERR_PTR(rc);
 
+	if (attrs->srq)
+		return ERR_PTR(-EINVAL);
+
 	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
 	if (!qp)
 		return ERR_PTR(-ENOMEM);
 
-	if (attrs->srq)
-		return ERR_PTR(-EINVAL);
-
 	DP_DEBUG(dev, QEDR_MSG_QP,
 		 "create qp: sq_cq=%p, sq_icid=%d, rq_cq=%p, rq_icid=%d\n",
 		 get_qedr_cq(attrs->send_cq),
@@ -1508,7 +1509,10 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
 			       "create qp: unexpected udata when creating GSI QP\n");
 			goto err0;
 		}
-		return qedr_create_gsi_qp(dev, attrs, qp);
+		ibqp = qedr_create_gsi_qp(dev, attrs, qp);
+		if (IS_ERR(ibqp))
+			kfree(qp);
+		return ibqp;
 	}
 
 	memset(&in_params, 0, sizeof(in_params));

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: Jason Gunthorpe @ 2016-10-28 16:40 UTC (permalink / raw)
  To: oulijun
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <58130573.4010902-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

On Fri, Oct 28, 2016 at 03:59:47PM +0800, oulijun wrote:

> total 0
> lrwxrwxrwx    1 root     root             0 Oct 27 11:07 driver -> ../../../../bus/platform/drivers/hns_roce
> 
> but I think it is the standard approach. because my device(hip06) is
> only platform device and the other device(hip07/hip0x0 will be pcie
> device, it will be distinguished separately.  Hence, we adpot the
> origin approach.

You have to parse out 'hns_roce' at the end of the readlink result and
compare against that, drop the 'bus/platform/drivers' stuff

Your PCI and Platform device should both have the same driver name.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 01/12] IB/mlx5: Replace numerical constant with predefined MACRO
From: Or Gerlitz @ 2016-10-28 16:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Max Gurtovoy
In-Reply-To: <20161028145534.GM3617-2ukJVAZIZ/Y@public.gmane.org>

On Fri, Oct 28, 2016 at 5:55 PM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Oct 28, 2016 at 03:54:25PM +0300, Or Gerlitz wrote:

>> By all means (or no means, choose), this is a fix. If you are telling

> Since change of numeric value to the same value defined by macro is not
> a fix but improvement [..] I have no plans to change the patch or/and drop it.

RU listening? it's a -next patch, not rc
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 12/12] IB/mlx5: Limit mkey page size to 2GB
From: Or Gerlitz @ 2016-10-28 16:53 UTC (permalink / raw)
  To: Majd Dibbiny
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Maor Gottlieb
In-Reply-To: <31841A29-5CE1-45A9-999A-1FAAA3CDFD4F-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On Fri, Oct 28, 2016 at 4:21 PM, Majd Dibbiny <majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On Oct 28, 2016, at 4:02 PM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> In case we are working with 1GB hugepage, it is possible to find more than
> two continuous pages, and end up with more than 2GB..

got it
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jarod Wilson @ 2016-10-28 17:11 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161027211059.GA7224-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Oct 27, 2016 at 03:10:59PM -0600, Jason Gunthorpe wrote:
> On Thu, Oct 20, 2016 at 11:33:57AM -0400, Jarod Wilson wrote:
> > @@ -7,10 +7,11 @@ Summary: RDMA core userspace libraries and daemons
> >  #  providers/ipathverbs/ Dual licensed using a BSD license with an extra patent clause
> >  #  providers/rxe/ Incorporates code from ipathverbs and contains the patent clause
> >  #  providers/hfi1verbs Uses the 3 Clause BSD license
> > -License: (GPLv2 or BSD) and (GPLv2 or PathScale-BSD)
> > +License: GPLv2 or BSD
> 
> Is this Ok? The Fedora guidelines I read suggested the PathScale
> license would need to be assigned a short tag, and I'd be surprised if
> 'BSD' is the right tag due to the patent stuff..

Our standalone libipathverbs has just "GPLv2 or BSD", and I didn't see
anything specific about PathScale, but I may not have been looking hard
enough or in the right place in the Fedora packaging guidelines. Where did
you see that?

> >  Url: http://openfabrics.org/
> 
> I guess we should change this url to
> https://github.com/linux-rdma/rdma-core ?

Either one works for me.

> >  Source: rdma-core-%{version}.tgz
> > -BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
> > +# https://github.com/linux-rdma/rdma-core
> > +BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX)
> 
> I always wondered why there was so much variability in spec files
> here.. I followed the Fedora guidelines, should we copy the above into
> the other spec file?

I believe the current Fedora guidelines actually say "just omit
BuildRoot", because rpm will figure out a sane default by itself. The one
with mktemp was introduced by the security-conscious/paranoid, I just
copied it over from another of the specs I was merging together here, not
sure what the "best" route is here now.

> > @@ -19,20 +20,15 @@ BuildRequires: pkgconfig
> >  BuildRequires: pkgconfig(libnl-3.0)
> >  BuildRequires: pkgconfig(libnl-route-3.0)
> >  BuildRequires: valgrind-devel
> > +BuildRequires: libnl3-devel
> 
> ?
> 
> Isn't pkgconfig(libnl-3.0) the same thing?

Oops, probably. Another copy-paste addition from another spec.

> >%define systemd_dep systemd-units
> >%if 0%{?fedora} >= 18
> >%define systemd_dep systemd
> >%endif
> 
> The source package probably doesn't even build on FC 18.. can probably
> remove this

Works for me. FC18 is 3-4 years old now, and nobody sane should be running
it for anything useful that they want to deploy new software on.

> > +Summary: InfiniBand Communication Manager Assistant
> > +Requires(post): %{systemd_dep}
> > +Requires(preun): %{systemd_dep}
> > +Requires(postun): %{systemd_dep}
> 
> I suppose we need these and related in the other spec file too?
> Looks like this spec file isn't going to work on C6, so you can
> probably drop the other systemd compat stuff:

Ah, yes, probably a good idea to add these there too. I'd meant to add all
the stuff that should go in both specs *before* the split/copy, but alas, I
failed. :)

> --- a/redhat/rdma-core.spec
> +++ b/redhat/rdma-core.spec
> @@ -202,13 +202,6 @@ discover and use SCSI devices via the SCSI RDMA Protocol over InfiniBand.
>  
>  %build
>  
> -# Detect if systemd is supported on this system
> -%if 0%{?_unitdir:1}
> -%define my_unitdir %{_unitdir}
> -%else
> -%define my_unitdir /tmp/
> -%endif
> -
>  # New RPM defines _rundir, usually as /run
>  %if 0%{?_rundir:1}
>  %else
> @@ -228,7 +221,7 @@ discover and use SCSI devices via the SCSI RDMA Protocol over InfiniBand.
>           -DCMAKE_INSTALL_INFODIR:PATH=%{_infodir} \
>           -DCMAKE_INSTALL_MANDIR:PATH=%{_mandir} \
>           -DCMAKE_INSTALL_SYSCONFDIR:PATH=%{_sysconfdir} \
> -        -DCMAKE_INSTALL_SYSTEMD_SERVICEDIR:PATH=%{my_unitdir} \
> +        -DCMAKE_INSTALL_SYSTEMD_SERVICEDIR:PATH=%{_unitdir} \
>          -DCMAKE_INSTALL_INITDDIR:PATH=%{_initrddir} \
>          -DCMAKE_INSTALL_RUNDIR:PATH=%{_rundir} \
>          -DCMAKE_INSTALL_DOCDIR:PATH=%{_docdir}/%{name}-%{version}
> @@ -276,8 +269,6 @@ install -D -m0644 redhat/srp_daemon.service %{buildroot}%{_unitdir}/
>  
>  %if 0%{?_unitdir:1}
>  rm -rf %{buildroot}/%{_initrddir}/
> -%else
> -rm -rf %{buildroot}/%{my_unitdir}/
>  %endif
>  
>  %post -p /sbin/ldconfig

Looks good to me. And yeah, we have no plans to update the el6 rdma stack
with rdma-core and driver refreshes at this stage in it's lifecycle, so
there's really no sense in pretending to maybe support it.

> > +%package -n librdmacm-utils
> > +Summary: Examples for the librdmacm library
> > +Requires: librdmacm%{?_isa} = %{version}-%{release}
> 
> Why the requires? Shouldn't auto shlib dependencies take care of that?

Probably. I think this was another legacy bit copied over from a
stand-alone spec file.

> Anyhow, this all looks fine to me, I put a branch here, with one
> change to make the debian packaging work after the README.md change:
> 
> https://github.com/jgunthorpe/rdma-plumbing/tree/redhat-packaging
> 
> If you want to make any final adjustments let me know, otherwise I
> will send this on..

Nah, I think things look good, and we can always keep tweaking as needed,
this is a solid update to work on top of.

-- 
Jarod Wilson
jarod-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Christoph Lameter @ 2016-10-28 17:17 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, skc-YOWKrPYUwWM, ira.weiny-ral2JQCrhuEAvxtiuMwx3w,
	Jason Gunthorpe, john.fleck-ral2JQCrhuEAvxtiuMwx3w,
	leon-DgEjT+Ai2ygdnm+yROfE0A, liranl-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w

We have two days dedicated to meetings about RDMA technology

- RDMA workshop on Tuesday, 1st of November

	Meeting in Sweeney AB from 9am till 5pm.
	See https://www.linuxplumbersconf.org/2016/ocw/events/LPC2016/schedule

	This is part of the Kernel Summit and the Linux Plumbers
	Conference. Only open to KS and LPC attendees with an invitation.

	Sessions:

		 Implementing a new Linux RDMA provider		Knut Omang
		 New IOCTL ABI					Matan Barak
		 Consolidation of RDMA User-space code		Jason Gunthorpe
		 Debuggability and Tracing			Leon Romanovsky
		 Integration with other subsystems		Leon Romanovsky
		 Lunch Break
		 Containers and RDMA				Liran Liss
		 New Fabrics					Ira Weiny
		 Future and Roadmap for RDMA			Doug Ledford
		 User Mode Ethernet Verbs			Tzahi Oved
		 Dual Licensing					Susan Coulter

	Note that there are some speakers who have not filled out the speakers profile
	and some abstract needs to be filled out. Those list me as a speaker until
	the time that we can put the information about the real speaker in there.

- RDMA Summit on Saturday the 5th of November

	This event will be held at

	Hilton Santa Fe
	100 Sandoval St.
	Santa Fe, NM 87501
	Phone: (505)988-2811

	Meeting in the Canyon Ballroom 9am till 4pm.

	This event is sponsored by the Open Fabrics Alliance and will be
	open to all

The workshop will be focusing more on the technical topics (some
of which could be carried over to Saturday if necessary because we have
more time then. We will summarize what happened on Tuesday for those who
were not able to participate earlier.

Saturday sessions 9am till 4pm. 12-1pm Lunchtime

	9am	Refine TODO list for consolidated library - Jason Gunthorpe
	10am	Submission process for multi subsystem drivers - Doug Ledford
	11am	Multicast features and gaps - Christoph Lameter

	1pm	Licensing carryover - Susan/Christoph
	2pm	Standard network tools, integrating to the regular network stack - Christoph
	3pm	Open Discussion/Reserve Session	- TBD
	4pm	Closing Session	- TBD


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jason Gunthorpe @ 2016-10-28 17:25 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161028171147.GJ42084-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, Oct 28, 2016 at 01:11:47PM -0400, Jarod Wilson wrote:
> On Thu, Oct 27, 2016 at 03:10:59PM -0600, Jason Gunthorpe wrote:
> > On Thu, Oct 20, 2016 at 11:33:57AM -0400, Jarod Wilson wrote:
> > > @@ -7,10 +7,11 @@ Summary: RDMA core userspace libraries and daemons
> > >  #  providers/ipathverbs/ Dual licensed using a BSD license with an extra patent clause
> > >  #  providers/rxe/ Incorporates code from ipathverbs and contains the patent clause
> > >  #  providers/hfi1verbs Uses the 3 Clause BSD license
> > > -License: (GPLv2 or BSD) and (GPLv2 or PathScale-BSD)
> > > +License: GPLv2 or BSD
> > 
> > Is this Ok? The Fedora guidelines I read suggested the PathScale
> > license would need to be assigned a short tag, and I'd be surprised if
> > 'BSD' is the right tag due to the patent stuff..
> 
> Our standalone libipathverbs has just "GPLv2 or BSD",

I suspect that was a mistake, the difference in the pathscale license
is subtle and several other people assumed it was the cannonical 'BSD'
text...

> and I didn't see anything specific about PathScale, but I may not
> have been looking hard enough or in the right place in the Fedora
> packaging guidelines. Where did you see that?

https://fedoraproject.org/wiki/Licensing:Main?rd=Licensing

'All software in Fedora must be under licenses in the Fedora licensing list.'

The Pathscale license is not in the licensing list. The patent clause
seems legally significant enough to at least ask what tag is OK.

Also, we now have CC0 and MIT licensed files, so I guess a 'and CC0
and MIT' is appropriate too?

> Nah, I think things look good, and we can always keep tweaking as needed,
> this is a solid update to work on top of.

Okay, I'll patch in the changes from this discussion and send the
pull.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH for-next 00/14][PULL request] Mellanox mlx5 core driver updates 2016-10-25
From: David Miller @ 2016-10-28 17:53 UTC (permalink / raw)
  To: saeedm-VPRAkNaXOzVWk0Htik3J/w
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	talal-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <1477407617-20745-1-git-send-email-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>


I really disalike pull requests of this form.

You add lots of datastructures and helper functions but no actual
users of these facilities to the driver.

Do this instead:

	1) Add TSAR infrastructure
	2) Add use of TSAR facilities to the driver

That's one pull request.

I don't care if this is hard, or if there are entanglements with
Infiniband or whatever, you must submit changes in this manner.

I will not accept additions to a driver that don't even get really
used.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH rdma-core 2/4] Move rdma_netlink compat into CMake
From: Nikolova, Tatyana E @ 2016-10-28 18:00 UTC (permalink / raw)
  To: Jason Gunthorpe, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Steve Wise
In-Reply-To: <1477609570-8087-3-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

Hi Jason,

The patch looks good.

I don't think that RDMA_NL_I40IW enum bellow is necessary.

+enum {
+	RDMA_NL_RDMA_CM = 1,
+	RDMA_NL_IWCM,
+	RDMA_NL_RSVD,
+	RDMA_NL_LS,	/* RDMA Local Services */
+	RDMA_NL_I40IW,
+	RDMA_NL_NUM_CLIENTS
+};

Thank you,
Tatyana

-----Original Message-----
From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
Sent: Thursday, October 27, 2016 6:06 PM
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Subject: [PATCH rdma-core 2/4] Move rdma_netlink compat into CMake

Detect if the distro's rdma_netlink.h is new enough, if not replace it with the built in copy, and eliminate the two loose copies of the header.

The built in copy is from v4.8

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 buildlib/RDMA_LinuxHeaders.cmake           |   2 +-
 buildlib/fixup-include/rdma-rdma_netlink.h | 225 +++++++++++++++++++++++++++++
 ibacm/src/acm.c                            |   3 -
 ibacm/src/acm_netlink.h                    | 128 ----------------
 iwpmd/iwarp_pm.h                           |   2 +-
 iwpmd/iwarp_pm_common.c                    |   5 -
 iwpmd/iwarp_pm_server.c                    |   4 +-
 iwpmd/iwpm_netlink.h                       | 214 ---------------------------
 8 files changed, 229 insertions(+), 354 deletions(-)  create mode 100644 buildlib/fixup-include/rdma-rdma_netlink.h
 delete mode 100644 ibacm/src/acm_netlink.h  delete mode 100644 iwpmd/iwpm_netlink.h

Steve,

Can you check if the changes to iwpmd/iwarp_pm_server.c make sense?
Should we do something to fix the kernel header?

diff --git a/buildlib/RDMA_LinuxHeaders.cmake b/buildlib/RDMA_LinuxHeaders.cmake
index bd16d8deca72..c67b0a6113d2 100644
--- a/buildlib/RDMA_LinuxHeaders.cmake
+++ b/buildlib/RDMA_LinuxHeaders.cmake
@@ -80,6 +80,6 @@ rdma_check_kheader("rdma/ib_user_verbs.h" "${DEFAULT_TEST}")  rdma_check_kheader("rdma/ib_user_sa.h" "${DEFAULT_TEST}")  rdma_check_kheader("rdma/ib_user_cm.h" "${DEFAULT_TEST}")  rdma_check_kheader("rdma/ib_user_mad.h" "${DEFAULT_TEST}") -rdma_check_kheader("rdma/rdma_netlink.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/rdma_netlink.h" "int main(int argc,const char 
+*argv[]) { return RDMA_NL_IWPM_REMOTE_INFO && RDMA_NL_IWCM; }")
 rdma_check_kheader("rdma/rdma_user_cm.h" "${DEFAULT_TEST}")  rdma_check_kheader("rdma/rdma_user_rxe.h" "${DEFAULT_TEST}") diff --git a/buildlib/fixup-include/rdma-rdma_netlink.h b/buildlib/fixup-include/rdma-rdma_netlink.h
new file mode 100644
index 000000000000..02fe8390c18f
--- /dev/null
+++ b/buildlib/fixup-include/rdma-rdma_netlink.h
@@ -0,0 +1,225 @@
+#ifndef _UAPI_RDMA_NETLINK_H
+#define _UAPI_RDMA_NETLINK_H
+
+#include <linux/types.h>
+
+enum {
+	RDMA_NL_RDMA_CM = 1,
+	RDMA_NL_IWCM,
+	RDMA_NL_RSVD,
+	RDMA_NL_LS,	/* RDMA Local Services */
+	RDMA_NL_I40IW,
+	RDMA_NL_NUM_CLIENTS
+};
+
+enum {
+	RDMA_NL_GROUP_CM = 1,
+	RDMA_NL_GROUP_IWPM,
+	RDMA_NL_GROUP_LS,
+	RDMA_NL_NUM_GROUPS
+};
+
+#define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 
+10) #define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1)) #define 
+RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
+
+enum {
+	RDMA_NL_RDMA_CM_ID_STATS = 0,
+	RDMA_NL_RDMA_CM_NUM_OPS
+};
+
+enum {
+	RDMA_NL_RDMA_CM_ATTR_SRC_ADDR = 1,
+	RDMA_NL_RDMA_CM_ATTR_DST_ADDR,
+	RDMA_NL_RDMA_CM_NUM_ATTR,
+};
+
+/* iwarp port mapper op-codes */
+enum {
+	RDMA_NL_IWPM_REG_PID = 0,
+	RDMA_NL_IWPM_ADD_MAPPING,
+	RDMA_NL_IWPM_QUERY_MAPPING,
+	RDMA_NL_IWPM_REMOVE_MAPPING,
+	RDMA_NL_IWPM_REMOTE_INFO,
+	RDMA_NL_IWPM_HANDLE_ERR,
+	RDMA_NL_IWPM_MAPINFO,
+	RDMA_NL_IWPM_MAPINFO_NUM,
+	RDMA_NL_IWPM_NUM_OPS
+};
+
+struct rdma_cm_id_stats {
+	__u32	qp_num;
+	__u32	bound_dev_if;
+	__u32	port_space;
+	__s32	pid;
+	__u8	cm_state;
+	__u8	node_type;
+	__u8	port_num;
+	__u8	qp_type;
+};
+
+enum {
+	IWPM_NLA_REG_PID_UNSPEC = 0,
+	IWPM_NLA_REG_PID_SEQ,
+	IWPM_NLA_REG_IF_NAME,
+	IWPM_NLA_REG_IBDEV_NAME,
+	IWPM_NLA_REG_ULIB_NAME,
+	IWPM_NLA_REG_PID_MAX
+};
+
+enum {
+	IWPM_NLA_RREG_PID_UNSPEC = 0,
+	IWPM_NLA_RREG_PID_SEQ,
+	IWPM_NLA_RREG_IBDEV_NAME,
+	IWPM_NLA_RREG_ULIB_NAME,
+	IWPM_NLA_RREG_ULIB_VER,
+	IWPM_NLA_RREG_PID_ERR,
+	IWPM_NLA_RREG_PID_MAX
+
+};
+
+enum {
+	IWPM_NLA_MANAGE_MAPPING_UNSPEC = 0,
+	IWPM_NLA_MANAGE_MAPPING_SEQ,
+	IWPM_NLA_MANAGE_ADDR,
+	IWPM_NLA_MANAGE_MAPPED_LOC_ADDR,
+	IWPM_NLA_RMANAGE_MAPPING_ERR,
+	IWPM_NLA_RMANAGE_MAPPING_MAX
+};
+
+#define IWPM_NLA_MANAGE_MAPPING_MAX 3
+#define IWPM_NLA_QUERY_MAPPING_MAX  4
+#define IWPM_NLA_MAPINFO_SEND_MAX   3
+
+enum {
+	IWPM_NLA_QUERY_MAPPING_UNSPEC = 0,
+	IWPM_NLA_QUERY_MAPPING_SEQ,
+	IWPM_NLA_QUERY_LOCAL_ADDR,
+	IWPM_NLA_QUERY_REMOTE_ADDR,
+	IWPM_NLA_RQUERY_MAPPED_LOC_ADDR,
+	IWPM_NLA_RQUERY_MAPPED_REM_ADDR,
+	IWPM_NLA_RQUERY_MAPPING_ERR,
+	IWPM_NLA_RQUERY_MAPPING_MAX
+};
+
+enum {
+	IWPM_NLA_MAPINFO_REQ_UNSPEC = 0,
+	IWPM_NLA_MAPINFO_ULIB_NAME,
+	IWPM_NLA_MAPINFO_ULIB_VER,
+	IWPM_NLA_MAPINFO_REQ_MAX
+};
+
+enum {
+	IWPM_NLA_MAPINFO_UNSPEC = 0,
+	IWPM_NLA_MAPINFO_LOCAL_ADDR,
+	IWPM_NLA_MAPINFO_MAPPED_ADDR,
+	IWPM_NLA_MAPINFO_MAX
+};
+
+enum {
+	IWPM_NLA_MAPINFO_NUM_UNSPEC = 0,
+	IWPM_NLA_MAPINFO_SEQ,
+	IWPM_NLA_MAPINFO_SEND_NUM,
+	IWPM_NLA_MAPINFO_ACK_NUM,
+	IWPM_NLA_MAPINFO_NUM_MAX
+};
+
+enum {
+	IWPM_NLA_ERR_UNSPEC = 0,
+	IWPM_NLA_ERR_SEQ,
+	IWPM_NLA_ERR_CODE,
+	IWPM_NLA_ERR_MAX
+};
+
+/*
+ * Local service operations:
+ *   RESOLVE - The client requests the local service to resolve a path.
+ *   SET_TIMEOUT - The local service requests the client to set the timeout.
+ *   IP_RESOLVE - The client requests the local service to resolve an IP to GID.
+ */
+enum {
+	RDMA_NL_LS_OP_RESOLVE = 0,
+	RDMA_NL_LS_OP_SET_TIMEOUT,
+	RDMA_NL_LS_OP_IP_RESOLVE,
+	RDMA_NL_LS_NUM_OPS
+};
+
+/* Local service netlink message flags */
+#define RDMA_NL_LS_F_ERR	0x0100	/* Failed response */
+
+/*
+ * Local service resolve operation family header.
+ * The layout for the resolve operation:
+ *    nlmsg header
+ *    family header
+ *    attributes
+ */
+
+/*
+ * Local service path use:
+ * Specify how the path(s) will be used.
+ *   ALL - For connected CM operation (6 pathrecords)
+ *   UNIDIRECTIONAL - For unidirectional UD (1 pathrecord)
+ *   GMP - For miscellaneous GMP like operation (at least 1 reversible
+ *         pathrecord)
+ */
+enum {
+	LS_RESOLVE_PATH_USE_ALL = 0,
+	LS_RESOLVE_PATH_USE_UNIDIRECTIONAL,
+	LS_RESOLVE_PATH_USE_GMP,
+	LS_RESOLVE_PATH_USE_MAX
+};
+
+#define LS_DEVICE_NAME_MAX 64
+
+struct rdma_ls_resolve_header {
+	__u8 device_name[LS_DEVICE_NAME_MAX];
+	__u8 port_num;
+	__u8 path_use;
+};
+
+struct rdma_ls_ip_resolve_header {
+	__u32 ifindex;
+};
+
+/* Local service attribute type */
+#define RDMA_NLA_F_MANDATORY	(1 << 13)
+#define RDMA_NLA_TYPE_MASK	(~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | \
+				  RDMA_NLA_F_MANDATORY))
+
+/*
+ * Local service attributes:
+ *   Attr Name       Size                       Byte order
+ *   -----------------------------------------------------
+ *   PATH_RECORD     struct ib_path_rec_data
+ *   TIMEOUT         u32                        cpu
+ *   SERVICE_ID      u64                        cpu
+ *   DGID            u8[16]                     BE
+ *   SGID            u8[16]                     BE
+ *   TCLASS          u8
+ *   PKEY            u16                        cpu
+ *   QOS_CLASS       u16                        cpu
+ *   IPV4            u32                        BE
+ *   IPV6            u8[16]                     BE
+ */
+enum {
+	LS_NLA_TYPE_UNSPEC = 0,
+	LS_NLA_TYPE_PATH_RECORD,
+	LS_NLA_TYPE_TIMEOUT,
+	LS_NLA_TYPE_SERVICE_ID,
+	LS_NLA_TYPE_DGID,
+	LS_NLA_TYPE_SGID,
+	LS_NLA_TYPE_TCLASS,
+	LS_NLA_TYPE_PKEY,
+	LS_NLA_TYPE_QOS_CLASS,
+	LS_NLA_TYPE_IPV4,
+	LS_NLA_TYPE_IPV6,
+	LS_NLA_TYPE_MAX
+};
+
+/* Local service DGID/SGID attribute: big endian */ struct 
+rdma_nla_ls_gid {
+	__u8		gid[16];
+};
+
+#endif /* _UAPI_RDMA_NETLINK_H */
diff --git a/ibacm/src/acm.c b/ibacm/src/acm.c index cc7dd065f69c..5f4068f619b4 100644
--- a/ibacm/src/acm.c
+++ b/ibacm/src/acm.c
@@ -61,9 +61,6 @@
 #include <ccan/list.h>
 #include "acm_mad.h"
 #include "acm_util.h"
-#if !defined(RDMA_NL_LS_F_ERR)
-	#include "acm_netlink.h"
-#endif
 
 #define src_out     data[0]
 #define src_index   data[1]
diff --git a/ibacm/src/acm_netlink.h b/ibacm/src/acm_netlink.h deleted file mode 100644 index 867ae8c838fc..000000000000 diff --git a/iwpmd/iwarp_pm.h b/iwpmd/iwarp_pm.h index b5a5a457a423..fc09e4fd752a 100644
--- a/iwpmd/iwarp_pm.h
+++ b/iwpmd/iwarp_pm.h
@@ -53,7 +53,7 @@
 #include <syslog.h>
 #include <netlink/msg.h>
 #include <ccan/list.h>
-#include "iwpm_netlink.h"
+#include <rdma/rdma_netlink.h>
 
 #define IWARP_PM_PORT          3935
 #define IWARP_PM_VER_SHIFT     6
diff --git a/iwpmd/iwarp_pm_common.c b/iwpmd/iwarp_pm_common.c index 58b1089a1998..941e0406ade7 100644
--- a/iwpmd/iwarp_pm_common.c
+++ b/iwpmd/iwarp_pm_common.c
@@ -33,11 +33,6 @@
 
 #include "iwarp_pm.h"
 
-/* Necessary only for SLES11 */
-#if !defined (NETLINK_RDMA)
-	#define NETLINK_RDMA	        20
-#endif
-
 /* iwpm config params */
 static const char * iwpm_param_names[IWPM_PARAM_NUM] =
 	{ "nl_sock_rbuf_size" };
diff --git a/iwpmd/iwarp_pm_server.c b/iwpmd/iwarp_pm_server.c index ab90c6c4b077..ef541c8175ed 100644
--- a/iwpmd/iwarp_pm_server.c
+++ b/iwpmd/iwarp_pm_server.c
@@ -1214,8 +1214,8 @@ static int init_iwpm_clients(__u32 iwarp_clients[])  {
 	int client_num = 2;
 
-	iwarp_clients[0] = RDMA_NL_NES;
-	iwarp_clients[1] = RDMA_NL_C4IW;
+	iwarp_clients[0] = RDMA_NL_IWCM;
+	iwarp_clients[1] = RDMA_NL_IWCM+1; /* Legacy RDMA_NL_C4IW for old 
+kernels */
 
 	return client_num;
 }
diff --git a/iwpmd/iwpm_netlink.h b/iwpmd/iwpm_netlink.h deleted file mode 100644 index 0edcb620de99..000000000000
--
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-28 18:51 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <20161028160145.GC5621@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

On 10/28/2016 08:51 AM, Keith Busch wrote:
> On Wed, Oct 26, 2016 at 03:56:04PM -0700, Bart Van Assche wrote:
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 7bb73ba..b662416 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -205,7 +205,7 @@ void nvme_requeue_req(struct request *req)
>>
>>  	blk_mq_requeue_request(req, false);
>>  	spin_lock_irqsave(req->q->queue_lock, flags);
>> -	if (!blk_queue_stopped(req->q))
>> +	if (!blk_mq_queue_stopped(req->q))
>>  		blk_mq_kick_requeue_list(req->q);
>>  	spin_unlock_irqrestore(req->q->queue_lock, flags);
>>  }
>> @@ -2079,10 +2079,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>
>>  	mutex_lock(&ctrl->namespaces_mutex);
>>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>> -		spin_lock_irq(ns->queue->queue_lock);
>> -		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
>> -		spin_unlock_irq(ns->queue->queue_lock);
>> -
>>  		blk_mq_cancel_requeue_work(ns->queue);
>>  		blk_mq_stop_hw_queues(ns->queue);
>
> There's actually a reason the queue stoppage is using a different flag
> than blk_mq_queue_stopped: kicking the queue starts stopped queues.
> The driver has to ensure the requeue work can't be kicked prior to
> cancelling the current requeue work. Once we know requeue work isn't
> running and can't restart again, then we're safe to stop the hw queues.
>
> It's a pretty obscure condition, requiring the controller post an
> error completion at the same time the driver decides to reset the
> controller. Here's the sequence with the wrong outcome:
>
>  CPU A                   CPU B
>  -----                   -----
> nvme_stop_queues         nvme_requeue_req
>  blk_mq_stop_hw_queues    if (blk_mq_queue_stopped) <- returns false
>   blk_mq_stop_hw_queue     blk_mq_kick_requeue_list
>                          blk_mq_requeue_work
>                           blk_mq_start_hw_queues

Hello Keith,

I think it is wrong that kicking the requeue list starts stopped queues 
because this makes it impossible to stop request processing without 
setting an additional flag next to BLK_MQ_S_STOPPED. Can you have a look 
at the attached two patches? These patches survive my dm-multipath and 
SCSI tests.

Thanks,

Bart.



[-- Attachment #2: 0001-block-Avoid-that-requeueing-starts-stopped-queues.patch --]
[-- Type: text/x-patch, Size: 3538 bytes --]

>From e93799f726485a3eeee98837c992c5c0068d7180 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Fri, 28 Oct 2016 10:48:58 -0700
Subject: [PATCH 1/2] block: Avoid that requeueing starts stopped queues

Since blk_mq_requeue_work() starts stopped queues and since
execution of this function can be scheduled after a queue has
been stopped it is not possible to stop queues without using
an additional state variable to track whether or not the queue
has been stopped. Hence modify blk_mq_requeue_work() such that it
does not start stopped queues. My conclusion after a review of
the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
callers is as follows:
* In the dm driver starting and stopping queues should only happen
  if __dm_suspend() or __dm_resume() is called and not if the
  requeue list is processed.
* In the SCSI core queue stopping and starting should only be
  performed by the scsi_internal_device_block() and
  scsi_internal_device_unblock() functions but not by any other
  function.
* In the NVMe core only the functions that call
  blk_mq_start_stopped_hw_queues() explicitly should start stopped
  queues.
* A blk_mq_start_stopped_hwqueues() call must be added in the
  xen-blkfront driver in its blkif_recover() function.
---
 block/blk-mq.c               | 6 +-----
 drivers/block/xen-blkfront.c | 1 +
 drivers/md/dm-rq.c           | 7 +------
 drivers/scsi/scsi_lib.c      | 1 -
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a49b8af..24dfd0d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -528,11 +528,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_insert_request(rq, false, false, false);
 	}
 
-	/*
-	 * Use the start variant of queue running here, so that running
-	 * the requeue work will kick stopped queues.
-	 */
-	blk_mq_start_hw_queues(q);
+	blk_mq_run_hw_queues(q, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1ca702d..a3e1727 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req, false);
 	}
+	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 107ed19..b951ae83 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -326,12 +326,7 @@ static void dm_old_requeue_request(struct request *rq)
 
 static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_mq_queue_stopped(q))
-		blk_mq_delay_kick_requeue_list(q, msecs);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_mq_delay_kick_requeue_list(q, msecs);
 }
 
 void dm_mq_kick_requeue_list(struct mapped_device *md)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4cddaff..94f54ac 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1939,7 +1939,6 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 out:
 	switch (ret) {
 	case BLK_MQ_RQ_QUEUE_BUSY:
-		blk_mq_stop_hw_queue(hctx);
 		if (atomic_read(&sdev->device_busy) == 0 &&
 		    !scsi_device_blocked(sdev))
 			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
-- 
2.10.1


[-- Attachment #3: 0002-blk-mq-Remove-blk_mq_cancel_requeue_work.patch --]
[-- Type: text/x-patch, Size: 2638 bytes --]

>From 47eec3bdcf4b673e3ab606543cb3acdf7f4de593 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Fri, 28 Oct 2016 10:50:04 -0700
Subject: [PATCH 2/2] blk-mq: Remove blk_mq_cancel_requeue_work()

Since blk_mq_requeue_work() no longer restarts stopped queues
canceling requeue work is no longer needed to prevent that a
stopped queue would be restarted. Hence remove this function.
---
 block/blk-mq.c           | 6 ------
 drivers/md/dm-rq.c       | 2 --
 drivers/nvme/host/core.c | 1 -
 include/linux/blk-mq.h   | 1 -
 4 files changed, 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 24dfd0d..1aa79e5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -557,12 +557,6 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
-void blk_mq_cancel_requeue_work(struct request_queue *q)
-{
-	cancel_delayed_work_sync(&q->requeue_work);
-}
-EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_schedule_delayed_work(&q->requeue_work, 0);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b951ae83..7f426ab 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,8 +102,6 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
-	/* Avoid that requeuing could restart the queue. */
-	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
 	/* Wait until dm_mq_queue_rq() has finished. */
 	blk_mq_quiesce_queue(q);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d6ab9a0..a67e815 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2075,7 +2075,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		struct request_queue *q = ns->queue;
 
-		blk_mq_cancel_requeue_work(q);
 		blk_mq_stop_hw_queues(q);
 		blk_mq_quiesce_queue(q);
 	}
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 76f6319..35a0af5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -221,7 +221,6 @@ void __blk_mq_end_request(struct request *rq, int error);
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
 				bool kick_requeue_list);
-void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1


^ permalink raw reply related

* Re: [PATCH 2/3] iopmem : Add a block device driver for PCIe attached IO memory.
From: Logan Gunthorpe @ 2016-10-28 19:22 UTC (permalink / raw)
  To: Christoph Hellwig, Stephen Bates
  Cc: jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	sbates-Rgftl6RXld5BDgjK7y7TUQ, haggaie-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-y27Ovi1pjclAfugRpC6u6w, corbet-T1hC0tSOHrs,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	jim.macdonald-FgSLVYC75IpWk0Htik3J/w,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, axboe-b10kYP2dOMg
In-Reply-To: <20161028064556.GA3231-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>

Hi Christoph,

Thanks so much for the detailed review of the code! Even though by the
sounds of things we will be moving to device dax and most of this is
moot. Still, it's great to get some feedback and learn a few things.

I've given some responses below.

On 28/10/16 12:45 AM, Christoph Hellwig wrote:
>> + * This driver is heavily based on drivers/block/pmem.c.
>> + * Copyright (c) 2014, Intel Corporation.
>> + * Copyright (C) 2007 Nick Piggin
>> + * Copyright (C) 2007 Novell Inc.
> 
> Is there anything left of it actually?  I didn't spot anything
> obvious.  Nevermind that we don't have a file with that name anymore :)

Yes, actually there's still a lot of similarities with the current
pmem.c. Though, yes, the path was on oversight. Some of this code is
getting pretty old (it started from an out-of-tree version of pmem.c)
and we've tried our best to track as many of the changes to the pmem.c
as possible. This proved to be difficult. Note: this is now the nvdimm
pmem and not the dax pmem (drivers/nvdimm/pmem.c)

>> +  /*
>> +   * We can only access the iopmem device with full 32-bit word
>> +   * accesses which cannot be gaurantee'd by the regular memcpy
>> +   */
> 
> Odd comment formatting. 

Oops. I'm surprised check_patch didn't pick up on that.

> 
>> +static void memcpy_from_iopmem(void *dst, const void *src, size_t sz)
>> +{
>> +	u64 *wdst = dst;
>> +	const u64 *wsrc = src;
>> +	u64 tmp;
>> +
>> +	while (sz >= sizeof(*wdst)) {
>> +		*wdst++ = *wsrc++;
>> +		sz -= sizeof(*wdst);
>> +	}
>> +
>> +	if (!sz)
>> +		return;
>> +
>> +	tmp = *wsrc;
>> +	memcpy(wdst, &tmp, sz);
>> +}
> 
> And then we dod a memcpy here anyway.  And no volatile whatsover, so
> the compiler could do anything to it.  I defintively feel a bit uneasy
> about having this in the driver as well.  Can we define the exact
> semantics for this and define it by the system, possibly in an arch
> specific way?

Yeah, you're right. We should have reviewed this function a bit more.
Anyway, I'd be interested in learning a better approach to forcing a
copy from a mapped BAR with larger widths.


>> +static void iopmem_do_bvec(struct iopmem_device *iopmem, struct page *page,
>> +			   unsigned int len, unsigned int off, bool is_write,
>> +			   sector_t sector)
>> +{
>> +	phys_addr_t iopmem_off = sector * 512;
>> +	void *iopmem_addr = iopmem->virt_addr + iopmem_off;
>> +
>> +	if (!is_write) {
>> +		read_iopmem(page, off, iopmem_addr, len);
>> +		flush_dcache_page(page);
>> +	} else {
>> +		flush_dcache_page(page);
>> +		write_iopmem(iopmem_addr, page, off, len);
>> +	}
> 
> How about moving the  address and offset calculation as well as the
> cache flushing into read_iopmem/write_iopmem and removing this function?

Could do. This was copied from the existing pmem.c and once the bad_pmem
stuff was stripped out this function became relatively simple.


> 
>> +static blk_qc_t iopmem_make_request(struct request_queue *q, struct bio *bio)
>> +{
>> +	struct iopmem_device *iopmem = q->queuedata;
>> +	struct bio_vec bvec;
>> +	struct bvec_iter iter;
>> +
>> +	bio_for_each_segment(bvec, bio, iter) {
>> +		iopmem_do_bvec(iopmem, bvec.bv_page, bvec.bv_len,
>> +			    bvec.bv_offset, op_is_write(bio_op(bio)),
>> +			    iter.bi_sector);
> 
> op_is_write just checks the data direction.  I'd feel much more
> comfortable with a switch on the op, e.g.

That makes sense. This was also copied from pmem.c, so this same change
may make sense there too.


>> +static long iopmem_direct_access(struct block_device *bdev, sector_t sector,
>> +			       void **kaddr, pfn_t *pfn, long size)
>> +{
>> +	struct iopmem_device *iopmem = bdev->bd_queue->queuedata;
>> +	resource_size_t offset = sector * 512;
>> +
>> +	if (!iopmem)
>> +		return -ENODEV;
> 
> I don't think this can ever happen, can it?

Yes, I think now that's the case. This is probably a holdover from a
previous version.

> Just use ida_simple_get/ida_simple_remove instead to take care
> of the locking and preloading, and get rid of these two functions.

Thanks, noted. That would be much better. I never found a simple example
of that when I was looking, though I expected there should have been.

> 
>> +static int iopmem_attach_disk(struct iopmem_device *iopmem)
>> +{
>> +	struct gendisk *disk;
>> +	int nid = dev_to_node(iopmem->dev);
>> +	struct request_queue *q = iopmem->queue;
>> +
>> +	blk_queue_write_cache(q, true, true);
> 
> You don't handle flush commands or the fua bit in make_request, so
> this setting seems wrong.

Yup, ok. I'm afraid this is a case of copying without complete
comprehension.

> 
>> +	int err = 0;
>> +	int nid = dev_to_node(&pdev->dev);
>> +
>> +	if (pci_enable_device_mem(pdev) < 0) {
> 
> propagate the actual error code, please.

Hmm, yup. Not sure why that was missed.

Thanks,

Logan

^ permalink raw reply

* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jason Gunthorpe @ 2016-10-28 20:57 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161028172503.GA28451-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Fri, Oct 28, 2016 at 11:25:03AM -0600, Jason Gunthorpe wrote:

> > Nah, I think things look good, and we can always keep tweaking as needed,
> > this is a solid update to work on top of.
> 
> Okay, I'll patch in the changes from this discussion and send the
> pull.

I added two commits:

https://github.com/linux-rdma/rdma-core/pull/30

This bit seems important:

--- a/redhat/rdma-core.spec
+++ b/redhat/rdma-core.spec
@@ -267,7 +267,7 @@ install -D -m0644 redhat/rdma.fixup-mtrr.awk %{buildroot}%{_libexecdir}/rdma-fix
 install -D -m0755 redhat/rdma.mlx4-setup.sh %{buildroot}%{_libexecdir}/mlx4-setup.sh
 
 # ibacm
-%{buildroot}/%{_bindir}/ib_acme -D . -O
+bin/ib_acme -D . -O
 install -D -m0644 ibacm_opts.cfg %{buildroot}%{_sysconfdir}/rdma/
 install -D -m0644 redhat/ibacm.service %{buildroot}%{_unitdir}/
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Keith Busch @ 2016-10-28 21:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <655c62ca-5da8-3c8f-9e56-016b7d518267@sandisk.com>

On Fri, Oct 28, 2016 at 11:51:35AM -0700, Bart Van Assche wrote:
> I think it is wrong that kicking the requeue list starts stopped queues
> because this makes it impossible to stop request processing without setting
> an additional flag next to BLK_MQ_S_STOPPED. Can you have a look at the
> attached two patches? These patches survive my dm-multipath and SCSI tests.

Hi Bart,

These look good to me, and succesful on my NVMe tests.

Thanks,
Keith


> From e93799f726485a3eeee98837c992c5c0068d7180 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Fri, 28 Oct 2016 10:48:58 -0700
> Subject: [PATCH 1/2] block: Avoid that requeueing starts stopped queues
> 
> Since blk_mq_requeue_work() starts stopped queues and since
> execution of this function can be scheduled after a queue has
> been stopped it is not possible to stop queues without using
> an additional state variable to track whether or not the queue
> has been stopped. Hence modify blk_mq_requeue_work() such that it
> does not start stopped queues. My conclusion after a review of
> the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
> callers is as follows:
> * In the dm driver starting and stopping queues should only happen
>   if __dm_suspend() or __dm_resume() is called and not if the
>   requeue list is processed.
> * In the SCSI core queue stopping and starting should only be
>   performed by the scsi_internal_device_block() and
>   scsi_internal_device_unblock() functions but not by any other
>   function.
> * In the NVMe core only the functions that call
>   blk_mq_start_stopped_hw_queues() explicitly should start stopped
>   queues.
> * A blk_mq_start_stopped_hwqueues() call must be added in the
>   xen-blkfront driver in its blkif_recover() function.
> ---
>  block/blk-mq.c               | 6 +-----
>  drivers/block/xen-blkfront.c | 1 +
>  drivers/md/dm-rq.c           | 7 +------
>  drivers/scsi/scsi_lib.c      | 1 -
>  4 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a49b8af..24dfd0d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -528,11 +528,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  		blk_mq_insert_request(rq, false, false, false);
>  	}
>  
> -	/*
> -	 * Use the start variant of queue running here, so that running
> -	 * the requeue work will kick stopped queues.
> -	 */
> -	blk_mq_start_hw_queues(q);
> +	blk_mq_run_hw_queues(q, false);
>  }
>  
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 1ca702d..a3e1727 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
>  		BUG_ON(req->nr_phys_segments > segs);
>  		blk_mq_requeue_request(req, false);
>  	}
> +	blk_mq_start_stopped_hwqueues(info->rq);
>  	blk_mq_kick_requeue_list(info->rq);
>  
>  	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 107ed19..b951ae83 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -326,12 +326,7 @@ static void dm_old_requeue_request(struct request *rq)
>  
>  static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
>  {
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(q->queue_lock, flags);
> -	if (!blk_mq_queue_stopped(q))
> -		blk_mq_delay_kick_requeue_list(q, msecs);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
> +	blk_mq_delay_kick_requeue_list(q, msecs);
>  }
>  
>  void dm_mq_kick_requeue_list(struct mapped_device *md)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 4cddaff..94f54ac 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1939,7 +1939,6 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  out:
>  	switch (ret) {
>  	case BLK_MQ_RQ_QUEUE_BUSY:
> -		blk_mq_stop_hw_queue(hctx);
>  		if (atomic_read(&sdev->device_busy) == 0 &&
>  		    !scsi_device_blocked(sdev))
>  			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
> -- 
> 2.10.1
> 

> From 47eec3bdcf4b673e3ab606543cb3acdf7f4de593 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Fri, 28 Oct 2016 10:50:04 -0700
> Subject: [PATCH 2/2] blk-mq: Remove blk_mq_cancel_requeue_work()
> 
> Since blk_mq_requeue_work() no longer restarts stopped queues
> canceling requeue work is no longer needed to prevent that a
> stopped queue would be restarted. Hence remove this function.
> ---
>  block/blk-mq.c           | 6 ------
>  drivers/md/dm-rq.c       | 2 --
>  drivers/nvme/host/core.c | 1 -
>  include/linux/blk-mq.h   | 1 -
>  4 files changed, 10 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 24dfd0d..1aa79e5 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -557,12 +557,6 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  }
>  EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
>  
> -void blk_mq_cancel_requeue_work(struct request_queue *q)
> -{
> -	cancel_delayed_work_sync(&q->requeue_work);
> -}
> -EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
> -
>  void blk_mq_kick_requeue_list(struct request_queue *q)
>  {
>  	kblockd_schedule_delayed_work(&q->requeue_work, 0);
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index b951ae83..7f426ab 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -102,8 +102,6 @@ static void dm_mq_stop_queue(struct request_queue *q)
>  	if (blk_mq_queue_stopped(q))
>  		return;
>  
> -	/* Avoid that requeuing could restart the queue. */
> -	blk_mq_cancel_requeue_work(q);
>  	blk_mq_stop_hw_queues(q);
>  	/* Wait until dm_mq_queue_rq() has finished. */
>  	blk_mq_quiesce_queue(q);
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d6ab9a0..a67e815 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2075,7 +2075,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>  		struct request_queue *q = ns->queue;
>  
> -		blk_mq_cancel_requeue_work(q);
>  		blk_mq_stop_hw_queues(q);
>  		blk_mq_quiesce_queue(q);
>  	}
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 76f6319..35a0af5 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -221,7 +221,6 @@ void __blk_mq_end_request(struct request *rq, int error);
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
>  				bool kick_requeue_list);
> -void blk_mq_cancel_requeue_work(struct request_queue *q);
>  void blk_mq_kick_requeue_list(struct request_queue *q);
>  void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
>  void blk_mq_abort_requeue_list(struct request_queue *q);
> -- 
> 2.10.1

^ permalink raw reply

* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jarod Wilson @ 2016-10-28 21:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161028172503.GA28451-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Fri, Oct 28, 2016 at 11:25:03AM -0600, Jason Gunthorpe wrote:
> On Fri, Oct 28, 2016 at 01:11:47PM -0400, Jarod Wilson wrote:
> > On Thu, Oct 27, 2016 at 03:10:59PM -0600, Jason Gunthorpe wrote:
> > > On Thu, Oct 20, 2016 at 11:33:57AM -0400, Jarod Wilson wrote:
> > > > @@ -7,10 +7,11 @@ Summary: RDMA core userspace libraries and daemons
> > > >  #  providers/ipathverbs/ Dual licensed using a BSD license with an extra patent clause
> > > >  #  providers/rxe/ Incorporates code from ipathverbs and contains the patent clause
> > > >  #  providers/hfi1verbs Uses the 3 Clause BSD license
> > > > -License: (GPLv2 or BSD) and (GPLv2 or PathScale-BSD)
> > > > +License: GPLv2 or BSD
> > > 
> > > Is this Ok? The Fedora guidelines I read suggested the PathScale
> > > license would need to be assigned a short tag, and I'd be surprised if
> > > 'BSD' is the right tag due to the patent stuff..
> > 
> > Our standalone libipathverbs has just "GPLv2 or BSD",
> 
> I suspect that was a mistake, the difference in the pathscale license
> is subtle and several other people assumed it was the cannonical 'BSD'
> text...

I sent something off to the Fedora legal mailing list, we'll see what they
have to say. I didn't find anything interesting in the initial package
review, nobody questioned the license there, but that could easily have
been an oversight.

-- 
Jarod Wilson
jarod-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] build: Fix build script to use correct cmake cmd
From: Jason Gunthorpe @ 2016-10-28 22:09 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Dennis Dalessandro,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161025101200.GP25013-2ukJVAZIZ/Y@public.gmane.org>

On Tue, Oct 25, 2016 at 01:12:00PM +0300, Leon Romanovsky wrote:

> > Maybe this is a good time to ask if anyone is interested in the docker
> > stuff I have - eg should I make it pushable? It is easy to use, but
> > you need to have docker installed.
> 
> I would be happy to get it and be more confident in my local tests.

Okay, let me look at it some more, I think it could be streamlined a
bit.

> > The docker script is able to run almost-travis locally, as well as do
> > clean package builds for all distros.
> 
> It looks like it goes beyond rdma-core definition. Do we want to put it
> separately or anyway integrate into main library?

It is a 500 line script, I'd keep it internal. The latest version has
learned to use different spec files depend on the distro build too, so
there is a bunch of subtle integration.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
From: Hefty, Sean @ 2016-10-28 22:53 UTC (permalink / raw)
  To: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1477579398-6875-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

> The current code creates an IDR per type. Since types are currently
> common for all vendors and known in advance, this was good enough.
> However, the proposed ioctl based infrastructure allows each vendor
> to declare only some of the common types and declare its own specific
> types.
> 
> Thus, we decided to implement IDR to be per device and refactor it to
> use a new file.

I think this needs to be more abstract.  I would consider introducing the concept of an 'ioctl provider', with the idr per ioctl provider.  You could then make each ib_device an ioctl provider.  (Just embed the structure).  I believe this will be necessary to support the rdma_cm, ib_cm, as well as devices that export different sets of ioctls, where an ib_device isn't necessarily available.

Essentially, I would treat plugging into the uABI independent from plugging into the kernel verbs API.  Otherwise, I think we'll end up with multiple ioctl 'frameworks'.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [Bug 185551] New: rxe_rdma: RTNL: assertion failed at net/core/ethtool.c (550)
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-10-28 23:46 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

https://bugzilla.kernel.org/show_bug.cgi?id=185551

            Bug ID: 185551
           Summary: rxe_rdma: RTNL: assertion failed at net/core/ethtool.c
                    (550)
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.8.5-1.el7.elrepo.x86_64
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Infiniband/RDMA
          Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
          Reporter: v.badalyan-mR0gyJWGnJgvJsYlp49lxw@public.gmane.org
        Regression: No

#rxe_cfg start
#rxe_cfg add eno1.4004
#rxe_cfg 

[root@intel2-i5 ~]# rxe_cfg
  Name       Link  Driver  Speed  NMTU  IPv4_addr  RDEV  RMTU
  DevNet     yes   bridge
  eno1       yes   e1000e
  eno1.4003  yes   802.1Q
  eno1.4004  yes   802.1Q                          rxe0  4096  (5)
  enp2s1     yes   r8169
  ovirtmgmt  yes   bridge


[  174.931213] rxe: loaded
[  174.950852] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.950882] CPU: 2 PID: 3158 Comm: sh Not tainted 4.8.5-1.el7.elrepo.x86_64
#1
[  174.950883] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.950884]  0000000000000286 000000007b249d8e ffff8802fbfe3a98
ffffffff81353eef
[  174.950886]  ffff88030c174000 ffff8802fbfe3b18 ffff8802fbfe3af8
ffffffff816109f7
[  174.950888]  ffffea000befc600 00000000024080c0 0000000000000001
ffffffff82164820
[  174.950889] Call Trace:
[  174.950894]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.950897]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.950900]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.950904]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.950906]  [<ffffffff8136023e>] ? vsnprintf+0x34e/0x4d0
[  174.950908]  [<ffffffffa09226bd>] rxe_port_immutable+0x2d/0x70 [rdma_rxe]
[  174.950919]  [<ffffffffa088426c>] ib_register_device+0x27c/0x500 [ib_core]
[  174.950921]  [<ffffffff811f4cee>] ? __kmalloc+0x1ee/0x200
[  174.950923]  [<ffffffff8137307d>] ? list_del+0xd/0x30
[  174.950925]  [<ffffffffa09236a1>] rxe_register_device+0x2a1/0x300 [rdma_rxe]
[  174.950927]  [<ffffffffa091b674>] rxe_add+0x544/0x5c0 [rdma_rxe]
[  174.950929]  [<ffffffffa09281a3>] rxe_net_add+0x43/0xa0 [rdma_rxe]
[  174.950931]  [<ffffffffa0928685>] rxe_param_set_add+0x65/0x148 [rdma_rxe]
[  174.950934]  [<ffffffff810a0678>] param_attr_store+0x68/0xd0
[  174.950935]  [<ffffffff8109fa0d>] module_attr_store+0x1d/0x30
[  174.950937]  [<ffffffff8129cbba>] sysfs_kf_write+0x3a/0x50
[  174.950937]  [<ffffffff8129c6e0>] kernfs_fop_write+0x120/0x1b0
[  174.950939]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  174.950941]  [<ffffffff810cebdf>] ? percpu_down_read+0x1f/0x50
[  174.950942]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  174.950944]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  174.950945]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  174.950947]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.950948]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  174.950967] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.950985] CPU: 2 PID: 3158 Comm: sh Not tainted 4.8.5-1.el7.elrepo.x86_64
#1
[  174.950985] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.950986]  0000000000000286 000000007b249d8e ffff8802fbfe39d8
ffffffff81353eef
[  174.950987]  ffff88030c174000 ffff8802fbfe3a58 ffff8802fbfe3a38
ffffffff816109f7
[  174.950988]  ffff8802fbfe3a58 ffff8802fbfe3a48 ffff8802fbfe3a30
ffff8802fbfe3a98
[  174.950990] Call Trace:
[  174.950991]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.950993]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.950998]  [<ffffffffa08885d0>] ? _add_netdev_ips+0x2f0/0x380 [ib_core]
[  174.950999]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.951001]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.951003]  [<ffffffff8160ef61>] ? netdev_run_todo+0x61/0x320
[  174.951007]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.951008]  [<ffffffff811f462b>] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  174.951012]  [<ffffffffa08863e7>] ib_cache_update+0xc7/0x350 [ib_core]
[  174.951016]  [<ffffffffa0884677>] ? ib_enum_roce_netdev+0xe7/0x100 [ib_core]
[  174.951020]  [<ffffffffa08870a1>] ib_cache_setup_one+0x271/0x400 [ib_core]
[  174.951024]  [<ffffffffa08842ce>] ib_register_device+0x2de/0x500 [ib_core]
[  174.951025]  [<ffffffff811f4cee>] ? __kmalloc+0x1ee/0x200
[  174.951026]  [<ffffffff8137307d>] ? list_del+0xd/0x30
[  174.951029]  [<ffffffffa09236a1>] rxe_register_device+0x2a1/0x300 [rdma_rxe]
[  174.951030]  [<ffffffffa091b674>] rxe_add+0x544/0x5c0 [rdma_rxe]
[  174.951032]  [<ffffffffa09281a3>] rxe_net_add+0x43/0xa0 [rdma_rxe]
[  174.951034]  [<ffffffffa0928685>] rxe_param_set_add+0x65/0x148 [rdma_rxe]
[  174.951036]  [<ffffffff810a0678>] param_attr_store+0x68/0xd0
[  174.951037]  [<ffffffff8109fa0d>] module_attr_store+0x1d/0x30
[  174.951038]  [<ffffffff8129cbba>] sysfs_kf_write+0x3a/0x50
[  174.951039]  [<ffffffff8129c6e0>] kernfs_fop_write+0x120/0x1b0
[  174.951041]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  174.951042]  [<ffffffff810cebdf>] ? percpu_down_read+0x1f/0x50
[  174.951043]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  174.951044]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  174.951045]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  174.951047]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.951048]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  174.951084] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.951102] CPU: 2 PID: 3158 Comm: sh Not tainted 4.8.5-1.el7.elrepo.x86_64
#1
[  174.951102] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.951103]  0000000000000286 000000007b249d8e ffff8802fbfe39d8
ffffffff81353eef
[  174.951104]  ffff88030c174000 ffff8802fbfe3a58 ffff8802fbfe3a38
ffffffff816109f7
[  174.951105]  ffffffff81299ec6 0000000000000000 ffff8802fbfe3a10
ffffffff81299f20
[  174.951107] Call Trace:
[  174.951109]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.951110]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.951112]  [<ffffffff81299ec6>] ? kernfs_leftmost_descendant+0x36/0x50
[  174.951113]  [<ffffffff81299f20>] ? kernfs_next_descendant_post+0x40/0x50
[  174.951115]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.951117]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.951118]  [<ffffffff8129b191>] ? kernfs_create_dir_ns+0x51/0x80
[  174.951119]  [<ffffffff8129d5a0>] ? sysfs_create_dir_ns+0x40/0x90
[  174.951123]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.951125]  [<ffffffff811afb92>] ? kfree_const+0x22/0x30
[  174.951129]  [<ffffffffa0882d4e>] add_port+0x3e/0x480 [ib_core]
[  174.951133]  [<ffffffffa088326c>] ib_device_register_sysfs+0xdc/0x140
[ib_core]
[  174.951136]  [<ffffffffa0884360>] ib_register_device+0x370/0x500 [ib_core]
[  174.951137]  [<ffffffff811f4cee>] ? __kmalloc+0x1ee/0x200
[  174.951139]  [<ffffffff8137307d>] ? list_del+0xd/0x30
[  174.951141]  [<ffffffffa09236a1>] rxe_register_device+0x2a1/0x300 [rdma_rxe]
[  174.951142]  [<ffffffffa091b674>] rxe_add+0x544/0x5c0 [rdma_rxe]
[  174.951144]  [<ffffffffa09281a3>] rxe_net_add+0x43/0xa0 [rdma_rxe]
[  174.951146]  [<ffffffffa0928685>] rxe_param_set_add+0x65/0x148 [rdma_rxe]
[  174.951148]  [<ffffffff810a0678>] param_attr_store+0x68/0xd0
[  174.951149]  [<ffffffff8109fa0d>] module_attr_store+0x1d/0x30
[  174.951150]  [<ffffffff8129cbba>] sysfs_kf_write+0x3a/0x50
[  174.951151]  [<ffffffff8129c6e0>] kernfs_fop_write+0x120/0x1b0
[  174.951153]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  174.951154]  [<ffffffff810cebdf>] ? percpu_down_read+0x1f/0x50
[  174.951155]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  174.951156]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  174.951157]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  174.951159]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.951160]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  174.954198] rxe: set rxe0 active
[  174.954199] rxe: added rxe0 to eno1.4004
[  174.954384] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.954408] CPU: 3 PID: 90 Comm: kworker/3:1 Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  174.954409] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.954423] Workqueue: infiniband ib_cache_task [ib_core]
[  174.954425]  0000000000000286 00000000c968401e ffff88030dabfc58
ffffffff81353eef
[  174.954427]  ffff88030c174000 ffff88030dabfcd8 ffff88030dabfcb8
ffffffff816109f7
[  174.954428]  ffff8803110c0800 ffff880311663240 0000000000000000
ffff88030dabfd80
[  174.954430] Call Trace:
[  174.954434]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.954437]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.954439]  [<ffffffff810bf41e>] ? load_balance+0x19e/0x9b0
[  174.954441]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.954445]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.954448]  [<ffffffff810b5f2e>] ? account_entity_dequeue+0xae/0xd0
[  174.954449]  [<ffffffff810b942a>] ? dequeue_entity+0x19a/0x5d0
[  174.954453]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.954455]  [<ffffffff811f462b>] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  174.954459]  [<ffffffffa08863e7>] ib_cache_update+0xc7/0x350 [ib_core]
[  174.954463]  [<ffffffffa088668a>] ib_cache_task+0x1a/0x30 [ib_core]
[  174.954465]  [<ffffffff8109ad82>] process_one_work+0x152/0x400
[  174.954467]  [<ffffffff8109b675>] worker_thread+0x125/0x4b0
[  174.954468]  [<ffffffff8109b550>] ? rescuer_thread+0x380/0x380
[  174.954470]  [<ffffffff810a1168>] kthread+0xd8/0xf0
[  174.954472]  [<ffffffff8173ad3f>] ret_from_fork+0x1f/0x40
[  174.954473]  [<ffffffff810a1090>] ? kthread_park+0x60/0x60
[  174.996381] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  174.997259] RTNL: assertion failed at net/core/ethtool.c (550)
[  174.997284] CPU: 1 PID: 3212 Comm: modprobe Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  174.997285] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  174.997286]  0000000000000286 0000000092f5ac1c ffff8802ed3af9f8
ffffffff81353eef
[  174.997288]  ffff88030c174000 ffff8802ed3afa78 ffff8802ed3afa58
ffffffff816109f7
[  174.997289]  ffff880313000d00 ffff8802ed3afad8 ffffffff811f41cf
ffff88031fa9c6c0
[  174.997291] Call Trace:
[  174.997296]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  174.997299]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  174.997301]  [<ffffffff811f41cf>] ? ___slab_alloc+0x34f/0x4c0
[  174.997303]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  174.997307]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  174.997316]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  174.997318]  [<ffffffffa092283a>] ? post_one_recv.isra.14+0xaa/0x110
[rdma_rxe]
[  174.997320]  [<ffffffffa09d6670>] srpt_refresh_port+0x80/0x1a0 [ib_srpt]
[  174.997322]  [<ffffffffa09229d2>] ? rxe_post_srq_recv+0x72/0xa0 [rdma_rxe]
[  174.997324]  [<ffffffffa09d9196>] srpt_add_one+0x276/0x440 [ib_srpt]
[  174.997326]  [<ffffffffa09d6150>] ?
srpt_tpg_attrib_srp_max_rdma_size_show+0x30/0x30 [ib_srpt]
[  174.997330]  [<ffffffffa0883c3f>] ib_register_client+0x5f/0xc0 [ib_core]
[  174.997331]  [<ffffffffa08ec000>] ? 0xffffffffa08ec000
[  174.997333]  [<ffffffffa08ec07c>] srpt_init_module+0x7c/0x1000 [ib_srpt]
[  174.997334]  [<ffffffffa08ec000>] ? 0xffffffffa08ec000
[  174.997336]  [<ffffffff81002190>] do_one_initcall+0x50/0x190
[  174.997337]  [<ffffffff811f462b>] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  174.997338]  [<ffffffff8118cb8c>] do_init_module+0x60/0x1f1
[  174.997340]  [<ffffffff8110ea81>] load_module+0x1f31/0x2700
[  174.997342]  [<ffffffff8110b980>] ? __symbol_put+0x60/0x60
[  174.997344]  [<ffffffff812fc18d>] ? ima_post_read_file+0x3d/0x80
[  174.997346]  [<ffffffff812ca41b>] ? security_kernel_post_read_file+0x6b/0x80
[  174.997347]  [<ffffffff8110f476>] SYSC_finit_module+0xa6/0xf0
[  174.997349]  [<ffffffff8110f4de>] SyS_finit_module+0xe/0x10
[  174.997350]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  174.997352]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  175.001504] iscsi: registered transport (iser)
[  175.040113] RPC: Registered rdma transport module.
[  175.040114] RPC: Registered rdma backchannel transport module.
[  175.064344] RTNL: assertion failed at net/core/ethtool.c (550)
[  175.064369] CPU: 0 PID: 3187 Comm: ibv_devinfo Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  175.064370] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  175.064371]  0000000000000286 00000000e5af710d ffff8802fbf97b98
ffffffff81353eef
[  175.064373]  ffff88030c174000 ffff8802fbf97c18 ffff8802fbf97bf8
ffffffff816109f7
[  175.064375]  ffffffff81235e1b ffff8802f2f3acc0 ffff8802f2f7af00
ffff8802fbf97c5c
[  175.064376] Call Trace:
[  175.064381]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  175.064384]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  175.064387]  [<ffffffff81235e1b>] ? __d_alloc+0x12b/0x1d0
[  175.064390]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  175.064393]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  175.064396]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.064397]  [<ffffffff81232f5c>] ? dentry_free+0x3c/0x90
[  175.064399]  [<ffffffff81234150>] ? __dentry_kill+0x100/0x150
[  175.064409]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  175.064410]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.064411]  [<ffffffff8121e2f4>] ? put_filp+0x44/0x50
[  175.064413]  [<ffffffffa08c7e5f>] ib_uverbs_query_port+0x5f/0x150
[ib_uverbs]
[  175.064415]  [<ffffffffa08c43dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[  175.064416]  [<ffffffff8122cc15>] ? do_filp_open+0xa5/0x100
[  175.064418]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  175.064419]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  175.064421]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  175.064423]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  175.064424]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  175.064426]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  175.103154] RTNL: assertion failed at net/core/ethtool.c (550)
[  175.103181] CPU: 1 PID: 3289 Comm: ibv_devinfo Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  175.103182] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  175.103183]  0000000000000286 0000000057c77a87 ffff8802fbf97b98
ffffffff81353eef
[  175.103185]  ffff88030c174000 ffff8802fbf97c18 ffff8802fbf97bf8
ffffffff816109f7
[  175.103186]  ffffffff81235e1b ffff8802f2f3acc0 ffff8802f2f19f00
ffff8802fbf97c5c
[  175.103188] Call Trace:
[  175.103193]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  175.103196]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  175.103199]  [<ffffffff81235e1b>] ? __d_alloc+0x12b/0x1d0
[  175.103202]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  175.103206]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  175.103209]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.103210]  [<ffffffff81232f5c>] ? dentry_free+0x3c/0x90
[  175.103211]  [<ffffffff81234150>] ? __dentry_kill+0x100/0x150
[  175.103222]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  175.103223]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  175.103224]  [<ffffffff8121e2f4>] ? put_filp+0x44/0x50
[  175.103226]  [<ffffffffa08c7e5f>] ib_uverbs_query_port+0x5f/0x150
[ib_uverbs]
[  175.103228]  [<ffffffffa08c43dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[  175.103230]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  175.103231]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  175.103233]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  175.103235]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  175.103236]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  175.103238]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25
[  204.011985] RTNL: assertion failed at net/core/ethtool.c (550)
[  204.012012] CPU: 0 PID: 3354 Comm: ibv_devinfo Not tainted
4.8.5-1.el7.elrepo.x86_64 #1
[  204.012012] Hardware name:                  /DH67CL, BIOS
BLH6710H.86A.0160.2012.1204.1156 12/04/2012
[  204.012013]  0000000000000286 0000000069598e2b ffff8802ed337b98
ffffffff81353eef
[  204.012015]  ffff88030c174000 ffff8802ed337c18 ffff8802ed337bf8
ffffffff816109f7
[  204.012017]  ffffffff81235e1b ffff8802f2f3acc0 ffff8802f2fdf0c0
ffff8802ed337c5c
[  204.012018] Call Trace:
[  204.012023]  [<ffffffff81353eef>] dump_stack+0x63/0x84
[  204.012026]  [<ffffffff816109f7>] __ethtool_get_link_ksettings+0x1b7/0x1c0
[  204.012029]  [<ffffffff81235e1b>] ? __d_alloc+0x12b/0x1d0
[  204.012032]  [<ffffffffa0041485>] vlan_ethtool_get_link_ksettings+0x15/0x20
[8021q]
[  204.012036]  [<ffffffffa09225b0>] rxe_query_port+0xb0/0x190 [rdma_rxe]
[  204.012039]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  204.012040]  [<ffffffff81232f5c>] ? dentry_free+0x3c/0x90
[  204.012041]  [<ffffffff81234150>] ? __dentry_kill+0x100/0x150
[  204.012052]  [<ffffffffa0883dca>] ib_query_port+0xca/0x170 [ib_core]
[  204.012053]  [<ffffffff810eaebd>] ? call_rcu_sched+0x1d/0x20
[  204.012054]  [<ffffffff8121e2f4>] ? put_filp+0x44/0x50
[  204.012057]  [<ffffffffa08c7e5f>] ib_uverbs_query_port+0x5f/0x150
[ib_uverbs]
[  204.012058]  [<ffffffffa08c43dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[  204.012060]  [<ffffffff8122cc15>] ? do_filp_open+0xa5/0x100
[  204.012061]  [<ffffffff8121b737>] __vfs_write+0x37/0x140
[  204.012063]  [<ffffffff8121be52>] vfs_write+0xb2/0x1b0
[  204.012065]  [<ffffffff8100365d>] ? syscall_trace_enter+0x1dd/0x2c0
[  204.012066]  [<ffffffff8121d2a5>] SyS_write+0x55/0xc0
[  204.012067]  [<ffffffff81003a47>] do_syscall_64+0x67/0x160
[  204.012069]  [<ffffffff8173abe1>] entry_SYSCALL64_slow_path+0x25/0x25

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5 0/14] Fix race conditions related to stopping block layer queues
From: Bart Van Assche @ 2016-10-29  0:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org

Hello Jens,

Multiple block drivers need the functionality to stop a request queue 
and to wait until all ongoing request_fn() / queue_rq() calls have 
finished without waiting until all outstanding requests have finished. 
Hence this patch series that introduces the blk_mq_quiesce_queue() 
function. The dm-mq, SRP and NVMe patches in this patch series are three 
examples of where these functions are useful. These patches have been 
tested on top of kernel v4.9-rc2. The following tests have been run to 
verify this patch series:
- Mike's mptest suite that stress-tests dm-multipath.
- My own srp-test suite that stress-tests SRP on top of dm-multipath.
- fio on top of the NVMeOF host driver that was connected to the NVMeOF
   target driver on the same host.
- Laurence verified the previous version (v3) of this patch series by
   running it through the Red Hat SRP and NVMe test suites.

The changes compared to the third version of this patch series are:
- Added a blk_mq_stop_hw_queues() call in blk_mq_quiesce_queue() as
   requested by Ming Lei.
- Modified scsi_unblock_target() such that it waits until
   .queuecommand() finished. Unexported scsi_wait_for_queuecommand().
- Reordered the two NVMe patches.
- Added a patch that avoids that blk_mq_requeue_work() restarts stopped
   queues.
- Added a patch that removes blk_mq_cancel_requeue_work().

Changes between v4 and v3:
- Left out the dm changes from the patch that introduces
   blk_mq_hctx_stopped() because a later patch deletes the changed code
   from the dm core.
- Moved the blk_mq_hctx_stopped() declaration from a public to a
   private block layer header file.
- Added a new patch that moves more code into
   blk_mq_direct_issue_request(). This patch avoids that a new function
   has to be introduced to avoid code duplication.
- Explained the implemented algorithm in the patch that introduces
   blk_mq_quiesce_queue() in the description of the patch that
   introduces this function.
- Added "select SRCU" to the patch that introduces
   blk_mq_quiesce_queue() to avoid build failures.
- Documented the shost argument in the scsi_wait_for_queuecommand()
   kerneldoc header.
- Fixed an unintended behavior change in the last patch of this series.

Changes between v3 and v2:
- Changed the order of the patches in this patch series.
- Added several new patches: a patch that avoids that .queue_rq() gets
   invoked from the direct submission path if a queue has been stopped
   and also a patch that introduces the helper function
   blk_mq_hctx_stopped().
- blk_mq_quiesce_queue() has been reworked (thanks to Ming Lin and Sagi
   for their feedback).
- A bool 'kick' argument has been added to blk_mq_requeue_request().
- As proposed by Christoph, the code that waits for queuecommand() has
   been moved from the SRP transport driver to the SCSI core.

Changes between v2 and v1:
- Dropped the non-blk-mq changes from this patch series.
- Added support for harware queues with BLK_MQ_F_BLOCKING set.
- Added a call stack to the description of the dm race fix patch.
- Dropped the non-scsi-mq changes from the SRP patch.
- Added a patch that introduces blk_mq_queue_stopped() in the dm driver.

The individual patches in this series are:

0001-blk-mq-Do-not-invoke-.queue_rq-for-a-stopped-queue.patch
0002-blk-mq-Introduce-blk_mq_hctx_stopped.patch
0003-blk-mq-Introduce-blk_mq_queue_stopped.patch
0004-blk-mq-Move-more-code-into-blk_mq_direct_issue_reque.patch
0005-blk-mq-Avoid-that-requeueing-starts-stopped-queues.patch
0006-blk-mq-Remove-blk_mq_cancel_requeue_work.patch
0007-blk-mq-Introduce-blk_mq_quiesce_queue.patch
0008-blk-mq-Add-a-kick_requeue_list-argument-to-blk_mq_re.patch
0009-dm-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOPPE.patch
0010-dm-Fix-a-race-condition-related-to-stopping-and-star.patch
0011-SRP-transport-Move-queuecommand-wait-code-to-SCSI-co.patch
0012-SRP-transport-scsi-mq-Wait-for-.queue_rq-if-necessar.patch
0013-nvme-Fix-a-race-condition-related-to-stopping-queues.patch
0014-nvme-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOP.patch

Thanks,

Bart.

^ permalink raw reply

* [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue
From: Bart Van Assche @ 2016-10-29  0:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3d27a6..ad459e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1332,9 +1332,9 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
+		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+			blk_mq_insert_request(old_rq, false, true, true);
 		goto done;
 	}
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 02/14] blk-mq: Introduce blk_mq_hctx_stopped()
From: Bart Van Assche @ 2016-10-29  0:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
a helper function that performs this test.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 12 ++++++------
 block/blk-mq.h |  5 +++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ad459e4..bc1f462 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -787,7 +787,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	struct list_head *dptr;
 	int queued;
 
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
@@ -912,8 +912,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state) ||
-	    !blk_mq_hw_queue_mapped(hctx)))
+	if (unlikely(blk_mq_hctx_stopped(hctx) ||
+		     !blk_mq_hw_queue_mapped(hctx)))
 		return;
 
 	if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
@@ -938,7 +938,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if ((!blk_mq_hctx_has_pending(hctx) &&
 		    list_empty_careful(&hctx->dispatch)) ||
-		    test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+		    blk_mq_hctx_stopped(hctx))
 			continue;
 
 		blk_mq_run_hw_queue(hctx, async);
@@ -988,7 +988,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+		if (!blk_mq_hctx_stopped(hctx))
 			continue;
 
 		clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
@@ -1332,7 +1332,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
+		if (blk_mq_hctx_stopped(data.hctx) ||
 		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
 			blk_mq_insert_request(old_rq, false, true, true);
 		goto done;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index e5d2524..ac772da 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -100,6 +100,11 @@ static inline void blk_mq_set_alloc_data(struct blk_mq_alloc_data *data,
 	data->hctx = hctx;
 }
 
+static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
+{
+	return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
+}
+
 static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 {
 	return hctx->nr_ctx && hctx->tags;
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 03/14] blk-mq: Introduce blk_mq_queue_stopped()
From: Bart Van Assche @ 2016-10-29  0:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c         | 20 ++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bc1f462..283e0eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -946,6 +946,26 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queues);
 
+/**
+ * blk_mq_queue_stopped() - check whether one or more hctxs have been stopped
+ * @q: request queue.
+ *
+ * The caller is responsible for serializing this function against
+ * blk_mq_{start,stop}_hw_queue().
+ */
+bool blk_mq_queue_stopped(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (blk_mq_hctx_stopped(hctx))
+			return true;
+
+	return false;
+}
+EXPORT_SYMBOL(blk_mq_queue_stopped);
+
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	cancel_work(&hctx->run_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 535ab2e..aa93000 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -223,6 +223,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
 void blk_mq_abort_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq, int error);
 
+bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_stop_hw_queues(struct request_queue *q);
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 04/14] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3@sandisk.com>

Move the "hctx stopped" test and the insert request calls into
blk_mq_direct_issue_request(). Rename that function into
blk_mq_try_issue_directly() to reflect its new semantics. Pass
the hctx pointer to that function instead of looking it up a
second time. These changes avoid that code has to be duplicated
in the next patch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 283e0eb..c8ae970 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1243,11 +1243,11 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	return rq;
 }
 
-static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
+static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+				      struct request *rq, blk_qc_t *cookie)
 {
 	int ret;
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
 	struct blk_mq_queue_data bd = {
 		.rq = rq,
 		.list = NULL,
@@ -1255,6 +1255,9 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 	};
 	blk_qc_t new_cookie = blk_tag_to_qc_t(rq->tag, hctx->queue_num);
 
+	if (blk_mq_hctx_stopped(hctx))
+		goto insert;
+
 	/*
 	 * For OK queue, we are done. For error, kill it. Any other
 	 * error (busy), just add it to our list as we previously
@@ -1263,7 +1266,7 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 	ret = q->mq_ops->queue_rq(hctx, &bd);
 	if (ret == BLK_MQ_RQ_QUEUE_OK) {
 		*cookie = new_cookie;
-		return 0;
+		return;
 	}
 
 	__blk_mq_requeue_request(rq);
@@ -1272,10 +1275,11 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 		*cookie = BLK_QC_T_NONE;
 		rq->errors = -EIO;
 		blk_mq_end_request(rq, rq->errors);
-		return 0;
+		return;
 	}
 
-	return -1;
+insert:
+	blk_mq_insert_request(rq, false, true, true);
 }
 
 /*
@@ -1352,9 +1356,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (blk_mq_hctx_stopped(data.hctx) ||
-		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
-			blk_mq_insert_request(old_rq, false, true, true);
+		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
 		goto done;
 	}
 
-- 
2.10.1


^ permalink raw reply related

* [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Since blk_mq_requeue_work() starts stopped queues and since
execution of this function can be scheduled after a queue has
been stopped it is not possible to stop queues without using
an additional state variable to track whether or not the queue
has been stopped. Hence modify blk_mq_requeue_work() such that it
does not start stopped queues. My conclusion after a review of
the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
callers is as follows:
* In the dm driver starting and stopping queues should only happen
  if __dm_suspend() or __dm_resume() is called and not if the
  requeue list is processed.
* In the SCSI core queue stopping and starting should only be
  performed by the scsi_internal_device_block() and
  scsi_internal_device_unblock() functions but not by any other
  function. Although the blk_mq_stop_hw_queue() call in
  scsi_queue_rq() may help to reduce CPU load if a LLD queue is
  full, figuring out whether or not a queue should be restarted
  when requeueing a command would require to introduce additional
  locking in scsi_mq_requeue_cmd() to avoid a race with
  scsi_internal_device_block(). Avoid this complexity by removing
  the blk_mq_stop_hw_queue() call from scsi_queue_rq().
* In the NVMe core only the functions that call
  blk_mq_start_stopped_hw_queues() explicitly should start stopped
  queues.
* A blk_mq_start_stopped_hwqueues() call must be added in the
  xen-blkfront driver in its blkif_recover() function.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Roger Pau Monné <roger.pau-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Cc: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: James Bottomley <jejb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 block/blk-mq.c               | 6 +-----
 drivers/block/xen-blkfront.c | 1 +
 drivers/md/dm-rq.c           | 7 +------
 drivers/scsi/scsi_lib.c      | 1 -
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c8ae970..fe367b5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -503,11 +503,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_insert_request(rq, false, false, false);
 	}
 
-	/*
-	 * Use the start variant of queue running here, so that running
-	 * the requeue work will kick stopped queues.
-	 */
-	blk_mq_start_hw_queues(q);
+	blk_mq_run_hw_queues(q, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..60fff99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
+	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2ad..1794de5 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -338,12 +338,7 @@ static void dm_old_requeue_request(struct request *rq)
 
 static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_queue_stopped(q))
-		blk_mq_delay_kick_requeue_list(q, msecs);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_mq_delay_kick_requeue_list(q, msecs);
 }
 
 void dm_mq_kick_requeue_list(struct mapped_device *md)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2cca9cf..126a784 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1941,7 +1941,6 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 out:
 	switch (ret) {
 	case BLK_MQ_RQ_QUEUE_BUSY:
-		blk_mq_stop_hw_queue(hctx);
 		if (atomic_read(&sdev->device_busy) == 0 &&
 		    !scsi_device_blocked(sdev))
 			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work()
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Since blk_mq_requeue_work() no longer restarts stopped queues
canceling requeue work is no longer needed to prevent that a
stopped queue would be restarted. Hence remove this function.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Keith Busch <keith.busch-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
---
 block/blk-mq.c           | 6 ------
 drivers/md/dm-rq.c       | 2 --
 drivers/nvme/host/core.c | 1 -
 include/linux/blk-mq.h   | 1 -
 4 files changed, 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index fe367b5..534128a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -528,12 +528,6 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
-void blk_mq_cancel_requeue_work(struct request_queue *q)
-{
-	cancel_delayed_work_sync(&q->requeue_work);
-}
-EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_schedule_delayed_work(&q->requeue_work, 0);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1794de5..2b82496 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -116,8 +116,6 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	queue_flag_set(QUEUE_FLAG_STOPPED, q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	/* Avoid that requeuing could restart the queue. */
-	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 79e679d..ab5f59e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2083,7 +2083,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index aa93000..a85a20f 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -217,7 +217,6 @@ void __blk_mq_end_request(struct request *rq, int error);
 
 void blk_mq_requeue_request(struct request *rq);
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
-void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <7460e8b2-2cfd-c0d5-7ae7-7f662d89dad3-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
  followed by synchronize_srcu() or synchronize_rcu(). The latter
  call waits for .queue_rq() invocations that started before
  blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  blk_mq_quiesce_queue().

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
---
 block/Kconfig          |  1 +
 block/blk-mq.c         | 71 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 4 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..0562ef9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -5,6 +5,7 @@ menuconfig BLOCK
        bool "Enable the block layer" if EXPERT
        default y
        select SBITMAP
+       select SRCU
        help
 	 Provide block layer support for the kernel.
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 534128a..96015a9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,33 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool rcu = false;
+
+	blk_mq_stop_hw_queues(q);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			synchronize_srcu(&hctx->queue_rq_srcu);
+		else
+			rcu = true;
+	}
+	if (rcu)
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -768,7 +795,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -780,9 +807,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
-	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
-		cpu_online(hctx->next_cpu));
-
 	hctx->run++;
 
 	/*
@@ -873,6 +897,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	int srcu_idx;
+
+	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+		cpu_online(hctx->next_cpu));
+
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
+	} else {
+		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
+		blk_mq_process_rq_list(hctx);
+		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
+	}
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1283,7 +1325,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
 	struct blk_map_ctx data;
 	struct request *rq;
-	unsigned int request_count = 0;
+	unsigned int request_count = 0, srcu_idx;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1326,7 +1368,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1346,7 +1388,16 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+
+		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
+			rcu_read_lock();
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			rcu_read_unlock();
+		} else {
+			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
+		}
 		goto done;
 	}
 
@@ -1625,6 +1676,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(&hctx->queue_rq_srcu);
+
 	blk_mq_remove_cpuhp(hctx);
 	blk_free_flush_queue(hctx->fq);
 	sbitmap_free(&hctx->ctx_map);
@@ -1705,6 +1759,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 				   flush_start_tag + hctx_idx, node))
 		goto free_fq;
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		init_srcu_struct(&hctx->queue_rq_srcu);
+
 	return 0;
 
  free_fq:
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a85a20f..ed20ac7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -3,6 +3,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -35,6 +36,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct srcu_struct	queue_rq_srcu;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..8259d87 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox