Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Leon Romanovsky @ 2016-11-21 17:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dmitry Vyukov, syzkaller, Valdis.Kletnieks-PjAqaU27lzQ,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML
In-Reply-To: <20161121165253.GA22237-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 523 bytes --]

On Mon, Nov 21, 2016 at 09:52:53AM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote:
> > >
> > > In ib_ucm_write function there is a wrong prefix:
> > >
> > > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",
> >
> > I did it intentionally to have the same errors for all flows.
>
> Lets actually use a good message too please?
>
>  pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n",
>
> Jason

[-- Attachment #1.2: 0001-IB-core-qib-Remove-WARN-that-is-not-kernel-bug.patch --]
[-- Type: text/x-diff, Size: 3630 bytes --]

From 70f95b2d35aea42e5b97e7d27ab2f4e8effcbe67 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Mon, 21 Nov 2016 13:30:59 +0200
Subject: [PATCH rdma-next V2] IB/{core, qib}: Remove WARN that is not kernel bug

WARNINGs mean kernel bugs, in this case, they are placed
to mark programming errors and/or malicious attempts.

BUG/WARNs that are not kernel bugs hinder automated testing efforts.

Signed-off-by: Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
 * Improved error prints.
---
 drivers/infiniband/core/ucm.c            | 5 ++++-
 drivers/infiniband/core/ucma.c           | 5 ++++-
 drivers/infiniband/core/uverbs_main.c    | 5 ++++-
 drivers/infiniband/hw/qib/qib_file_ops.c | 5 ++++-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 7713ef0..579f9a7 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1104,8 +1104,11 @@ static ssize_t ib_ucm_write(struct file *filp, const char __user *buf,
 	struct ib_ucm_cmd_hdr hdr;
 	ssize_t result;

-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		pr_err_once("ucm_write: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (len < sizeof(hdr))
 		return -EINVAL;
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 9520154..e12f8fa 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
 	struct rdma_ucm_cmd_hdr hdr;
 	ssize_t ret;

-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		pr_err_once("ucma_write: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (len < sizeof(hdr))
 		return -EINVAL;
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 44b1104..38c79ad 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -746,8 +746,11 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	int srcu_key;
 	ssize_t ret;

-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		pr_err_once("uverbs_write: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (count < sizeof hdr)
 		return -EINVAL;
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 382466a..2d1eacf 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -2066,8 +2066,11 @@ static ssize_t qib_write(struct file *fp, const char __user *data,
 	ssize_t ret = 0;
 	void *dest;

-	if (WARN_ON_ONCE(!ib_safe_file_access(fp)))
+	if (!ib_safe_file_access(fp)) {
+		pr_err_once("qib_write: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (count < sizeof(cmd.type)) {
 		ret = -EINVAL;
--
2.7.4


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related

* Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Leon Romanovsky @ 2016-11-21 17:14 UTC (permalink / raw)
  To: Salil Mehta
  Cc: dledford@redhat.com, Huwei (Xavier), oulijun,
	mehta.salil.lnk@gmail.com, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
	Zhangping (ZP)
In-Reply-To: <F4CC6FACFEB3C54C9141D49AD221F7F91A7AD7DF@lhreml503-mbx>

[-- Attachment #1: Type: text/plain, Size: 4775 bytes --]

On Mon, Nov 21, 2016 at 04:12:38PM +0000, Salil Mehta wrote:
> > -----Original Message-----
> > From: Leon Romanovsky [mailto:leon@kernel.org]
> > Sent: Wednesday, November 16, 2016 8:36 AM
> > To: Salil Mehta
> > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > Zhangping (ZP)
> > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > allocating memory using APIs
> >
> > On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > > > -----Original Message-----
> > > > From: Leon Romanovsky [mailto:leon@kernel.org]
> > > > Sent: Wednesday, November 09, 2016 7:22 AM
> > > > To: Salil Mehta
> > > > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > > > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > > > Zhangping (ZP)
> > > > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > > > allocating memory using APIs
> > > >
> > > > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > > > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
> > > > >
> > > > > This patch modified the logic of allocating memory using APIs in
> > > > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > > > memory.
> > > > >
> > > > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> > > > > Signed-off-by: Ping Zhang <zhangping5@huawei.com>
> > > > > Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>
> > > > > ---
> > > > >  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
> > > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > > index fb87883..d3dfb5f 100644
> > > > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > > > hns_roce_buddy *buddy, int max_order)
> > > > >
> > > > >  	for (i = 0; i <= buddy->max_order; ++i) {
> > > > >  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > > > -		buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > > > GFP_KERNEL);
> > > > > -		if (!buddy->bits[i])
> > > > > -			goto err_out_free;
> > > > > -
> > > > > -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order -
> > i));
> > > > > +		buddy->bits[i] = kcalloc(s, sizeof(long),
> > GFP_KERNEL);
> > > > > +		if (!buddy->bits[i]) {
> > > > > +			buddy->bits[i] = vzalloc(s * sizeof(long));
> > > >
> > > > I wonder, why don't you use directly vzalloc instead of kcalloc
> > > > fallback?
> > > As we know we will have physical contiguous pages if the kcalloc
> > > call succeeds. This will give us a chance to have better performance
> > > over the allocations which are just virtually contiguous through the
> > > function vzalloc(). Therefore, later has only been used as a fallback
> > > when our memory request cannot be entertained through kcalloc.
> > >
> > > Are you suggesting that there will not be much performance penalty
> > > if we use just vzalloc ?
> >
> > Not exactly,
> > I asked it, because we have similar code in our drivers and this
> > construction looks strange to me.
> >
> > 1. If performance is critical, we will use kmalloc.
> > 2. If performance is not critical, we will use vmalloc.
> >
> > But in this case, such construction shows me that we can live with
> > vmalloc performance and kmalloc allocation are not really needed.
> >
> > In your specific case, I'm not sure that kcalloc will ever fail.
> Performance is definitely critical here. Though, I agree this is bit
> unusual way of memory allocation. In actual, we were encountering
> memory alloc failures using kmalloc (if you see allocation amount
> is on the higher side and is exponential) so we ended up using
> vmalloc as fall back - It is very naïve allocation scheme.

I understand it, we did the same, see our mlx5_vzalloc call.
BTW, we used __GFP_NOWARN flag, which you should consider to use
in your case too.

>
> Maybe we need to rethink this allocation scheme part? Also, I can pull
> back this particular patch for now or just live with vzalloc() till
> we figure out proper solution to this?

It is up to you, I don't think that you should drop it, AFAIK, there is
no other proper solution.

>
> >
> > Thanks
> >
> >
> > >
> > > >
> > > > > +			if (!buddy->bits[i])
> > > > > +				goto err_out_free;
> > > > > +		}
> > > > >  	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v5 2/9] IB/core: Replace semaphore sm_sem with an atomic wait
From: Christoph Hellwig @ 2016-11-21 16:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, Binoy Jayan, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Mustafa Ismail, Lijun Ou, Nicholas Bellinger, Leon Romanovsky,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Tatyana E Nikolova,
	Doug Ledford, Jenny Derzhavetz, Sagi Grimberg, Sean Hefty,
	Faisal Latif, Hal Rosenstock, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Wei Hu(Xavier), Mark Brown, Mark Bloch, Steve Wise
In-Reply-To: <4374986.OFD8bCgUa1@wuerfel>

On Mon, Nov 21, 2016 at 05:52:27PM +0100, Arnd Bergmann wrote:
> I think a mutex would't work here, since fops->open() and fops->close()
> are not called from the same context and lockdep will complain
> about that.
> 
> Version of the series had replaced the semaphore with a completion
> here, which worked correctly, but one reviewer suggested using
> the wait_event() instead since it's confusing to have a completion
> starting out in 'completed' state.

On the other hand the existing semaphore works perfectly fine, and
is way easier to understand than any of the other models, which also
don't provide any benefit at all.

Please stop messing with perfectly fine semaphores now - there are
plenty that are much better replaced with mutexes or completions,
but this (and various others are not).  Leave them alone and do
something useful instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Jason Gunthorpe @ 2016-11-21 16:52 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Dmitry Vyukov, syzkaller, Valdis.Kletnieks, dledford, sean.hefty,
	Hal Rosenstock, linux-rdma, LKML
In-Reply-To: <20161121121408.GC4158@leon.nu>

On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote:
> >
> > In ib_ucm_write function there is a wrong prefix:
> >
> > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",
> 
> I did it intentionally to have the same errors for all flows.

Lets actually use a good message too please?

 pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n",

Jason

^ permalink raw reply

* Re: [PATCH v5 2/9] IB/core: Replace semaphore sm_sem with an atomic wait
From: Arnd Bergmann @ 2016-11-21 16:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Binoy Jayan, linux-rdma, Mustafa Ismail, Lijun Ou,
	Nicholas Bellinger, Leon Romanovsky, target-devel,
	Tatyana E Nikolova, Doug Ledford, Jenny Derzhavetz, Sagi Grimberg,
	Sean Hefty, Faisal Latif, Hal Rosenstock, linux-kernel,
	Wei Hu(Xavier), Mark Brown, Mark Bloch, Steve Wise, Ira Weiny
In-Reply-To: <CA+55aFxGjaqduhRCyk0mVxEA7aqQ-omdG8SBreZ=x5cW2ovngQ@mail.gmail.com>

On Monday, November 21, 2016 7:57:53 AM CET Linus Torvalds wrote:
> Don't do this.
> 
> Never ever do your own locking primitives. You will get the memory ordering
> wrong. And even if you get it right, why do it?
> 
> If you want to get rid of semaphores, and replace them with a mutex, that's
> OK. But don't replace them with something more complex like an open coded
> waiting model.

I think a mutex would't work here, since fops->open() and fops->close()
are not called from the same context and lockdep will complain
about that.

Version of the series had replaced the semaphore with a completion
here, which worked correctly, but one reviewer suggested using
the wait_event() instead since it's confusing to have a completion
starting out in 'completed' state.

	Arnd

^ permalink raw reply

* RE: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Salil Mehta @ 2016-11-21 16:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford@redhat.com, Huwei (Xavier), oulijun,
	mehta.salil.lnk@gmail.com, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
	Zhangping (ZP)
In-Reply-To: <20161116083602.GH4240@leon.nu>

> -----Original Message-----
> From: Leon Romanovsky [mailto:leon@kernel.org]
> Sent: Wednesday, November 16, 2016 8:36 AM
> To: Salil Mehta
> Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> Zhangping (ZP)
> Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> allocating memory using APIs
> 
> On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > > -----Original Message-----
> > > From: Leon Romanovsky [mailto:leon@kernel.org]
> > > Sent: Wednesday, November 09, 2016 7:22 AM
> > > To: Salil Mehta
> > > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > > Zhangping (ZP)
> > > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > > allocating memory using APIs
> > >
> > > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
> > > >
> > > > This patch modified the logic of allocating memory using APIs in
> > > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > > memory.
> > > >
> > > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> > > > Signed-off-by: Ping Zhang <zhangping5@huawei.com>
> > > > Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>
> > > > ---
> > > >  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
> > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > index fb87883..d3dfb5f 100644
> > > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > > hns_roce_buddy *buddy, int max_order)
> > > >
> > > >  	for (i = 0; i <= buddy->max_order; ++i) {
> > > >  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > > -		buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > > GFP_KERNEL);
> > > > -		if (!buddy->bits[i])
> > > > -			goto err_out_free;
> > > > -
> > > > -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order -
> i));
> > > > +		buddy->bits[i] = kcalloc(s, sizeof(long),
> GFP_KERNEL);
> > > > +		if (!buddy->bits[i]) {
> > > > +			buddy->bits[i] = vzalloc(s * sizeof(long));
> > >
> > > I wonder, why don't you use directly vzalloc instead of kcalloc
> > > fallback?
> > As we know we will have physical contiguous pages if the kcalloc
> > call succeeds. This will give us a chance to have better performance
> > over the allocations which are just virtually contiguous through the
> > function vzalloc(). Therefore, later has only been used as a fallback
> > when our memory request cannot be entertained through kcalloc.
> >
> > Are you suggesting that there will not be much performance penalty
> > if we use just vzalloc ?
> 
> Not exactly,
> I asked it, because we have similar code in our drivers and this
> construction looks strange to me.
> 
> 1. If performance is critical, we will use kmalloc.
> 2. If performance is not critical, we will use vmalloc.
> 
> But in this case, such construction shows me that we can live with
> vmalloc performance and kmalloc allocation are not really needed.
> 
> In your specific case, I'm not sure that kcalloc will ever fail.
Performance is definitely critical here. Though, I agree this is bit
unusual way of memory allocation. In actual, we were encountering
memory alloc failures using kmalloc (if you see allocation amount
is on the higher side and is exponential) so we ended up using
vmalloc as fall back - It is very naïve allocation scheme.

Maybe we need to rethink this allocation scheme part? Also, I can pull
back this particular patch for now or just live with vzalloc() till
we figure out proper solution to this? 

> 
> Thanks
> 
> 
> >
> > >
> > > > +			if (!buddy->bits[i])
> > > > +				goto err_out_free;
> > > > +		}
> > > >  	}

^ permalink raw reply

* Re: [PATCH v6 12/16] IB/pvrdma: Add Queue Pair support
From: Yuval Shaia @ 2016-11-21 15:51 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	pv-drivers-pghWNbHTmq7QT0dZR+AlfA, jhansen-pghWNbHTmq7QT0dZR+AlfA,
	asarwade-pghWNbHTmq7QT0dZR+AlfA,
	georgezhang-pghWNbHTmq7QT0dZR+AlfA,
	bryantan-pghWNbHTmq7QT0dZR+AlfA
In-Reply-To: <6a643e92376856394d45638d80a90619d3abac37.1475458407.git.aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>

On Sun, Oct 02, 2016 at 07:10:32PM -0700, Adit Ranadive wrote:
> +
> +/**
> + * pvrdma_post_send - post send work request entries on a QP
> + * @ibqp: the QP
> + * @wr: work request list to post
> + * @bad_wr: the first bad WR returned
> + *
> + * @return: 0 on success, otherwise errno returned.
> + */
> +int pvrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> +		     struct ib_send_wr **bad_wr)
> +{
> +	struct pvrdma_qp *qp = to_vqp(ibqp);
> +	struct pvrdma_dev *dev = to_vdev(ibqp->device);
> +	unsigned long flags;
> +	struct pvrdma_sq_wqe_hdr *wqe_hdr;
> +	struct pvrdma_sge *sge;
> +	int i, index;
> +	int nreq;
> +	int ret;
> +
> +	/*
> +	 * In states lower than RTS, we can fail immediately. In other states,
> +	 * just post and let the device figure it out.
> +	 */
> +	if (qp->state < IB_QPS_RTS) {
> +		*bad_wr = wr;
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_irqsave(&qp->sq.lock, flags);
> +
> +	index = pvrdma_idx(&qp->sq.ring->prod_tail, qp->sq.wqe_cnt);
> +	for (nreq = 0; wr; nreq++, wr = wr->next) {

nreq is not in used so better remove it.

> +		unsigned int tail;
> +
> +		if (unlikely(!pvrdma_idx_ring_has_space(
> +				qp->sq.ring, qp->sq.wqe_cnt, &tail))) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "send queue is full\n");
> +			*bad_wr = wr;
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		if (unlikely(wr->num_sge > qp->sq.max_sg || wr->num_sge < 0)) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "send SGE overflow\n");
> +			*bad_wr = wr;
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		if (unlikely(wr->opcode < 0)) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "invalid send opcode\n");
> +			*bad_wr = wr;
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		/*
> +		 * Only support UD, RC.
> +		 * Need to check opcode table for thorough checking.
> +		 * opcode		_UD	_UC	_RC
> +		 * _SEND		x	x	x
> +		 * _SEND_WITH_IMM	x	x	x
> +		 * _RDMA_WRITE			x	x
> +		 * _RDMA_WRITE_WITH_IMM		x	x
> +		 * _LOCAL_INV			x	x
> +		 * _SEND_WITH_INV		x	x
> +		 * _RDMA_READ				x
> +		 * _ATOMIC_CMP_AND_SWP			x
> +		 * _ATOMIC_FETCH_AND_ADD		x
> +		 * _MASK_ATOMIC_CMP_AND_SWP		x
> +		 * _MASK_ATOMIC_FETCH_AND_ADD		x
> +		 * _REG_MR				x
> +		 *
> +		 */
> +		if (qp->ibqp.qp_type != IB_QPT_UD &&
> +		    qp->ibqp.qp_type != IB_QPT_RC &&
> +			wr->opcode != IB_WR_SEND) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "unsupported queuepair type\n");
> +			*bad_wr = wr;
> +			ret = -EINVAL;
> +			goto out;
> +		} else if (qp->ibqp.qp_type == IB_QPT_UD ||
> +			   qp->ibqp.qp_type == IB_QPT_GSI) {
> +			if (wr->opcode != IB_WR_SEND &&
> +			    wr->opcode != IB_WR_SEND_WITH_IMM) {
> +				dev_warn_ratelimited(&dev->pdev->dev,
> +						     "invalid send opcode\n");
> +				*bad_wr = wr;
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +		}
> +
> +		wqe_hdr = (struct pvrdma_sq_wqe_hdr *)get_sq_wqe(qp, index);
> +		memset(wqe_hdr, 0, sizeof(*wqe_hdr));
> +		wqe_hdr->wr_id = wr->wr_id;
> +		wqe_hdr->num_sge = wr->num_sge;
> +		wqe_hdr->opcode = ib_wr_opcode_to_pvrdma(wr->opcode);
> +		wqe_hdr->send_flags = ib_send_flags_to_pvrdma(wr->send_flags);
> +		if (wr->opcode == IB_WR_SEND_WITH_IMM ||
> +		    wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM)
> +			wqe_hdr->ex.imm_data = wr->ex.imm_data;
> +
> +		switch (qp->ibqp.qp_type) {
> +		case IB_QPT_GSI:
> +		case IB_QPT_UD:
> +			if (unlikely(!ud_wr(wr)->ah)) {
> +				dev_warn_ratelimited(&dev->pdev->dev,
> +						     "invalid address handle\n");
> +				*bad_wr = wr;
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +
> +			/*
> +			 * Use qkey from qp context if high order bit set,
> +			 * otherwise from work request.
> +			 */
> +			wqe_hdr->wr.ud.remote_qpn = ud_wr(wr)->remote_qpn;
> +			wqe_hdr->wr.ud.remote_qkey =
> +				ud_wr(wr)->remote_qkey & 0x80000000 ?
> +				qp->qkey : ud_wr(wr)->remote_qkey;
> +			wqe_hdr->wr.ud.av = to_vah(ud_wr(wr)->ah)->av;
> +
> +			break;
> +		case IB_QPT_RC:
> +			switch (wr->opcode) {
> +			case IB_WR_RDMA_READ:
> +			case IB_WR_RDMA_WRITE:
> +			case IB_WR_RDMA_WRITE_WITH_IMM:
> +				wqe_hdr->wr.rdma.remote_addr =
> +					rdma_wr(wr)->remote_addr;
> +				wqe_hdr->wr.rdma.rkey = rdma_wr(wr)->rkey;
> +				break;
> +			case IB_WR_LOCAL_INV:
> +			case IB_WR_SEND_WITH_INV:
> +				wqe_hdr->ex.invalidate_rkey =
> +					wr->ex.invalidate_rkey;
> +				break;
> +			case IB_WR_ATOMIC_CMP_AND_SWP:
> +			case IB_WR_ATOMIC_FETCH_AND_ADD:
> +				wqe_hdr->wr.atomic.remote_addr =
> +					atomic_wr(wr)->remote_addr;
> +				wqe_hdr->wr.atomic.rkey = atomic_wr(wr)->rkey;
> +				wqe_hdr->wr.atomic.compare_add =
> +					atomic_wr(wr)->compare_add;
> +				if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP)
> +					wqe_hdr->wr.atomic.swap =
> +						atomic_wr(wr)->swap;
> +				break;
> +			case IB_WR_REG_MR:
> +				ret = set_reg_seg(wqe_hdr, reg_wr(wr));
> +				if (ret < 0) {
> +					dev_warn_ratelimited(&dev->pdev->dev,
> +							     "Failed to set fast register work request\n");
> +					*bad_wr = wr;
> +					goto out;
> +				}
> +				break;
> +			default:
> +				break;
> +			}
> +
> +			break;
> +		default:
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "invalid queuepair type\n");
> +			ret = -EINVAL;
> +			*bad_wr = wr;
> +			goto out;
> +		}
> +
> +		sge = (struct pvrdma_sge *)(wqe_hdr + 1);
> +		for (i = 0; i < wr->num_sge; i++) {
> +			/* Need to check wqe_size 0 or max size */
> +			sge->addr = wr->sg_list[i].addr;
> +			sge->length = wr->sg_list[i].length;
> +			sge->lkey = wr->sg_list[i].lkey;
> +			sge++;
> +		}
> +
> +		/* Make sure wqe is written before index update */
> +		smp_wmb();
> +
> +		index++;
> +		if (unlikely(index >= qp->sq.wqe_cnt))
> +			index = 0;
> +		/* Update shared sq ring */
> +		pvrdma_idx_ring_inc(&qp->sq.ring->prod_tail,
> +				    qp->sq.wqe_cnt);
> +	}
> +
> +	ret = 0;
> +
> +out:
> +	spin_unlock_irqrestore(&qp->sq.lock, flags);
> +
> +	if (!ret)
> +		pvrdma_write_uar_qp(dev, PVRDMA_UAR_QP_SEND | qp->qp_handle);
> +
> +	return ret;
> +}
> +
> +/**
> + * pvrdma_post_receive - post receive work request entries on a QP
> + * @ibqp: the QP
> + * @wr: the work request list to post
> + * @bad_wr: the first bad WR returned
> + *
> + * @return: 0 on success, otherwise errno returned.
> + */
> +int pvrdma_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
> +		     struct ib_recv_wr **bad_wr)
> +{
> +	struct pvrdma_dev *dev = to_vdev(ibqp->device);
> +	unsigned long flags;
> +	struct pvrdma_qp *qp = to_vqp(ibqp);
> +	struct pvrdma_rq_wqe_hdr *wqe_hdr;
> +	struct pvrdma_sge *sge;
> +	int index, nreq;
> +	int ret = 0;
> +	int i;
> +
> +	/*
> +	 * In the RESET state, we can fail immediately. For other states,
> +	 * just post and let the device figure it out.
> +	 */
> +	if (qp->state == IB_QPS_RESET) {
> +		*bad_wr = wr;
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_irqsave(&qp->rq.lock, flags);
> +
> +	index = pvrdma_idx(&qp->rq.ring->prod_tail, qp->rq.wqe_cnt);
> +	for (nreq = 0; wr; nreq++, wr = wr->next) {

ditto

> +		unsigned int tail;
> +
> +		if (unlikely(wr->num_sge > qp->rq.max_sg ||
> +			     wr->num_sge < 0)) {
> +			ret = -EINVAL;
> +			*bad_wr = wr;
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "recv SGE overflow\n");
> +			goto out;
> +		}
> +
> +		if (unlikely(!pvrdma_idx_ring_has_space(
> +				qp->rq.ring, qp->rq.wqe_cnt, &tail))) {
> +			ret = -ENOMEM;
> +			*bad_wr = wr;
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "recv queue full\n");
> +			goto out;
> +		}
> +
> +		wqe_hdr = (struct pvrdma_rq_wqe_hdr *)get_rq_wqe(qp, index);
> +		wqe_hdr->wr_id = wr->wr_id;
> +		wqe_hdr->num_sge = wr->num_sge;
> +		wqe_hdr->total_len = 0;
> +
> +		sge = (struct pvrdma_sge *)(wqe_hdr + 1);
> +		for (i = 0; i < wr->num_sge; i++) {
> +			sge->addr = wr->sg_list[i].addr;
> +			sge->length = wr->sg_list[i].length;
> +			sge->lkey = wr->sg_list[i].lkey;
> +			sge++;
> +		}
> +
> +		/* Make sure wqe is written before index update */
> +		smp_wmb();
> +
> +		index++;
> +		if (unlikely(index >= qp->rq.wqe_cnt))
> +			index = 0;
> +		/* Update shared rq ring */
> +		pvrdma_idx_ring_inc(&qp->rq.ring->prod_tail,
> +				    qp->rq.wqe_cnt);
> +	}
> +
> +	spin_unlock_irqrestore(&qp->rq.lock, flags);
> +
> +	pvrdma_write_uar_qp(dev, PVRDMA_UAR_QP_RECV | qp->qp_handle);
> +
> +	return ret;
> +
> +out:
> +	spin_unlock_irqrestore(&qp->rq.lock, flags);
> +
> +	return ret;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 5/9] IB/isert: Replace semaphore sem with completion
From: Arnd Bergmann @ 2016-11-21 14:50 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Binoy Jayan, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier), Faisal Latif, Mustafa Ismail, Mark Brown,
	linux-rdma
In-Reply-To: <5489f4ef-02fb-5207-6261-c08ee805acae@grimberg.me>

On Monday, November 21, 2016 2:33:02 PM CET Sagi Grimberg wrote:
> >>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> >>> index 6dd43f6..de80f56 100644
> >>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> >>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> >>> @@ -619,7 +619,7 @@
> >>>       mutex_unlock(&isert_np->mutex);
> >>>
> >>>       isert_info("np %p: Allow accept_np to continue\n", isert_np);
> >>> -     up(&isert_np->sem);
> >>> +     complete(&isert_np->comp);
> >>>  }
> >>>
> >>>  static void
> >>> @@ -2311,7 +2311,7 @@ struct rdma_cm_id *
> >>>               isert_err("Unable to allocate struct isert_np\n");
> >>>               return -ENOMEM;
> >>>       }
> >>> -     sema_init(&isert_np->sem, 0);
> >>> +     init_completion(&isert_np->comp);
> >>
> >> This is still racy, a connect event can complete just before we
> >> init the completion and *will* get lost...
> >>
> >> This code started off with a waitqueue which exposes the same
> >> problem, see:
> >> 531b7bf4bd79 Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
> >>
> >> So, still NAK from me...
> >
> > I don't see what you mean here. The code using a waitqueue had no
> > counter but the completion does.
> 
> The problem here is that init_completion sets the counter to zero
> and between this thread wakes up and until it initializes comp->done
> we might have another connect event completing it and I fail to
> see how this is handled correctly.

But the sema_init()/init_completion() is done right after the
structure is allocated, so nothing should have a pointer to it
at this time.

Later when we do the down()/wait_for_completion(), that will
just decrement the counter rather than initialize, and converting
from one to the other doesn't change anything here.

	Arnd

^ permalink raw reply

* Re: [PATCH v5 5/9] IB/isert: Replace semaphore sem with completion
From: Sagi Grimberg @ 2016-11-21 12:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Binoy Jayan, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier), Faisal Latif, Mustafa Ismail, Mark Brown,
	linux-rdma
In-Reply-To: <2625189.HB0S3sfnY7@wuerfel>


>>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
>>> index 6dd43f6..de80f56 100644
>>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
>>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
>>> @@ -619,7 +619,7 @@
>>>       mutex_unlock(&isert_np->mutex);
>>>
>>>       isert_info("np %p: Allow accept_np to continue\n", isert_np);
>>> -     up(&isert_np->sem);
>>> +     complete(&isert_np->comp);
>>>  }
>>>
>>>  static void
>>> @@ -2311,7 +2311,7 @@ struct rdma_cm_id *
>>>               isert_err("Unable to allocate struct isert_np\n");
>>>               return -ENOMEM;
>>>       }
>>> -     sema_init(&isert_np->sem, 0);
>>> +     init_completion(&isert_np->comp);
>>
>> This is still racy, a connect event can complete just before we
>> init the completion and *will* get lost...
>>
>> This code started off with a waitqueue which exposes the same
>> problem, see:
>> 531b7bf4bd79 Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
>>
>> So, still NAK from me...
>
> I don't see what you mean here. The code using a waitqueue had no
> counter but the completion does.

The problem here is that init_completion sets the counter to zero
and between this thread wakes up and until it initializes comp->done
we might have another connect event completing it and I fail to
see how this is handled correctly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Leon Romanovsky @ 2016-11-21 12:14 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Jason Gunthorpe, Valdis.Kletnieks-PjAqaU27lzQ,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML
In-Reply-To: <CACT4Y+Y1sL-8K7HhZ5M5ABVPG0kM=dKZh+6gvuNRvzXFtiA5aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2805 bytes --]

On Mon, Nov 21, 2016 at 12:48:35PM +0100, Dmitry Vyukov wrote:
> On Mon, Nov 21, 2016 at 12:44 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > On Mon, Nov 21, 2016 at 11:30:21AM +0100, Dmitry Vyukov wrote:
> >> On Mon, Nov 21, 2016 at 11:25 AM, Miroslav Benes <mbenes-AlSwsSmVLrQ@public.gmane.org> wrote:
> >> > On Mon, 21 Nov 2016, Dmitry Vyukov wrote:
> >> >
> >> >> WARNINGs mean kernel bugs.
> >> >> The one in ucma_write() points to user programming error
> >> >> or a malicious attempt. This is not a kernel bug, remove it.
> >> >>
> >> >> BUG/WARNs that are not kernel bugs hinder automated testing effots.
> >> >>
> >> >> Signed-off-by: Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> >> >> Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> >> Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >> >> Cc: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> >> Cc: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >> >> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> >> Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> >> Cc: syzkaller-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> >> >>
> >> >> ---
> >> >> Changes since v1:
> >> >>  - added printk_once
> >> >> ---
> >> >>  drivers/infiniband/core/ucma.c | 5 ++++-
> >> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> >> >> index 9520154..405d0ce 100644
> >> >> --- a/drivers/infiniband/core/ucma.c
> >> >> +++ b/drivers/infiniband/core/ucma.c
> >> >> @@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
> >> >>       struct rdma_ucm_cmd_hdr hdr;
> >> >>       ssize_t ret;
> >> >>
> >> >> -     if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
> >> >> +     if (!ib_safe_file_access(filp)) {
> >> >> +             printk_once("ucma_write: process %d (%s) tried to do something hinky\n",
> >> >> +                     task_tgid_vnr(current), current->comm);
> >> >>               return -EACCES;
> >> >> +     }
> >> >>
> >> >>       if (len < sizeof(hdr))
> >> >>               return -EINVAL;
> >> >
> >> > FWIW, WARN_ON_ONCE came with commit e6bd18f57aad ("IB/security: Restrict
> >> > use of the write() interface"). Would it make sense to change the other
> >> > places as well?
> >>
> >>
> >> I guess so.
> >> Can I ask somebody of infiniband maintainers to take care of this?
> >
> > Please see below,
> > Hope it helps.
>
> In ib_ucm_write function there is a wrong prefix:
>
> + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",

I did it intentionally to have the same errors for all flows.

>
> Otherwise looks good. Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Dmitry Vyukov @ 2016-11-21 11:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzkaller, Jason Gunthorpe, Valdis.Kletnieks, dledford,
	sean.hefty, Hal Rosenstock, linux-rdma, LKML
In-Reply-To: <20161121114417.GA4158@leon.nu>

On Mon, Nov 21, 2016 at 12:44 PM, Leon Romanovsky <leon@kernel.org> wrote:
> On Mon, Nov 21, 2016 at 11:30:21AM +0100, Dmitry Vyukov wrote:
>> On Mon, Nov 21, 2016 at 11:25 AM, Miroslav Benes <mbenes@suse.cz> wrote:
>> > On Mon, 21 Nov 2016, Dmitry Vyukov wrote:
>> >
>> >> WARNINGs mean kernel bugs.
>> >> The one in ucma_write() points to user programming error
>> >> or a malicious attempt. This is not a kernel bug, remove it.
>> >>
>> >> BUG/WARNs that are not kernel bugs hinder automated testing effots.
>> >>
>> >> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> >> Cc: Doug Ledford <dledford@redhat.com>
>> >> Cc: Sean Hefty <sean.hefty@intel.com>
>> >> Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
>> >> Cc: Leon Romanovsky <leon@kernel.org>
>> >> Cc: linux-rdma@vger.kernel.org
>> >> Cc: linux-kernel@vger.kernel.org
>> >> Cc: syzkaller@googlegroups.com
>> >>
>> >> ---
>> >> Changes since v1:
>> >>  - added printk_once
>> >> ---
>> >>  drivers/infiniband/core/ucma.c | 5 ++++-
>> >>  1 file changed, 4 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
>> >> index 9520154..405d0ce 100644
>> >> --- a/drivers/infiniband/core/ucma.c
>> >> +++ b/drivers/infiniband/core/ucma.c
>> >> @@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
>> >>       struct rdma_ucm_cmd_hdr hdr;
>> >>       ssize_t ret;
>> >>
>> >> -     if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
>> >> +     if (!ib_safe_file_access(filp)) {
>> >> +             printk_once("ucma_write: process %d (%s) tried to do something hinky\n",
>> >> +                     task_tgid_vnr(current), current->comm);
>> >>               return -EACCES;
>> >> +     }
>> >>
>> >>       if (len < sizeof(hdr))
>> >>               return -EINVAL;
>> >
>> > FWIW, WARN_ON_ONCE came with commit e6bd18f57aad ("IB/security: Restrict
>> > use of the write() interface"). Would it make sense to change the other
>> > places as well?
>>
>>
>> I guess so.
>> Can I ask somebody of infiniband maintainers to take care of this?
>
> Please see below,
> Hope it helps.

In ib_ucm_write function there is a wrong prefix:

+ pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",

Otherwise looks good. Thanks.

^ permalink raw reply

* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Leon Romanovsky @ 2016-11-21 11:44 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Jason Gunthorpe, Valdis.Kletnieks-PjAqaU27lzQ,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML
In-Reply-To: <CACT4Y+ZO9sTbBGwNUUMkAQ5m0N-4aT0S9xypCRbBUch1pQNw9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 2469 bytes --]

On Mon, Nov 21, 2016 at 11:30:21AM +0100, Dmitry Vyukov wrote:
> On Mon, Nov 21, 2016 at 11:25 AM, Miroslav Benes <mbenes-AlSwsSmVLrQ@public.gmane.org> wrote:
> > On Mon, 21 Nov 2016, Dmitry Vyukov wrote:
> >
> >> WARNINGs mean kernel bugs.
> >> The one in ucma_write() points to user programming error
> >> or a malicious attempt. This is not a kernel bug, remove it.
> >>
> >> BUG/WARNs that are not kernel bugs hinder automated testing effots.
> >>
> >> Signed-off-by: Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> >> Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >> Cc: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Cc: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Cc: syzkaller-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> >>
> >> ---
> >> Changes since v1:
> >>  - added printk_once
> >> ---
> >>  drivers/infiniband/core/ucma.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> >> index 9520154..405d0ce 100644
> >> --- a/drivers/infiniband/core/ucma.c
> >> +++ b/drivers/infiniband/core/ucma.c
> >> @@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
> >>       struct rdma_ucm_cmd_hdr hdr;
> >>       ssize_t ret;
> >>
> >> -     if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
> >> +     if (!ib_safe_file_access(filp)) {
> >> +             printk_once("ucma_write: process %d (%s) tried to do something hinky\n",
> >> +                     task_tgid_vnr(current), current->comm);
> >>               return -EACCES;
> >> +     }
> >>
> >>       if (len < sizeof(hdr))
> >>               return -EINVAL;
> >
> > FWIW, WARN_ON_ONCE came with commit e6bd18f57aad ("IB/security: Restrict
> > use of the write() interface"). Would it make sense to change the other
> > places as well?
>
>
> I guess so.
> Can I ask somebody of infiniband maintainers to take care of this?

Please see below,
Hope it helps.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #1.2: 0001-IB-core-qib-Remove-WARN-that-is-not-kernel-bug.patch --]
[-- Type: text/x-diff, Size: 3383 bytes --]

From e0bfe4771591f9ffbcaa4e66d6257ed95e2d7188 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Mon, 21 Nov 2016 13:30:59 +0200
Subject: [PATCH] IB/{core, qib}: Remove WARN that is not kernel bug

WARNINGs mean kernel bugs, in this case, they are placed
to mark programming errors and/or malicious attempts.

BUG/WARNs that are not kernel bugs hinder automated testing efforts.

Signed-off-by: Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/ucm.c            | 5 ++++-
 drivers/infiniband/core/ucma.c           | 5 ++++-
 drivers/infiniband/core/uverbs_main.c    | 5 ++++-
 drivers/infiniband/hw/qib/qib_file_ops.c | 5 ++++-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 7713ef0..0076b55 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1104,8 +1104,11 @@ static ssize_t ib_ucm_write(struct file *filp, const char __user *buf,
 	struct ib_ucm_cmd_hdr hdr;
 	ssize_t result;

-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (len < sizeof(hdr))
 		return -EINVAL;
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 9520154..31fa27f 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
 	struct rdma_ucm_cmd_hdr hdr;
 	ssize_t ret;

-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		pr_err_once("ucma_write: process %d (%s) tried to do something hinky\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (len < sizeof(hdr))
 		return -EINVAL;
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 44b1104..329eb15 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -746,8 +746,11 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	int srcu_key;
 	ssize_t ret;

-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		pr_err_once("uverbs_write: process %d (%s) tried to do something hinky\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (count < sizeof hdr)
 		return -EINVAL;
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 382466a..b43ea52 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -2066,8 +2066,11 @@ static ssize_t qib_write(struct file *fp, const char __user *data,
 	ssize_t ret = 0;
 	void *dest;

-	if (WARN_ON_ONCE(!ib_safe_file_access(fp)))
+	if (!ib_safe_file_access(fp)) {
+		pr_err_once("qib_write: process %d (%s) tried to do something hinky\n",
+			    task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}

 	if (count < sizeof(cmd.type)) {
 		ret = -EINVAL;
--
2.7.4


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related

* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Dmitry Vyukov @ 2016-11-21 10:30 UTC (permalink / raw)
  To: syzkaller
  Cc: Jason Gunthorpe, Valdis.Kletnieks-PjAqaU27lzQ,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Hal Rosenstock,
	leon-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	LKML
In-Reply-To: <alpine.LNX.2.00.1611211123450.22752-ztGlSCb7Y1iN3ZZ/Hiejyg@public.gmane.org>

On Mon, Nov 21, 2016 at 11:25 AM, Miroslav Benes <mbenes-AlSwsSmVLrQ@public.gmane.org> wrote:
> On Mon, 21 Nov 2016, Dmitry Vyukov wrote:
>
>> WARNINGs mean kernel bugs.
>> The one in ucma_write() points to user programming error
>> or a malicious attempt. This is not a kernel bug, remove it.
>>
>> BUG/WARNs that are not kernel bugs hinder automated testing effots.
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> Cc: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Cc: syzkaller-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>
>> ---
>> Changes since v1:
>>  - added printk_once
>> ---
>>  drivers/infiniband/core/ucma.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
>> index 9520154..405d0ce 100644
>> --- a/drivers/infiniband/core/ucma.c
>> +++ b/drivers/infiniband/core/ucma.c
>> @@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
>>       struct rdma_ucm_cmd_hdr hdr;
>>       ssize_t ret;
>>
>> -     if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
>> +     if (!ib_safe_file_access(filp)) {
>> +             printk_once("ucma_write: process %d (%s) tried to do something hinky\n",
>> +                     task_tgid_vnr(current), current->comm);
>>               return -EACCES;
>> +     }
>>
>>       if (len < sizeof(hdr))
>>               return -EINVAL;
>
> FWIW, WARN_ON_ONCE came with commit e6bd18f57aad ("IB/security: Restrict
> use of the write() interface"). Would it make sense to change the other
> places as well?


I guess so.
Can I ask somebody of infiniband maintainers to take care of this?
I just hit the warning in my automated testing environment when a
thread executed key_add in between open and write, then spent some
time debugging to figure out that this is an "invalid user input"
rather than a kernel bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Miroslav Benes @ 2016-11-21 10:25 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: jgunthorpe, Valdis.Kletnieks, dledford, sean.hefty,
	hal.rosenstock, leon, linux-rdma, linux-kernel, syzkaller
In-Reply-To: <1479723531-17940-1-git-send-email-dvyukov@google.com>

On Mon, 21 Nov 2016, Dmitry Vyukov wrote:

> WARNINGs mean kernel bugs.
> The one in ucma_write() points to user programming error
> or a malicious attempt. This is not a kernel bug, remove it.
> 
> BUG/WARNs that are not kernel bugs hinder automated testing effots.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Sean Hefty <sean.hefty@intel.com>
> Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: syzkaller@googlegroups.com
> 
> ---
> Changes since v1:
>  - added printk_once
> ---
>  drivers/infiniband/core/ucma.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index 9520154..405d0ce 100644
> --- a/drivers/infiniband/core/ucma.c
> +++ b/drivers/infiniband/core/ucma.c
> @@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
>  	struct rdma_ucm_cmd_hdr hdr;
>  	ssize_t ret;
>  
> -	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
> +	if (!ib_safe_file_access(filp)) {
> +		printk_once("ucma_write: process %d (%s) tried to do something hinky\n",
> +			task_tgid_vnr(current), current->comm);
>  		return -EACCES;
> +	}
>  
>  	if (len < sizeof(hdr))
>  		return -EINVAL;

FWIW, WARN_ON_ONCE came with commit e6bd18f57aad ("IB/security: Restrict 
use of the write() interface"). Would it make sense to change the other 
places as well?

Regards,
Miroslav

^ permalink raw reply

* Re: [PATCH v5 5/9] IB/isert: Replace semaphore sem with completion
From: Arnd Bergmann @ 2016-11-21 10:22 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Binoy Jayan, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier), Faisal Latif, Mustafa Ismail, Mark Brown,
	linux-rdma
In-Reply-To: <e1e4788d-9250-d4c3-937f-262bd9264a76@grimberg.me>

On Monday, November 21, 2016 9:36:22 AM CET Sagi Grimberg wrote:
> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> > index 6dd43f6..de80f56 100644
> > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > @@ -619,7 +619,7 @@
> >       mutex_unlock(&isert_np->mutex);
> >
> >       isert_info("np %p: Allow accept_np to continue\n", isert_np);
> > -     up(&isert_np->sem);
> > +     complete(&isert_np->comp);
> >  }
> >
> >  static void
> > @@ -2311,7 +2311,7 @@ struct rdma_cm_id *
> >               isert_err("Unable to allocate struct isert_np\n");
> >               return -ENOMEM;
> >       }
> > -     sema_init(&isert_np->sem, 0);
> > +     init_completion(&isert_np->comp);
> 
> This is still racy, a connect event can complete just before we
> init the completion and *will* get lost...
> 
> This code started off with a waitqueue which exposes the same
> problem, see:
> 531b7bf4bd79 Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
> 
> So, still NAK from me...

I don't see what you mean here. The code using a waitqueue had no
counter but the completion does. I had suggested that Binoy add
a comment into the code about this, as it is a rarely used
property of completions, but it does seem entirely valid to me.

	Arnd

^ permalink raw reply

* Re: [PATCH] infiniband: remove WARN that is not kernel bug
From: Dmitry Vyukov @ 2016-11-21 10:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Valdis.Kletnieks-PjAqaU27lzQ, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Hal Rosenstock,
	leon-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	LKML, syzkaller
In-Reply-To: <20161119185732.GF22775-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Sat, Nov 19, 2016 at 7:57 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Fri, Nov 18, 2016 at 09:22:42PM -0500, Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org wrote:
>> On Fri, 18 Nov 2016 12:24:37 +0100, Dmitry Vyukov said:
>> > WARNINGs mean kernel bugs.
>> > The one in ucma_write() points to user programming error
>> > or a malicious attempt. This is not a kernel bug, remove it.
>>
>> > -   if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
>> > +   if (!ib_safe_file_access(filp))
>> >             return -EACCES;
>>
>> In that case, wouldn't this be better?
>>
>>       if (!ib_safe_file_access(filp)) {
>>               printk_once("Process %d (%s) tried to do something hinky", pid, comm);
>>               return _EACCESS;
>>       }
>>
>> so the sysadmin becomes aware of the malicious attempt?
>
> Yes please

Mailed v2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2] infiniband: remove WARN that is not kernel bug
From: Dmitry Vyukov @ 2016-11-21 10:18 UTC (permalink / raw)
  To: jgunthorpe, Valdis.Kletnieks, dledford, sean.hefty,
	hal.rosenstock, leon
  Cc: Dmitry Vyukov, linux-rdma, linux-kernel, syzkaller

WARNINGs mean kernel bugs.
The one in ucma_write() points to user programming error
or a malicious attempt. This is not a kernel bug, remove it.

BUG/WARNs that are not kernel bugs hinder automated testing effots.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: syzkaller@googlegroups.com

---
Changes since v1:
 - added printk_once
---
 drivers/infiniband/core/ucma.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 9520154..405d0ce 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1584,8 +1584,11 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
 	struct rdma_ucm_cmd_hdr hdr;
 	ssize_t ret;
 
-	if (WARN_ON_ONCE(!ib_safe_file_access(filp)))
+	if (!ib_safe_file_access(filp)) {
+		printk_once("ucma_write: process %d (%s) tried to do something hinky\n",
+			task_tgid_vnr(current), current->comm);
 		return -EACCES;
+	}
 
 	if (len < sizeof(hdr))
 		return -EINVAL;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCH v5 5/9] IB/isert: Replace semaphore sem with completion
From: Sagi Grimberg @ 2016-11-21  7:36 UTC (permalink / raw)
  To: Binoy Jayan, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel
In-Reply-To: <1479708496-9828-6-git-send-email-binoy.jayan@linaro.org>


> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index 6dd43f6..de80f56 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -619,7 +619,7 @@
>  	mutex_unlock(&isert_np->mutex);
>
>  	isert_info("np %p: Allow accept_np to continue\n", isert_np);
> -	up(&isert_np->sem);
> +	complete(&isert_np->comp);
>  }
>
>  static void
> @@ -2311,7 +2311,7 @@ struct rdma_cm_id *
>  		isert_err("Unable to allocate struct isert_np\n");
>  		return -ENOMEM;
>  	}
> -	sema_init(&isert_np->sem, 0);
> +	init_completion(&isert_np->comp);

This is still racy, a connect event can complete just before we
init the completion and *will* get lost...

This code started off with a waitqueue which exposes the same
problem, see:
531b7bf4bd79 Target/iser: Fix iscsit_accept_np and rdma_cm racy flow

So, still NAK from me...

^ permalink raw reply

* [PATCH v5 9/9] IB/mlx5: Replace semaphore umr_common:sem with wait_event
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan@linaro.org>

Remove semaphore umr_common:sem used to limit concurrent access to umr qp
and introduce an atomic value 'users' to keep track of the same. Use a
wait_event to block when the limit is reached.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 6 +-----
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 7 ++++++-
 drivers/infiniband/hw/mlx5/mr.c      | 6 ++++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 63036c7..9de716c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2437,10 +2437,6 @@ static void destroy_umrc_res(struct mlx5_ib_dev *dev)
 	ib_dealloc_pd(dev->umrc.pd);
 }
 
-enum {
-	MAX_UMR_WR = 128,
-};
-
 static int create_umr_res(struct mlx5_ib_dev *dev)
 {
 	struct ib_qp_init_attr *init_attr = NULL;
@@ -2520,7 +2516,7 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
 	dev->umrc.cq = cq;
 	dev->umrc.pd = pd;
 
-	sema_init(&dev->umrc.sem, MAX_UMR_WR);
+	init_waitqueue_head(&dev->umrc.wq);
 	ret = mlx5_mr_cache_init(dev);
 	if (ret) {
 		mlx5_ib_warn(dev, "mr cache init failed %d\n", ret);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dcdcd19..de31b5f 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -533,7 +533,12 @@ struct umr_common {
 	struct ib_qp	*qp;
 	/* control access to UMR QP
 	 */
-	struct semaphore	sem;
+	wait_queue_head_t	wq;
+	atomic_t		users;
+};
+
+enum {
+	MAX_UMR_WR = 128,
 };
 
 enum {
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 1593856..dfaf6f6 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -867,7 +867,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
 	mlx5_ib_init_umr_context(&umr_context);
 	umrwr->wr.wr_cqe = &umr_context.cqe;
 
-	down(&umrc->sem);
+	/* limit number of concurrent ib_post_send() on qp */
+	wait_event(umrc->wq, atomic_add_unless(&umrc->users, 1, MAX_UMR_WR));
 	err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
 	if (err) {
 		mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
@@ -879,7 +880,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
 			err = -EFAULT;
 		}
 	}
-	up(&umrc->sem);
+	atomic_dec(&umrc->users);
+	wake_up(&umrc->wq);
 	return err;
 }
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v5 8/9] IB/mlx5: Add helper mlx5_ib_post_send_wait
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan@linaro.org>

Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore

As semaphores are going away in the future, moving all of these into the
shared helper leaves only a single function using the semaphore, which
can then be rewritten to use something else.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/mlx5/mr.c | 115 +++++++++++-----------------------------
 1 file changed, 32 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index d4ad672..1593856 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -856,16 +856,40 @@ static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
 	init_completion(&context->done);
 }
 
+static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
+					 struct mlx5_umr_wr *umrwr)
+{
+	struct umr_common *umrc = &dev->umrc;
+	struct ib_send_wr *bad;
+	int err;
+	struct mlx5_ib_umr_context umr_context;
+
+	mlx5_ib_init_umr_context(&umr_context);
+	umrwr->wr.wr_cqe = &umr_context.cqe;
+
+	down(&umrc->sem);
+	err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
+	if (err) {
+		mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
+	} else {
+		wait_for_completion(&umr_context.done);
+		if (umr_context.status != IB_WC_SUCCESS) {
+			mlx5_ib_warn(dev, "reg umr failed (%u)\n",
+				     umr_context.status);
+			err = -EFAULT;
+		}
+	}
+	up(&umrc->sem);
+	return err;
+}
+
 static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
 				  u64 virt_addr, u64 len, int npages,
 				  int page_shift, int order, int access_flags)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct device *ddev = dev->ib_dev.dma_device;
-	struct umr_common *umrc = &dev->umrc;
-	struct mlx5_ib_umr_context umr_context;
 	struct mlx5_umr_wr umrwr = {};
-	struct ib_send_wr *bad;
 	struct mlx5_ib_mr *mr;
 	struct ib_sge sg;
 	int size;
@@ -894,24 +918,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
 	if (err)
 		goto free_mr;
 
-	mlx5_ib_init_umr_context(&umr_context);
-
-	umrwr.wr.wr_cqe = &umr_context.cqe;
 	prep_umr_reg_wqe(pd, &umrwr.wr, &sg, dma, npages, mr->mmkey.key,
 			 page_shift, virt_addr, len, access_flags);
 
-	down(&umrc->sem);
-	err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
-	if (err) {
-		mlx5_ib_warn(dev, "post send failed, err %d\n", err);
+	err = mlx5_ib_post_send_wait(dev, &umrwr);
+	if (err && err != -EFAULT)
 		goto unmap_dma;
-	} else {
-		wait_for_completion(&umr_context.done);
-		if (umr_context.status != IB_WC_SUCCESS) {
-			mlx5_ib_warn(dev, "reg umr failed\n");
-			err = -EFAULT;
-		}
-	}
 
 	mr->mmkey.iova = virt_addr;
 	mr->mmkey.size = len;
@@ -920,7 +932,6 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
 	mr->live = 1;
 
 unmap_dma:
-	up(&umrc->sem);
 	dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
 
 	kfree(mr_pas);
@@ -940,13 +951,10 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,
 {
 	struct mlx5_ib_dev *dev = mr->dev;
 	struct device *ddev = dev->ib_dev.dma_device;
-	struct umr_common *umrc = &dev->umrc;
-	struct mlx5_ib_umr_context umr_context;
 	struct ib_umem *umem = mr->umem;
 	int size;
 	__be64 *pas;
 	dma_addr_t dma;
-	struct ib_send_wr *bad;
 	struct mlx5_umr_wr wr;
 	struct ib_sge sg;
 	int err = 0;
@@ -1011,10 +1019,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,
 
 		dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);
 
-		mlx5_ib_init_umr_context(&umr_context);
-
 		memset(&wr, 0, sizeof(wr));
-		wr.wr.wr_cqe = &umr_context.cqe;
 
 		sg.addr = dma;
 		sg.length = ALIGN(npages * sizeof(u64),
@@ -1031,19 +1036,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,
 		wr.mkey = mr->mmkey.key;
 		wr.target.offset = start_page_index;
 
-		down(&umrc->sem);
-		err = ib_post_send(umrc->qp, &wr.wr, &bad);
-		if (err) {
-			mlx5_ib_err(dev, "UMR post send failed, err %d\n", err);
-		} else {
-			wait_for_completion(&umr_context.done);
-			if (umr_context.status != IB_WC_SUCCESS) {
-				mlx5_ib_err(dev, "UMR completion failed, code %d\n",
-					    umr_context.status);
-				err = -EFAULT;
-			}
-		}
-		up(&umrc->sem);
+		err = mlx5_ib_post_send_wait(dev, &wr);
 	}
 	dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
 
@@ -1210,39 +1203,14 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 static int unreg_umr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
 {
 	struct mlx5_core_dev *mdev = dev->mdev;
-	struct umr_common *umrc = &dev->umrc;
-	struct mlx5_ib_umr_context umr_context;
 	struct mlx5_umr_wr umrwr = {};
-	struct ib_send_wr *bad;
-	int err;
 
 	if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
 		return 0;
 
-	mlx5_ib_init_umr_context(&umr_context);
-
-	umrwr.wr.wr_cqe = &umr_context.cqe;
 	prep_umr_unreg_wqe(dev, &umrwr.wr, mr->mmkey.key);
 
-	down(&umrc->sem);
-	err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
-	if (err) {
-		up(&umrc->sem);
-		mlx5_ib_dbg(dev, "err %d\n", err);
-		goto error;
-	} else {
-		wait_for_completion(&umr_context.done);
-		up(&umrc->sem);
-	}
-	if (umr_context.status != IB_WC_SUCCESS) {
-		mlx5_ib_warn(dev, "unreg umr failed\n");
-		err = -EFAULT;
-		goto error;
-	}
-	return 0;
-
-error:
-	return err;
+	return mlx5_ib_post_send_wait(dev, &umrwr);
 }
 
 static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr, u64 virt_addr,
@@ -1251,19 +1219,13 @@ static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr, u64 virt_addr,
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct device *ddev = dev->ib_dev.dma_device;
-	struct mlx5_ib_umr_context umr_context;
-	struct ib_send_wr *bad;
 	struct mlx5_umr_wr umrwr = {};
 	struct ib_sge sg;
-	struct umr_common *umrc = &dev->umrc;
 	dma_addr_t dma = 0;
 	__be64 *mr_pas = NULL;
 	int size;
 	int err;
 
-	mlx5_ib_init_umr_context(&umr_context);
-
-	umrwr.wr.wr_cqe = &umr_context.cqe;
 	umrwr.wr.send_flags = MLX5_IB_SEND_UMR_FAIL_IF_FREE;
 
 	if (flags & IB_MR_REREG_TRANS) {
@@ -1291,21 +1253,8 @@ static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr, u64 virt_addr,
 	}
 
 	/* post send request to UMR QP */
-	down(&umrc->sem);
-	err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
+	err = mlx5_ib_post_send_wait(dev, &umrwr);
 
-	if (err) {
-		mlx5_ib_warn(dev, "post send failed, err %d\n", err);
-	} else {
-		wait_for_completion(&umr_context.done);
-		if (umr_context.status != IB_WC_SUCCESS) {
-			mlx5_ib_warn(dev, "reg umr failed (%u)\n",
-				     umr_context.status);
-			err = -EFAULT;
-		}
-	}
-
-	up(&umrc->sem);
 	if (flags & IB_MR_REREG_TRANS) {
 		dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
 		kfree(mr_pas);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v5 7/9] IB/mthca: Replace counting semaphore event_sem with wait_event
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan@linaro.org>

Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/mthca/mthca_cmd.c | 47 ++++++++++++++++++++++-----------
 drivers/infiniband/hw/mthca/mthca_dev.h |  3 ++-
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 49c6e19..d6a048a 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -405,6 +405,34 @@ void mthca_cmd_event(struct mthca_dev *dev,
 	complete(&context->done);
 }
 
+static inline struct mthca_cmd_context *
+mthca_try_get_context(struct mthca_cmd *cmd)
+{
+	struct mthca_cmd_context *context = NULL;
+
+	spin_lock(&cmd->context_lock);
+
+	if (cmd->free_head < 0)
+		goto out;
+
+	context = &cmd->context[cmd->free_head];
+	context->token += cmd->token_mask + 1;
+	cmd->free_head = context->next;
+out:
+	spin_unlock(&cmd->context_lock);
+	return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct mthca_cmd_context *
+mthca_get_free_context(struct mthca_cmd *cmd)
+{
+	struct mthca_cmd_context *context;
+
+	wait_event(cmd->wq, (context = mthca_try_get_context(cmd)));
+	return context;
+}
+
 static int mthca_cmd_wait(struct mthca_dev *dev,
 			  u64 in_param,
 			  u64 *out_param,
@@ -417,15 +445,7 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
 	int err = 0;
 	struct mthca_cmd_context *context;
 
-	down(&dev->cmd.event_sem);
-
-	spin_lock(&dev->cmd.context_lock);
-	BUG_ON(dev->cmd.free_head < 0);
-	context = &dev->cmd.context[dev->cmd.free_head];
-	context->token += dev->cmd.token_mask + 1;
-	dev->cmd.free_head = context->next;
-	spin_unlock(&dev->cmd.context_lock);
-
+	context = mthca_get_free_context(&dev->cmd);
 	init_completion(&context->done);
 
 	err = mthca_cmd_post(dev, in_param,
@@ -458,8 +478,8 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
 	context->next = dev->cmd.free_head;
 	dev->cmd.free_head = context - dev->cmd.context;
 	spin_unlock(&dev->cmd.context_lock);
+	wake_up(&dev->cmd.wq);
 
-	up(&dev->cmd.event_sem);
 	return err;
 }
 
@@ -571,7 +591,7 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
 	dev->cmd.context[dev->cmd.max_cmds - 1].next = -1;
 	dev->cmd.free_head = 0;
 
-	sema_init(&dev->cmd.event_sem, dev->cmd.max_cmds);
+	init_waitqueue_head(&dev->cmd.wq);
 	spin_lock_init(&dev->cmd.context_lock);
 
 	for (dev->cmd.token_mask = 1;
@@ -590,12 +610,9 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
  */
 void mthca_cmd_use_polling(struct mthca_dev *dev)
 {
-	int i;
-
 	dev->cmd.flags &= ~MTHCA_CMD_USE_EVENTS;
 
-	for (i = 0; i < dev->cmd.max_cmds; ++i)
-		down(&dev->cmd.event_sem);
+	dev->cmd.free_head = -1;
 
 	kfree(dev->cmd.context);
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 87ab964..2fc86db 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -46,6 +46,7 @@
 #include <linux/list.h>
 #include <linux/semaphore.h>
 
+#include <rdma/ib_sa.h>
 #include "mthca_provider.h"
 #include "mthca_doorbell.h"
 
@@ -121,7 +122,7 @@ struct mthca_cmd {
 	struct pci_pool          *pool;
 	struct mutex              hcr_mutex;
 	struct mutex		  poll_mutex;
-	struct semaphore 	  event_sem;
+	wait_queue_head_t	  wq;
 	int              	  max_cmds;
 	spinlock_t                context_lock;
 	int                       free_head;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v5 6/9] IB/hns: Replace counting semaphore event_sem with wait_event
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan@linaro.org>

Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    | 46 ++++++++++++++++++++---------
 drivers/infiniband/hw/hns/hns_roce_device.h |  2 +-
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 51a0675..12ef3d8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -189,6 +189,34 @@ void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 token, u8 status,
 	complete(&context->done);
 }
 
+static inline struct hns_roce_cmd_context *
+hns_roce_try_get_context(struct hns_roce_cmdq *cmd)
+{
+	struct hns_roce_cmd_context *context = NULL;
+
+	spin_lock(&cmd->context_lock);
+
+	if (cmd->free_head < 0)
+		goto out;
+
+	context = &cmd->context[cmd->free_head];
+	context->token += cmd->token_mask + 1;
+	cmd->free_head = context->next;
+out:
+	spin_unlock(&cmd->context_lock);
+	return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct hns_roce_cmd_context *
+hns_roce_get_free_context(struct hns_roce_cmdq *cmd)
+{
+	struct hns_roce_cmd_context *context;
+
+	wait_event(cmd->wq, (context = hns_roce_try_get_context(cmd)));
+	return context;
+}
+
 /* this should be called with "use_events" */
 static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 				    u64 out_param, unsigned long in_modifier,
@@ -200,13 +228,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 	struct hns_roce_cmd_context *context;
 	int ret = 0;
 
-	spin_lock(&cmd->context_lock);
-	WARN_ON(cmd->free_head < 0);
-	context = &cmd->context[cmd->free_head];
-	context->token += cmd->token_mask + 1;
-	cmd->free_head = context->next;
-	spin_unlock(&cmd->context_lock);
-
+	context = hns_roce_get_free_context(cmd);
 	init_completion(&context->done);
 
 	ret = hns_roce_cmd_mbox_post_hw(hr_dev, in_param, out_param,
@@ -238,6 +260,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 	context->next = cmd->free_head;
 	cmd->free_head = context - cmd->context;
 	spin_unlock(&cmd->context_lock);
+	wake_up(&cmd->wq);
 
 	return ret;
 }
@@ -248,10 +271,8 @@ static int hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 {
 	int ret = 0;
 
-	down(&hr_dev->cmd.event_sem);
 	ret = __hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
 				       in_modifier, op_modifier, op, timeout);
-	up(&hr_dev->cmd.event_sem);
 
 	return ret;
 }
@@ -313,7 +334,7 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 	hr_cmd->context[hr_cmd->max_cmds - 1].next = -1;
 	hr_cmd->free_head = 0;
 
-	sema_init(&hr_cmd->event_sem, hr_cmd->max_cmds);
+	init_waitqueue_head(&hr_cmd->wq);
 	spin_lock_init(&hr_cmd->context_lock);
 
 	hr_cmd->token_mask = CMD_TOKEN_MASK;
@@ -325,12 +346,9 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
-	int i;
 
 	hr_cmd->use_events = 0;
-
-	for (i = 0; i < hr_cmd->max_cmds; ++i)
-		down(&hr_cmd->event_sem);
+	hr_cmd->free_head = -1;
 
 	kfree(hr_cmd->context);
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 2afe075..ac95f52 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -364,7 +364,7 @@ struct hns_roce_cmdq {
 	* Event mode: cmd register mutex protection,
 	* ensure to not exceed max_cmds and user use limit region
 	*/
-	struct semaphore	event_sem;
+	wait_queue_head_t       wq;
 	int			max_cmds;
 	spinlock_t		context_lock;
 	int			free_head;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v5 5/9] IB/isert: Replace semaphore sem with completion
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan@linaro.org>

The semaphore 'sem' in isert_device is used as completion, but in a
counting fashion as isert_connected_handler could be called multiple times
during which it allows for that number of waiters (isert_accept_np) to
continue without blocking, each consuming one node out from the list
isert_np-pending in the same order in which they were enqueued (FIFO). So,
convert it to struct completion. Semaphores are going away in the future.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 6 +++---
 drivers/infiniband/ulp/isert/ib_isert.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f6..de80f56 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -619,7 +619,7 @@
 	mutex_unlock(&isert_np->mutex);
 
 	isert_info("np %p: Allow accept_np to continue\n", isert_np);
-	up(&isert_np->sem);
+	complete(&isert_np->comp);
 }
 
 static void
@@ -2311,7 +2311,7 @@ struct rdma_cm_id *
 		isert_err("Unable to allocate struct isert_np\n");
 		return -ENOMEM;
 	}
-	sema_init(&isert_np->sem, 0);
+	init_completion(&isert_np->comp);
 	mutex_init(&isert_np->mutex);
 	INIT_LIST_HEAD(&isert_np->accepted);
 	INIT_LIST_HEAD(&isert_np->pending);
@@ -2427,7 +2427,7 @@ struct rdma_cm_id *
 	int ret;
 
 accept_wait:
-	ret = down_interruptible(&isert_np->sem);
+	ret = wait_for_completion_interruptible(&isert_np->comp);
 	if (ret)
 		return -ENODEV;
 
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index c02ada5..a1277c0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -3,6 +3,7 @@
 #include <linux/in6.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+#include <linux/completion.h>
 #include <rdma/rw.h>
 #include <scsi/iser.h>
 
@@ -190,7 +191,7 @@ struct isert_device {
 
 struct isert_np {
 	struct iscsi_np         *np;
-	struct semaphore	sem;
+	struct completion	comp;
 	struct rdma_cm_id	*cm_id;
 	struct mutex		mutex;
 	struct list_head	accepted;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v5 4/9] IB/mthca: Replace semaphore poll_sem with mutex
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma, linux-kernel, target-devel, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan@linaro.org>

The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 drivers/infiniband/hw/mthca/mthca_cmd.c | 10 +++-------
 drivers/infiniband/hw/mthca/mthca_cmd.h |  1 +
 drivers/infiniband/hw/mthca/mthca_dev.h |  2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index c7f49bb..49c6e19 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -347,7 +347,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
 	unsigned long end;
 	u8 status;
 
-	down(&dev->cmd.poll_sem);
+	mutex_lock(&dev->cmd.poll_mutex);
 
 	err = mthca_cmd_post(dev, in_param,
 			     out_param ? *out_param : 0,
@@ -382,7 +382,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
 	}
 
 out:
-	up(&dev->cmd.poll_sem);
+	mutex_unlock(&dev->cmd.poll_mutex);
 	return err;
 }
 
@@ -520,7 +520,7 @@ static int mthca_cmd_imm(struct mthca_dev *dev,
 int mthca_cmd_init(struct mthca_dev *dev)
 {
 	mutex_init(&dev->cmd.hcr_mutex);
-	sema_init(&dev->cmd.poll_sem, 1);
+	mutex_init(&dev->cmd.poll_mutex);
 	dev->cmd.flags = 0;
 
 	dev->hcr = ioremap(pci_resource_start(dev->pdev, 0) + MTHCA_HCR_BASE,
@@ -582,8 +582,6 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
 
 	dev->cmd.flags |= MTHCA_CMD_USE_EVENTS;
 
-	down(&dev->cmd.poll_sem);
-
 	return 0;
 }
 
@@ -600,8 +598,6 @@ void mthca_cmd_use_polling(struct mthca_dev *dev)
 		down(&dev->cmd.event_sem);
 
 	kfree(dev->cmd.context);
-
-	up(&dev->cmd.poll_sem);
 }
 
 struct mthca_mailbox *mthca_alloc_mailbox(struct mthca_dev *dev,
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.h b/drivers/infiniband/hw/mthca/mthca_cmd.h
index d2e5b19..a7f197e 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.h
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.h
@@ -35,6 +35,7 @@
 #ifndef MTHCA_CMD_H
 #define MTHCA_CMD_H
 
+#include <linux/mutex.h>
 #include <rdma/ib_verbs.h>
 
 #define MTHCA_MAILBOX_SIZE 4096
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 4393a02..87ab964 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -120,7 +120,7 @@ enum {
 struct mthca_cmd {
 	struct pci_pool          *pool;
 	struct mutex              hcr_mutex;
-	struct semaphore 	  poll_sem;
+	struct mutex		  poll_mutex;
 	struct semaphore 	  event_sem;
 	int              	  max_cmds;
 	spinlock_t                context_lock;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH v5 3/9] IB/hns: Replace semaphore poll_sem with mutex
From: Binoy Jayan @ 2016-11-21  6:08 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Sagi Grimberg, Hal Rosenstock,
	Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	Nicholas Bellinger, Jenny Derzhavetz, Ira Weiny, Steve Wise,
	Mark Bloch, Tatyana E Nikolova, Matan Barak, Lijun Ou,
	Wei Hu(Xavier)
  Cc: Faisal Latif, Mustafa Ismail, Mark Brown, Arnd Bergmann,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Binoy Jayan
In-Reply-To: <1479708496-9828-1-git-send-email-binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan <binoy.jayan-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    | 11 ++++-------
 drivers/infiniband/hw/hns/hns_roce_device.h |  3 ++-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 2a0b6c0..51a0675 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -119,7 +119,7 @@ static int hns_roce_cmd_mbox_post_hw(struct hns_roce_dev *hr_dev, u64 in_param,
 	return ret;
 }
 
-/* this should be called with "poll_sem" */
+/* this should be called with "poll_mutex" */
 static int __hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
 				    u64 out_param, unsigned long in_modifier,
 				    u8 op_modifier, u16 op,
@@ -167,10 +167,10 @@ static int hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
 {
 	int ret;
 
-	down(&hr_dev->cmd.poll_sem);
+	mutex_lock(&hr_dev->cmd.poll_mutex);
 	ret = __hns_roce_cmd_mbox_poll(hr_dev, in_param, out_param, in_modifier,
 				       op_modifier, op, timeout);
-	up(&hr_dev->cmd.poll_sem);
+	mutex_unlock(&hr_dev->cmd.poll_mutex);
 
 	return ret;
 }
@@ -275,7 +275,7 @@ int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
 	struct device *dev = &hr_dev->pdev->dev;
 
 	mutex_init(&hr_dev->cmd.hcr_mutex);
-	sema_init(&hr_dev->cmd.poll_sem, 1);
+	mutex_init(&hr_dev->cmd.poll_mutex);
 	hr_dev->cmd.use_events = 0;
 	hr_dev->cmd.toggle = 1;
 	hr_dev->cmd.max_cmds = CMD_MAX_NUM;
@@ -319,8 +319,6 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
 	hr_cmd->token_mask = CMD_TOKEN_MASK;
 	hr_cmd->use_events = 1;
 
-	down(&hr_cmd->poll_sem);
-
 	return 0;
 }
 
@@ -335,7 +333,6 @@ void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
 		down(&hr_cmd->event_sem);
 
 	kfree(hr_cmd->context);
-	up(&hr_cmd->poll_sem);
 }
 
 struct hns_roce_cmd_mailbox
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..2afe075 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -34,6 +34,7 @@
 #define _HNS_ROCE_DEVICE_H
 
 #include <rdma/ib_verbs.h>
+#include <linux/mutex.h>
 
 #define DRV_NAME "hns_roce"
 
@@ -358,7 +359,7 @@ struct hns_roce_cmdq {
 	struct dma_pool		*pool;
 	u8 __iomem		*hcr;
 	struct mutex		hcr_mutex;
-	struct semaphore	poll_sem;
+	struct mutex		poll_mutex;
 	/*
 	* Event mode: cmd register mutex protection,
 	* ensure to not exceed max_cmds and user use limit region
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox