kernel-janitors.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] infiniband: uverbs: limit the number of entries
@ 2010-10-07  7:16 Dan Carpenter
  2010-10-07 16:16 ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Carpenter @ 2010-10-07  7:16 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sean Hefty, Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

If we don't limit cmd.ne then the multiplications can overflow.  This
will allocate a small amount of RAM successfully for the "resp" and
"wc" buffers.  The heap will get corrupted when we call ib_poll_cq().

Documentation/infiniband/user_verbs.txt suggests this function is meant
for unprivileged access.

I chose to limit the number of entries to 1000.  That limits the
allocations to 52kb of RAM at the most.  I didn't want to choose a
lower number and break userspace for someone.

Also we don't necessarily fill the "resp" buffer so I changed the
kmalloc() to a kzalloc() to avoid an information leak.

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -162,6 +162,7 @@ void ib_uverbs_srq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_event_handler(struct ib_event_handler *handler,
 			     struct ib_event *event);
 
+#define UVERBS_MAX_NUM_ENTRIES 1000
 #define IB_UVERBS_DECLARE_CMD(name)					\
 	ssize_t ib_uverbs_##name(struct ib_uverbs_file *file,		\
 				 const char __user *buf, int in_len,	\
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -906,12 +906,15 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
+	if (cmd.ne > UVERBS_MAX_NUM_ENTRIES)
+		return -EINVAL;
+
 	wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
 	if (!wc)
 		return -ENOMEM;
 
 	rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
-	resp = kmalloc(rsize, GFP_KERNEL);
+	resp = kzalloc(rsize, GFP_KERNEL);
 	if (!resp) {
 		ret = -ENOMEM;
 		goto out_wc;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] infiniband: uverbs: limit the number of entries
  2010-10-07  7:16 [patch] infiniband: uverbs: limit the number of entries Dan Carpenter
@ 2010-10-07 16:16 ` Jason Gunthorpe
       [not found]   ` <20101007161649.GD21206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2010-10-07 16:16 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 07, 2010 at 09:16:10AM +0200, Dan Carpenter wrote:
> If we don't limit cmd.ne then the multiplications can overflow.  This
> will allocate a small amount of RAM successfully for the "resp" and
> "wc" buffers.  The heap will get corrupted when we call ib_poll_cq().

I think you could cap the number of returned entries to
UVERBS_MAX_NUM_ENTRIES rather than return EINVAL. That might be more
compatible with user space..

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] infiniband: uverbs: limit the number of entries
       [not found]   ` <20101007161649.GD21206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-10-07 16:59     ` Dan Carpenter
  2010-10-08  7:59       ` Nicolas Palix
  2010-10-09 23:16       ` [patch] " Jason Gunthorpe
  0 siblings, 2 replies; 15+ messages in thread
From: Dan Carpenter @ 2010-10-07 16:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 07, 2010 at 10:16:49AM -0600, Jason Gunthorpe wrote:
> On Thu, Oct 07, 2010 at 09:16:10AM +0200, Dan Carpenter wrote:
> > If we don't limit cmd.ne then the multiplications can overflow.  This
> > will allocate a small amount of RAM successfully for the "resp" and
> > "wc" buffers.  The heap will get corrupted when we call ib_poll_cq().
> 
> I think you could cap the number of returned entries to
> UVERBS_MAX_NUM_ENTRIES rather than return EINVAL. That might be more
> compatible with user space..
> 

Good idea.  I don't actually have this hardware, so I can't test it, but
that definitely sounds reasonable.

If we did that then UVERBS_MAX_NUM_ENTRIES could be lower than 1000.
What is a reasonable number?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] infiniband: uverbs: limit the number of entries
  2010-10-07 16:59     ` Dan Carpenter
@ 2010-10-08  7:59       ` Nicolas Palix
       [not found]         ` <AANLkTin5zou2JHsdDyhGESuxyPonOs3kLo9Th0vg-kd8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-10-09 23:16       ` [patch] " Jason Gunthorpe
  1 sibling, 1 reply; 15+ messages in thread
From: Nicolas Palix @ 2010-10-08  7:59 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Jason Gunthorpe, Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma, kernel-janitors

Hi,

On Thu, Oct 7, 2010 at 6:59 PM, Dan Carpenter <error27@gmail.com> wrote:
> On Thu, Oct 07, 2010 at 10:16:49AM -0600, Jason Gunthorpe wrote:
>> On Thu, Oct 07, 2010 at 09:16:10AM +0200, Dan Carpenter wrote:
>> > If we don't limit cmd.ne then the multiplications can overflow.  This
>> > will allocate a small amount of RAM successfully for the "resp" and
>> > "wc" buffers.  The heap will get corrupted when we call ib_poll_cq().
>>
>> I think you could cap the number of returned entries to
>> UVERBS_MAX_NUM_ENTRIES rather than return EINVAL. That might be more
>> compatible with user space..
>>
>
> Good idea.  I don't actually have this hardware, so I can't test it, but
> that definitely sounds reasonable.
>
> If we did that then UVERBS_MAX_NUM_ENTRIES could be lower than 1000.
> What is a reasonable number?

You can also use kcalloc to allocate wc.

>
> regards,
> dan carpenter
>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Nicolas Palix
Tel: +33 6 81 07 91 72
--
To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch v2] infiniband: uverbs: limit the number of entries
       [not found]         ` <AANLkTin5zou2JHsdDyhGESuxyPonOs3kLo9Th0vg-kd8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-08 14:25           ` Dan Carpenter
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Carpenter @ 2010-10-08 14:25 UTC (permalink / raw)
  To: Nicolas Palix
  Cc: Jason Gunthorpe, Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma, kernel-janitors


If we don't limit cmd.ne then the multiplications can overflow.  This
will allocate a small amount of RAM successfully for the "resp" and
"wc" buffers.  The heap will get corrupted when we call ib_poll_cq().

Documentation/infiniband/user_verbs.txt suggests this function is meant
for unprivileged access.

I chose to limit the number of entries to 500.  That limits the
allocations to 26 kb of RAM at the most.

Also we don't necessarily fill the "resp" buffer so I changed the
kmalloc() to a kzalloc() to avoid an information leak.  And I changed
the wc allocation to kcalloc() as a cleanup.

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -162,6 +162,7 @@ void ib_uverbs_srq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_event_handler(struct ib_event_handler *handler,
 			     struct ib_event *event);
 
+#define UVERBS_MAX_NUM_ENTRIES 500
 #define IB_UVERBS_DECLARE_CMD(name)					\
 	ssize_t ib_uverbs_##name(struct ib_uverbs_file *file,		\
 				 const char __user *buf, int in_len,	\
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -906,12 +906,15 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
+	if (cmd.ne > UVERBS_MAX_NUM_ENTRIES)
+		cmd.ne = UVERBS_MAX_NUM_ENTRIES;
+
-	wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
+	wc = kcalloc(cmd.ne, sizeof *wc, GFP_KERNEL);
 	if (!wc)
 		return -ENOMEM;
 
 	rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
-	resp = kmalloc(rsize, GFP_KERNEL);
+	resp = kzalloc(rsize, GFP_KERNEL);
 	if (!resp) {
 		ret = -ENOMEM;
 		goto out_wc;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch] infiniband: uverbs: limit the number of entries
  2010-10-07 16:59     ` Dan Carpenter
  2010-10-08  7:59       ` Nicolas Palix
@ 2010-10-09 23:16       ` Jason Gunthorpe
       [not found]         ` <20101009231607.GA24649-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2010-10-09 23:16 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 07, 2010 at 06:59:47PM +0200, Dan Carpenter wrote:
> On Thu, Oct 07, 2010 at 10:16:49AM -0600, Jason Gunthorpe wrote:
> > On Thu, Oct 07, 2010 at 09:16:10AM +0200, Dan Carpenter wrote:
> > > If we don't limit cmd.ne then the multiplications can overflow.  This
> > > will allocate a small amount of RAM successfully for the "resp" and
> > > "wc" buffers.  The heap will get corrupted when we call ib_poll_cq().
> > 
> > I think you could cap the number of returned entries to
> > UVERBS_MAX_NUM_ENTRIES rather than return EINVAL. That might be more
> > compatible with user space..
> > 
> 
> Good idea.  I don't actually have this hardware, so I can't test it, but
> that definitely sounds reasonable.

I was just writing some code related to this, and I'm not so sure
anymore. The problem is that the CQ has to be fully drained to properly
synchronize with the edge triggered poll. I was just about to write
a check that assumes it is drained based on returning not equal to
what was requested rather than calling it again and possibly getting 0
back.

I'm wondering if other apps have done something like this? Things
will 'work' but it introduces a subtle race condition with CQ draining
and CQ poll events. 

I guess the ideal thing would be to code this routine to handle
an arbitary count without allocating arbitary memory - by copying
to user space in limited batches.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch v3] infiniband: uverbs: handle large number of entries
       [not found]         ` <20101009231607.GA24649-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-10-12 11:31           ` Dan Carpenter
  2010-10-12 21:01             ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Carpenter @ 2010-10-12 11:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

In the original code there was a potential integer overflow if you
passed in a large cmd.ne.  The calls to kmalloc() would allocate smaller
buffers than intended, leading to memory corruption.

There was also an information leak.

Documentation/infiniband/user_verbs.txt suggests this function is meant
for unprivileged access.

Jason Gunthorpe suggested that I should modify it to pass the data to
the user bit by bit and avoid the kmalloc() entirely.

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>
---
Please, please, check this.  I've think I've done it right, but I don't
have the hardware and can not test it.

It's strange to me that we return "in_len" on success.

struct ib_uverbs_poll_cq_resp is used by userspace libraries right?
Otherwise I could delete it.

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 6fcfbeb..b0788b6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -891,68 +891,89 @@ out:
 	return ret ? ret : in_len;
 }
 
+static int copy_header_to_user(void __user *dest, u32 count)
+{
+	u32 header[2];  /* the second u32 is reserved */
+
+	memset(header, 0, sizeof(header));
+	if (copy_to_user(dest, header, sizeof(header)))
+		return -EFAULT;
+	return 0;
+}
+
+static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
+{
+	struct ib_uverbs_wc tmp;
+
+	memset(&tmp, 0, sizeof(tmp));
+
+	tmp.wr_id	   = wc->wr_id;
+	tmp.status	   = wc->status;
+	tmp.opcode	   = wc->opcode;
+	tmp.vendor_err	   = wc->vendor_err;
+	tmp.byte_len	   = wc->byte_len;
+	tmp.ex.imm_data	   = (__u32 __force) wc->ex.imm_data;
+	tmp.qp_num	   = wc->qp->qp_num;
+	tmp.src_qp	   = wc->src_qp;
+	tmp.wc_flags	   = wc->wc_flags;
+	tmp.pkey_index	   = wc->pkey_index;
+	tmp.slid	   = wc->slid;
+	tmp.sl		   = wc->sl;
+	tmp.dlid_path_bits = wc->dlid_path_bits;
+	tmp.port_num	   = wc->port_num;
+
+	if (copy_to_user(dest, &tmp, sizeof(tmp)))
+		return -EFAULT;
+	return 0;
+}
+
 ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 			  const char __user *buf, int in_len,
 			  int out_len)
 {
 	struct ib_uverbs_poll_cq       cmd;
-	struct ib_uverbs_poll_cq_resp *resp;
+	u8 __user                     *header_ptr;
+	u8 __user                     *data_ptr;
 	struct ib_cq                  *cq;
-	struct ib_wc                  *wc;
-	int                            ret = 0;
+	struct ib_wc                   wc;
+	u32                            count = 0;
+	int                            ret;
 	int                            i;
-	int                            rsize;
 
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
-	if (!wc)
-		return -ENOMEM;
-
-	rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
-	resp = kmalloc(rsize, GFP_KERNEL);
-	if (!resp) {
-		ret = -ENOMEM;
-		goto out_wc;
-	}
-
 	cq = idr_read_cq(cmd.cq_handle, file->ucontext, 0);
-	if (!cq) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (!cq)
+		return -EINVAL;
 
-	resp->count = ib_poll_cq(cq, cmd.ne, wc);
+	/* we copy a struct ib_uverbs_poll_cq_resp to user space */
+	header_ptr = (void __user *)(unsigned long)cmd.response;
+	data_ptr = header_ptr + sizeof(u32) * 2;
 
-	put_cq_read(cq);
+	for (i = 0; i < cmd.ne; i++) {
+		ret = ib_poll_cq(cq, 1, &wc);
+		if (ret < 0)
+			goto out_put;
+		if (!ret)
+			break;
 
-	for (i = 0; i < resp->count; i++) {
-		resp->wc[i].wr_id 	   = wc[i].wr_id;
-		resp->wc[i].status 	   = wc[i].status;
-		resp->wc[i].opcode 	   = wc[i].opcode;
-		resp->wc[i].vendor_err 	   = wc[i].vendor_err;
-		resp->wc[i].byte_len 	   = wc[i].byte_len;
-		resp->wc[i].ex.imm_data    = (__u32 __force) wc[i].ex.imm_data;
-		resp->wc[i].qp_num 	   = wc[i].qp->qp_num;
-		resp->wc[i].src_qp 	   = wc[i].src_qp;
-		resp->wc[i].wc_flags 	   = wc[i].wc_flags;
-		resp->wc[i].pkey_index 	   = wc[i].pkey_index;
-		resp->wc[i].slid 	   = wc[i].slid;
-		resp->wc[i].sl 		   = wc[i].sl;
-		resp->wc[i].dlid_path_bits = wc[i].dlid_path_bits;
-		resp->wc[i].port_num 	   = wc[i].port_num;
+		ret = copy_wc_to_user(data_ptr, &wc);
+		if (ret)
+			goto out_put;
+		data_ptr += sizeof(struct ib_uverbs_wc);
+		count++;
 	}
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response, resp, rsize))
-		ret = -EFAULT;
+	ret = copy_header_to_user(header_ptr, count);
+	if (ret)
+		goto out_put;
 
-out:
-	kfree(resp);
+	ret = in_len;
 
-out_wc:
-	kfree(wc);
-	return ret ? ret : in_len;
+out_put:
+	put_cq_read(cq);
+	return ret;
 }
 
 ssize_t ib_uverbs_req_notify_cq(struct ib_uverbs_file *file,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch v3] infiniband: uverbs: handle large number of entries
  2010-10-12 11:31           ` [patch v3] infiniband: uverbs: handle large " Dan Carpenter
@ 2010-10-12 21:01             ` Jason Gunthorpe
       [not found]               ` <20101012210118.GR24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2010-10-12 21:01 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 12, 2010 at 01:31:17PM +0200, Dan Carpenter wrote:
> In the original code there was a potential integer overflow if you
> passed in a large cmd.ne.  The calls to kmalloc() would allocate smaller
> buffers than intended, leading to memory corruption.

Keep in mind these are probably performance sensitive APIs, I was
imagining batching a small number and they copy_to_user ? No idea what
the various performance trades offs are..

> Please, please, check this.  I've think I've done it right, but I don't
> have the hardware and can not test it.

Nor, do I.. I actually don't know what hardware uses this path? The
Mellanox cards use a user-space only version.
 
Maybe an iwarp card? I kinda recall some recent messages concerning
memory allocations in these paths for iwarp. I wonder if removing the
allocation is such a big win the larger number of copy_to_user calls
does not matter?

> It's strange to me that we return "in_len" on success.

Agree..

> +static int copy_header_to_user(void __user *dest, u32 count)
> +{
> +	u32 header[2];  /* the second u32 is reserved */
> +
> +	memset(header, 0, sizeof(header));

Don't you need header[0] = count ?

Maybe:
  u32 header[2] = {count};

And let the compiler 0 the other word optimally. Also, I'm not matters
here, since you are zeroing user memory that isn't currently used..

> +static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
> +{
> +	struct ib_uverbs_wc tmp;
> +
> +	memset(&tmp, 0, sizeof(tmp));

I'd really like to see that memset go away for performance. Again
maybe use named initializers and let the compiler zero the
uninitialized (does it zero padding, I wonder?). Or pre-zero this
memory outside the loop..

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch v3] infiniband: uverbs: handle large number of entries
       [not found]               ` <20101012210118.GR24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-10-13  9:05                 ` Dan Carpenter
  2010-10-13  9:13                 ` [patch v4] " Dan Carpenter
  1 sibling, 0 replies; 15+ messages in thread
From: Dan Carpenter @ 2010-10-13  9:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 12, 2010 at 03:01:18PM -0600, Jason Gunthorpe wrote:
> On Tue, Oct 12, 2010 at 01:31:17PM +0200, Dan Carpenter wrote:
> > In the original code there was a potential integer overflow if you
> > passed in a large cmd.ne.  The calls to kmalloc() would allocate smaller
> > buffers than intended, leading to memory corruption.
> 
> Keep in mind these are probably performance sensitive APIs, I was
> imagining batching a small number and they copy_to_user ? No idea what
> the various performance trades offs are..
> 
> > Please, please, check this.  I've think I've done it right, but I don't
> > have the hardware and can not test it.
> 
> Nor, do I.. I actually don't know what hardware uses this path? The
> Mellanox cards use a user-space only version.
>  
> Maybe an iwarp card? I kinda recall some recent messages concerning
> memory allocations in these paths for iwarp. I wonder if removing the
> allocation is such a big win the larger number of copy_to_user calls
> does not matter?
> 

Who knows?

The reason I'm writing this is to fix a potential security issue, but I
think that viewed from a holistic perspective this patch is also a
performance improvement over the original code because it avoids the big
kmalloc()s.  Doing the copy_to_user() in batches of PAGE_SIZE might be
better but it's more complicated and I'm very lazy... :/  If someone
steps up to do the benchmarks then I might take a look at it.

> > It's strange to me that we return "in_len" on success.
> 
> Agree..
> 
> > +static int copy_header_to_user(void __user *dest, u32 count)
> > +{
> > +	u32 header[2];  /* the second u32 is reserved */
> > +
> > +	memset(header, 0, sizeof(header));
> 
> Don't you need header[0] = count ?
> 

Yes.  Thank you for catching that.

> Maybe:
>   u32 header[2] = {count};
> 
> And let the compiler 0 the other word optimally. Also, I'm not matters
> here, since you are zeroing user memory that isn't currently used..

It does matter, because we don't want to leak information to the user.

> 
> > +static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
> > +{
> > +	struct ib_uverbs_wc tmp;
> > +
> > +	memset(&tmp, 0, sizeof(tmp));
> 
> I'd really like to see that memset go away for performance. Again
> maybe use named initializers and let the compiler zero the
> uninitialized (does it zero padding, I wonder?). Or pre-zero this
> memory outside the loop..
> 

Good idea.  Yes, it does do zero padding.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch v4] infiniband: uverbs: handle large number of entries
       [not found]               ` <20101012210118.GR24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2010-10-13  9:05                 ` Dan Carpenter
@ 2010-10-13  9:13                 ` Dan Carpenter
  2010-11-23  7:10                   ` Dan Carpenter
  1 sibling, 1 reply; 15+ messages in thread
From: Dan Carpenter @ 2010-10-13  9:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

In the original code there was a potential integer overflow if you
passed in a large cmd.ne.  The calls to kmalloc() would allocate smaller
buffers than intended, leading to memory corruption. There was also an 
information leak if resp wasn't all used.

Documentation/infiniband/user_verbs.txt suggests this function is meant
for unprivileged access.

Special thanks to Jason Gunthorpe for his help and advice.

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>
---
v4:  Fixed a bug where count wasn't sent to the user.
     Removed the calls to memset() and used C99 initializers instead.

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 6fcfbeb..5fc1e29 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -891,68 +891,88 @@ out:
 	return ret ? ret : in_len;
 }
 
+static int copy_header_to_user(void __user *dest, u32 count)
+{
+	u32 header[2] = {count, 0};  /* the second u32 is reserved */
+
+	if (copy_to_user(dest, header, sizeof(header)))
+		return -EFAULT;
+	return 0;
+}
+
+static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
+{
+	struct ib_uverbs_wc tmp = {
+	.wr_id		= wc->wr_id,
+	.status		= wc->status,
+	.opcode		= wc->opcode,
+	.vendor_err	= wc->vendor_err,
+	.byte_len	= wc->byte_len,
+	.ex = {
+		.imm_data = (__u32 __force) wc->ex.imm_data,
+	},
+	.qp_num		= wc->qp->qp_num,
+	.src_qp		= wc->src_qp,
+	.wc_flags	= wc->wc_flags,
+	.pkey_index	= wc->pkey_index,
+	.slid		= wc->slid,
+	.sl		= wc->sl,
+	.dlid_path_bits = wc->dlid_path_bits,
+	.port_num	= wc->port_num,
+	};
+
+	if (copy_to_user(dest, &tmp, sizeof(tmp)))
+		return -EFAULT;
+	return 0;
+}
+
 ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 			  const char __user *buf, int in_len,
 			  int out_len)
 {
 	struct ib_uverbs_poll_cq       cmd;
-	struct ib_uverbs_poll_cq_resp *resp;
+	u8 __user                     *header_ptr;
+	u8 __user                     *data_ptr;
 	struct ib_cq                  *cq;
-	struct ib_wc                  *wc;
-	int                            ret = 0;
+	struct ib_wc                   wc;
+	u32                            count = 0;
+	int                            ret;
 	int                            i;
-	int                            rsize;
 
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
-	if (!wc)
-		return -ENOMEM;
-
-	rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
-	resp = kmalloc(rsize, GFP_KERNEL);
-	if (!resp) {
-		ret = -ENOMEM;
-		goto out_wc;
-	}
-
 	cq = idr_read_cq(cmd.cq_handle, file->ucontext, 0);
-	if (!cq) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (!cq)
+		return -EINVAL;
 
-	resp->count = ib_poll_cq(cq, cmd.ne, wc);
+	/* we copy a struct ib_uverbs_poll_cq_resp to user space */
+	header_ptr = (void __user *)(unsigned long)cmd.response;
+	data_ptr = header_ptr + sizeof(u32) * 2;
 
-	put_cq_read(cq);
+	for (i = 0; i < cmd.ne; i++) {
+		ret = ib_poll_cq(cq, 1, &wc);
+		if (ret < 0)
+			goto out_put;
+		if (!ret)
+			break;
 
-	for (i = 0; i < resp->count; i++) {
-		resp->wc[i].wr_id 	   = wc[i].wr_id;
-		resp->wc[i].status 	   = wc[i].status;
-		resp->wc[i].opcode 	   = wc[i].opcode;
-		resp->wc[i].vendor_err 	   = wc[i].vendor_err;
-		resp->wc[i].byte_len 	   = wc[i].byte_len;
-		resp->wc[i].ex.imm_data    = (__u32 __force) wc[i].ex.imm_data;
-		resp->wc[i].qp_num 	   = wc[i].qp->qp_num;
-		resp->wc[i].src_qp 	   = wc[i].src_qp;
-		resp->wc[i].wc_flags 	   = wc[i].wc_flags;
-		resp->wc[i].pkey_index 	   = wc[i].pkey_index;
-		resp->wc[i].slid 	   = wc[i].slid;
-		resp->wc[i].sl 		   = wc[i].sl;
-		resp->wc[i].dlid_path_bits = wc[i].dlid_path_bits;
-		resp->wc[i].port_num 	   = wc[i].port_num;
+		ret = copy_wc_to_user(data_ptr, &wc);
+		if (ret)
+			goto out_put;
+		data_ptr += sizeof(struct ib_uverbs_wc);
+		count++;
 	}
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response, resp, rsize))
-		ret = -EFAULT;
+	ret = copy_header_to_user(header_ptr, count);
+	if (ret)
+		goto out_put;
 
-out:
-	kfree(resp);
+	ret = in_len;
 
-out_wc:
-	kfree(wc);
-	return ret ? ret : in_len;
+out_put:
+	put_cq_read(cq);
+	return ret;
 }
 
 ssize_t ib_uverbs_req_notify_cq(struct ib_uverbs_file *file,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch v4] infiniband: uverbs: handle large number of entries
  2010-10-13  9:13                 ` [patch v4] " Dan Carpenter
@ 2010-11-23  7:10                   ` Dan Carpenter
  2010-11-24 22:07                     ` Roland Dreier
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Carpenter @ 2010-11-23  7:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 13, 2010 at 11:13:12AM +0200, Dan Carpenter wrote:
> In the original code there was a potential integer overflow if you
> passed in a large cmd.ne.  The calls to kmalloc() would allocate smaller
> buffers than intended, leading to memory corruption. There was also an 
> information leak if resp wasn't all used.
> 
> Documentation/infiniband/user_verbs.txt suggests this function is meant
> for unprivileged access.
> 
> Special thanks to Jason Gunthorpe for his help and advice.
> 

Crap!  Apparently c99 initialization zeroes out the holes most of the
time but not all the time.

http://lkml.org/lkml/2010/11/22/367

I'm still waiting for some GCC people to chime in about what the rules
are here, but it looks like I may need to add memsets to this patch.

regards,
dan carpenter



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch v4] infiniband: uverbs: handle large number of entries
  2010-11-23  7:10                   ` Dan Carpenter
@ 2010-11-24 22:07                     ` Roland Dreier
       [not found]                       ` <adahbf6gytv.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Roland Dreier @ 2010-11-24 22:07 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Jason Gunthorpe, Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

 > Crap!  Apparently c99 initialization zeroes out the holes most of the
 > time but not all the time.

I think in this case we would be OK, since the structs involved here
don't have holes.  Right?

However, I think this final patch is not really what I would like here.
I didn't follow the discussion carefully, but it seems things ended up
off track.

poll_cq is a data path operation where performance matters, and the
whole point of passing in cmd.ne (the number of completion entries to
poll) is to allow the low level driver to batch things up to save
overhead.  So converting this into a loop that polls one at a time is
not the best thing to do.

What was wrong with:

 - Fixing the potential integer overflow by capping cmd.ne at some high
   but safe value (1000 say)

 - Fixing the information leaks by setting resp->reserved = 0 and inside
   the loop resp->wc[i].reserved = 0, and then only copying the actual
   number of completions polled to userspace

 - And as a bonus handling the (nearly impossible) case of ib_poll_cq
   returning a negative value

IOW a patch like the below (compile tested only):

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b342248..ec6e434 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -903,17 +903,17 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 	struct ib_wc                  *wc;
 	int                            ret = 0;
 	int                            i;
-	int                            rsize;
 
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
+	cmd.ne = min_t(u32, cmd.ne, IB_UVERBS_POLL_CQ_MAX_ENTRIES);
+
 	wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
 	if (!wc)
 		return -ENOMEM;
 
-	rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
-	resp = kmalloc(rsize, GFP_KERNEL);
+	resp = kmalloc(sizeof *resp + cmd.ne * sizeof *resp->wc, GFP_KERNEL);
 	if (!resp) {
 		ret = -ENOMEM;
 		goto out_wc;
@@ -925,10 +925,17 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 		goto out;
 	}
 
-	resp->count = ib_poll_cq(cq, cmd.ne, wc);
+	ret = ib_poll_cq(cq, cmd.ne, wc);
+	if (ret < 0) {
+		ret = -EIO;
+		goto out;
+	}
 
 	put_cq_read(cq);
 
+	resp->count    = ret;
+	resp->reserved = 0;
+
 	for (i = 0; i < resp->count; i++) {
 		resp->wc[i].wr_id 	   = wc[i].wr_id;
 		resp->wc[i].status 	   = wc[i].status;
@@ -944,9 +951,11 @@ ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
 		resp->wc[i].sl 		   = wc[i].sl;
 		resp->wc[i].dlid_path_bits = wc[i].dlid_path_bits;
 		resp->wc[i].port_num 	   = wc[i].port_num;
+		resp->wc[i].reserved	   = 0;
 	}
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response, resp, rsize))
+	if (copy_to_user((void __user *) (unsigned long) cmd.response, resp,
+			 sizeof *resp + resp->count * sizeof *resp->wc))
 		ret = -EFAULT;
 
 out:
diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h
index fe5b051..61b0286 100644
--- a/include/rdma/ib_user_verbs.h
+++ b/include/rdma/ib_user_verbs.h
@@ -278,6 +278,10 @@ struct ib_uverbs_resize_cq_resp {
 	__u64 driver_data[0];
 };
 
+enum {
+	IB_UVERBS_POLL_CQ_MAX_ENTRIES	= 1000
+};
+
 struct ib_uverbs_poll_cq {
 	__u64 response;
 	__u32 cq_handle;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [patch v4] infiniband: uverbs: handle large number of entries
       [not found]                       ` <adahbf6gytv.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2010-11-24 22:18                         ` Jason Gunthorpe
       [not found]                           ` <20101124221845.GH2369-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2010-11-24 22:18 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Dan Carpenter, Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 24, 2010 at 02:07:40PM -0800, Roland Dreier wrote:
 
>  - Fixing the potential integer overflow by capping cmd.ne at some high
>    but safe value (1000 say)

This breaks how userspace wants to use poll, ie if you return anything
less than what I asked for then that means there is no more work to
do. Apps make this assumption, and is a reasonable thing to
do. Otherwise apps need a dummy call to ibv_poll to avoid races.

So if you are worried about how many times ib_poll_cq is called then
bound the kzalloc size and wrap the whole thing in a loop, but
realistically I have to think the performance trade off of
kzalloc/free vs calling ib_poll more often is not entirely obvious.

Who uses this path anyhow?

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch v4] infiniband: uverbs: handle large number of entries
       [not found]                           ` <20101124221845.GH2369-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-11-25  4:05                             ` Roland Dreier
       [not found]                               ` <adad3pugi90.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Roland Dreier @ 2010-11-25  4:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Carpenter, Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

 > This breaks how userspace wants to use poll, ie if you return anything
 > less than what I asked for then that means there is no more work to
 > do. Apps make this assumption, and is a reasonable thing to
 > do. Otherwise apps need a dummy call to ibv_poll to avoid races.

OIC.  forgot about that.  hmm...

 > So if you are worried about how many times ib_poll_cq is called then
 > bound the kzalloc size and wrap the whole thing in a loop, but
 > realistically I have to think the performance trade off of
 > kzalloc/free vs calling ib_poll more often is not entirely obvious.

That's true... maybe doing things one at a time but avoiding the allocs
is the right tradeoff.

 > Who uses this path anyhow?

AFAICT, cxgb3, cxgb4, nes, ipath and qib.

 - R.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch v4] infiniband: uverbs: handle large number of entries
       [not found]                               ` <adad3pugi90.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2010-11-25  4:13                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2010-11-25  4:13 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Dan Carpenter, Roland Dreier, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 24, 2010 at 08:05:47PM -0800, Roland Dreier wrote:

>  > So if you are worried about how many times ib_poll_cq is called then
>  > bound the kzalloc size and wrap the whole thing in a loop, but
>  > realistically I have to think the performance trade off of
>  > kzalloc/free vs calling ib_poll more often is not entirely obvious.
> 
> That's true... maybe doing things one at a time but avoiding the allocs
> is the right tradeoff.

Hmm, considering your list is everything but Mellanox, maybe it makes
much more sense to push the copy_to_user down into the driver - 
ie a ibv_poll_cq_user - then the driver can construct each CQ entry on
the stack and copy it to userspace, avoid the double copy, allocation
and avoid any fixed overhead of ibv_poll_cq.

A bigger change to be sure, but remember this old thread:

http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg05114.html

2x improvement by removing allocs on the post path..

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-11-25  4:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-07  7:16 [patch] infiniband: uverbs: limit the number of entries Dan Carpenter
2010-10-07 16:16 ` Jason Gunthorpe
     [not found]   ` <20101007161649.GD21206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-10-07 16:59     ` Dan Carpenter
2010-10-08  7:59       ` Nicolas Palix
     [not found]         ` <AANLkTin5zou2JHsdDyhGESuxyPonOs3kLo9Th0vg-kd8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-08 14:25           ` [patch v2] " Dan Carpenter
2010-10-09 23:16       ` [patch] " Jason Gunthorpe
     [not found]         ` <20101009231607.GA24649-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-10-12 11:31           ` [patch v3] infiniband: uverbs: handle large " Dan Carpenter
2010-10-12 21:01             ` Jason Gunthorpe
     [not found]               ` <20101012210118.GR24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-10-13  9:05                 ` Dan Carpenter
2010-10-13  9:13                 ` [patch v4] " Dan Carpenter
2010-11-23  7:10                   ` Dan Carpenter
2010-11-24 22:07                     ` Roland Dreier
     [not found]                       ` <adahbf6gytv.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-11-24 22:18                         ` Jason Gunthorpe
     [not found]                           ` <20101124221845.GH2369-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-11-25  4:05                             ` Roland Dreier
     [not found]                               ` <adad3pugi90.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-11-25  4:13                                 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).