All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Max Krasnyansky <maxk@qualcomm.com>,
	virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 5/5] tun: vringfd xmit support.
Date: Sat, 19 Apr 2008 01:15:15 +1000	[thread overview]
Message-ID: <200804190115.15983.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20080418043120.ff78eab5.akpm@linux-foundation.org>

On Friday 18 April 2008 21:31:20 Andrew Morton wrote:
> On Fri, 18 Apr 2008 14:43:24 +1000 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > +		/* How many pages will this take? */
> > +		npages = 1 + (base + len - 1)/PAGE_SIZE - base/PAGE_SIZE;
>
> Brain hurts.  I hope you got that right.

I tested it when I wrote it, but just wrote a tester again:

base		len	npages
0               1       1
0xfff           1       1
0x1000          1       1
0               4096    1
0x1             4096    2
0xfff           4096    2
0x1000          4096    1
0xfffff000      4096    1
0xfffff000      4097    4293918722

> > +		if (unlikely(num_pg + npages > MAX_SKB_FRAGS)) {
> > +			err = -ENOSPC;
> > +			goto fail;
> > +		}
> > +		n = get_user_pages(current, current->mm, base, npages,
> > +				   0, 0, pages, NULL);
>
> What is the maximum numbet of pages which an unpriviliged user can
> concurrently pin with this code?

Since only root can open the tun device, it's currently OK.  The old code
kmalloced and copied: is there some mm-fu reason why pinning userspace memory
is worse?

But I actually think it's OK even for non-root, since these become skbs, which
means they either go into an outgoing device queue or a socket queue which is
accounted for exactly for this reason. 

> > +		if (unlikely(n < 0)) {
> > +			err = n;
> > +			goto fail;
> > +		}
> > +
> > +		/* Transfer pages to the frag array */
> > +		for (j = 0; j < n; j++) {
> > +			f[num_pg].page = pages[j];
> > +			if (j == 0) {
> > +				f[num_pg].page_offset = offset_in_page(base);
> > +				f[num_pg].size = min(len, PAGE_SIZE -
> > +						     f[num_pg].page_offset);
> > +			} else {
> > +				f[num_pg].page_offset = 0;
> > +				f[num_pg].size = min(len, PAGE_SIZE);
> > +			}
> > +			len -= f[num_pg].size;
> > +			base += f[num_pg].size;
> > +			num_pg++;
> > +		}
>
> This loop is a fancy way of doing
>
> 		num_pg = n;

Damn, you had me reworking this until I realized why.  It's not: we're
inside a loop, doing one iovec array element at a time.

> > +		if (unlikely(n != npages)) {
> > +			err = -EFAULT;
> > +			goto fail;
> > +		}
>
> why not do this immediately after running get_user_pages()?

To simplify the failure path.  Hmm, I would use release_pages here...

> > +fail:
> > +	for (i = 0; i < num_pg; i++)
> > +		put_page(f[i].page);
>
> release_pages() could be a tad more efficient, but it's only error-path.

... but I didn't know that existed.  Had to include pagemap.h, and it's not
exported.  It seems to be a useful interface; see patch.

Cheers,
Rusty.

Subject: Export release_pages; nice undo for get_user_pages.

Andrew Morton suggests tun/tap use release_pages, but it's not
exported.  It's not clear to me why this is in swap.c, but it exists
even without CONFIG_SWAP, so that's OK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r abd2ad431e5c mm/swap.c
--- a/mm/swap.c	Sat Apr 19 00:34:54 2008 +1000
+++ b/mm/swap.c	Sat Apr 19 01:11:40 2008 +1000
@@ -346,6 +346,7 @@ void release_pages(struct page **pages, 
 
 	pagevec_free(&pages_to_free);
 }
+EXPORT_SYMBOL(release_pages);
 
 /*
  * The pages which we're about to release may be in the deferred lru-addition

WARNING: multiple messages have this Message-ID (diff)
From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: netdev@vger.kernel.org, Max Krasnyansky <maxk@qualcomm.com>,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/5] tun: vringfd xmit support.
Date: Sat, 19 Apr 2008 01:15:15 +1000	[thread overview]
Message-ID: <200804190115.15983.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20080418043120.ff78eab5.akpm@linux-foundation.org>

On Friday 18 April 2008 21:31:20 Andrew Morton wrote:
> On Fri, 18 Apr 2008 14:43:24 +1000 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > +		/* How many pages will this take? */
> > +		npages = 1 + (base + len - 1)/PAGE_SIZE - base/PAGE_SIZE;
>
> Brain hurts.  I hope you got that right.

I tested it when I wrote it, but just wrote a tester again:

base		len	npages
0               1       1
0xfff           1       1
0x1000          1       1
0               4096    1
0x1             4096    2
0xfff           4096    2
0x1000          4096    1
0xfffff000      4096    1
0xfffff000      4097    4293918722

> > +		if (unlikely(num_pg + npages > MAX_SKB_FRAGS)) {
> > +			err = -ENOSPC;
> > +			goto fail;
> > +		}
> > +		n = get_user_pages(current, current->mm, base, npages,
> > +				   0, 0, pages, NULL);
>
> What is the maximum numbet of pages which an unpriviliged user can
> concurrently pin with this code?

Since only root can open the tun device, it's currently OK.  The old code
kmalloced and copied: is there some mm-fu reason why pinning userspace memory
is worse?

But I actually think it's OK even for non-root, since these become skbs, which
means they either go into an outgoing device queue or a socket queue which is
accounted for exactly for this reason. 

> > +		if (unlikely(n < 0)) {
> > +			err = n;
> > +			goto fail;
> > +		}
> > +
> > +		/* Transfer pages to the frag array */
> > +		for (j = 0; j < n; j++) {
> > +			f[num_pg].page = pages[j];
> > +			if (j == 0) {
> > +				f[num_pg].page_offset = offset_in_page(base);
> > +				f[num_pg].size = min(len, PAGE_SIZE -
> > +						     f[num_pg].page_offset);
> > +			} else {
> > +				f[num_pg].page_offset = 0;
> > +				f[num_pg].size = min(len, PAGE_SIZE);
> > +			}
> > +			len -= f[num_pg].size;
> > +			base += f[num_pg].size;
> > +			num_pg++;
> > +		}
>
> This loop is a fancy way of doing
>
> 		num_pg = n;

Damn, you had me reworking this until I realized why.  It's not: we're
inside a loop, doing one iovec array element at a time.

> > +		if (unlikely(n != npages)) {
> > +			err = -EFAULT;
> > +			goto fail;
> > +		}
>
> why not do this immediately after running get_user_pages()?

To simplify the failure path.  Hmm, I would use release_pages here...

> > +fail:
> > +	for (i = 0; i < num_pg; i++)
> > +		put_page(f[i].page);
>
> release_pages() could be a tad more efficient, but it's only error-path.

... but I didn't know that existed.  Had to include pagemap.h, and it's not
exported.  It seems to be a useful interface; see patch.

Cheers,
Rusty.

Subject: Export release_pages; nice undo for get_user_pages.

Andrew Morton suggests tun/tap use release_pages, but it's not
exported.  It's not clear to me why this is in swap.c, but it exists
even without CONFIG_SWAP, so that's OK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r abd2ad431e5c mm/swap.c
--- a/mm/swap.c	Sat Apr 19 00:34:54 2008 +1000
+++ b/mm/swap.c	Sat Apr 19 01:11:40 2008 +1000
@@ -346,6 +346,7 @@ void release_pages(struct page **pages, 
 
 	pagevec_free(&pages_to_free);
 }
+EXPORT_SYMBOL(release_pages);
 
 /*
  * The pages which we're about to release may be in the deferred lru-addition

  reply	other threads:[~2008-04-18 15:15 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-18  4:33 [PATCH 0/5] High-speed tun receive and xmit Rusty Russell
2008-04-18  4:35 ` [PATCH 1/5] virtio: put last_used and last_avail index into ring itself Rusty Russell
2008-04-18  4:35 ` Rusty Russell
2008-04-18  4:39   ` [PATCH 2/5] /dev/vring: simple userspace-kernel ringbuffer interface Rusty Russell
2008-04-18  4:39   ` Rusty Russell
2008-04-18  4:41     ` [PATCH 3/5] /dev/vring limit and base ioctls Rusty Russell
2008-04-18  4:41     ` Rusty Russell
2008-04-18  4:42       ` [PATCH 4/5] tun: vringfd receive support Rusty Russell
2008-04-18  4:43         ` [PATCH 5/5] tun: vringfd xmit support Rusty Russell
2008-04-18 11:31           ` Andrew Morton
2008-04-18 11:31             ` Andrew Morton
2008-04-18 15:15             ` Rusty Russell [this message]
2008-04-18 15:15               ` Rusty Russell
2008-04-18 16:24               ` Ray Lee
2008-04-18 16:24                 ` Ray Lee
2008-04-18 19:06               ` Andrew Morton
2008-04-18 19:06                 ` Andrew Morton
2008-04-19 14:41                 ` Rusty Russell
2008-04-19 17:51                   ` Andrew Morton
2008-04-19 17:51                   ` Andrew Morton
2008-04-19 14:41                 ` Rusty Russell
2008-04-19  1:54               ` Andrew Morton
2008-04-19  1:54                 ` Andrew Morton
2008-04-18 11:46           ` pradeep singh rautela
2008-04-18 14:25             ` Ray Lee
2008-04-18 14:25               ` Ray Lee
2008-04-18 18:01               ` pradeep singh rautela
2008-04-18 18:01                 ` pradeep singh rautela
2008-04-18  4:43         ` Rusty Russell
2008-04-18  4:43         ` Rusty Russell
2008-04-18  4:42       ` [PATCH 4/5] tun: vringfd receive support Rusty Russell
2008-04-18 11:18     ` [PATCH 2/5] /dev/vring: simple userspace-kernel ringbuffer interface Andrew Morton
2008-04-18 11:18     ` Andrew Morton
2008-04-18 14:32       ` Rusty Russell
2008-04-18 14:32         ` Rusty Russell
2008-04-18 18:59         ` Andrew Morton
2008-04-18 18:59           ` Andrew Morton
2008-04-18 19:38           ` Michael Kerrisk
2008-04-18 19:38             ` Michael Kerrisk
2008-04-19 16:41             ` Rusty Russell
2008-04-20  0:16               ` David Miller
2008-04-20  0:16               ` David Miller
2008-04-19 16:41             ` Rusty Russell
2008-04-19 15:02           ` Jonathan Corbet
2008-04-19 15:02           ` Jonathan Corbet
2008-04-19 10:22     ` Evgeniy Polyakov
2008-04-19 10:22     ` Evgeniy Polyakov
2008-04-19 16:05       ` Rusty Russell
2008-04-19 16:05         ` Rusty Russell
2008-04-19 16:33         ` Evgeniy Polyakov
2008-04-19 16:45           ` Rusty Russell
2008-04-19 16:45             ` Rusty Russell
2008-04-19 16:33         ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200804190115.15983.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qualcomm.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.