netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] tun: fix aio
@ 2009-04-20 11:25 Michael S. Tsirkin
  2009-04-20 12:09 ` Herbert Xu
  2009-04-21 12:50 ` Michael S. Tsirkin
  0 siblings, 2 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-20 11:25 UTC (permalink / raw)
  To: David S. Miller, netdev, Herbert Xu, Rusty Russell

tun device currently fills in aio_read/aio_write, however the implementation
corrupts the input iovec through const struct iovec *. As a side effect,
attempts to use io_submit/io_getevents on a tun/tap device result in EINVAL
(after actually sending/consuming a packet).

The following patches make tun_chr_aio_read and tun_chr_aio_write not modify
the iovec so that io_submit/io_getevents succeed, becoming functionally
equivalent to write/read on the device.  I am working on another patch set
supporting zero-copy transmit over tun, which will rely on this functionality,
but I think the patches might be already useful by themself. What do you think?
Please review.

Note: I started out just allocating and copying the iovec rather than adding
yet another skb-iterating routine, but this turned out to add small but
measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
(let me know if you want to see that version of the patch).

Michael S. Tsirkin (3):
  net: skb_copy_datagram_const_iovec()
  tun: fix tun_chr_aio_read so that aio works
  tun: fix tun_chr_aio_write so that aio works

 drivers/net/tun.c      |   24 ++++++----
 include/linux/skbuff.h |    8 +++-
 include/linux/socket.h |    6 ++-
 net/core/datagram.c    |  112 +++++++++++++++++++++++++++++++++++++++++++++---
 net/core/iovec.c       |   33 +++++++++++++-
 5 files changed, 161 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin
@ 2009-04-20 12:09 ` Herbert Xu
  2009-04-20 12:21   ` Michael S. Tsirkin
  2009-04-21 12:50 ` Michael S. Tsirkin
  1 sibling, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2009-04-20 12:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, rusty

Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Note: I started out just allocating and copying the iovec rather than adding
> yet another skb-iterating routine, but this turned out to add small but
> measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> (let me know if you want to see that version of the patch).

Can you please post the copying version as well so we can compare?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-20 12:09 ` Herbert Xu
@ 2009-04-20 12:21   ` Michael S. Tsirkin
  2009-04-27  1:18     ` Rusty Russell
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-20 12:21 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev, rusty

On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote:
> Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Note: I started out just allocating and copying the iovec rather than adding
> > yet another skb-iterating routine, but this turned out to add small but
> > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> > (let me know if you want to see that version of the patch).
> 
> Can you please post the copying version as well so we can compare?

Sure. Here it is: much smaller, but slightly slower.

	tun: fix aio read/aio write

	tun device uses skb_copy_datagram_from_iovec for write
	and memcpy_to_iovec for read, which modify the iovec,
	violating the contract for aio_read/aio_write which
	get const iovec * and are assumed not to modify it.
	As a result, attempts to perform io_submut on tap device
	fail with -EINVAL.

	A simple fix for this is to copy the iovec and work on a copy.

	Signed-off-by: Michael S. Tsirkin <m.s.tsirkin@gmail.com>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 16716ae..9059738 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -657,11 +657,12 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
 	return count;
 }
 
-static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
+static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *uiv,
 			      unsigned long count, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
 	struct tun_struct *tun = tun_get(file);
+	struct iovec *iv;
 	ssize_t result;
 
 	if (!tun)
@@ -669,9 +670,17 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
 
 	DBG(KERN_INFO "%s: tun_chr_write %ld\n", tun->dev->name, count);
 
-	result = tun_get_user(tun, (struct iovec *)iv, iov_length(iv, count),
-			      file->f_flags & O_NONBLOCK);
+ 	iv = kmalloc(count * sizeof *iv, GFP_KERNEL);
+	if (!iv) {
+		result = -ENOMEM;
+		goto out;
+	}
+	memcpy(iv, uiv, count * sizeof *iv);
 
+	result = tun_get_user(tun, iv, iov_length(iv, count),
+			      file->f_flags & O_NONBLOCK);
+	kfree(iv);
+out:
 	tun_put(tun);
 	return result;
 }
@@ -679,7 +688,8 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
 /* Put packet to the user space buffer */
 static __inline__ ssize_t tun_put_user(struct tun_struct *tun,
 				       struct sk_buff *skb,
-				       struct iovec *iv, int len)
+				       struct iovec *iv,
+				       ssize_t len)
 {
 	struct tun_pi pi = { 0, skb->protocol };
 	ssize_t total = 0;
@@ -742,7 +752,7 @@ static __inline__ ssize_t tun_put_user(struct tun_struct *tun,
 	return total;
 }
 
-static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
+static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *uiv,
 			    unsigned long count, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
@@ -751,17 +761,24 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 	DECLARE_WAITQUEUE(wait, current);
 	struct sk_buff *skb;
 	ssize_t len, ret = 0;
+	struct iovec *iv;
 
 	if (!tun)
 		return -EBADFD;
 
 	DBG(KERN_INFO "%s: tun_chr_read\n", tun->dev->name);
 
-	len = iov_length(iv, count);
+	len = iov_length(uiv, count);
 	if (len < 0) {
 		ret = -EINVAL;
 		goto out;
 	}
+	iv = kmalloc(count * sizeof *iv, GFP_KERNEL);
+	if (!iv) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	memcpy(iv, uiv, count * sizeof *iv);
 
 	add_wait_queue(&tfile->read_wait, &wait);
 	while (len) {
@@ -788,14 +805,14 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 		}
 		netif_wake_queue(tun->dev);
 
-		ret = tun_put_user(tun, skb, (struct iovec *) iv, len);
+		ret = tun_put_user(tun, skb, iv, len);
 		kfree_skb(skb);
 		break;
 	}
 
 	current->state = TASK_RUNNING;
 	remove_wait_queue(&tfile->read_wait, &wait);
-
+	kfree(iv);
 out:
 	tun_put(tun);
 	return ret;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin
  2009-04-20 12:09 ` Herbert Xu
@ 2009-04-21 12:50 ` Michael S. Tsirkin
  2009-04-21 13:02   ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-21 12:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David S. Miller, netdev, Herbert Xu, Rusty Russell

On Mon, Apr 20, 2009 at 02:25:27PM +0300, Michael S. Tsirkin wrote:
> The following patches make tun_chr_aio_read and tun_chr_aio_write not modify
> the iovec so that io_submit/io_getevents succeed, becoming functionally
> equivalent to write/read on the device.

What are the thoughts on these patches?
Is this a reasonable thing to do?

Thanks,
-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-21 12:50 ` Michael S. Tsirkin
@ 2009-04-21 13:02   ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2009-04-21 13:02 UTC (permalink / raw)
  To: m.s.tsirkin; +Cc: mst, netdev, herbert, rusty

From: "Michael S. Tsirkin" <m.s.tsirkin@gmail.com>
Date: Tue, 21 Apr 2009 15:50:21 +0300

> On Mon, Apr 20, 2009 at 02:25:27PM +0300, Michael S. Tsirkin wrote:
>> The following patches make tun_chr_aio_read and tun_chr_aio_write not modify
>> the iovec so that io_submit/io_getevents succeed, becoming functionally
>> equivalent to write/read on the device.
> 
> What are the thoughts on these patches?
> Is this a reasonable thing to do?

Man, you're real impatient.

I just applied your patches to net-next-2.6 and was going to let you
know that I had done this.

Sit tight next time, if you changes are under review they'll be
happily sitting right here:

http://patchwork.ozlabs.org/project/netdev/list/

where you can monitor their status until they're either applied
or rejected.

:-P

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-20 12:21   ` Michael S. Tsirkin
@ 2009-04-27  1:18     ` Rusty Russell
  2009-04-27  9:34       ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Rusty Russell @ 2009-04-27  1:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Herbert Xu, davem, netdev

On Mon, 20 Apr 2009 09:51:38 pm Michael S. Tsirkin wrote:
> On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote:
> > Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > Note: I started out just allocating and copying the iovec rather than adding
> > > yet another skb-iterating routine, but this turned out to add small but
> > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> > > (let me know if you want to see that version of the patch).
> > 
> > Can you please post the copying version as well so we can compare?
> 
> Sure. Here it is: much smaller, but slightly slower.

Which could probably be fixed by using an on-stack version for a iovec
of less than a certain size...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-27  1:18     ` Rusty Russell
@ 2009-04-27  9:34       ` Michael S. Tsirkin
  2009-05-05  3:18         ` Rusty Russell
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-27  9:34 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Herbert Xu, davem, netdev

On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote:
> On Mon, 20 Apr 2009 09:51:38 pm Michael S. Tsirkin wrote:
> > On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote:
> > > Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > Note: I started out just allocating and copying the iovec rather than adding
> > > > yet another skb-iterating routine, but this turned out to add small but
> > > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> > > > (let me know if you want to see that version of the patch).
> > > 
> > > Can you please post the copying version as well so we can compare?
> > 
> > Sure. Here it is: much smaller, but slightly slower.
> 
> Which could probably be fixed by using an on-stack version for a iovec
> of less than a certain size...

I agree that for large message sizes the malloc would probably be
dwarfed by the cost of memory copy. However a large iovec might pass a
small message, might it not?

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-04-27  9:34       ` Michael S. Tsirkin
@ 2009-05-05  3:18         ` Rusty Russell
  2009-05-05  8:40           ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Rusty Russell @ 2009-05-05  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Herbert Xu, davem, netdev

On Mon, 27 Apr 2009 07:04:55 pm Michael S. Tsirkin wrote:
> On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote:
> > > Sure. Here it is: much smaller, but slightly slower.
> > 
> > Which could probably be fixed by using an on-stack version for a iovec
> > of less than a certain size...
> 
> I agree that for large message sizes the malloc would probably be
> dwarfed by the cost of memory copy. However a large iovec might pass a
> small message, might it not?

Sorry, I didn't make myself clear.  Something like:

	struct iovec smalliov[512 / sizeof(struct iovec)];

	if (count < ARRAY_SIZE(smalliov)) {
		iv = smalliov;
	} else {
		iv = kmalloc(...)
	}

	...

	if (iv != smalliov)
		kfree(iv);

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] tun: fix aio
  2009-05-05  3:18         ` Rusty Russell
@ 2009-05-05  8:40           ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-05-05  8:40 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Herbert Xu, davem, netdev

On Tue, May 05, 2009 at 12:48:13PM +0930, Rusty Russell wrote:
> On Mon, 27 Apr 2009 07:04:55 pm Michael S. Tsirkin wrote:
> > On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote:
> > > > Sure. Here it is: much smaller, but slightly slower.
> > > 
> > > Which could probably be fixed by using an on-stack version for a iovec
> > > of less than a certain size...
> > 
> > I agree that for large message sizes the malloc would probably be
> > dwarfed by the cost of memory copy. However a large iovec might pass a
> > small message, might it not?
> 
> Sorry, I didn't make myself clear.  Something like:
> 
> 	struct iovec smalliov[512 / sizeof(struct iovec)];
> 
> 	if (count < ARRAY_SIZE(smalliov)) {
> 		iv = smalliov;
> 	} else {
> 		iv = kmalloc(...)
> 	}
> 
> 	...
> 
> 	if (iv != smalliov)
> 		kfree(iv);
> 
> Cheers,
> Rusty.

What I was trying to say is that this special-cases iovecs with count >
64. But I think that such iovecs might be used to still pass around
relatively small messages (theoretically as small as 64 bytes) and then
the overhead of the kmalloc/kfree is noticeable.

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-05-05  8:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin
2009-04-20 12:09 ` Herbert Xu
2009-04-20 12:21   ` Michael S. Tsirkin
2009-04-27  1:18     ` Rusty Russell
2009-04-27  9:34       ` Michael S. Tsirkin
2009-05-05  3:18         ` Rusty Russell
2009-05-05  8:40           ` Michael S. Tsirkin
2009-04-21 12:50 ` Michael S. Tsirkin
2009-04-21 13:02   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).