* [PATCH 0/3] tun: fix aio @ 2009-04-20 11:25 Michael S. Tsirkin 2009-04-20 12:09 ` Herbert Xu 2009-04-21 12:50 ` Michael S. Tsirkin 0 siblings, 2 replies; 9+ messages in thread From: Michael S. Tsirkin @ 2009-04-20 11:25 UTC (permalink / raw) To: David S. Miller, netdev, Herbert Xu, Rusty Russell tun device currently fills in aio_read/aio_write, however the implementation corrupts the input iovec through const struct iovec *. As a side effect, attempts to use io_submit/io_getevents on a tun/tap device result in EINVAL (after actually sending/consuming a packet). The following patches make tun_chr_aio_read and tun_chr_aio_write not modify the iovec so that io_submit/io_getevents succeed, becoming functionally equivalent to write/read on the device. I am working on another patch set supporting zero-copy transmit over tun, which will rely on this functionality, but I think the patches might be already useful by themself. What do you think? Please review. Note: I started out just allocating and copying the iovec rather than adding yet another skb-iterating routine, but this turned out to add small but measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns (let me know if you want to see that version of the patch). Michael S. Tsirkin (3): net: skb_copy_datagram_const_iovec() tun: fix tun_chr_aio_read so that aio works tun: fix tun_chr_aio_write so that aio works drivers/net/tun.c | 24 ++++++---- include/linux/skbuff.h | 8 +++- include/linux/socket.h | 6 ++- net/core/datagram.c | 112 +++++++++++++++++++++++++++++++++++++++++++++--- net/core/iovec.c | 33 +++++++++++++- 5 files changed, 161 insertions(+), 22 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin @ 2009-04-20 12:09 ` Herbert Xu 2009-04-20 12:21 ` Michael S. Tsirkin 2009-04-21 12:50 ` Michael S. Tsirkin 1 sibling, 1 reply; 9+ messages in thread From: Herbert Xu @ 2009-04-20 12:09 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: davem, netdev, rusty Michael S. Tsirkin <mst@redhat.com> wrote: > > Note: I started out just allocating and copying the iovec rather than adding > yet another skb-iterating routine, but this turned out to add small but > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns > (let me know if you want to see that version of the patch). Can you please post the copying version as well so we can compare? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-20 12:09 ` Herbert Xu @ 2009-04-20 12:21 ` Michael S. Tsirkin 2009-04-27 1:18 ` Rusty Russell 0 siblings, 1 reply; 9+ messages in thread From: Michael S. Tsirkin @ 2009-04-20 12:21 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, netdev, rusty On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote: > Michael S. Tsirkin <mst@redhat.com> wrote: > > > > Note: I started out just allocating and copying the iovec rather than adding > > yet another skb-iterating routine, but this turned out to add small but > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns > > (let me know if you want to see that version of the patch). > > Can you please post the copying version as well so we can compare? Sure. Here it is: much smaller, but slightly slower. tun: fix aio read/aio write tun device uses skb_copy_datagram_from_iovec for write and memcpy_to_iovec for read, which modify the iovec, violating the contract for aio_read/aio_write which get const iovec * and are assumed not to modify it. As a result, attempts to perform io_submut on tap device fail with -EINVAL. A simple fix for this is to copy the iovec and work on a copy. Signed-off-by: Michael S. Tsirkin <m.s.tsirkin@gmail.com> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 16716ae..9059738 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -657,11 +657,12 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, return count; } -static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv, +static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *uiv, unsigned long count, loff_t pos) { struct file *file = iocb->ki_filp; struct tun_struct *tun = tun_get(file); + struct iovec *iv; ssize_t result; if (!tun) @@ -669,9 +670,17 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv, DBG(KERN_INFO "%s: tun_chr_write %ld\n", tun->dev->name, count); - result = tun_get_user(tun, (struct iovec *)iv, iov_length(iv, count), - file->f_flags & O_NONBLOCK); + iv = kmalloc(count * sizeof *iv, GFP_KERNEL); + if (!iv) { + result = -ENOMEM; + goto out; + } + memcpy(iv, uiv, count * sizeof *iv); + result = tun_get_user(tun, iv, iov_length(iv, count), + file->f_flags & O_NONBLOCK); + kfree(iv); +out: tun_put(tun); return result; } @@ -679,7 +688,8 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv, /* Put packet to the user space buffer */ static __inline__ ssize_t tun_put_user(struct tun_struct *tun, struct sk_buff *skb, - struct iovec *iv, int len) + struct iovec *iv, + ssize_t len) { struct tun_pi pi = { 0, skb->protocol }; ssize_t total = 0; @@ -742,7 +752,7 @@ static __inline__ ssize_t tun_put_user(struct tun_struct *tun, return total; } -static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, +static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *uiv, unsigned long count, loff_t pos) { struct file *file = iocb->ki_filp; @@ -751,17 +761,24 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, DECLARE_WAITQUEUE(wait, current); struct sk_buff *skb; ssize_t len, ret = 0; + struct iovec *iv; if (!tun) return -EBADFD; DBG(KERN_INFO "%s: tun_chr_read\n", tun->dev->name); - len = iov_length(iv, count); + len = iov_length(uiv, count); if (len < 0) { ret = -EINVAL; goto out; } + iv = kmalloc(count * sizeof *iv, GFP_KERNEL); + if (!iv) { + ret = -ENOMEM; + goto out; + } + memcpy(iv, uiv, count * sizeof *iv); add_wait_queue(&tfile->read_wait, &wait); while (len) { @@ -788,14 +805,14 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv, } netif_wake_queue(tun->dev); - ret = tun_put_user(tun, skb, (struct iovec *) iv, len); + ret = tun_put_user(tun, skb, iv, len); kfree_skb(skb); break; } current->state = TASK_RUNNING; remove_wait_queue(&tfile->read_wait, &wait); - + kfree(iv); out: tun_put(tun); return ret; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-20 12:21 ` Michael S. Tsirkin @ 2009-04-27 1:18 ` Rusty Russell 2009-04-27 9:34 ` Michael S. Tsirkin 0 siblings, 1 reply; 9+ messages in thread From: Rusty Russell @ 2009-04-27 1:18 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Herbert Xu, davem, netdev On Mon, 20 Apr 2009 09:51:38 pm Michael S. Tsirkin wrote: > On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote: > > Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > Note: I started out just allocating and copying the iovec rather than adding > > > yet another skb-iterating routine, but this turned out to add small but > > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns > > > (let me know if you want to see that version of the patch). > > > > Can you please post the copying version as well so we can compare? > > Sure. Here it is: much smaller, but slightly slower. Which could probably be fixed by using an on-stack version for a iovec of less than a certain size... Cheers, Rusty. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-27 1:18 ` Rusty Russell @ 2009-04-27 9:34 ` Michael S. Tsirkin 2009-05-05 3:18 ` Rusty Russell 0 siblings, 1 reply; 9+ messages in thread From: Michael S. Tsirkin @ 2009-04-27 9:34 UTC (permalink / raw) To: Rusty Russell; +Cc: Herbert Xu, davem, netdev On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote: > On Mon, 20 Apr 2009 09:51:38 pm Michael S. Tsirkin wrote: > > On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote: > > > Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > Note: I started out just allocating and copying the iovec rather than adding > > > > yet another skb-iterating routine, but this turned out to add small but > > > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns > > > > (let me know if you want to see that version of the patch). > > > > > > Can you please post the copying version as well so we can compare? > > > > Sure. Here it is: much smaller, but slightly slower. > > Which could probably be fixed by using an on-stack version for a iovec > of less than a certain size... I agree that for large message sizes the malloc would probably be dwarfed by the cost of memory copy. However a large iovec might pass a small message, might it not? -- MST ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-27 9:34 ` Michael S. Tsirkin @ 2009-05-05 3:18 ` Rusty Russell 2009-05-05 8:40 ` Michael S. Tsirkin 0 siblings, 1 reply; 9+ messages in thread From: Rusty Russell @ 2009-05-05 3:18 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Herbert Xu, davem, netdev On Mon, 27 Apr 2009 07:04:55 pm Michael S. Tsirkin wrote: > On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote: > > > Sure. Here it is: much smaller, but slightly slower. > > > > Which could probably be fixed by using an on-stack version for a iovec > > of less than a certain size... > > I agree that for large message sizes the malloc would probably be > dwarfed by the cost of memory copy. However a large iovec might pass a > small message, might it not? Sorry, I didn't make myself clear. Something like: struct iovec smalliov[512 / sizeof(struct iovec)]; if (count < ARRAY_SIZE(smalliov)) { iv = smalliov; } else { iv = kmalloc(...) } ... if (iv != smalliov) kfree(iv); Cheers, Rusty. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-05-05 3:18 ` Rusty Russell @ 2009-05-05 8:40 ` Michael S. Tsirkin 0 siblings, 0 replies; 9+ messages in thread From: Michael S. Tsirkin @ 2009-05-05 8:40 UTC (permalink / raw) To: Rusty Russell; +Cc: Herbert Xu, davem, netdev On Tue, May 05, 2009 at 12:48:13PM +0930, Rusty Russell wrote: > On Mon, 27 Apr 2009 07:04:55 pm Michael S. Tsirkin wrote: > > On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote: > > > > Sure. Here it is: much smaller, but slightly slower. > > > > > > Which could probably be fixed by using an on-stack version for a iovec > > > of less than a certain size... > > > > I agree that for large message sizes the malloc would probably be > > dwarfed by the cost of memory copy. However a large iovec might pass a > > small message, might it not? > > Sorry, I didn't make myself clear. Something like: > > struct iovec smalliov[512 / sizeof(struct iovec)]; > > if (count < ARRAY_SIZE(smalliov)) { > iv = smalliov; > } else { > iv = kmalloc(...) > } > > ... > > if (iv != smalliov) > kfree(iv); > > Cheers, > Rusty. What I was trying to say is that this special-cases iovecs with count > 64. But I think that such iovecs might be used to still pass around relatively small messages (theoretically as small as 64 bytes) and then the overhead of the kmalloc/kfree is noticeable. -- MST ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin 2009-04-20 12:09 ` Herbert Xu @ 2009-04-21 12:50 ` Michael S. Tsirkin 2009-04-21 13:02 ` David Miller 1 sibling, 1 reply; 9+ messages in thread From: Michael S. Tsirkin @ 2009-04-21 12:50 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: David S. Miller, netdev, Herbert Xu, Rusty Russell On Mon, Apr 20, 2009 at 02:25:27PM +0300, Michael S. Tsirkin wrote: > The following patches make tun_chr_aio_read and tun_chr_aio_write not modify > the iovec so that io_submit/io_getevents succeed, becoming functionally > equivalent to write/read on the device. What are the thoughts on these patches? Is this a reasonable thing to do? Thanks, -- MST ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio 2009-04-21 12:50 ` Michael S. Tsirkin @ 2009-04-21 13:02 ` David Miller 0 siblings, 0 replies; 9+ messages in thread From: David Miller @ 2009-04-21 13:02 UTC (permalink / raw) To: m.s.tsirkin; +Cc: mst, netdev, herbert, rusty From: "Michael S. Tsirkin" <m.s.tsirkin@gmail.com> Date: Tue, 21 Apr 2009 15:50:21 +0300 > On Mon, Apr 20, 2009 at 02:25:27PM +0300, Michael S. Tsirkin wrote: >> The following patches make tun_chr_aio_read and tun_chr_aio_write not modify >> the iovec so that io_submit/io_getevents succeed, becoming functionally >> equivalent to write/read on the device. > > What are the thoughts on these patches? > Is this a reasonable thing to do? Man, you're real impatient. I just applied your patches to net-next-2.6 and was going to let you know that I had done this. Sit tight next time, if you changes are under review they'll be happily sitting right here: http://patchwork.ozlabs.org/project/netdev/list/ where you can monitor their status until they're either applied or rejected. :-P ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-05-05 8:41 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin 2009-04-20 12:09 ` Herbert Xu 2009-04-20 12:21 ` Michael S. Tsirkin 2009-04-27 1:18 ` Rusty Russell 2009-04-27 9:34 ` Michael S. Tsirkin 2009-05-05 3:18 ` Rusty Russell 2009-05-05 8:40 ` Michael S. Tsirkin 2009-04-21 12:50 ` Michael S. Tsirkin 2009-04-21 13:02 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).