* [PATCH 0/3] tun: fix aio
@ 2009-04-20 11:25 Michael S. Tsirkin
2009-04-20 12:09 ` Herbert Xu
2009-04-21 12:50 ` Michael S. Tsirkin
0 siblings, 2 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-20 11:25 UTC (permalink / raw)
To: David S. Miller, netdev, Herbert Xu, Rusty Russell
tun device currently fills in aio_read/aio_write, however the implementation
corrupts the input iovec through const struct iovec *. As a side effect,
attempts to use io_submit/io_getevents on a tun/tap device result in EINVAL
(after actually sending/consuming a packet).
The following patches make tun_chr_aio_read and tun_chr_aio_write not modify
the iovec so that io_submit/io_getevents succeed, becoming functionally
equivalent to write/read on the device. I am working on another patch set
supporting zero-copy transmit over tun, which will rely on this functionality,
but I think the patches might be already useful by themself. What do you think?
Please review.
Note: I started out just allocating and copying the iovec rather than adding
yet another skb-iterating routine, but this turned out to add small but
measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
(let me know if you want to see that version of the patch).
Michael S. Tsirkin (3):
net: skb_copy_datagram_const_iovec()
tun: fix tun_chr_aio_read so that aio works
tun: fix tun_chr_aio_write so that aio works
drivers/net/tun.c | 24 ++++++----
include/linux/skbuff.h | 8 +++-
include/linux/socket.h | 6 ++-
net/core/datagram.c | 112 +++++++++++++++++++++++++++++++++++++++++++++---
net/core/iovec.c | 33 +++++++++++++-
5 files changed, 161 insertions(+), 22 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin
@ 2009-04-20 12:09 ` Herbert Xu
2009-04-20 12:21 ` Michael S. Tsirkin
2009-04-21 12:50 ` Michael S. Tsirkin
1 sibling, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2009-04-20 12:09 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, rusty
Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Note: I started out just allocating and copying the iovec rather than adding
> yet another skb-iterating routine, but this turned out to add small but
> measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> (let me know if you want to see that version of the patch).
Can you please post the copying version as well so we can compare?
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-20 12:09 ` Herbert Xu
@ 2009-04-20 12:21 ` Michael S. Tsirkin
2009-04-27 1:18 ` Rusty Russell
0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-20 12:21 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, netdev, rusty
On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote:
> Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Note: I started out just allocating and copying the iovec rather than adding
> > yet another skb-iterating routine, but this turned out to add small but
> > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> > (let me know if you want to see that version of the patch).
>
> Can you please post the copying version as well so we can compare?
Sure. Here it is: much smaller, but slightly slower.
tun: fix aio read/aio write
tun device uses skb_copy_datagram_from_iovec for write
and memcpy_to_iovec for read, which modify the iovec,
violating the contract for aio_read/aio_write which
get const iovec * and are assumed not to modify it.
As a result, attempts to perform io_submut on tap device
fail with -EINVAL.
A simple fix for this is to copy the iovec and work on a copy.
Signed-off-by: Michael S. Tsirkin <m.s.tsirkin@gmail.com>
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 16716ae..9059738 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -657,11 +657,12 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
return count;
}
-static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
+static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *uiv,
unsigned long count, loff_t pos)
{
struct file *file = iocb->ki_filp;
struct tun_struct *tun = tun_get(file);
+ struct iovec *iv;
ssize_t result;
if (!tun)
@@ -669,9 +670,17 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
DBG(KERN_INFO "%s: tun_chr_write %ld\n", tun->dev->name, count);
- result = tun_get_user(tun, (struct iovec *)iv, iov_length(iv, count),
- file->f_flags & O_NONBLOCK);
+ iv = kmalloc(count * sizeof *iv, GFP_KERNEL);
+ if (!iv) {
+ result = -ENOMEM;
+ goto out;
+ }
+ memcpy(iv, uiv, count * sizeof *iv);
+ result = tun_get_user(tun, iv, iov_length(iv, count),
+ file->f_flags & O_NONBLOCK);
+ kfree(iv);
+out:
tun_put(tun);
return result;
}
@@ -679,7 +688,8 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
/* Put packet to the user space buffer */
static __inline__ ssize_t tun_put_user(struct tun_struct *tun,
struct sk_buff *skb,
- struct iovec *iv, int len)
+ struct iovec *iv,
+ ssize_t len)
{
struct tun_pi pi = { 0, skb->protocol };
ssize_t total = 0;
@@ -742,7 +752,7 @@ static __inline__ ssize_t tun_put_user(struct tun_struct *tun,
return total;
}
-static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
+static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *uiv,
unsigned long count, loff_t pos)
{
struct file *file = iocb->ki_filp;
@@ -751,17 +761,24 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
ssize_t len, ret = 0;
+ struct iovec *iv;
if (!tun)
return -EBADFD;
DBG(KERN_INFO "%s: tun_chr_read\n", tun->dev->name);
- len = iov_length(iv, count);
+ len = iov_length(uiv, count);
if (len < 0) {
ret = -EINVAL;
goto out;
}
+ iv = kmalloc(count * sizeof *iv, GFP_KERNEL);
+ if (!iv) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ memcpy(iv, uiv, count * sizeof *iv);
add_wait_queue(&tfile->read_wait, &wait);
while (len) {
@@ -788,14 +805,14 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
}
netif_wake_queue(tun->dev);
- ret = tun_put_user(tun, skb, (struct iovec *) iv, len);
+ ret = tun_put_user(tun, skb, iv, len);
kfree_skb(skb);
break;
}
current->state = TASK_RUNNING;
remove_wait_queue(&tfile->read_wait, &wait);
-
+ kfree(iv);
out:
tun_put(tun);
return ret;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin
2009-04-20 12:09 ` Herbert Xu
@ 2009-04-21 12:50 ` Michael S. Tsirkin
2009-04-21 13:02 ` David Miller
1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-21 12:50 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: David S. Miller, netdev, Herbert Xu, Rusty Russell
On Mon, Apr 20, 2009 at 02:25:27PM +0300, Michael S. Tsirkin wrote:
> The following patches make tun_chr_aio_read and tun_chr_aio_write not modify
> the iovec so that io_submit/io_getevents succeed, becoming functionally
> equivalent to write/read on the device.
What are the thoughts on these patches?
Is this a reasonable thing to do?
Thanks,
--
MST
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-21 12:50 ` Michael S. Tsirkin
@ 2009-04-21 13:02 ` David Miller
0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2009-04-21 13:02 UTC (permalink / raw)
To: m.s.tsirkin; +Cc: mst, netdev, herbert, rusty
From: "Michael S. Tsirkin" <m.s.tsirkin@gmail.com>
Date: Tue, 21 Apr 2009 15:50:21 +0300
> On Mon, Apr 20, 2009 at 02:25:27PM +0300, Michael S. Tsirkin wrote:
>> The following patches make tun_chr_aio_read and tun_chr_aio_write not modify
>> the iovec so that io_submit/io_getevents succeed, becoming functionally
>> equivalent to write/read on the device.
>
> What are the thoughts on these patches?
> Is this a reasonable thing to do?
Man, you're real impatient.
I just applied your patches to net-next-2.6 and was going to let you
know that I had done this.
Sit tight next time, if you changes are under review they'll be
happily sitting right here:
http://patchwork.ozlabs.org/project/netdev/list/
where you can monitor their status until they're either applied
or rejected.
:-P
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-20 12:21 ` Michael S. Tsirkin
@ 2009-04-27 1:18 ` Rusty Russell
2009-04-27 9:34 ` Michael S. Tsirkin
0 siblings, 1 reply; 9+ messages in thread
From: Rusty Russell @ 2009-04-27 1:18 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Herbert Xu, davem, netdev
On Mon, 20 Apr 2009 09:51:38 pm Michael S. Tsirkin wrote:
> On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote:
> > Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > Note: I started out just allocating and copying the iovec rather than adding
> > > yet another skb-iterating routine, but this turned out to add small but
> > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> > > (let me know if you want to see that version of the patch).
> >
> > Can you please post the copying version as well so we can compare?
>
> Sure. Here it is: much smaller, but slightly slower.
Which could probably be fixed by using an on-stack version for a iovec
of less than a certain size...
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-27 1:18 ` Rusty Russell
@ 2009-04-27 9:34 ` Michael S. Tsirkin
2009-05-05 3:18 ` Rusty Russell
0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-04-27 9:34 UTC (permalink / raw)
To: Rusty Russell; +Cc: Herbert Xu, davem, netdev
On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote:
> On Mon, 20 Apr 2009 09:51:38 pm Michael S. Tsirkin wrote:
> > On Mon, Apr 20, 2009 at 08:09:30PM +0800, Herbert Xu wrote:
> > > Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > Note: I started out just allocating and copying the iovec rather than adding
> > > > yet another skb-iterating routine, but this turned out to add small but
> > > > measurable overhead on data path: tx time per packet jumped from 6500 to 6700 ns
> > > > (let me know if you want to see that version of the patch).
> > >
> > > Can you please post the copying version as well so we can compare?
> >
> > Sure. Here it is: much smaller, but slightly slower.
>
> Which could probably be fixed by using an on-stack version for a iovec
> of less than a certain size...
I agree that for large message sizes the malloc would probably be
dwarfed by the cost of memory copy. However a large iovec might pass a
small message, might it not?
--
MST
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-04-27 9:34 ` Michael S. Tsirkin
@ 2009-05-05 3:18 ` Rusty Russell
2009-05-05 8:40 ` Michael S. Tsirkin
0 siblings, 1 reply; 9+ messages in thread
From: Rusty Russell @ 2009-05-05 3:18 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Herbert Xu, davem, netdev
On Mon, 27 Apr 2009 07:04:55 pm Michael S. Tsirkin wrote:
> On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote:
> > > Sure. Here it is: much smaller, but slightly slower.
> >
> > Which could probably be fixed by using an on-stack version for a iovec
> > of less than a certain size...
>
> I agree that for large message sizes the malloc would probably be
> dwarfed by the cost of memory copy. However a large iovec might pass a
> small message, might it not?
Sorry, I didn't make myself clear. Something like:
struct iovec smalliov[512 / sizeof(struct iovec)];
if (count < ARRAY_SIZE(smalliov)) {
iv = smalliov;
} else {
iv = kmalloc(...)
}
...
if (iv != smalliov)
kfree(iv);
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] tun: fix aio
2009-05-05 3:18 ` Rusty Russell
@ 2009-05-05 8:40 ` Michael S. Tsirkin
0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2009-05-05 8:40 UTC (permalink / raw)
To: Rusty Russell; +Cc: Herbert Xu, davem, netdev
On Tue, May 05, 2009 at 12:48:13PM +0930, Rusty Russell wrote:
> On Mon, 27 Apr 2009 07:04:55 pm Michael S. Tsirkin wrote:
> > On Mon, Apr 27, 2009 at 10:48:37AM +0930, Rusty Russell wrote:
> > > > Sure. Here it is: much smaller, but slightly slower.
> > >
> > > Which could probably be fixed by using an on-stack version for a iovec
> > > of less than a certain size...
> >
> > I agree that for large message sizes the malloc would probably be
> > dwarfed by the cost of memory copy. However a large iovec might pass a
> > small message, might it not?
>
> Sorry, I didn't make myself clear. Something like:
>
> struct iovec smalliov[512 / sizeof(struct iovec)];
>
> if (count < ARRAY_SIZE(smalliov)) {
> iv = smalliov;
> } else {
> iv = kmalloc(...)
> }
>
> ...
>
> if (iv != smalliov)
> kfree(iv);
>
> Cheers,
> Rusty.
What I was trying to say is that this special-cases iovecs with count >
64. But I think that such iovecs might be used to still pass around
relatively small messages (theoretically as small as 64 bytes) and then
the overhead of the kmalloc/kfree is noticeable.
--
MST
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-05-05 8:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-20 11:25 [PATCH 0/3] tun: fix aio Michael S. Tsirkin
2009-04-20 12:09 ` Herbert Xu
2009-04-20 12:21 ` Michael S. Tsirkin
2009-04-27 1:18 ` Rusty Russell
2009-04-27 9:34 ` Michael S. Tsirkin
2009-05-05 3:18 ` Rusty Russell
2009-05-05 8:40 ` Michael S. Tsirkin
2009-04-21 12:50 ` Michael S. Tsirkin
2009-04-21 13:02 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).