All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Krasnyanskiy <maxk@qualcomm.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org
Subject: Re: [PATCH RFC 3/5] tun: vringfd receive support.
Date: Thu, 10 Apr 2008 10:18:52 -0700	[thread overview]
Message-ID: <47FE4BFC.6000705@qualcomm.com> (raw)
In-Reply-To: <200804101544.17736.rusty@rustcorp.com.au>

Rusty Russell wrote:
> On Wednesday 09 April 2008 05:49:15 Max Krasnyansky wrote:
>> Rusty Russell wrote:
>>> This patch modifies tun to allow a vringfd to specify the receive
>>> buffer.  Because we can't copy to userspace in bh context, we queue
>>> like normal then use the "pull" hook to actually do the copy.
>>>
>>> More thought needs to be put into the possible races with ring
>>> registration and a simultaneous close, for example (see FIXME).
>>>
>>> We use struct virtio_net_hdr prepended to packets in the ring to allow
>>> userspace to receive GSO packets in future (at the moment, the tun
>>> driver doesn't tell the stack it can handle them, so these cases are
>>> never taken).
>> In general the code looks good. The only thing I could not convince myself
>> in is whether having generic ring buffer makes sense or not.
>> At least the TUN driver would be more efficient if it had its own simple
>> ring implementation. Less indirection, fewer callbacks, fewer if()s, etc.
>> TUN already has the file descriptor and having two additional fds for rx
>> and tx ring is a waste (think of a VPN server that has to have a bunch of
>> TUN fds). Also as I mentioned before Jamal and I wanted to expose some of
>> the SKB fields through TUN device. With the rx/tx rings the natural way of
>> doing that would be the ring descriptor itself. It can of course be done
>> the same way we copy proto info (PI) and GSO stuff before the packet but
>> that means more copy_to_user() calls and yet more checks.
>>
>> So. What am I missing ? Why do we need generic ring for the TUN ? I looked
>> at the lguest code a bit and it seems that we need a bunch of network
>> specific code anyway. The cool thing is that you can now mmap the rings
>> into the guest directly but the same thing can be done with TUN specific
>> rings.
> 
> I started modifying tun to do this directly, but it ended up with a whole heap 
> of code just for the rings, and a lot of current code (eg. read, write, poll) 
> ended up inside an 'if (tun->rings) ... else {'.  Having a natural poll() 
> interface for the rings made more sense, so being their own fds fell out 
> naturally.
Hmm, the version that I sent you awhile ago (remember I sent you an attachment 
with prototype of the new tun driver and user space code) was not that bad in 
that area. It mean it did not touch existing read()/write() path. The 
difference was that it allocated the rings and the data buffer in the kernel 
and mapped into the user-space. Which is not what you guys need but that's a 
separate thing.

The fd thing could be an issue. As I mentioned the example would be a VPN 
server (OpenVPN, etc) with a bunch of client connection (typically tun per 
connection).

> I decided to float this version because it does minimal damage to tun, and I
> know that other people have wanted rings before: I'd like to know if this is
> likely to be generic enough for them.
I see.

I'll try to spend some time on this in a near future and take a crack at the 
version with the TUN specific rings. Although I said that many times now and 
it may not happen in the near enough future :). In the mean time if your 
current version helps you guys a lot I do not mind us putting it in. We can 
always add another mode or something that uses internal rings and gradually 
obsolete old read()/write() and generic rings.

Max

  parent reply	other threads:[~2008-04-10 17:19 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-05 12:02 [PATCH RFC 1/5] vringfd syscall Rusty Russell
2008-04-05 12:04 ` [PATCH RFC 2/5] vringfd base/offset Rusty Russell
2008-04-05 17:18   ` Anthony Liguori
2008-04-06  3:23     ` Rusty Russell
2008-04-06  3:23     ` Rusty Russell
2008-04-05 17:18   ` Anthony Liguori
2008-04-05 12:04 ` Rusty Russell
2008-04-05 12:05   ` [PATCH RFC 3/5] tun: vringfd receive support Rusty Russell
2008-04-05 12:06     ` [PATCH RFC 4/5] tun: vringfd xmit support Rusty Russell
2008-04-05 12:06     ` Rusty Russell
2008-04-05 12:09       ` [PATCH RFC 5/5] lguest support Rusty Russell
2008-04-05 12:09       ` Rusty Russell
2008-04-07  5:13       ` [PATCH RFC 4/5] tun: vringfd xmit support Herbert Xu
2008-04-07  7:24         ` Rusty Russell
2008-04-07  7:24         ` Rusty Russell
2008-04-07  7:35           ` David Miller
2008-04-07  7:35           ` David Miller
2008-04-08  1:51             ` Rusty Russell
2008-04-08  1:51             ` Rusty Russell
2008-04-07  5:13       ` Herbert Xu
2008-04-08 19:49     ` [PATCH RFC 3/5] tun: vringfd receive support Max Krasnyansky
2008-04-08 19:49     ` Max Krasnyansky
2008-04-09 12:46       ` Dor Laor
2008-04-09 12:46       ` Dor Laor
2008-04-10 17:02         ` Max Krasnyanskiy
2008-04-10 17:02         ` Max Krasnyanskiy
2008-04-10  5:44       ` Rusty Russell
2008-04-10  5:44       ` Rusty Russell
2008-04-10 17:18         ` Max Krasnyanskiy
2008-04-10 17:18         ` Max Krasnyanskiy [this message]
2008-04-05 12:05   ` Rusty Russell
2008-04-05 17:26     ` Anthony Liguori
2008-04-05 17:26     ` Anthony Liguori
2008-04-05 12:44   ` [PATCH RFC 2/5] vringfd base/offset Avi Kivity
2008-04-06  2:54     ` Rusty Russell
2008-04-06  2:54     ` Rusty Russell
2008-04-05 12:44   ` Avi Kivity
2008-04-08  5:14   ` Arnd Bergmann
2008-04-08  5:14   ` Arnd Bergmann
2008-04-07 17:54 ` [PATCH RFC 1/5] vringfd syscall Jonathan Corbet
2008-04-07 22:34   ` Rusty Russell
2008-04-07 22:34   ` Rusty Russell
2008-04-07 17:54 ` Jonathan Corbet
2008-04-08  2:35 ` Arnd Bergmann
2008-04-08  2:35 ` Arnd Bergmann
2008-04-08  2:35   ` Arnd Bergmann
2008-04-09 19:28 ` Jeremy Fitzhardinge
2008-04-09 19:28 ` Jeremy Fitzhardinge
2008-04-12 17:18 ` Marcelo Tosatti
2008-04-12 17:39   ` Marcelo Tosatti
2008-04-12 17:39   ` Marcelo Tosatti
2008-04-12 18:19   ` Rusty Russell
2008-04-12 18:19   ` Rusty Russell
2008-04-12 17:18 ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47FE4BFC.6000705@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.