From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: QEMU PIC indirection patch for in-kernel APIC work
Date: Mon, 09 Apr 2007 16:38:18 +0300 [thread overview]
Message-ID: <461A41CA.9080201@qumranet.com> (raw)
In-Reply-To: <1176111984.11664.90.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
Rusty Russell wrote:
> On Mon, 2007-04-09 at 10:10 +0300, Avi Kivity wrote:
>
>> Rusty Russell wrote:
>>
>>> I'm a little puzzled by your response. Hmm...
>>>
>>> lguest's userspace network frontend does exactly as many copies as
>>> Ingo's in-host-kernel code. One from the Guest, one to the Guest.
>>>
>> kvm pvnet is suboptimal now. The number of copies could be reduced by
>> two (to zero), by constructing an skb that points to guest memory.
>> Right now, this can only be done in-kernel.
>>
>
> Sorry, you lost me here. You mean both input and output copies can be
> eliminated? Or are you talking about another two copies somewhere?
>
On the transmit path, current kvm pvnet has two copies:
1. on the guest side, the driver copies the skb data into the shared ring
2. on the host side, the device copies the data from the ring into a
newly allocated skb
Both of these copies can be eliminated with a host-side kernel. With
current userspace interfaces, only one copy can be eliminated.
Similar logic applies to receive, except that one copy must remain.
> But I don't get this "we can enhance the kernel but not userspace" vibe
> 8(
>
I've been waiting for network aio since ~2003. If it arrives in the
next few days, I'm all for it; much more than kvm can use it
profitably. But I'm not going to write that interface myself.
Moreover, some things just don't lend themselves to a userspace
abstraction. If we want to expose tso (tcp segmentation offload), we
can easily do so with a kernel driver since the kernel interfaces are
all tso aware. Tacking on tso awareness to tun/tap is doable, but at
the very least wierd.
>
>> With current userspace networking interfaces, one cannot build a network
>> device that has less than one copy on transmit, because sendmsg() *must*
>> copy the data (as there is no completion notification).
>>
>
> Why are you talking about sendmsg()? Perhaps this is where we're
> getting tangled up.
>
> We're dealing with the tun/tap device here, not a socket.
>
>
Hmm. tun actually has aio_write implemented, but it seems synchronous.
So does the read path.
If these are made truly asynchronous, and the write path is made in
addition copyless, then we might have something workable. I still
cringe at having a pagetable walk in order to deliver a 1500-byte packet.
>> sendfilev(),
>> even if it existed, cannot be used: it is copyless, but lacks completion
>> notification. It is useful only on unchanging data like read-only files.
>>
>
> Again, sendfile is a *much* harder problem than sending a single packet
> once, which is the question here.
>
sendfile() is a *different* problem. It doesn't need completion because
the data is assumed not to change under it.
Consider that the guest may be issuing a megabyte-sized sendfile() which
is broken into 17 tso frames. We need to preserve the large structures
as much as possible or we end up repeating the simple "single packet
once" path 700 times.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
next prev parent reply other threads:[~2007-04-09 13:38 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4613B438.60107@codemonkey.ws>
[not found] ` <4613B89F.8090806@qumranet.com>
[not found] ` <4613BC6B.1070708@codemonkey.ws>
[not found] ` <4613BF07.50606@qumranet.com>
[not found] ` <4613C993.9020405@codemonkey.ws>
[not found] ` <4613CC01.1090500@qumranet.com>
[not found] ` <4613CDB2.4000903@codemonkey.ws>
[not found] ` <4613D001.3040606@qumranet.com>
[not found] ` <20070404200112.GA6070@elte.hu>
[not found] ` <4614098F.2030307@us.ibm.com>
[not found] ` <20070404212103.GA19026@elte.hu>
2007-04-04 23:19 ` [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work Rusty Russell
2007-04-05 7:17 ` Avi Kivity
2007-04-06 1:02 ` Rusty Russell
2007-04-08 5:36 ` Avi Kivity
[not found] ` <46187F4E.1080807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-08 9:04 ` Muli Ben-Yehuda
2007-04-09 2:50 ` Rusty Russell
[not found] ` <1176087018.11664.65.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-09 7:10 ` Avi Kivity
[not found] ` <4619E6DC.3010804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-09 9:46 ` Rusty Russell
[not found] ` <1176111984.11664.90.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-09 13:38 ` Avi Kivity [this message]
[not found] ` <461A41CA.9080201-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 8:07 ` Evgeniy Polyakov
2007-04-10 8:19 ` [kvm-devel] " Avi Kivity
[not found] ` <461B48A8.1060904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 8:58 ` Evgeniy Polyakov
2007-04-10 11:21 ` [kvm-devel] " Avi Kivity
[not found] ` <461B7334.8090807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 11:50 ` Evgeniy Polyakov
2007-04-10 12:17 ` [kvm-devel] " Avi Kivity
[not found] ` <461B8069.6070007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 12:30 ` Evgeniy Polyakov
[not found] ` <20070410123034.GA11493-9fLWQ3dKdXwox3rIn2DAYQ@public.gmane.org>
2007-04-10 12:49 ` Avi Kivity
2007-04-11 3:53 ` [kvm-devel] " Rusty Russell
[not found] ` <1176263593.26372.84.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-11 4:26 ` Avi Kivity
[not found] ` <461C6360.1060908-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-11 13:23 ` Rusty Russell
[not found] ` <1176297794.14322.72.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-11 14:28 ` Avi Kivity
[not found] ` <461CF098.3090003-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-11 23:30 ` Rusty Russell
[not found] ` <1176334200.14322.133.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-12 3:32 ` Avi Kivity
2007-04-16 0:22 ` [kvm-devel] " Rusty Russell
2007-04-16 5:13 ` Avi Kivity
[not found] ` <1175728768.12230.593.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-05 9:30 ` Ingo Molnar
[not found] ` <20070405093033.GC25448-X9Un+BFzKDI@public.gmane.org>
2007-04-05 9:58 ` Avi Kivity
2007-04-05 10:26 ` [kvm-devel] " Ingo Molnar
2007-04-05 11:26 ` Avi Kivity
[not found] ` <4614DCE1.70905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-05 11:36 ` Ingo Molnar
2007-04-06 1:16 ` [kvm-devel] " Rusty Russell
2007-04-06 18:59 ` Ingo Molnar
2007-04-05 10:55 ` Ingo Molnar
2007-04-05 14:32 ` [kvm-devel] " Anthony Liguori
2007-04-06 10:37 ` Ingo Molnar
2007-04-06 11:07 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=461A41CA.9080201@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).