From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: QEMU PIC indirection patch for in-kernel APIC work
Date: Thu, 12 Apr 2007 06:32:25 +0300 [thread overview]
Message-ID: <461DA849.50406@qumranet.com> (raw)
In-Reply-To: <1176334200.14322.133.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
Rusty Russell wrote:
> On Wed, 2007-04-11 at 17:28 +0300, Avi Kivity wrote:
>
>> Rusty Russell wrote:
>>
>>> On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:
>>>
>>>
>>>> Nope. Being async is critical for copyless networking:
>>>>
>>>>
>> With async operations, the saga continues like this: the host-side
>> driver allocates an skb, get_page()s and attaches the data to the new
>> skb, this skb crosses the bridge, trickles into the real ethernet
>> device, gets queued there, sent, interrupts fire, triggering async
>> completion. On this completion, we send a virtual interrupt to the
>> guest, which tells it to destroy the skb and reclaim the pages attached
>> to it.
>>
>
> Hi Avi!
>
> Thanks for spelling it out, I now understand your POV. I had
> considered it obvious that a (non-async) write which didn't copy would
> block until the skb was finished with, which is easy to code up within
> the tap device itself. Otherwise it's actually an async write without a
> notification mechanism, which I agree is broken.
>
>
I hadn't considered an always-blocking (or unbuffered) networking API.
It's very counter to current APIs, but does make sense with things like
syslets. Without syslets, I don't think it's very useful as you need
some artificial threads to keep things humming along.
(How would userspace specify it? O_DIRECT when opening the tap?)
I don't think there's a lot of difference between implementing aio or
always-blocking copyless writes for tap. They just differ in how they
sleep and in how to access user pages.
> Note though: if the guest can change the packet headers they can
> subvert some firewall rules and possibly crash the host. None of the
> networking code I wrote expects packets to change in flight 8(
>
> This applies to a userspace or kernelspace driver.
>
>
Umm, right. We could write-protect the packets (which would be very
expensive). We could set the evil bit on guest-originated packets, and
rewrite the entire networking stack to copy any part which is inspected
if the evil bit is set. We need more head-scratching on this.
>>> Yes, and this is already present in the tap device. Anthony suggested a
>>> slightly nasty hack for multiple sg packets in one writev()/readv, which
>>> could also give us batching.
>>>
>> No need for hacks if we get list aio support one day.
>>
>
> As you point out though, aio is not something we want to hold our breath
> for. Plus, aio never makes things simpler, and complexity kills
> puppies.
>
The puppies had better stay away from qemu then, as it is completely async.
Always-blocking writes won't reduce complexity. Suddenly you need a
thread for each request batch and some pleasant code for joining the
threads when done. Syslets do make it go away, though they're more for
the mostly-nonblocking-with-occasional-blockage stuff rather than the
always blocking thingie you describe.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
next prev parent reply other threads:[~2007-04-12 3:32 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4613B438.60107@codemonkey.ws>
[not found] ` <4613B89F.8090806@qumranet.com>
[not found] ` <4613BC6B.1070708@codemonkey.ws>
[not found] ` <4613BF07.50606@qumranet.com>
[not found] ` <4613C993.9020405@codemonkey.ws>
[not found] ` <4613CC01.1090500@qumranet.com>
[not found] ` <4613CDB2.4000903@codemonkey.ws>
[not found] ` <4613D001.3040606@qumranet.com>
[not found] ` <20070404200112.GA6070@elte.hu>
[not found] ` <4614098F.2030307@us.ibm.com>
[not found] ` <20070404212103.GA19026@elte.hu>
2007-04-04 23:19 ` [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work Rusty Russell
2007-04-05 7:17 ` Avi Kivity
2007-04-06 1:02 ` Rusty Russell
2007-04-08 5:36 ` Avi Kivity
[not found] ` <46187F4E.1080807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-08 9:04 ` Muli Ben-Yehuda
2007-04-09 2:50 ` Rusty Russell
[not found] ` <1176087018.11664.65.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-09 7:10 ` Avi Kivity
[not found] ` <4619E6DC.3010804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-09 9:46 ` Rusty Russell
[not found] ` <1176111984.11664.90.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-09 13:38 ` Avi Kivity
[not found] ` <461A41CA.9080201-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 8:07 ` Evgeniy Polyakov
2007-04-10 8:19 ` [kvm-devel] " Avi Kivity
[not found] ` <461B48A8.1060904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 8:58 ` Evgeniy Polyakov
2007-04-10 11:21 ` [kvm-devel] " Avi Kivity
[not found] ` <461B7334.8090807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 11:50 ` Evgeniy Polyakov
2007-04-10 12:17 ` [kvm-devel] " Avi Kivity
[not found] ` <461B8069.6070007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 12:30 ` Evgeniy Polyakov
[not found] ` <20070410123034.GA11493-9fLWQ3dKdXwox3rIn2DAYQ@public.gmane.org>
2007-04-10 12:49 ` Avi Kivity
2007-04-11 3:53 ` [kvm-devel] " Rusty Russell
[not found] ` <1176263593.26372.84.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-11 4:26 ` Avi Kivity
[not found] ` <461C6360.1060908-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-11 13:23 ` Rusty Russell
[not found] ` <1176297794.14322.72.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-11 14:28 ` Avi Kivity
[not found] ` <461CF098.3090003-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-11 23:30 ` Rusty Russell
[not found] ` <1176334200.14322.133.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-12 3:32 ` Avi Kivity [this message]
2007-04-16 0:22 ` [kvm-devel] " Rusty Russell
2007-04-16 5:13 ` Avi Kivity
[not found] ` <1175728768.12230.593.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-05 9:30 ` Ingo Molnar
[not found] ` <20070405093033.GC25448-X9Un+BFzKDI@public.gmane.org>
2007-04-05 9:58 ` Avi Kivity
2007-04-05 10:26 ` [kvm-devel] " Ingo Molnar
2007-04-05 11:26 ` Avi Kivity
[not found] ` <4614DCE1.70905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-05 11:36 ` Ingo Molnar
2007-04-06 1:16 ` [kvm-devel] " Rusty Russell
2007-04-06 18:59 ` Ingo Molnar
2007-04-05 10:55 ` Ingo Molnar
2007-04-05 14:32 ` [kvm-devel] " Anthony Liguori
2007-04-06 10:37 ` Ingo Molnar
2007-04-06 11:07 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=461DA849.50406@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).