From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: QEMU PIC indirection patch for in-kernel APIC work
Date: Wed, 11 Apr 2007 17:28:40 +0300 [thread overview]
Message-ID: <461CF098.3090003@qumranet.com> (raw)
In-Reply-To: <1176297794.14322.72.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
Rusty Russell wrote:
> On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:
>
>> Nope. Being async is critical for copyless networking:
>>
>> - in the transmit path, so need to stop the sender (guest) from touching
>> the memory until it's on the wire. This means 100% of packets sent will
>> be blocked.
>>
>
> Hi Avi,
>
> You keep saying stuff like this, and I keep ignoring it. OK, I'll
> bite:
>
> Why would we try to prevent the sender from altering the packets?
>
>
To avoid data corruption.
The guest wants to send a packet. It calls write(), which causes an skb
to be allocated, data to be copied into it, the entire networking stack
gets into gear, and the guest-side driver instructs the "device" to send
the packet.
With async operations, the saga continues like this: the host-side
driver allocates an skb, get_page()s and attaches the data to the new
skb, this skb crosses the bridge, trickles into the real ethernet
device, gets queued there, sent, interrupts fire, triggering async
completion. On this completion, we send a virtual interrupt to the
guest, which tells it to destroy the skb and reclaim the pages attached
to it.
Without async operations, we don't have a hook to notify the guest when
to reclaim the skb. If we do it too soon, the skb can be reclaimed and
the memory reused before the real device gets to see it, so we end up
sending data that we did not intend. The only way to avoid it is to
copy the data somewhere safe, but that is exactly what we don't want to do.
>> - multiple packets per operation (for interrupt mitigation) (like
>> lio_listio)
>>
>
> The benefits for interrupt mitigation are less clear to me in a virtual
> environment (scheduling tends to make it happen anyway); I'd want to
> benchmark it.
>
>
Yes, the guest will probably submit multiple packets in one hypercall.
It would be nice for the userspace driver to be able to submit them to
the host kernel in one syscall.
> Some kind of batching to reduce syscall overhead, perhaps, but TSO would
> go a fair way towards that anyway (probably not enough).
>
>
For some workloads, sure.
>> - scatter/gather packets (iovecs)
>>
>
> Yes, and this is already present in the tap device. Anthony suggested a
> slightly nasty hack for multiple sg packets in one writev()/readv, which
> could also give us batching.
>
>
No need for hacks if we get list aio support one day.
>> - configurable wakeup (by packet count/timeout) for queue management
>>
>
> I'm not convinced that this is a showstopper, though.
>
It probably isn't. It's free with aio though.
>
>> - hacks (tso)
>>
>
> I'd usually go for a batch interface over TSO, but if the card we're
> sending to actually does TSO then TSO will probably win.
>
Sure, if tso helps a regular host then it should help one that happens
to be running a virtual machine.
>
>> Most of these can be provided by a combination of the pending aio work,
>> the pending aio/fd integration, and the not-so-pending tap aio work. As
>> the first two are available as patches and the third is limited to the
>> tap device, it is not unreasonable to try it out. Maybe it will turn
>> out not to be as difficult as I predicted just a few lines above.
>>
>
> Indeed, I don't think we're asking for a revolution a-la VJ-style
> channels. But I'm still itching to get back to that, and this might yet
> provide an excuse 8)
>
I'll be happy if this can be made to work. It will make the paravirt
guest-side driver work in kvm-less setups, which are useful for testing,
and of course reduction in kernel code is beneficial. It will be slower
that in-kernel, but if we get the batching right, perhaps not
significantly slower. I'm mostly concerned that this depends on code
that has eluded merging for such a long time.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
next prev parent reply other threads:[~2007-04-11 14:28 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4613B438.60107@codemonkey.ws>
[not found] ` <4613B89F.8090806@qumranet.com>
[not found] ` <4613BC6B.1070708@codemonkey.ws>
[not found] ` <4613BF07.50606@qumranet.com>
[not found] ` <4613C993.9020405@codemonkey.ws>
[not found] ` <4613CC01.1090500@qumranet.com>
[not found] ` <4613CDB2.4000903@codemonkey.ws>
[not found] ` <4613D001.3040606@qumranet.com>
[not found] ` <20070404200112.GA6070@elte.hu>
[not found] ` <4614098F.2030307@us.ibm.com>
[not found] ` <20070404212103.GA19026@elte.hu>
2007-04-04 23:19 ` [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work Rusty Russell
2007-04-05 7:17 ` Avi Kivity
2007-04-06 1:02 ` Rusty Russell
2007-04-08 5:36 ` Avi Kivity
[not found] ` <46187F4E.1080807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-08 9:04 ` Muli Ben-Yehuda
2007-04-09 2:50 ` Rusty Russell
[not found] ` <1176087018.11664.65.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-09 7:10 ` Avi Kivity
[not found] ` <4619E6DC.3010804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-09 9:46 ` Rusty Russell
[not found] ` <1176111984.11664.90.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-09 13:38 ` Avi Kivity
[not found] ` <461A41CA.9080201-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 8:07 ` Evgeniy Polyakov
2007-04-10 8:19 ` [kvm-devel] " Avi Kivity
[not found] ` <461B48A8.1060904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 8:58 ` Evgeniy Polyakov
2007-04-10 11:21 ` [kvm-devel] " Avi Kivity
[not found] ` <461B7334.8090807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 11:50 ` Evgeniy Polyakov
2007-04-10 12:17 ` [kvm-devel] " Avi Kivity
[not found] ` <461B8069.6070007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-10 12:30 ` Evgeniy Polyakov
[not found] ` <20070410123034.GA11493-9fLWQ3dKdXwox3rIn2DAYQ@public.gmane.org>
2007-04-10 12:49 ` Avi Kivity
2007-04-11 3:53 ` [kvm-devel] " Rusty Russell
[not found] ` <1176263593.26372.84.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-11 4:26 ` Avi Kivity
[not found] ` <461C6360.1060908-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-11 13:23 ` Rusty Russell
[not found] ` <1176297794.14322.72.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-11 14:28 ` Avi Kivity [this message]
[not found] ` <461CF098.3090003-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-11 23:30 ` Rusty Russell
[not found] ` <1176334200.14322.133.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-12 3:32 ` Avi Kivity
2007-04-16 0:22 ` [kvm-devel] " Rusty Russell
2007-04-16 5:13 ` Avi Kivity
[not found] ` <1175728768.12230.593.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-04-05 9:30 ` Ingo Molnar
[not found] ` <20070405093033.GC25448-X9Un+BFzKDI@public.gmane.org>
2007-04-05 9:58 ` Avi Kivity
2007-04-05 10:26 ` [kvm-devel] " Ingo Molnar
2007-04-05 11:26 ` Avi Kivity
[not found] ` <4614DCE1.70905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-05 11:36 ` Ingo Molnar
2007-04-06 1:16 ` [kvm-devel] " Rusty Russell
2007-04-06 18:59 ` Ingo Molnar
2007-04-05 10:55 ` Ingo Molnar
2007-04-05 14:32 ` [kvm-devel] " Anthony Liguori
2007-04-06 10:37 ` Ingo Molnar
2007-04-06 11:07 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=461CF098.3090003@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).