From: Avi Kivity <avi@qumranet.com>
To: Mark McLoughlin <markmc@redhat.com>
Cc: kvm@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au>,
Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH 0/9][RFC] KVM virtio_net performance
Date: Tue, 12 Aug 2008 16:35:12 +0300 [thread overview]
Message-ID: <48A19190.8020806@qumranet.com> (raw)
In-Reply-To: <1218484598.12581.49.camel@muff>
Mark McLoughlin wrote:
> Hi Avi,
>
> Sorry, I got distracted from this ...
>
>
So did I :)
>>> 1) The length of the tx mitigation timer makes quite a difference to
>>> throughput achieved; we probably need a good heuristic for
>>> adjusting this on the fly.
>>>
>>>
>> The tx mitigation timer is just one part of the equation; the other is
>> the virtio ring window size, which is now fixed.
>>
>> Using a maximum sized window is good when the guest and host are running
>> flat out, doing nothing but networking. When throughput drops (because
>> the guest is spending cpu on processing, or simply because the other
>> side is not keeping up), we need to drop the windows size so as to
>> retain acceptable latencies.
>>
>> The tx timer can then be set to "a bit after the end of the window",
>> acting as a safety belt in case the throughput changes.
>>
>
> i.e. the tx timer should give just enough time for a flat out guest to
> fill the ring, and no more?
>
> Yep, that's basically what lguest's tx timer heuristic is aiming for
> AFAICT.
>
>
Yes, but that's not enough. If networking is slow (for whatever reason)
we need to drop the window size, to make sure the timer never fires
under steady state circumstances.
Thinking about it, we could have an explicit "worst case latency"
parameter (instead of the implicit "flat out guest fills ring") and set
the timer to that. Adjust window size to as large as we can without
seeing the timer expire.
>>> 4) Dropping the global mutex while reading GSO packets from the tap
>>> interface gives a nice speedup. This highlights the global mutex
>>> as a general perfomance issue.
>>>
>>>
>>>
>> Not sure whether this is safe. What's stopping the guest from accessing
>> virtio and changing some state?
>>
>
> With the current code, the virtio state should be consistent before we
> drop the mutex. The I/O thread would only drop the lock while it reads
> into the tap buffer and then grab the lock again before popping a buffer
> from the ring and copying to it.
>
>
Right, tap_send() is called outside virtio-net context.
> With Anthony's zero-copy patch, the situation is less clear - we pop a
> buffer from the avail, drop the lock, read() into the buffer, grab the
> lock and then push the buffer back onto the used ring. While the mutex
> is released, the guest could e.g. reset the ring and release the buffer
> which we're in the process of read()ing too.
>
> So, yes - dropping the mutex during read() in the zero-copy patch isn't
> safe.
>
> Another potential concern is that if we drop the mutex, the guest thread
> could delete an I/O handler while the I/O thread is in the I/O handler
> loop in main_loop_wait(). However, this seems to have been coded to
> handle this situation - the I/O handler would only be marked as deleted,
> and ignored by the loop.
>
I think it's safe. Still I don't feel good about it.
>
>>> 5) Eliminating an extra copy on the host->guest path only makes a
>>> barely measurable difference.
>>>
>>>
>>>
>> That's expected on a host->guest test. Zero copy is mostly important
>> for guest->external, and with zerocopy already enabled in the guest
>> (sendfile or nfs server workloads).
>>
>
> Hmm, could you elaborate on that?
>
> The copy we're eliminating here is an intermediate copy from tapfd into
> a buffer before copying to a guest buffer. It doesn't give you zero-copy
> as we still copy from kernel space to user space and vice-versa.
>
So long as we're eliminating intermediate copies, each elimination is
not bring us much. It's the elimination of the last copy that brings
the benefit (actually, the last copy for each separate L2 cache; need to
test on loaded multisocket hosts).
There are broadly three workload categories wrt copying:
- server-side static http/nfs/smb protocols, serving to clients outside
the host: guest is already copyless; serving from cache-cold buffers
- normal network servers that actually process their data: you'll have a
guest kernel/user copy, and in any case, guest protocol processing means
that the potential for gains from eliminating copies is limited
- guest/host benchmarks, which do a copy from a single buffer which is
always in L1: zero copy is not going to show a gain (perhaps the opposite)
To get the first workload type optimized, we have to get the entire path
copyless.
--
error compiling committee.c: too many arguments to function
prev parent reply other threads:[~2008-08-12 13:35 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-24 11:46 [PATCH 0/9][RFC] KVM virtio_net performance Mark McLoughlin
2008-07-24 11:46 ` [PATCH 1/9] kvm: qemu: Set MIN_TIMER_REARM_US to 150us Mark McLoughlin
2008-07-24 11:46 ` [PATCH 2/9] kvm: qemu: Fix virtio_net tx timer Mark McLoughlin
2008-07-24 11:46 ` [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic Mark McLoughlin
2008-07-24 11:46 ` [PATCH 4/9] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY Mark McLoughlin
2008-07-24 11:46 ` [PATCH 5/9] kvm: qemu: Disable recv notifications until avail buffers exhausted Mark McLoughlin
2008-07-24 11:46 ` [PATCH 6/9] kvm: qemu: Add support for partial csums and GSO Mark McLoughlin
2008-07-24 11:46 ` [PATCH 7/9] kvm: qemu: Increase size of virtio_net rings Mark McLoughlin
2008-07-24 11:46 ` [PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd Mark McLoughlin
2008-07-24 11:46 ` [PATCH 9/9] kvm: qemu: Eliminate extra virtio_net copy Mark McLoughlin
2008-07-24 23:33 ` [PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd Dor Laor
2008-07-25 17:25 ` Mark McLoughlin
2008-07-24 23:22 ` [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic Dor Laor
2008-07-25 0:30 ` Rusty Russell
2008-07-25 17:30 ` Mark McLoughlin
2008-07-25 17:23 ` Mark McLoughlin
2008-07-24 23:56 ` Dor Laor
2008-07-26 9:48 ` [PATCH 2/9] kvm: qemu: Fix virtio_net tx timer Avi Kivity
2008-07-26 12:08 ` Mark McLoughlin
2008-07-24 11:55 ` [PATCH 0/9][RFC] KVM virtio_net performance Herbert Xu
2008-07-24 16:53 ` Mark McLoughlin
2008-07-24 18:29 ` Anthony Liguori
2008-07-25 16:36 ` Mark McLoughlin
2008-07-24 20:56 ` Anthony Liguori
2008-07-25 17:17 ` Mark McLoughlin
2008-07-25 21:29 ` Dor Laor
2008-07-26 19:09 ` Bill Davidsen
2008-07-27 7:52 ` Avi Kivity
2008-07-27 12:52 ` Bill Davidsen
2008-07-27 13:17 ` Bill Davidsen
2008-07-28 6:42 ` Mark McLoughlin
2008-07-26 9:45 ` Avi Kivity
2008-07-27 6:48 ` Rusty Russell
2008-07-27 6:48 ` Rusty Russell
2008-08-11 19:56 ` Mark McLoughlin
2008-08-12 13:35 ` Avi Kivity [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48A19190.8020806@qumranet.com \
--to=avi@qumranet.com \
--cc=herbert@gondor.apana.org.au \
--cc=kvm@vger.kernel.org \
--cc=markmc@redhat.com \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox