From: Avi Kivity <avi@qumranet.com>
To: qemu-devel@nongnu.org, Anthony Liguori <aliguori@us.ibm.com>,
kvm-devel@lists.sourceforge.net,
Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Date: Mon, 21 Apr 2008 09:39:31 +0300 [thread overview]
Message-ID: <480C36A3.6010900@qumranet.com> (raw)
In-Reply-To: <20080420233913.GA23292@shareable.org>
Jamie Lokier wrote:
> Avi Kivity wrote:
>
>>> Does that mean "for the majority of deployments, the slow version is
>>> sufficient. The few that care about performance can use Linux AIO?"
>>>
>> In essence, yes. s/slow/slower/ and s/performance/ultimate block device
>> performance/.
>>
>> Many deployments don't care at all about block device performance; they
>> care mostly about networking performance.
>>
>
> That's interesting. I'd have expected block device performance to be
> important for most things, for the same reason that disk performance
> is (well, reasonably) important for non-virtual machines.
>
>
Seek time is important. Bandwidth is somewhat important. But for one-
and two- spindle workloads (the majority), the cpu utilization induced
by getting requests to the disk is not important, and that's what we're
optimizing here.
Disks work at around 300 Hz. Processors at around 3 GHz. That's seven
orders of magnitude difference. Even if you spent 100 usec calculating
what's the next best seek, even if it saves you only 10% of seeks it's a
win. And of course modern processors spend a few microseconds at most
getting a request out.
You really need 50+ disks or a large write-back cache to make
microoptimizations around the submission path felt.
> But as you say next:
>
>
>>> I'm under the impression that the entire and only point of Linux AIO
>>> is that it's faster than POSIX AIO on Linux.
>>>
>> It is. I estimate posix aio adds a few microseconds above linux aio per
>> I/O request, when using O_DIRECT. Assuming 10 microseconds, you will
>> need 10,000 I/O requests per second per vcpu to have a 10% performance
>> difference. That's definitely rare.
>>
>
> Oh, I didn't realise the difference was so small.
>
> At such a tiny difference, I'm wondering why Linux-AIO exists at all,
> as it complicates the kernel rather a lot. I can see the theoretical
> appeal, but if performance is so marginal, I'm surprised it's in
> there.
>
>
Linux aio exists, but that's all that can be said for it. It works
mostly for raw disks, doesn't integrate with networking, and doesn't
advance at the same pace as the rest of the kernel. I believe only
databases use it (and a userspace filesystem I wrote some time ago).
> I'm also surprised the Glibc implementation of AIO using ordinary
> threads is so close to it.
Why are you surprised?
Actually the glibc implementation could be improved from what I've
heard. My estimates are for a thread pool implementation, but there is
not reason why glibc couldn't achieve exactly the same performance.
> And then, I'm wondering why use AIO it
> all: it suggests QEMU would run about as fast doing synchronous I/O in
> a few dedicated I/O threads.
>
>
Posix aio is the unix API for this, why not use it?
>> Also, I'd presume that those that need 10K IOPS and above will not place
>> their high throughput images on a filesystem; rather on a separate SAN LUN.
>>
>
> Does the separate LUN make any difference? I thought O_DIRECT on a
> filesystem was meant to be pretty close to block device performance.
>
On a good extent-based filesystem like XFS you will get good performance
(though more cpu overhead due to needing to go through additional
mapping layers. Old clunkers like ext3 will require additional seeks or
a ton of cache (1 GB per 1 TB).
> I base this on messages here and there which say swapping to a file is
> about as fast as swapping to a block device, nowadays.
>
Swapping to a file preloads the block mapping into memory, so the
filesystem is not involved at all in the I/O path.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
next prev parent reply other threads:[~2008-04-21 6:39 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-17 19:26 [PATCH 1/3] Refactor AIO interface to allow other AIO implementations Anthony Liguori
2008-04-17 19:26 ` [PATCH 2/3] Split out posix-aio code Anthony Liguori
2008-04-17 19:26 ` [PATCH 3/3] Implement linux-aio backend Anthony Liguori
2008-04-18 15:09 ` Marcelo Tosatti
2008-04-18 15:18 ` Anthony Liguori
2008-04-18 17:46 ` Marcelo Tosatti
2008-04-17 19:38 ` [PATCH 1/3] Refactor AIO interface to allow other AIO implementations Daniel P. Berrange
2008-04-17 19:41 ` [kvm-devel] " Anthony Liguori
2008-04-17 20:00 ` Daniel P. Berrange
2008-04-17 20:05 ` Anthony Liguori
2008-04-18 12:43 ` Re: [kvm-devel] " Jamie Lokier
2008-04-18 15:23 ` Anthony Liguori
2008-04-18 16:22 ` [Qemu-devel] " Jamie Lokier
2008-04-18 16:32 ` Avi Kivity
2008-04-20 15:49 ` Jamie Lokier
2008-04-20 18:43 ` Avi Kivity
2008-04-20 23:39 ` Jamie Lokier
2008-04-21 6:39 ` Avi Kivity [this message]
2008-04-21 12:10 ` Jamie Lokier
2008-04-22 8:10 ` Avi Kivity
2008-04-22 14:28 ` [kvm-devel] " Jamie Lokier
2008-04-22 14:53 ` [Qemu-devel] " Anthony Liguori
2008-04-22 15:05 ` [kvm-devel] " Avi Kivity
2008-04-22 15:23 ` Jamie Lokier
2008-04-22 15:12 ` Jamie Lokier
2008-04-22 15:03 ` [Qemu-devel] " Avi Kivity
2008-04-22 15:36 ` [kvm-devel] " Jamie Lokier
2008-04-22 15:47 ` [Qemu-devel] " Javier Guerra
2008-04-21 0:31 ` Javier Guerra Giraldez
2008-04-21 6:41 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=480C36A3.6010900@qumranet.com \
--to=avi@qumranet.com \
--cc=aliguori@us.ibm.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox