From: Andrea Arcangeli <aarcange@redhat.com>
To: Jens Axboe <qemu@kernel.dk>
Cc: qemu-devel@nongnu.org, Gerd Hoffmann <kraxel@redhat.com>,
kvm-devel <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date: Fri, 12 Dec 2008 15:13:33 +0100 [thread overview]
Message-ID: <20081212141333.GJ6809@random.random> (raw)
In-Reply-To: <20081212115420.GR23742@kernel.dk>
On Fri, Dec 12, 2008 at 12:54:21PM +0100, Jens Axboe wrote:
> I agree completely. The buffered aio patches got pretty involved though,
> it wasn't real pretty in the end. So it never got merged. Looks like the
> most realistic way forward is some variant of syslet (or the acall stuff
> that Zach has been working on), which is largely a cop out and will
> never perform as well.
It'll at least perform better a brand new userland pool of threads for
each task that needs aio functionality, and it can be later optimized
if we want ;).
But I'm surprised, the aio patches in 2.4 were very clean, we didn't
have to break filesystems, it was really a nice done work, enterprise
quality as demonstrated by the several databases running on it for
years. Ironically the O_DIRECT part didn't work at the
time... because effectively the O_DIRECT part is more difficult. So
2.6 has the hard stuff done and misses the simpler stuff. I guess the
simpler stuff is harder to merge as it has more users.
Well I hope it'll be fixed... for kvm/qemu we definitely require aio
for buffered reads too (buffered writes aren't a big deal but reads
are). For the parent images it makes sense to run them in buffered
mode even on servers using O_DIRECT, so basically we can't use
linux-aio until this is fixed somehow.
In the meantime I think it'd be better to -EINVAL (so the userland
thread can fallback to userland thread pool) instead of just behaving
synchronously that can break GUI and interactive behavior...
> I added CLONE_IO some time ago to avoid that, so it's perfectly possible
> to share cfq io contexts with threads or processes even in userspace!
It's available in recent kernels I see! so the fix is easy. Only
problem is how to pass CLONE_IO to pthread_create... We'll have to
make a linux-only change and call clone by hand under some #ifdef
CLONE_IO.
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Jens Axboe <qemu@kernel.dk>
Cc: qemu-devel@nongnu.org, kvm-devel <kvm@vger.kernel.org>,
Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date: Fri, 12 Dec 2008 15:13:33 +0100 [thread overview]
Message-ID: <20081212141333.GJ6809@random.random> (raw)
In-Reply-To: <20081212115420.GR23742@kernel.dk>
On Fri, Dec 12, 2008 at 12:54:21PM +0100, Jens Axboe wrote:
> I agree completely. The buffered aio patches got pretty involved though,
> it wasn't real pretty in the end. So it never got merged. Looks like the
> most realistic way forward is some variant of syslet (or the acall stuff
> that Zach has been working on), which is largely a cop out and will
> never perform as well.
It'll at least perform better a brand new userland pool of threads for
each task that needs aio functionality, and it can be later optimized
if we want ;).
But I'm surprised, the aio patches in 2.4 were very clean, we didn't
have to break filesystems, it was really a nice done work, enterprise
quality as demonstrated by the several databases running on it for
years. Ironically the O_DIRECT part didn't work at the
time... because effectively the O_DIRECT part is more difficult. So
2.6 has the hard stuff done and misses the simpler stuff. I guess the
simpler stuff is harder to merge as it has more users.
Well I hope it'll be fixed... for kvm/qemu we definitely require aio
for buffered reads too (buffered writes aren't a big deal but reads
are). For the parent images it makes sense to run them in buffered
mode even on servers using O_DIRECT, so basically we can't use
linux-aio until this is fixed somehow.
In the meantime I think it'd be better to -EINVAL (so the userland
thread can fallback to userland thread pool) instead of just behaving
synchronously that can break GUI and interactive behavior...
> I added CLONE_IO some time ago to avoid that, so it's perfectly possible
> to share cfq io contexts with threads or processes even in userspace!
It's available in recent kernels I see! so the fix is easy. Only
problem is how to pass CLONE_IO to pthread_create... We'll have to
make a linux-only change and call clone by hand under some #ifdef
CLONE_IO.
next prev parent reply other threads:[~2008-12-12 14:13 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 21:21 [RFC] Replace posix-aio with custom thread pool Anthony Liguori
2008-12-05 21:21 ` [Qemu-devel] " Anthony Liguori
2008-12-06 9:03 ` Blue Swirl
2008-12-06 18:26 ` Jamie Lokier
2008-12-08 18:23 ` Anthony Liguori
2008-12-08 18:23 ` Anthony Liguori
2008-12-09 15:51 ` Gerd Hoffmann
2008-12-09 16:01 ` Anthony Liguori
2008-12-10 16:44 ` Andrea Arcangeli
2008-12-10 17:21 ` Anthony Liguori
2008-12-10 17:21 ` Anthony Liguori
2008-12-10 17:29 ` Gerd Hoffmann
2008-12-10 18:50 ` Anthony Liguori
2008-12-10 19:08 ` Andrea Arcangeli
2008-12-10 19:08 ` Andrea Arcangeli
2008-12-11 13:12 ` Andrea Arcangeli
2008-12-11 15:24 ` Gerd Hoffmann
2008-12-11 15:24 ` Gerd Hoffmann
2008-12-11 15:53 ` Andrea Arcangeli
2008-12-11 15:53 ` Andrea Arcangeli
2008-12-11 16:11 ` Gerd Hoffmann
2008-12-11 16:11 ` Gerd Hoffmann
2008-12-11 16:49 ` Andrea Arcangeli
2008-12-11 16:49 ` Andrea Arcangeli
2008-12-11 17:20 ` Gerd Hoffmann
2008-12-11 17:20 ` Gerd Hoffmann
2008-12-11 18:11 ` Andrea Arcangeli
2008-12-11 18:11 ` Andrea Arcangeli
2008-12-11 20:38 ` Gerd Hoffmann
2008-12-11 20:38 ` Gerd Hoffmann
2008-12-11 20:40 ` Anthony Liguori
2008-12-12 8:23 ` Jens Axboe
2008-12-12 8:23 ` Jens Axboe
2008-12-12 11:51 ` Andrea Arcangeli
2008-12-12 11:51 ` Andrea Arcangeli
2008-12-12 11:54 ` Jens Axboe
2008-12-12 11:54 ` Jens Axboe
2008-12-12 14:13 ` Andrea Arcangeli [this message]
2008-12-12 14:13 ` Andrea Arcangeli
2008-12-12 14:24 ` Anthony Liguori
2008-12-12 14:24 ` Anthony Liguori
2008-12-12 16:33 ` Chris Wright
2008-12-12 16:33 ` Chris Wright
2008-12-12 16:51 ` Anthony Liguori
2008-12-12 16:51 ` Anthony Liguori
2008-12-12 16:52 ` Chris Wright
2008-12-12 16:52 ` Chris Wright
2008-12-11 21:32 ` Christoph Hellwig
2008-12-12 0:27 ` Andrea Arcangeli
2008-12-12 0:27 ` Andrea Arcangeli
2008-12-11 21:30 ` Christoph Hellwig
2008-12-11 16:41 ` Anthony Liguori
2008-12-11 16:41 ` Anthony Liguori
2008-12-12 14:24 ` Andrea Arcangeli
2008-12-12 14:24 ` Andrea Arcangeli
2008-12-12 14:35 ` Anthony Liguori
2008-12-12 14:35 ` Anthony Liguori
2008-12-12 15:44 ` Andrea Arcangeli
2008-12-12 15:44 ` Andrea Arcangeli
2008-12-12 16:49 ` Anthony Liguori
2008-12-12 16:49 ` Anthony Liguori
2008-12-12 17:09 ` Andrea Arcangeli
2008-12-12 17:09 ` Andrea Arcangeli
2008-12-12 17:25 ` Anthony Liguori
2008-12-12 17:25 ` Anthony Liguori
2008-12-12 17:52 ` Andrea Arcangeli
2008-12-12 17:52 ` Andrea Arcangeli
2008-12-12 18:17 ` Anthony Liguori
2008-12-12 18:17 ` Anthony Liguori
2008-12-12 18:26 ` Andrea Arcangeli
2008-12-12 20:12 ` Gerd Hoffmann
2008-12-12 20:17 ` Anthony Liguori
2008-12-12 20:35 ` Gerd Hoffmann
2008-12-09 17:16 ` Avi Kivity
2008-12-17 14:44 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081212141333.GJ6809@random.random \
--to=aarcange@redhat.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu@kernel.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.