From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Date: Fri, 12 Dec 2008 15:13:33 +0100 Message-ID: <20081212141333.GJ6809@random.random> References: <20081211131222.GA14908@random.random> <494130B5.2080800@redhat.com> <20081211155335.GE14908@random.random> <49413B9C.3030703@redhat.com> <20081211164947.GD6809@random.random> <49414BC9.5090905@redhat.com> <20081211181116.GE6809@random.random> <20081212082309.GI23742@kernel.dk> <20081212115133.GI6809@random.random> <20081212115420.GR23742@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: qemu-devel@nongnu.org, Gerd Hoffmann , kvm-devel To: Jens Axboe Return-path: Received: from mx2.redhat.com ([66.187.237.31]:45295 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757040AbYLLONr (ORCPT ); Fri, 12 Dec 2008 09:13:47 -0500 Content-Disposition: inline In-Reply-To: <20081212115420.GR23742@kernel.dk> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, Dec 12, 2008 at 12:54:21PM +0100, Jens Axboe wrote: > I agree completely. The buffered aio patches got pretty involved though, > it wasn't real pretty in the end. So it never got merged. Looks like the > most realistic way forward is some variant of syslet (or the acall stuff > that Zach has been working on), which is largely a cop out and will > never perform as well. It'll at least perform better a brand new userland pool of threads for each task that needs aio functionality, and it can be later optimized if we want ;). But I'm surprised, the aio patches in 2.4 were very clean, we didn't have to break filesystems, it was really a nice done work, enterprise quality as demonstrated by the several databases running on it for years. Ironically the O_DIRECT part didn't work at the time... because effectively the O_DIRECT part is more difficult. So 2.6 has the hard stuff done and misses the simpler stuff. I guess the simpler stuff is harder to merge as it has more users. Well I hope it'll be fixed... for kvm/qemu we definitely require aio for buffered reads too (buffered writes aren't a big deal but reads are). For the parent images it makes sense to run them in buffered mode even on servers using O_DIRECT, so basically we can't use linux-aio until this is fixed somehow. In the meantime I think it'd be better to -EINVAL (so the userland thread can fallback to userland thread pool) instead of just behaving synchronously that can break GUI and interactive behavior... > I added CLONE_IO some time ago to avoid that, so it's perfectly possible > to share cfq io contexts with threads or processes even in userspace! It's available in recent kernels I see! so the fix is easy. Only problem is how to pass CLONE_IO to pthread_create... We'll have to make a linux-only change and call clone by hand under some #ifdef CLONE_IO.