From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:40956) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T9ZLC-0002Z0-KZ for qemu-devel@nongnu.org; Thu, 06 Sep 2012 06:29:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T9ZL6-0007SY-Mv for qemu-devel@nongnu.org; Thu, 06 Sep 2012 06:29:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58871) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T9ZL6-0007SR-EM for qemu-devel@nongnu.org; Thu, 06 Sep 2012 06:29:36 -0400 Message-ID: <50487B0A.6010908@redhat.com> Date: Thu, 06 Sep 2012 12:29:30 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <20120809130010.GA7960@in.ibm.com> <20120809130216.GC7960@in.ibm.com> <20120905074106.GA28080@in.ibm.com> <20120905095431.GB28080@in.ibm.com> <50484F63.5050406@redhat.com> <504867A9.3050808@redhat.com> <50486F24.2010106@redhat.com> <504875F6.9090302@redhat.com> <50487881.9030207@redhat.com> In-Reply-To: <50487881.9030207@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Anthony Liguori , Anand Avati , Stefan Hajnoczi , Vijay Bellur , Amar Tumballi , qemu-devel@nongnu.org, Blue Swirl , bharata@linux.vnet.ibm.com Am 06.09.2012 12:18, schrieb Paolo Bonzini: > Il 06/09/2012 12:07, Kevin Wolf ha scritto: >>> The AIOCB is already invalid at the time the callback is entered, so we >>> could release it before the call. However, not all implementation of >>> AIO are ready for that and I'm not really in the mood for large scale >>> refactoring... >> >> But the way, what I'd really want to see in the end is to get rid of >> qemu_aio_flush() and replace it by .bdrv_drain() callbacks in each >> BlockDriver. The way we're doing it today is a layering violation. > > That's quite difficult. Completion of an I/O operation can trigger > another I/O operation on another block device, and so on until we go > back to the first device (think of a hypothetical RAID-5 device). You always have a tree of BDSes, and children should only ever trigger completion of I/O operations in their parents. Am I missing anything? >> Doesn't change anything about this problem, though. So the options that >> we have are: >> >> 1. Delay the callback using a BH. Doing this in each driver is ugly. >> But is there actually more than one possible callback in today's >> coroutine world? I only see bdrv_co_io_em_complete(), which could >> reenter the coroutine from a BH. > > Easy and safe, but it feels a bit like a timebomb. Also, I'm not > entirely sure of _why_ the bottom half works. :) Hm, safe and time bomb is contradictory in my book. :-) The bottom half work because we're not reentering the qcow2_create coroutine immediately, so the gluster AIO callback can complete all of its cleanup work without being interrupted by code that might wait on this particular request and create a deadlock this way. >> 2. Delay the callback by just calling it later when the cleanup has >> been completed and .io_flush() can return 0. You say that it's hard >> to implement for some drivers, except if the AIOCB are leaked until >> the end of functions like qcow2_create(). > > ... which is what we do in posix-aio-compat.c; nobody screamed so far. True. Would be easy to fix in posix-aio-compat, though, or can a callback expect that the AIOCB is still valid? > Not really hard, it just has to be assessed for each driver separately. > We can just do it in gluster and refactor it later. Okay, so let's keep it as an option for now. >> 3. Add a delay only later in functions like bdrv_drain_all() that assume >> that the request has completed. Are there more of this type? AIOCBs >> are leaked until a bdrv_drain_all() call. Does it work with draining >> specific BDSes instead of everything? >> >> Unless I forgot some important point, it almost looks like option 1 is >> the easiest and safest. > > I agree with your opinion, but I would feel better if I understood > better why it works. (2) can be done easily in each driver (no > ugliness) and refactored later. I think option 2 must be done in each driver by design, or do you see even a theoretical way how to do it generically? Kevin