From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53829) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cACQ8-0001dz-VD for qemu-devel@nongnu.org; Fri, 25 Nov 2016 04:03:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cACQ7-0006Sk-R1 for qemu-devel@nongnu.org; Fri, 25 Nov 2016 04:03:48 -0500 References: <20161115063715.12561-1-pbutsykin@virtuozzo.com> <20161115063715.12561-2-pbutsykin@virtuozzo.com> <20161123142851.GB5068@noname.redhat.com> <5836C7DB.5000109@virtuozzo.com> <20161124123620.GB4535@noname.redhat.com> From: Pavel Butsykin Message-ID: <583702E1.3050707@virtuozzo.com> Date: Thu, 24 Nov 2016 18:10:25 +0300 MIME-Version: 1.0 In-Reply-To: <20161124123620.GB4535@noname.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v1 01/18] block/io: add bdrv_aio_{preadv, pwritev} List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, den@openvz.org, famz@redhat.com, stefanha@redhat.com, mreitz@redhat.com, eblake@redhat.com On 24.11.2016 15:36, Kevin Wolf wrote: > Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben: >> On 23.11.2016 17:28, Kevin Wolf wrote: >>> Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben: >>>> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide >>>> a byte-based interface for AIO read/write. >>>> >>>> Signed-off-by: Pavel Butsykin >>> >>> I'm in the process to phase out the last users of bdrv_aio_*() so that >>> this set of interfaces can be removed. I'm doing this because it's an >>> unnecessary redundancy, we have too many wrapper functions that expose >>> the same functionality with different syntax. So let's not add new >>> users. >>> >>> At first sight, you don't even seem to use bdrv_aio_preadv() for actual >>> parallelism, but you often have a pattern like this: >>> >>> void foo_cb(void *opaque) >>> { >>> ... >>> qemu_coroutine_enter(acb->co); >>> } >>> >>> void caller() >>> { >>> ... >>> acb = bdrv_aio_preadv(...); >>> qemu_coroutine_yield(); >>> } >>> >>> The code will actually become a lot simpler if you use bdrv_co_preadv() >>> instead because you don't have to have a callback, but you get pure >>> sequential code. >>> >>> The part that actually has some parallelism, pcache_readahead_request(), >>> already creates its own coroutine, so it runs in the background without >>> using callback-style interfaces. >> >> I used bdrv_co_preadv(), because it conveniently solves the partial >> cache hit. To solve the partial cache hit, we need to split a request >> into smaller parts, make asynchronous requests and wait for all >> requests in one place. >> >> Do you propose to create a coroutine for each part of request? It >> seemed to me that bdrv_co_preadv() is a wrapper that allows us to get >> rid of the same code. > > It's actually the other way round, bdrv_co_preadv() is the "native" > block layer API, and bdrv_aio_*() are wrappers providing an alternative > interface. > Sorry, I mixed up bdrv_co_preadv() and drv_aio_preadv() :) Also I forgot that in this version, pcache can't split one request into many small requests(as in RFC version). > I'm looking at pcache_co_readahead(), for example. It looks like this: > > bdrv_aio_preadv(bs->file, node->common.offset, &readahead_acb.qiov, > node->common.bytes, pcache_aio_readahead_cb, > &readahead_acb); > qemu_coroutine_yield(); > > And then we have pcache_aio_readahead_cb(), which ends in: > > qemu_coroutine_enter(acb->co); > > So here the callback style doesn't buy you anything, it just rips the > code apart in two function. There is no parallelism here anyway, > pcache_co_readahead() doesn't do anything until the callback reenters > it. This is a very obvious example where bdrv_co_preadv() will simplify > the code. > I agree. > It's similar with the other bdrv_aio_preadv() calls, which are in > pcache_co_preadv(): > > if (bytes > s->max_aio_size) { > bdrv_aio_preadv(bs->file, offset, qiov, bytes, > pcache_aio_read_cb, &acb); > goto out; > } > > update_req_stats(s->req_stats, offset, bytes); > > status = pcache_lookup_data(&acb); > if (status == CACHE_MISS) { > bdrv_aio_preadv(bs->file, offset, qiov, bytes, > pcache_aio_read_cb, &acb); > } else if (status == PARTIAL_CACHE_HIT) { > assert(acb.part.qiov.niov != 0); > bdrv_aio_preadv(bs->file, acb.part.offset, &acb.part.qiov, > acb.part.bytes, pcache_aio_read_cb, &acb); > } > > pcache_readahead_request(&acb); > > if (status == CACHE_HIT && --acb.ref == 0) { > return 0; > } > > out: > qemu_coroutine_yield(); > > Here you have mainly the pcache_readahead_request() call between > bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which > works in the background, so I think you can move it to before the reads > and then the reads can trivially become bdrv_co_preadv() and the > callback can again be inlined instead of ripping the function in two > parts. > > > The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same > thing and using the coroutine version results in obvious code > improvements. > > > And I think this are all uses of bdrv_aio_*() in the pcache driver, so > converting it to use bdrv_co_*() instead isn't only possible, but will > improve the legibility of your code, too. It's a clear win in all three > places. > Yes, you're right, this scheme was acceptable when the caller was bdrv_aio_*(), but now it just confuses. Thanks for the explanation! > Kevin >