From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53829)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbutsykin@virtuozzo.com>) id 1cACQ8-0001dz-VD
	for qemu-devel@nongnu.org; Fri, 25 Nov 2016 04:03:52 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbutsykin@virtuozzo.com>) id 1cACQ7-0006Sk-R1
	for qemu-devel@nongnu.org; Fri, 25 Nov 2016 04:03:48 -0500
References: <20161115063715.12561-1-pbutsykin@virtuozzo.com>
	<20161115063715.12561-2-pbutsykin@virtuozzo.com>
	<20161123142851.GB5068@noname.redhat.com>
	<5836C7DB.5000109@virtuozzo.com>
	<20161124123620.GB4535@noname.redhat.com>
From: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-ID: <583702E1.3050707@virtuozzo.com>
Date: Thu, 24 Nov 2016 18:10:25 +0300
MIME-Version: 1.0
In-Reply-To: <20161124123620.GB4535@noname.redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v1 01/18] block/io: add bdrv_aio_{preadv,
 pwritev}
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, den@openvz.org, famz@redhat.com, stefanha@redhat.com, mreitz@redhat.com, eblake@redhat.com

On 24.11.2016 15:36, Kevin Wolf wrote:
> Am 24.11.2016 um 11:58 hat Pavel Butsykin geschrieben:
>> On 23.11.2016 17:28, Kevin Wolf wrote:
>>> Am 15.11.2016 um 07:36 hat Pavel Butsykin geschrieben:
>>>> It's just byte-based wrappers over bdrv_co_aio_prw_vector(), which provide
>>>>   a byte-based interface for AIO read/write.
>>>>
>>>> Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
>>>
>>> I'm in the process to phase out the last users of bdrv_aio_*() so that
>>> this set of interfaces can be removed. I'm doing this because it's an
>>> unnecessary redundancy, we have too many wrapper functions that expose
>>> the same functionality with different syntax. So let's not add new
>>> users.
>>>
>>> At first sight, you don't even seem to use bdrv_aio_preadv() for actual
>>> parallelism, but you often have a pattern like this:
>>>
>>>      void foo_cb(void *opaque)
>>>      {
>>>          ...
>>>          qemu_coroutine_enter(acb->co);
>>>      }
>>>
>>>      void caller()
>>>      {
>>>          ...
>>>          acb = bdrv_aio_preadv(...);
>>>          qemu_coroutine_yield();
>>>      }
>>>
>>> The code will actually become a lot simpler if you use bdrv_co_preadv()
>>> instead because you don't have to have a callback, but you get pure
>>> sequential code.
>>>
>>> The part that actually has some parallelism, pcache_readahead_request(),
>>> already creates its own coroutine, so it runs in the background without
>>> using callback-style interfaces.
>>
>> I used bdrv_co_preadv(), because it conveniently solves the partial
>> cache hit. To solve the partial cache hit, we need to split a request
>> into smaller parts, make asynchronous requests and wait for all
>> requests in one place.
>>
>> Do you propose to create a coroutine for each part of request? It
>> seemed to me that bdrv_co_preadv() is a wrapper that allows us to get
>> rid of the same code.
>
> It's actually the other way round, bdrv_co_preadv() is the "native"
> block layer API, and bdrv_aio_*() are wrappers providing an alternative
> interface.
>

Sorry, I mixed up bdrv_co_preadv() and drv_aio_preadv() :)

Also I forgot that in this version, pcache can't split one request into 
many small requests(as in RFC version).

> I'm looking at pcache_co_readahead(), for example. It looks like this:
>
>      bdrv_aio_preadv(bs->file, node->common.offset, &readahead_acb.qiov,
>                      node->common.bytes, pcache_aio_readahead_cb,
>                      &readahead_acb);
>      qemu_coroutine_yield();
>
> And then we have pcache_aio_readahead_cb(), which ends in:
>
>      qemu_coroutine_enter(acb->co);
>
> So here the callback style doesn't buy you anything, it just rips the
> code apart in two function. There is no parallelism here anyway,
> pcache_co_readahead() doesn't do anything until the callback reenters
> it. This is a very obvious example where bdrv_co_preadv() will simplify
> the code.
>

I agree.

> It's similar with the other bdrv_aio_preadv() calls, which are in
> pcache_co_preadv():
>
>          if (bytes > s->max_aio_size) {
>              bdrv_aio_preadv(bs->file, offset, qiov, bytes,
>                              pcache_aio_read_cb, &acb);
>              goto out;
>          }
>
>          update_req_stats(s->req_stats, offset, bytes);
>
>          status = pcache_lookup_data(&acb);
>          if (status == CACHE_MISS) {
>              bdrv_aio_preadv(bs->file, offset, qiov, bytes,
>                              pcache_aio_read_cb, &acb);
>          } else if (status == PARTIAL_CACHE_HIT) {
>              assert(acb.part.qiov.niov != 0);
>              bdrv_aio_preadv(bs->file, acb.part.offset, &acb.part.qiov,
>                              acb.part.bytes, pcache_aio_read_cb, &acb);
>          }
>
>          pcache_readahead_request(&acb);
>
>          if (status == CACHE_HIT && --acb.ref == 0) {
>              return 0;
>          }
>
>      out:
>          qemu_coroutine_yield();
>
> Here you have mainly the pcache_readahead_request() call between
> bdrv_aio_preadv() and the yield. It only spawns a new coroutine, which
> works in the background, so I think you can move it to before the reads
> and then the reads can trivially become bdrv_co_preadv() and the
> callback can again be inlined instead of ripping the function in two
> parts.
>
>
> The bdrv_aio_pwritev() call in pcache_co_pwritev() is just the same
> thing and using the coroutine version results in obvious code
> improvements.
>
>
> And I think this are all uses of bdrv_aio_*() in the pcache driver, so
> converting it to use bdrv_co_*() instead isn't only possible, but will
> improve the legibility of your code, too. It's a clear win in all three
> places.
>

Yes, you're right, this scheme was acceptable when the caller was
bdrv_aio_*(), but now it just confuses. Thanks for the explanation!

> Kevin
>