From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51241)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eyakovlev@virtuozzo.com>) id 1bLh3V-0000Ti-CK
	for qemu-devel@nongnu.org; Fri, 08 Jul 2016 21:27:42 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eyakovlev@virtuozzo.com>) id 1bLh3S-0003lS-56
	for qemu-devel@nongnu.org; Fri, 08 Jul 2016 21:27:41 -0400
References: <1467643124-29778-1-git-send-email-den@openvz.org>
	<1467643124-29778-5-git-send-email-den@openvz.org>
	<577EE010.7090106@redhat.com>
From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Message-ID: <577FC46F.50307@virtuozzo.com>
Date: Fri, 8 Jul 2016 18:19:11 +0300
MIME-Version: 1.0
In-Reply-To: <577EE010.7090106@redhat.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when
 storage is clean
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, "Denis V. Lunev" <den@openvz.org>, qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, John Snow <jsnow@redhat.com>


On 08.07.2016 02:04, Eric Blake wrote:
> On 07/04/2016 08:38 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a write generation scheme in BlockDriverState.
>> Current write generation is checked against last flushed generation to
>> avoid unnessesary flushes.
>>
>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> +++ b/block/io.c
>> @@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>>       }
>>       bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
>>   
>> +    ++bs->write_gen;
> Why pre-increment?  Most code uses post-increment when done as a
> statement in isolation.

Just a habit of mine, from C++ days, where you can never be sure if 
someone overloaded post-increment operator and it will then generate a 
temporary object because post-increment is defined to return previous 
value. Now it's just a muscle memory :)

>
>>       bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
>>   
>>       if (bs->wr_highest_offset < offset + bytes) {
>> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   {
>>       int ret;
>>       BdrvTrackedRequest req;
>> +    int current_gen = bs->write_gen;
>>   
>>       if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>>           bdrv_is_sg(bs)) {
>> @@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   
>>       tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
>>   
>> +    /* Wait until any previous flushes are completed */
>> +    while (bs->flush_started_gen != bs->flushed_gen) {
> Should this be an inequality, as in s/!=/</, in case several flushes can
> be started in parallel and where the later flush ends up finishing
> before the earlier flush?

flush_started_gen is always ahead of flushed_gen or is equal to it no 
matter how many requests we have in flight. using != allows us to ignore 
checking for unsigned overflow (which you also mention in this email).

>
>> +        qemu_co_queue_wait(&bs->flush_queue);
>> +    }
>> +    bs->flush_started_gen = current_gen;
>> +
>>       /* Write back all layers by calling one driver function */
>>       if (bs->drv->bdrv_co_flush) {
>>           ret = bs->drv->bdrv_co_flush(bs);
>> @@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>           goto flush_parent;
>>       }
>>   
>> +    /* Check if we really need to flush anything */
>> +    if (bs->flushed_gen == current_gen) {
> Likewise, if you are tracking generations, should this be s/==/<=/ (am I
> getting the direction correct)?

Same here - '==' is so that we don't have to worry about unsigned overflow.

>
>> +++ b/include/block/block_int.h
>> @@ -420,6 +420,11 @@ struct BlockDriverState {
>>                            note this is a reference count */
>>       bool probed;
>>   
>> +    CoQueue flush_queue;            /* Serializing flush queue */
>> +    unsigned int write_gen;         /* Current data generation */
>> +    unsigned int flush_started_gen; /* Generation for which flush has started */
>> +    unsigned int flushed_gen;       /* Flushed write generation */
> Should these be 64-bit integers to avoid risk of overflow after just
> 2^32 flush attempts?
>

We don't have to care about unsigned overflow. It has a well defined 
behavior and we only check if generations are equal or not.