From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56698)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1bIAvM-0008FJ-7D
	for qemu-devel@nongnu.org; Wed, 29 Jun 2016 04:32:45 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1bIAvH-0004ET-76
	for qemu-devel@nongnu.org; Wed, 29 Jun 2016 04:32:44 -0400
References: <1466780802-30424-1-git-send-email-den@openvz.org>
	<1466780802-30424-2-git-send-email-den@openvz.org>
	<a1841f84-4c6a-7ad7-629f-d522ec2a19a4@redhat.com>
	<94839058-7044-cc98-45f5-170ac8906690@redhat.com>
From: "Denis V. Lunev" <den@openvz.org>
Message-ID: <577387A0.9020006@openvz.org>
Date: Wed, 29 Jun 2016 11:32:32 +0300
MIME-Version: 1.0
In-Reply-To: <94839058-7044-cc98-45f5-170ac8906690@redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 1/3] block: ignore flush requests when
 storage is clean
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Evgeny Yakovlev <eyakovlev@virtuozzo.com>, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <famz@redhat.com>

On 06/29/2016 10:36 AM, Paolo Bonzini wrote:
>
> On 28/06/2016 23:01, Paolo Bonzini wrote:
>>
>> On 24/06/2016 17:06, Denis V. Lunev wrote:
>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>
>>> Some guests (win2008 server for example) do a lot of unnecessary
>>> flushing when underlying media has not changed. This adds additional
>>> overhead on host when calling fsync/fdatasync.
>>>
>>> This change introduces a dirty flag in BlockDriverState which is set
>>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>>> avoid unnesessary flushing when storage is clean.
>>>
>>> The problem with excessive flushing was found by a performance test
>>> which does parallel directory tree creation (from 2 processes).
>>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>>> Each loop creates 10^3 directories with 10 files in each.
>>>
>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <famz@redhat.com>
>>> CC: John Snow <jsnow@redhat.com>
>>> ---
>>>   block.c                   |  1 +
>>>   block/dirty-bitmap.c      |  3 +++
>>>   block/io.c                | 19 +++++++++++++++++++
>>>   include/block/block_int.h |  2 ++
>>>   4 files changed, 25 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index f4648e9..e36f148 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>>           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>>           bdrv_dirty_bitmap_truncate(bs);
>>>           bdrv_parent_cb_resize(bs);
>>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>>       }
>>>       return ret;
>>>   }
>>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>>> index 4902ca5..54e0413 100644
>>> --- a/block/dirty-bitmap.c
>>> +++ b/block/dirty-bitmap.c
>>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>>           }
>>>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>>       }
>>> +
>>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>>> +    bs->dirty = true;
>>>   }
>>>   
>>>   /**
>>> diff --git a/block/io.c b/block/io.c
>>> index 7cf3645..8078af2 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>           goto flush_parent;
>>>       }
>>>   
>>> +    /* Check if storage is actually dirty before flushing to disk */
>>> +    if (!bs->dirty) {
>>> +        /* Flush requests are appended to tracked request list in order so that
>>> +         * most recent request is at the head of the list. Following code uses
>>> +         * this ordering to wait for the most recent flush request to complete
>>> +         * to ensure that requests return in order */
>>> +        BdrvTrackedRequest *prev_req;
>>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>>> +                continue;
>>> +            }
>>> +
>>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>>> +            break;
>>> +        }
>>> +        goto flush_parent;
>> Can you just have a CoQueue specific to flushes, where a completing
>> flush does a restart_all on the CoQueue?
> Something like this:
>
>      current_gen = bs->write_gen;
>      if (bs->flush_started_gen >= current_gen) {
>          while (bs->flushed_gen < current_gen) {
>              qemu_co_queue_wait(&bs->flush_queue);
>          }
>          return;
>      }
>
>      bs->flush_started_gen = current_gen;
>      ...
>      if (current_gen < bs->flushed_gen) {
>          bs->flushed_gen = current_gen;
>          qemu_co_queue_restart_all(&bs->flush_queue);
>      }
>
> Paolo
>
I have had exactly this inn mind originally but current queue
with tracked requests is also useful. If it is going to be removed
in 2.8, generation approach would also work.

Thank you,
     Den