From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51241) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bLh3V-0000Ti-CK for qemu-devel@nongnu.org; Fri, 08 Jul 2016 21:27:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bLh3S-0003lS-56 for qemu-devel@nongnu.org; Fri, 08 Jul 2016 21:27:41 -0400 References: <1467643124-29778-1-git-send-email-den@openvz.org> <1467643124-29778-5-git-send-email-den@openvz.org> <577EE010.7090106@redhat.com> From: Evgeny Yakovlev Message-ID: <577FC46F.50307@virtuozzo.com> Date: Fri, 8 Jul 2016 18:19:11 +0300 MIME-Version: 1.0 In-Reply-To: <577EE010.7090106@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , "Denis V. Lunev" , qemu-block@nongnu.org, qemu-devel@nongnu.org Cc: Kevin Wolf , Fam Zheng , Max Reitz , Stefan Hajnoczi , John Snow On 08.07.2016 02:04, Eric Blake wrote: > On 07/04/2016 08:38 AM, Denis V. Lunev wrote: >> From: Evgeny Yakovlev >> >> Some guests (win2008 server for example) do a lot of unnecessary >> flushing when underlying media has not changed. This adds additional >> overhead on host when calling fsync/fdatasync. >> >> This change introduces a write generation scheme in BlockDriverState. >> Current write generation is checked against last flushed generation to >> avoid unnessesary flushes. >> >> The problem with excessive flushing was found by a performance test >> which does parallel directory tree creation (from 2 processes). >> Results improved from 0.424 loops/sec to 0.432 loops/sec. >> Each loop creates 10^3 directories with 10 files in each. >> >> +++ b/block/io.c >> @@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, >> } >> bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE); >> >> + ++bs->write_gen; > Why pre-increment? Most code uses post-increment when done as a > statement in isolation. Just a habit of mine, from C++ days, where you can never be sure if someone overloaded post-increment operator and it will then generate a temporary object because post-increment is defined to return previous value. Now it's just a muscle memory :) > >> bdrv_set_dirty(bs, start_sector, end_sector - start_sector); >> >> if (bs->wr_highest_offset < offset + bytes) { >> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) >> { >> int ret; >> BdrvTrackedRequest req; >> + int current_gen = bs->write_gen; >> >> if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) || >> bdrv_is_sg(bs)) { >> @@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) >> >> tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH); >> >> + /* Wait until any previous flushes are completed */ >> + while (bs->flush_started_gen != bs->flushed_gen) { > Should this be an inequality, as in s/!=/ be started in parallel and where the later flush ends up finishing > before the earlier flush? flush_started_gen is always ahead of flushed_gen or is equal to it no matter how many requests we have in flight. using != allows us to ignore checking for unsigned overflow (which you also mention in this email). > >> + qemu_co_queue_wait(&bs->flush_queue); >> + } >> + bs->flush_started_gen = current_gen; >> + >> /* Write back all layers by calling one driver function */ >> if (bs->drv->bdrv_co_flush) { >> ret = bs->drv->bdrv_co_flush(bs); >> @@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) >> goto flush_parent; >> } >> >> + /* Check if we really need to flush anything */ >> + if (bs->flushed_gen == current_gen) { > Likewise, if you are tracking generations, should this be s/==/<=/ (am I > getting the direction correct)? Same here - '==' is so that we don't have to worry about unsigned overflow. > >> +++ b/include/block/block_int.h >> @@ -420,6 +420,11 @@ struct BlockDriverState { >> note this is a reference count */ >> bool probed; >> >> + CoQueue flush_queue; /* Serializing flush queue */ >> + unsigned int write_gen; /* Current data generation */ >> + unsigned int flush_started_gen; /* Generation for which flush has started */ >> + unsigned int flushed_gen; /* Flushed write generation */ > Should these be 64-bit integers to avoid risk of overflow after just > 2^32 flush attempts? > We don't have to care about unsigned overflow. It has a well defined behavior and we only check if generations are equal or not.