qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean
@ 2016-06-24 15:06 Denis V. Lunev
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 1/3] " Denis V. Lunev
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-06-24 15:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

Changes from v2:
- Better comments
- Rebased on latest master

Changes from v1:
- Flush requests that should be skipped will now wait for completion
  of any previous requests already in flight
- Fixed IDE and AHCI tests to dirty media for new flush behaviour
- Fixed a problem in IDE CMD_FLUSH_CACHE failure handling

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>

Evgeny Yakovlev (3):
  block: ignore flush requests when storage is clean
  ide: ignore retry_unit check for non-retry operations
  tests: in IDE and AHCI tests perform DMA write before flushing

 block.c                   |  1 +
 block/dirty-bitmap.c      |  3 +++
 block/io.c                | 19 +++++++++++++++++++
 hw/ide/core.c             |  3 ++-
 include/block/block_int.h |  2 ++
 tests/ahci-test.c         | 34 ++++++++++++++++++++++++++++++++--
 tests/ide-test.c          | 43 +++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 102 insertions(+), 3 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 1/3] block: ignore flush requests when storage is clean
  2016-06-24 15:06 [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
@ 2016-06-24 15:06 ` Denis V. Lunev
  2016-06-24 15:31   ` Eric Blake
  2016-06-28 21:01   ` Paolo Bonzini
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations Denis V. Lunev
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-06-24 15:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a dirty flag in BlockDriverState which is set
in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
avoid unnesessary flushing when storage is clean.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 block.c                   |  1 +
 block/dirty-bitmap.c      |  3 +++
 block/io.c                | 19 +++++++++++++++++++
 include/block/block_int.h |  2 ++
 4 files changed, 25 insertions(+)

diff --git a/block.c b/block.c
index f4648e9..e36f148 100644
--- a/block.c
+++ b/block.c
@@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
         ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
         bdrv_dirty_bitmap_truncate(bs);
         bdrv_parent_cb_resize(bs);
+        bs->dirty = true; /* file node sync is needed after truncate */
     }
     return ret;
 }
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 4902ca5..54e0413 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
         }
         hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
     }
+
+    /* Set global block driver dirty flag even if bitmap is disabled */
+    bs->dirty = true;
 }
 
 /**
diff --git a/block/io.c b/block/io.c
index 7cf3645..8078af2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
+    /* Check if storage is actually dirty before flushing to disk */
+    if (!bs->dirty) {
+        /* Flush requests are appended to tracked request list in order so that
+         * most recent request is at the head of the list. Following code uses
+         * this ordering to wait for the most recent flush request to complete
+         * to ensure that requests return in order */
+        BdrvTrackedRequest *prev_req;
+        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
+            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
+                continue;
+            }
+
+            qemu_co_queue_wait(&prev_req->wait_queue);
+            break;
+        }
+        goto flush_parent;
+    }
+    bs->dirty = false;
+
     BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
     if (bs->drv->bdrv_co_flush_to_disk) {
         ret = bs->drv->bdrv_co_flush_to_disk(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2057156..616058b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -418,6 +418,8 @@ struct BlockDriverState {
     int sg;        /* if true, the device is a /dev/sg* */
     int copy_on_read; /* if true, copy read backing sectors into image
                          note this is a reference count */
+
+    bool dirty;
     bool probed;
 
     BlockDriver *drv; /* NULL means no media */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations
  2016-06-24 15:06 [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 1/3] " Denis V. Lunev
@ 2016-06-24 15:06 ` Denis V. Lunev
  2016-06-28 20:56   ` Paolo Bonzini
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
  2016-06-24 15:22 ` [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Eric Blake
  3 siblings, 1 reply; 13+ messages in thread
From: Denis V. Lunev @ 2016-06-24 15:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush

When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.

This patch restricts retry unit assertion only to ops that actually use
retry logic.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 hw/ide/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 029f6b9..17f884b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -779,7 +779,8 @@ int ide_handle_rw_error(IDEState *s, int error, int op)
     BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
 
     if (action == BLOCK_ERROR_ACTION_STOP) {
-        assert(s->bus->retry_unit == s->unit);
+        assert(!(IS_IDE_RETRY_DMA(op) || IS_IDE_RETRY_PIO(op))
+                || s->bus->retry_unit == s->unit);
         s->bus->error_status = op;
     } else if (action == BLOCK_ERROR_ACTION_REPORT) {
         block_acct_failed(blk_get_stats(s->blk), &s->acct);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-24 15:06 [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 1/3] " Denis V. Lunev
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations Denis V. Lunev
@ 2016-06-24 15:06 ` Denis V. Lunev
  2016-06-24 15:22 ` [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Eric Blake
  3 siblings, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-06-24 15:06 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Due to previous change flushed on clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands began
to fail.

This change adds additional DMA writes to affected tests before sending
flush cache commands so that bdrv_flush actually generates flush_to_disk
event.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
 tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 57dc44c..d707714 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
     g_free(tx);
 }
 
+/*
+ * Write sector 0 with random data to make AHCI storage dirty
+ * Needed for flush tests so that flushes actually go though the block layer
+ */
+static void make_dirty(AHCIQState* ahci, uint8_t port)
+{
+    uint64_t ptr;
+    unsigned bufsize = 512;
+
+    ptr = ahci_alloc(ahci, bufsize);
+    g_assert(ptr);
+
+    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
+    ahci_free(ahci, ptr);
+}
+
 static void test_flush(void)
 {
     AHCIQState *ahci;
+    uint8_t port;
 
     ahci = ahci_boot_and_enable(NULL);
+
+    port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    make_dirty(ahci, port);
+
     ahci_test_flush(ahci);
     ahci_shutdown(ahci);
 }
@@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
                                 debug_path,
                                 tmp_path, imgfmt);
 
-    /* Issue Flush Command and wait for error */
     port = ahci_port_select(ahci);
     ahci_port_clear(ahci, port);
 
+    /* Issue write so that flush actually goes to disk */
+    make_dirty(ahci, port);
+
+    /* Issue Flush Command and wait for error */
     cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
     ahci_guest_io_resume(ahci, cmd);
 
@@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
 
     set_context(src->parent);
 
-    /* Issue Flush Command */
     px = ahci_port_select(src);
     ahci_port_clear(src, px);
+
+    /* Dirty device so that flush reaches disk */
+    make_dirty(src, px);
+
+    /* Issue Flush Command */
     cmd = ahci_command_create(CMD_FLUSH_CACHE);
     ahci_command_commit(src, cmd, px);
     ahci_command_issue_async(src, cmd);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index fed1b2e..8466d0f 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -499,6 +499,39 @@ static void test_identify(void)
     ide_test_quit();
 }
 
+/*
+ * Write sector 0 with random data to make IDE storage dirty
+ * Needed for flush tests so that flushes actually go though the block layer
+ */
+static void make_dirty(uint8_t device)
+{
+    uint8_t status;
+    size_t len = 512;
+    uintptr_t guest_buf;
+    void* buf;
+
+    guest_buf = guest_alloc(guest_malloc, len);
+    buf = g_malloc(len);
+    g_assert(guest_buf);
+    g_assert(buf);
+
+    memwrite(guest_buf, buf, len);
+
+    PrdtEntry prdt[] = {
+        {
+            .addr = cpu_to_le32(guest_buf),
+            .size = cpu_to_le32(len | PRDT_EOT),
+        },
+    };
+
+    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
+                              ARRAY_SIZE(prdt), NULL);
+    g_assert_cmphex(status, ==, BM_STS_INTR);
+    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
+
+    g_free(buf);
+}
+
 static void test_flush(void)
 {
     uint8_t data;
@@ -507,6 +540,11 @@ static void test_flush(void)
         "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
         tmp_path);
 
+    qtest_irq_intercept_in(global_qtest, "ioapic");
+
+    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
+    make_dirty(0);
+
     /* Delay the completion of the flush request until we explicitly do it */
     g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
 
@@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
         "rerror=stop,werror=stop",
         debug_path, tmp_path);
 
+    qtest_irq_intercept_in(global_qtest, "ioapic");
+
+    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
+    make_dirty(0);
+
     /* FLUSH CACHE command on device 0*/
     outb(IDE_BASE + reg_device, 0);
     outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean
  2016-06-24 15:06 [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
                   ` (2 preceding siblings ...)
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
@ 2016-06-24 15:22 ` Eric Blake
  3 siblings, 0 replies; 13+ messages in thread
From: Eric Blake @ 2016-06-24 15:22 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz,
	Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On 06/24/2016 09:06 AM, Denis V. Lunev wrote:
> Changes from v2:
> - Better comments
> - Rebased on latest master

Remember, 'git send-email -v3' will automatically put v3 in all your
subject lines, making it easier to see that this is an updated series.

Rebasing on Kevin's block branch may be more appropriate:

http://repo.or.cz/qemu/kevin.git/shortlog/refs/heads/block

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] block: ignore flush requests when storage is clean
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 1/3] " Denis V. Lunev
@ 2016-06-24 15:31   ` Eric Blake
  2016-06-24 15:54     ` Evgeny Yakovlev
  2016-06-28 21:01   ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Blake @ 2016-06-24 15:31 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz,
	Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

On 06/24/2016 09:06 AM, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a dirty flag in BlockDriverState which is set
> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> avoid unnesessary flushing when storage is clean.

s/unnesessary/unnecessary/ (I pointed this out against v2
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg05817.html,
which makes me wonder if anything else was missed)

> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---

> +++ b/include/block/block_int.h
> @@ -418,6 +418,8 @@ struct BlockDriverState {
>      int sg;        /* if true, the device is a /dev/sg* */
>      int copy_on_read; /* if true, copy read backing sectors into image
>                           note this is a reference count */
> +
> +    bool dirty;
>      bool probed;
>  

Conflicts with the current state of Kevin's block branch (due to my
reordering and conversion of bool parameters); so you'll want to rebase.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] block: ignore flush requests when storage is clean
  2016-06-24 15:31   ` Eric Blake
@ 2016-06-24 15:54     ` Evgeny Yakovlev
  0 siblings, 0 replies; 13+ messages in thread
From: Evgeny Yakovlev @ 2016-06-24 15:54 UTC (permalink / raw)
  To: Eric Blake, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi, John Snow


On 24.06.2016 18:31, Eric Blake wrote:
> On 06/24/2016 09:06 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a dirty flag in BlockDriverState which is set
>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>> avoid unnesessary flushing when storage is clean.
> s/unnesessary/unnecessary/ (I pointed this out against v2
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg05817.html,
> which makes me wonder if anything else was missed)

Yeah, i fixed that but messed up committing a change in commit message. 
Will be fixed in rebased version.

>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>> +++ b/include/block/block_int.h
>> @@ -418,6 +418,8 @@ struct BlockDriverState {
>>       int sg;        /* if true, the device is a /dev/sg* */
>>       int copy_on_read; /* if true, copy read backing sectors into image
>>                            note this is a reference count */
>> +
>> +    bool dirty;
>>       bool probed;
>>   
> Conflicts with the current state of Kevin's block branch (due to my
> reordering and conversion of bool parameters); so you'll want to rebase.
>

Ok

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations Denis V. Lunev
@ 2016-06-28 20:56   ` Paolo Bonzini
  2016-06-29  8:35     ` Evgeny Yakovlev
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-28 20:56 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz,
	Stefan Hajnoczi, John Snow



On 24/06/2016 17:06, Denis V. Lunev wrote:
> When doing DMA request ide/core.c will set s->retry_unit to s->unit in
> ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
> After that ide_flush_cache runs and fails thanks to blkdebug.
> ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
> == s->unit. But s->retry_unit is still -1 after previous DMA completion
> and flush does not use anything related to retry.

Wouldn't the assertion fail for a PIO read/write too?  Perhaps
retry_unit should be set to s->unit in ide_transfer_start too.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] block: ignore flush requests when storage is clean
  2016-06-24 15:06 ` [Qemu-devel] [PATCH 1/3] " Denis V. Lunev
  2016-06-24 15:31   ` Eric Blake
@ 2016-06-28 21:01   ` Paolo Bonzini
  2016-06-29  7:36     ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-28 21:01 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz,
	Stefan Hajnoczi, John Snow



On 24/06/2016 17:06, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a dirty flag in BlockDriverState which is set
> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> avoid unnesessary flushing when storage is clean.
> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---
>  block.c                   |  1 +
>  block/dirty-bitmap.c      |  3 +++
>  block/io.c                | 19 +++++++++++++++++++
>  include/block/block_int.h |  2 ++
>  4 files changed, 25 insertions(+)
> 
> diff --git a/block.c b/block.c
> index f4648e9..e36f148 100644
> --- a/block.c
> +++ b/block.c
> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>          ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>          bdrv_dirty_bitmap_truncate(bs);
>          bdrv_parent_cb_resize(bs);
> +        bs->dirty = true; /* file node sync is needed after truncate */
>      }
>      return ret;
>  }
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 4902ca5..54e0413 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>          }
>          hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>      }
> +
> +    /* Set global block driver dirty flag even if bitmap is disabled */
> +    bs->dirty = true;
>  }
>  
>  /**
> diff --git a/block/io.c b/block/io.c
> index 7cf3645..8078af2 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>          goto flush_parent;
>      }
>  
> +    /* Check if storage is actually dirty before flushing to disk */
> +    if (!bs->dirty) {
> +        /* Flush requests are appended to tracked request list in order so that
> +         * most recent request is at the head of the list. Following code uses
> +         * this ordering to wait for the most recent flush request to complete
> +         * to ensure that requests return in order */
> +        BdrvTrackedRequest *prev_req;
> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
> +                continue;
> +            }
> +
> +            qemu_co_queue_wait(&prev_req->wait_queue);
> +            break;
> +        }
> +        goto flush_parent;

Can you just have a CoQueue specific to flushes, where a completing
flush does a restart_all on the CoQueue?

Flushes are never serialising, so there's no reason for them to be in
tracked_requests (I posted patches a while ago that instead use a simple
atomic counter, but they will only be in 2.8).

Paolo

> +    }
> +    bs->dirty = false;
> +
>      BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>      if (bs->drv->bdrv_co_flush_to_disk) {
>          ret = bs->drv->bdrv_co_flush_to_disk(bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2057156..616058b 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -418,6 +418,8 @@ struct BlockDriverState {
>      int sg;        /* if true, the device is a /dev/sg* */
>      int copy_on_read; /* if true, copy read backing sectors into image
>                           note this is a reference count */
> +
> +    bool dirty;
>      bool probed;
>  
>      BlockDriver *drv; /* NULL means no media */
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] block: ignore flush requests when storage is clean
  2016-06-28 21:01   ` Paolo Bonzini
@ 2016-06-29  7:36     ` Paolo Bonzini
  2016-06-29  8:32       ` Denis V. Lunev
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-29  7:36 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Evgeny Yakovlev, Max Reitz, Stefan Hajnoczi,
	Fam Zheng



On 28/06/2016 23:01, Paolo Bonzini wrote:
> 
> 
> On 24/06/2016 17:06, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a dirty flag in BlockDriverState which is set
>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>> avoid unnesessary flushing when storage is clean.
>>
>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>>  block.c                   |  1 +
>>  block/dirty-bitmap.c      |  3 +++
>>  block/io.c                | 19 +++++++++++++++++++
>>  include/block/block_int.h |  2 ++
>>  4 files changed, 25 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index f4648e9..e36f148 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>          ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>          bdrv_dirty_bitmap_truncate(bs);
>>          bdrv_parent_cb_resize(bs);
>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>      }
>>      return ret;
>>  }
>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>> index 4902ca5..54e0413 100644
>> --- a/block/dirty-bitmap.c
>> +++ b/block/dirty-bitmap.c
>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>          }
>>          hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>      }
>> +
>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>> +    bs->dirty = true;
>>  }
>>  
>>  /**
>> diff --git a/block/io.c b/block/io.c
>> index 7cf3645..8078af2 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>          goto flush_parent;
>>      }
>>  
>> +    /* Check if storage is actually dirty before flushing to disk */
>> +    if (!bs->dirty) {
>> +        /* Flush requests are appended to tracked request list in order so that
>> +         * most recent request is at the head of the list. Following code uses
>> +         * this ordering to wait for the most recent flush request to complete
>> +         * to ensure that requests return in order */
>> +        BdrvTrackedRequest *prev_req;
>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>> +                continue;
>> +            }
>> +
>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>> +            break;
>> +        }
>> +        goto flush_parent;
> 
> Can you just have a CoQueue specific to flushes, where a completing
> flush does a restart_all on the CoQueue?

Something like this:

    current_gen = bs->write_gen;
    if (bs->flush_started_gen >= current_gen) {
        while (bs->flushed_gen < current_gen) {
            qemu_co_queue_wait(&bs->flush_queue);
        }
        return;
    }

    bs->flush_started_gen = current_gen;
    ...
    if (current_gen < bs->flushed_gen) {
        bs->flushed_gen = current_gen;
        qemu_co_queue_restart_all(&bs->flush_queue);
    }

Paolo

> Flushes are never serialising, so there's no reason for them to be in
> tracked_requests (I posted patches a while ago that instead use a simple
> atomic counter, but they will only be in 2.8).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] block: ignore flush requests when storage is clean
  2016-06-29  7:36     ` Paolo Bonzini
@ 2016-06-29  8:32       ` Denis V. Lunev
  0 siblings, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-06-29  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block, qemu-devel
  Cc: Kevin Wolf, Evgeny Yakovlev, Max Reitz, Stefan Hajnoczi,
	Fam Zheng

On 06/29/2016 10:36 AM, Paolo Bonzini wrote:
>
> On 28/06/2016 23:01, Paolo Bonzini wrote:
>>
>> On 24/06/2016 17:06, Denis V. Lunev wrote:
>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>
>>> Some guests (win2008 server for example) do a lot of unnecessary
>>> flushing when underlying media has not changed. This adds additional
>>> overhead on host when calling fsync/fdatasync.
>>>
>>> This change introduces a dirty flag in BlockDriverState which is set
>>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>>> avoid unnesessary flushing when storage is clean.
>>>
>>> The problem with excessive flushing was found by a performance test
>>> which does parallel directory tree creation (from 2 processes).
>>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>>> Each loop creates 10^3 directories with 10 files in each.
>>>
>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <famz@redhat.com>
>>> CC: John Snow <jsnow@redhat.com>
>>> ---
>>>   block.c                   |  1 +
>>>   block/dirty-bitmap.c      |  3 +++
>>>   block/io.c                | 19 +++++++++++++++++++
>>>   include/block/block_int.h |  2 ++
>>>   4 files changed, 25 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index f4648e9..e36f148 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>>           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>>           bdrv_dirty_bitmap_truncate(bs);
>>>           bdrv_parent_cb_resize(bs);
>>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>>       }
>>>       return ret;
>>>   }
>>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>>> index 4902ca5..54e0413 100644
>>> --- a/block/dirty-bitmap.c
>>> +++ b/block/dirty-bitmap.c
>>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>>           }
>>>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>>       }
>>> +
>>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>>> +    bs->dirty = true;
>>>   }
>>>   
>>>   /**
>>> diff --git a/block/io.c b/block/io.c
>>> index 7cf3645..8078af2 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>           goto flush_parent;
>>>       }
>>>   
>>> +    /* Check if storage is actually dirty before flushing to disk */
>>> +    if (!bs->dirty) {
>>> +        /* Flush requests are appended to tracked request list in order so that
>>> +         * most recent request is at the head of the list. Following code uses
>>> +         * this ordering to wait for the most recent flush request to complete
>>> +         * to ensure that requests return in order */
>>> +        BdrvTrackedRequest *prev_req;
>>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>>> +                continue;
>>> +            }
>>> +
>>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>>> +            break;
>>> +        }
>>> +        goto flush_parent;
>> Can you just have a CoQueue specific to flushes, where a completing
>> flush does a restart_all on the CoQueue?
> Something like this:
>
>      current_gen = bs->write_gen;
>      if (bs->flush_started_gen >= current_gen) {
>          while (bs->flushed_gen < current_gen) {
>              qemu_co_queue_wait(&bs->flush_queue);
>          }
>          return;
>      }
>
>      bs->flush_started_gen = current_gen;
>      ...
>      if (current_gen < bs->flushed_gen) {
>          bs->flushed_gen = current_gen;
>          qemu_co_queue_restart_all(&bs->flush_queue);
>      }
>
> Paolo
>
I have had exactly this inn mind originally but current queue
with tracked requests is also useful. If it is going to be removed
in 2.8, generation approach would also work.

Thank you,
     Den

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations
  2016-06-28 20:56   ` Paolo Bonzini
@ 2016-06-29  8:35     ` Evgeny Yakovlev
  2016-06-29 10:41       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Yakovlev @ 2016-06-29  8:35 UTC (permalink / raw)
  To: Paolo Bonzini, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi, John Snow



On 28.06.2016 23:56, Paolo Bonzini wrote:
>
> On 24/06/2016 17:06, Denis V. Lunev wrote:
>> When doing DMA request ide/core.c will set s->retry_unit to s->unit in
>> ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
>> After that ide_flush_cache runs and fails thanks to blkdebug.
>> ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
>> == s->unit. But s->retry_unit is still -1 after previous DMA completion
>> and flush does not use anything related to retry.
> Wouldn't the assertion fail for a PIO read/write too?  Perhaps
> retry_unit should be set to s->unit in ide_transfer_start too.

If PIO follows DMA and fails then yes, it looks like it will trigger an 
assert. I am not sure about setting retry_unit in ide_transfer_start. It 
looks like currently only DMA I/O entries touch retry_unit at all. Does 
that mean that PIO, flush, etc do not support retries by design and we 
need to add more exceptions to assert check or is it a real bug in how 
retries are initialized?

>
> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations
  2016-06-29  8:35     ` Evgeny Yakovlev
@ 2016-06-29 10:41       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-29 10:41 UTC (permalink / raw)
  To: Evgeny Yakovlev, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi, Max Reitz



On 29/06/2016 10:35, Evgeny Yakovlev wrote:
>>>
>> Wouldn't the assertion fail for a PIO read/write too?  Perhaps
>> retry_unit should be set to s->unit in ide_transfer_start too.
> 
> If PIO follows DMA and fails then yes, it looks like it will trigger an
> assert. I am not sure about setting retry_unit in ide_transfer_start. It
> looks like currently only DMA I/O entries touch retry_unit at all. Does
> that mean that PIO, flush, etc do not support retries by design and we
> need to add more exceptions to assert check or is it a real bug in how
> retries are initialized?

Both PIO and flush do support retries, so I think it is a real bug.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-06-29 10:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24 15:06 [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
2016-06-24 15:06 ` [Qemu-devel] [PATCH 1/3] " Denis V. Lunev
2016-06-24 15:31   ` Eric Blake
2016-06-24 15:54     ` Evgeny Yakovlev
2016-06-28 21:01   ` Paolo Bonzini
2016-06-29  7:36     ` Paolo Bonzini
2016-06-29  8:32       ` Denis V. Lunev
2016-06-24 15:06 ` [Qemu-devel] [PATCH 2/3] ide: ignore retry_unit check for non-retry operations Denis V. Lunev
2016-06-28 20:56   ` Paolo Bonzini
2016-06-29  8:35     ` Evgeny Yakovlev
2016-06-29 10:41       ` Paolo Bonzini
2016-06-24 15:06 ` [Qemu-devel] [PATCH 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
2016-06-24 15:22 ` [Qemu-devel] [PATCH 0/3] block: ignore flush requests when storage is clean Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).