qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance
@ 2014-12-30  9:28 Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 01/19] iotests, parallels: quote TEST_IMG in 076 test to be path-safe Denis V. Lunev
                   ` (9 more replies)
  0 siblings, 10 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Roman Kagan, Jeff Cody, qemu-devel, Stefan Hajnoczi,
	Denis V. Lunev

This patchset provides an ability to create of/write to Parallels
images and some testing of the new code. Writes are not optimized
now at all, we just modify catalog_bitmap and write those changes
to the image itself at once. This will be improved in next steps.

This patchset consists of not problematic part of the previous
patchset aka
   [PATCH v4 0/16] parallels format support improvements
and new write/create code but it does not contradict questionable code
with XML handling there.

Kevin, I would appreciate if you will quick look into proof-of-concept
patch aka
   [RFC PATCH 1/1] block/parallels: new concept for DiskDescriptor.xml
to validate that approach for DiskDescriptor.xml problem.

Changes from v1:
- patches 13-19 added, which boosts performance from 800 KiB/sec to
  near native performance

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Roman Kagan <rkagan@parallels.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>

P.S. Should I add myself to MAINTAINERS of this driver?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 01/19] iotests, parallels: quote TEST_IMG in 076 test to be path-safe
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 02/19] block/parallels: rename parallels_header to ParallelsHeader Denis V. Lunev
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

suggested by Jeff Cody

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/076 | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index ed2be35..0139976 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -49,31 +49,31 @@ nb_sectors_offset=$((0x24))
 echo
 echo "== Read from a valid v1 image =="
 _use_sample_img parallels-v1.bz2
-{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x11 0 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== Negative catalog size =="
 _use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$catalog_entries_offset" "\xff\xff\xff\xff"
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read 0 512" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== Overflow in catalog allocation =="
 _use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$nb_sectors_offset" "\xff\xff\xff\xff"
 poke_file "$TEST_IMG" "$catalog_entries_offset" "\x01\x00\x00\x40"
-{ $QEMU_IO -c "read 64M 64M" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read 64M 64M" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== Zero sectors per track =="
 _use_sample_img parallels-v1.bz2
 poke_file "$TEST_IMG" "$tracks_offset" "\x00\x00\x00\x00"
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read 0 512" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== Read from a valid v2 image =="
 _use_sample_img parallels-v2.bz2
-{ $QEMU_IO -c "read -P 0x11 0 64k" $TEST_IMG; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x11 0 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 # success, all done
 echo "*** done"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 02/19] block/parallels: rename parallels_header to ParallelsHeader
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 01/19] iotests, parallels: quote TEST_IMG in 076 test to be path-safe Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 03/19] block/parallels: switch to bdrv_read Denis V. Lunev
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

this follows QEMU coding convention

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 4f9cd8d..dca0df6 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -35,7 +35,7 @@
 #define HEADER_SIZE 64
 
 // always little-endian
-struct parallels_header {
+typedef struct ParallelsHeader {
     char magic[16]; // "WithoutFreeSpace"
     uint32_t version;
     uint32_t heads;
@@ -46,7 +46,7 @@ struct parallels_header {
     uint32_t inuse;
     uint32_t data_off;
     char padding[12];
-} QEMU_PACKED;
+} QEMU_PACKED ParallelsHeader;
 
 typedef struct BDRVParallelsState {
     CoMutex lock;
@@ -61,7 +61,7 @@ typedef struct BDRVParallelsState {
 
 static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
-    const struct parallels_header *ph = (const void *)buf;
+    const ParallelsHeader *ph = (const void *)buf;
 
     if (buf_size < HEADER_SIZE)
         return 0;
@@ -79,7 +79,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
 {
     BDRVParallelsState *s = bs->opaque;
     int i;
-    struct parallels_header ph;
+    ParallelsHeader ph;
     int ret;
 
     bs->read_only = 1; // no write support yet
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 03/19] block/parallels: switch to bdrv_read
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 01/19] iotests, parallels: quote TEST_IMG in 076 test to be path-safe Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 02/19] block/parallels: rename parallels_header to ParallelsHeader Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 04/19] block/parallels: read up to cluster end in one go Denis V. Lunev
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi,
	Roman Kagan

From: Roman Kagan <rkagan@parallels.com>

Switch the .bdrv_read method implementation from using bdrv_pread() to
bdrv_read() on the underlying file, since the latter is subject to i/o
throttling while the former is not.

Besides, since bdrv_read() operates in sectors rather than bytes, adjust
the helper functions to do so too.

Signed-off-by: Roman Kagan <rkagan@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index dca0df6..baefd3e 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -146,9 +146,8 @@ fail:
     return ret;
 }
 
-static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
+static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
 {
-    BDRVParallelsState *s = bs->opaque;
     uint32_t index, offset;
 
     index = sector_num / s->tracks;
@@ -157,24 +156,27 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
     /* not allocated */
     if ((index >= s->catalog_size) || (s->catalog_bitmap[index] == 0))
         return -1;
-    return
-        ((uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset) * 512;
+    return (uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset;
 }
 
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
                     uint8_t *buf, int nb_sectors)
 {
+    BDRVParallelsState *s = bs->opaque;
+
     while (nb_sectors > 0) {
-        int64_t position = seek_to_sector(bs, sector_num);
+        int64_t position = seek_to_sector(s, sector_num);
         if (position >= 0) {
-            if (bdrv_pread(bs->file, position, buf, 512) != 512)
-                return -1;
+            int ret = bdrv_read(bs->file, position, buf, 1);
+            if (ret < 0) {
+                return ret;
+            }
         } else {
-            memset(buf, 0, 512);
+            memset(buf, 0, BDRV_SECTOR_SIZE);
         }
         nb_sectors--;
         sector_num++;
-        buf += 512;
+        buf += BDRV_SECTOR_SIZE;
     }
     return 0;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 04/19] block/parallels: read up to cluster end in one go
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (2 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 03/19] block/parallels: switch to bdrv_read Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 05/19] block/parallels: add get_block_status Denis V. Lunev
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi,
	Roman Kagan

From: Roman Kagan <rkagan@parallels.com>

Teach parallels_read() to do reads in coarser granularity than just a
single sector: if requested, read up to the cluster end in one go.

Signed-off-by: Roman Kagan <rkagan@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index baefd3e..8770c82 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -159,6 +159,13 @@ static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
     return (uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset;
 }
 
+static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
+        int nb_sectors)
+{
+    int ret = s->tracks - sector_num % s->tracks;
+    return MIN(nb_sectors, ret);
+}
+
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
                     uint8_t *buf, int nb_sectors)
 {
@@ -166,17 +173,18 @@ static int parallels_read(BlockDriverState *bs, int64_t sector_num,
 
     while (nb_sectors > 0) {
         int64_t position = seek_to_sector(s, sector_num);
+        int n = cluster_remainder(s, sector_num, nb_sectors);
         if (position >= 0) {
-            int ret = bdrv_read(bs->file, position, buf, 1);
+            int ret = bdrv_read(bs->file, position, buf, n);
             if (ret < 0) {
                 return ret;
             }
         } else {
-            memset(buf, 0, BDRV_SECTOR_SIZE);
+            memset(buf, 0, n << BDRV_SECTOR_BITS);
         }
-        nb_sectors--;
-        sector_num++;
-        buf += BDRV_SECTOR_SIZE;
+        nb_sectors -= n;
+        sector_num += n;
+        buf += n << BDRV_SECTOR_BITS;
     }
     return 0;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 05/19] block/parallels: add get_block_status
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (3 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 04/19] block/parallels: read up to cluster end in one go Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 06/19] block/parallels: provide _co_readv routine for parallels format driver Denis V. Lunev
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi,
	Roman Kagan

From: Roman Kagan <rkagan@parallels.com>

Implement VFS method for get_block_status to Parallels format driver.

Signed-off-by: Roman Kagan <rkagan@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 8770c82..b469984 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -166,6 +166,26 @@ static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
     return MIN(nb_sectors, ret);
 }
 
+static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, int *pnum)
+{
+    BDRVParallelsState *s = bs->opaque;
+    int64_t offset;
+
+    qemu_co_mutex_lock(&s->lock);
+    offset = seek_to_sector(s, sector_num);
+    qemu_co_mutex_unlock(&s->lock);
+
+    *pnum = cluster_remainder(s, sector_num, nb_sectors);
+
+    if (offset < 0) {
+        return 0;
+    }
+
+    return (offset << BDRV_SECTOR_BITS) |
+        BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+}
+
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
                     uint8_t *buf, int nb_sectors)
 {
@@ -213,6 +233,7 @@ static BlockDriver bdrv_parallels = {
     .bdrv_open		= parallels_open,
     .bdrv_read          = parallels_co_read,
     .bdrv_close		= parallels_close,
+    .bdrv_co_get_block_status = parallels_co_get_block_status,
 };
 
 static void bdrv_parallels_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 06/19] block/parallels: provide _co_readv routine for parallels format driver
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (4 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 05/19] block/parallels: add get_block_status Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 07/19] block/parallels: replace magic constants 4, 64 with proper sizeofs Denis V. Lunev
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Main approach is taken from qcow2_co_readv.

The patch drops coroutine lock for the duration of IO operation and
peforms normal scatter-gather IO using standard QEMU backend.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index b469984..64b169b 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -186,37 +186,45 @@ static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
         BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
-static int parallels_read(BlockDriverState *bs, int64_t sector_num,
-                    uint8_t *buf, int nb_sectors)
+static coroutine_fn int parallels_co_readv(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
     BDRVParallelsState *s = bs->opaque;
+    uint64_t bytes_done = 0;
+    QEMUIOVector hd_qiov;
+    int ret = 0;
 
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+
+    qemu_co_mutex_lock(&s->lock);
     while (nb_sectors > 0) {
         int64_t position = seek_to_sector(s, sector_num);
         int n = cluster_remainder(s, sector_num, nb_sectors);
-        if (position >= 0) {
-            int ret = bdrv_read(bs->file, position, buf, n);
+        int nbytes = n << BDRV_SECTOR_BITS;
+
+        if (position < 0) {
+            qemu_iovec_memset(qiov, bytes_done, 0, nbytes);
+        } else {
+            qemu_iovec_reset(&hd_qiov);
+            qemu_iovec_concat(&hd_qiov, qiov, bytes_done, nbytes);
+
+            qemu_co_mutex_unlock(&s->lock);
+            ret = bdrv_co_readv(bs->file, position, n, &hd_qiov);
+            qemu_co_mutex_lock(&s->lock);
+
             if (ret < 0) {
-                return ret;
+                goto fail;
             }
-        } else {
-            memset(buf, 0, n << BDRV_SECTOR_BITS);
         }
+
         nb_sectors -= n;
         sector_num += n;
-        buf += n << BDRV_SECTOR_BITS;
+        bytes_done += nbytes;
     }
-    return 0;
-}
-
-static coroutine_fn int parallels_co_read(BlockDriverState *bs, int64_t sector_num,
-                                          uint8_t *buf, int nb_sectors)
-{
-    int ret;
-    BDRVParallelsState *s = bs->opaque;
-    qemu_co_mutex_lock(&s->lock);
-    ret = parallels_read(bs, sector_num, buf, nb_sectors);
     qemu_co_mutex_unlock(&s->lock);
+
+fail:
+    qemu_iovec_destroy(&hd_qiov);
     return ret;
 }
 
@@ -231,9 +239,9 @@ static BlockDriver bdrv_parallels = {
     .instance_size	= sizeof(BDRVParallelsState),
     .bdrv_probe		= parallels_probe,
     .bdrv_open		= parallels_open,
-    .bdrv_read          = parallels_co_read,
     .bdrv_close		= parallels_close,
     .bdrv_co_get_block_status = parallels_co_get_block_status,
+    .bdrv_co_readv  = parallels_co_readv,
 };
 
 static void bdrv_parallels_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 07/19] block/parallels: replace magic constants 4, 64 with proper sizeofs
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (5 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 06/19] block/parallels: provide _co_readv routine for parallels format driver Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 08/19] block/parallels: _co_writev callback for Parallels format Denis V. Lunev
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

simple purification..

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 64b169b..306f2e3 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -32,7 +32,6 @@
 #define HEADER_MAGIC "WithoutFreeSpace"
 #define HEADER_MAGIC2 "WithouFreSpacExt"
 #define HEADER_VERSION 2
-#define HEADER_SIZE 64
 
 // always little-endian
 typedef struct ParallelsHeader {
@@ -63,7 +62,7 @@ static int parallels_probe(const uint8_t *buf, int buf_size, const char *filenam
 {
     const ParallelsHeader *ph = (const void *)buf;
 
-    if (buf_size < HEADER_SIZE)
+    if (buf_size < sizeof(ParallelsHeader))
         return 0;
 
     if ((!memcmp(ph->magic, HEADER_MAGIC, 16) ||
@@ -116,7 +115,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     s->catalog_size = le32_to_cpu(ph.catalog_entries);
-    if (s->catalog_size > INT_MAX / 4) {
+    if (s->catalog_size > INT_MAX / sizeof(uint32_t)) {
         error_setg(errp, "Catalog too large");
         ret = -EFBIG;
         goto fail;
@@ -127,7 +126,8 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
-    ret = bdrv_pread(bs->file, 64, s->catalog_bitmap, s->catalog_size * 4);
+    ret = bdrv_pread(bs->file, sizeof(ParallelsHeader),
+                     s->catalog_bitmap, s->catalog_size * sizeof(uint32_t));
     if (ret < 0) {
         goto fail;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 08/19] block/parallels: _co_writev callback for Parallels format
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (6 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 07/19] block/parallels: replace magic constants 4, 64 with proper sizeofs Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 09/19] iotests, parallels: test for write into Parallels image Denis V. Lunev
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Support write on Parallels images. The code is almost the same as one
in the previous patch implemented scatter-gather IO for read.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 306f2e3..dca3d0b 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -81,8 +81,6 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
     ParallelsHeader ph;
     int ret;
 
-    bs->read_only = 1; // no write support yet
-
     ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
     if (ret < 0) {
         goto fail;
@@ -159,6 +157,37 @@ static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
     return (uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset;
 }
 
+static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
+{
+    BDRVParallelsState *s = bs->opaque;
+    uint32_t idx, offset, tmp;
+    int64_t pos;
+    int ret;
+
+    idx = sector_num / s->tracks;
+    offset = sector_num % s->tracks;
+
+    if (idx >= s->catalog_size) {
+        return -EINVAL;
+    }
+    if (s->catalog_bitmap[idx] != 0) {
+        return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset;
+    }
+
+    pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
+    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
+    s->catalog_bitmap[idx] = pos / s->off_multiplier;
+
+    tmp = cpu_to_le32(s->catalog_bitmap[idx]);
+
+    ret = bdrv_pwrite_sync(bs->file,
+            sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeof(tmp));
+    if (ret < 0) {
+        return ret;
+    }
+    return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset;
+}
+
 static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
         int nb_sectors)
 {
@@ -186,6 +215,49 @@ static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
         BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
+static coroutine_fn int parallels_co_writev(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+{
+    BDRVParallelsState *s = bs->opaque;
+    uint64_t bytes_done = 0;
+    QEMUIOVector hd_qiov;
+    int ret = 0;
+
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+
+    qemu_co_mutex_lock(&s->lock);
+    while (nb_sectors > 0) {
+        int64_t position = allocate_sector(bs, sector_num);
+        int n = cluster_remainder(s, sector_num, nb_sectors);
+        int nbytes = n << BDRV_SECTOR_BITS;
+
+        if (position < 0) {
+            ret = (int)position;
+            break;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, nbytes);
+
+        qemu_co_mutex_unlock(&s->lock);
+        ret = bdrv_co_writev(bs->file, position, n, &hd_qiov);
+        qemu_co_mutex_lock(&s->lock);
+
+        if (ret < 0) {
+            goto fail;
+        }
+
+        nb_sectors -= n;
+        sector_num += n;
+        bytes_done += nbytes;
+    }
+    qemu_co_mutex_unlock(&s->lock);
+
+fail:
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
 static coroutine_fn int parallels_co_readv(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
@@ -242,6 +314,7 @@ static BlockDriver bdrv_parallels = {
     .bdrv_close		= parallels_close,
     .bdrv_co_get_block_status = parallels_co_get_block_status,
     .bdrv_co_readv  = parallels_co_readv,
+    .bdrv_co_writev = parallels_co_writev,
 };
 
 static void bdrv_parallels_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 09/19] iotests, parallels: test for write into Parallels image
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (7 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 08/19] block/parallels: _co_writev callback for Parallels format Denis V. Lunev
@ 2014-12-30  9:28 ` Denis V. Lunev
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
  9 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30  9:28 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/076     |  5 +++++
 tests/qemu-iotests/076.out | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index 0139976..c9b55a9 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -74,6 +74,11 @@ echo
 echo "== Read from a valid v2 image =="
 _use_sample_img parallels-v2.bz2
 { $QEMU_IO -c "read -P 0x11 0 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -P 0x21 1024k 1k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -P 0x22 1025k 1k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x21 1024k 1k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x22 1025k 1k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0 1026k 62k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 32ade08..b0000ae 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -19,4 +19,14 @@ no file open, try 'help open'
 == Read from a valid v2 image ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 1048576
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 1049600
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 1048576
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 1049600
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 63488/63488 bytes at offset 1050624
+62 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation
  2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
                   ` (8 preceding siblings ...)
  2014-12-30  9:28 ` [Qemu-devel] [PATCH 09/19] iotests, parallels: test for write into Parallels image Denis V. Lunev
@ 2014-12-30 10:07 ` Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 11/19] iotests, parallels: test for newly created parallels image via qemu-img Denis V. Lunev
                     ` (8 more replies)
  9 siblings, 9 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Do not even care to create WithoutFreeSpace image, it is obsolete.
Always create WithouFreSpacExt one.

The code also does not spend a lot of efforts to fill cylinders and
heads fields, they are not used actually in a real life neither in
QEMU nor in Parallels products.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index dca3d0b..bea1217 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -33,6 +33,9 @@
 #define HEADER_MAGIC2 "WithouFreSpacExt"
 #define HEADER_VERSION 2
 
+#define DEFAULT_CLUSTER_SIZE 1048576        /* 1 MiB */
+
+
 // always little-endian
 typedef struct ParallelsHeader {
     char magic[16]; // "WithoutFreeSpace"
@@ -300,12 +303,103 @@ fail:
     return ret;
 }
 
+static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
+{
+    int64_t total_size, cl_size;
+    uint8_t tmp[BDRV_SECTOR_SIZE];
+    Error *local_err = NULL;
+    BlockDriverState *file;
+    uint32_t cat_entries, cat_sectors;
+    ParallelsHeader header;
+    int ret;
+
+    total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+                          BDRV_SECTOR_SIZE);
+    cl_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
+                          DEFAULT_CLUSTER_SIZE), BDRV_SECTOR_SIZE);
+
+    ret = bdrv_create_file(filename, opts, &local_err);
+    if (ret < 0) {
+        error_propagate(errp, local_err);
+        return ret;
+    }
+
+    file = NULL;
+    ret = bdrv_open(&file, filename, NULL, NULL,
+                    BDRV_O_RDWR | BDRV_O_PROTOCOL, NULL, &local_err);
+    if (ret < 0) {
+        error_propagate(errp, local_err);
+        return ret;
+    }
+    ret = bdrv_truncate(file, 0);
+    if (ret < 0) {
+        goto exit;
+    }
+
+    cat_entries = DIV_ROUND_UP(total_size, cl_size);
+    cat_sectors = DIV_ROUND_UP(cat_entries * sizeof(uint32_t) +
+                               sizeof(ParallelsHeader), cl_size);
+    cat_sectors = (cat_sectors *  cl_size) >> BDRV_SECTOR_BITS;
+
+    memset(&header, 0, sizeof(header));
+    memcpy(header.magic, HEADER_MAGIC2, sizeof(header.magic));
+    header.version = cpu_to_le32(HEADER_VERSION);
+    /* don't care much about geometry, it is not used on image level */
+    header.heads = cpu_to_le32(16);
+    header.cylinders = cpu_to_le32(total_size / BDRV_SECTOR_SIZE / 16 / 32);
+    header.tracks = cpu_to_le32(cl_size >> BDRV_SECTOR_BITS);
+    header.catalog_entries = cpu_to_le32(cat_entries);
+    header.nb_sectors = cpu_to_le64(DIV_ROUND_UP(total_size, BDRV_SECTOR_SIZE));
+    header.data_off = cpu_to_le32(cat_sectors);
+
+    /* write all the data */
+    memset(tmp, 0, sizeof(tmp));
+    memcpy(tmp, &header, sizeof(header));
+
+    ret = bdrv_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
+    if (ret < 0) {
+        goto exit;
+    }
+    ret = bdrv_write_zeroes(file, 1, cat_sectors - 1, 0);
+    if (ret < 0) {
+        goto exit;
+    }
+    ret = 0;
+
+done:
+    bdrv_unref(file);
+    return ret;
+
+exit:
+    error_setg_errno(errp, -ret, "Failed to create Parallels image");
+    goto done;
+}
+
 static void parallels_close(BlockDriverState *bs)
 {
     BDRVParallelsState *s = bs->opaque;
     g_free(s->catalog_bitmap);
 }
 
+static QemuOptsList parallels_create_opts = {
+    .name = "parallels-create-opts",
+    .head = QTAILQ_HEAD_INITIALIZER(parallels_create_opts.head),
+    .desc = {
+        {
+            .name = BLOCK_OPT_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Virtual disk size",
+        },
+        {
+            .name = BLOCK_OPT_CLUSTER_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Parallels image cluster size",
+            .def_value_str = stringify(DEFAULT_CLUSTER_SIZE),
+        },
+        { /* end of list */ }
+    }
+};
+
 static BlockDriver bdrv_parallels = {
     .format_name	= "parallels",
     .instance_size	= sizeof(BDRVParallelsState),
@@ -315,6 +409,9 @@ static BlockDriver bdrv_parallels = {
     .bdrv_co_get_block_status = parallels_co_get_block_status,
     .bdrv_co_readv  = parallels_co_readv,
     .bdrv_co_writev = parallels_co_writev,
+
+    .bdrv_create    = parallels_create,
+    .create_opts    = &parallels_create_opts,
 };
 
 static void bdrv_parallels_init(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 11/19] iotests, parallels: test for newly created parallels image via qemu-img
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 12/19] parallels: change copyright information in the image header Denis V. Lunev
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/115     | 68 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/115.out | 24 ++++++++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 93 insertions(+)
 create mode 100755 tests/qemu-iotests/115
 create mode 100644 tests/qemu-iotests/115.out

diff --git a/tests/qemu-iotests/115 b/tests/qemu-iotests/115
new file mode 100755
index 0000000..f45afa7
--- /dev/null
+++ b/tests/qemu-iotests/115
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# parallels format validation tests (created by QEMU)
+#
+# Copyright (C) 2014 Denis V. Lunev <den@openvz.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=den@openvz.org
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt parallels
+_supported_proto file
+_supported_os Linux
+
+size=64M
+CLUSTER_SIZE=64k
+IMGFMT=parallels
+_make_test_img $size
+
+echo == read empty image ==
+{ $QEMU_IO -c "read -P 0 32k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+echo == write more than 1 block in a row ==
+{ $QEMU_IO -c "write -P 0x11 32k 128k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+echo == read less than block ==
+{ $QEMU_IO -c "read -P 0x11 32k 32k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+echo == read exactly 1 block ==
+{ $QEMU_IO -c "read -P 0x11 64k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+echo == read more than 1 block ==
+{ $QEMU_IO -c "read -P 0x11 32k 128k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+echo == check that there is no trash after written ==
+{ $QEMU_IO -c "read -P 0 160k 32k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+echo == check that there is no trash before written ==
+{ $QEMU_IO -c "read -P 0 0 32k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/115.out b/tests/qemu-iotests/115.out
new file mode 100644
index 0000000..6a73104
--- /dev/null
+++ b/tests/qemu-iotests/115.out
@@ -0,0 +1,24 @@
+QA output created by 115
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+== read empty image ==
+read 65536/65536 bytes at offset 32768
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== write more than 1 block in a row ==
+wrote 131072/131072 bytes at offset 32768
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== read less than block ==
+read 32768/32768 bytes at offset 32768
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== read exactly 1 block ==
+read 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== read more than 1 block ==
+read 131072/131072 bytes at offset 32768
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check that there is no trash after written ==
+read 32768/32768 bytes at offset 163840
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check that there is no trash before written ==
+read 32768/32768 bytes at offset 0
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index a4742c6..77377b3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -115,3 +115,4 @@
 111 rw auto quick
 113 rw auto quick
 114 rw auto quick
+115 rw auto quick
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 12/19] parallels: change copyright information in the image header
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 11/19] iotests, parallels: test for newly created parallels image via qemu-img Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 13/19] block/parallels: store ParallelsHeader to the BDRVParallelsState Denis V. Lunev
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Roman Kagan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index bea1217..e3abf4e 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -2,8 +2,12 @@
  * Block driver for Parallels disk image format
  *
  * Copyright (c) 2007 Alex Beregszaszi
+ * Copyright (c) 2014 Denis V. Lunev <den@openvz.org>
  *
- * This code is based on comparing different disk images created by Parallels.
+ * This code was originally based on comparing different disk images created
+ * by Parallels. Currently it is based on opened OpenVZ sources
+ * available at
+ *     http://git.openvz.org/?p=ploop;a=summary
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 13/19] block/parallels: store ParallelsHeader to the BDRVParallelsState
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 11/19] iotests, parallels: test for newly created parallels image via qemu-img Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 12/19] parallels: change copyright information in the image header Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-13 14:13     ` Roman Kagan
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 14/19] block/parallels: create catalog_offset helper Denis V. Lunev
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

This would be useful for the future for speed optimizations of new block
creation in the image. At the moment each write to the catalog bitmap
results in read-modify-write transaction. It would be beneficial to
write by pages or sectors. Though in order to do that for the begining
of the image we should keep the header somethere to obtain first sector
of the image properly. BDRVParallelsState would be a good place for that.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index e3abf4e..f79ddff 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -57,6 +57,8 @@ typedef struct ParallelsHeader {
 typedef struct BDRVParallelsState {
     CoMutex lock;
 
+    ParallelsHeader ph;
+
     uint32_t *catalog_bitmap;
     unsigned int catalog_size;
 
@@ -85,29 +87,28 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
 {
     BDRVParallelsState *s = bs->opaque;
     int i;
-    ParallelsHeader ph;
     int ret;
 
-    ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
+    ret = bdrv_pread(bs->file, 0, &s->ph, sizeof(s->ph));
     if (ret < 0) {
         goto fail;
     }
 
-    bs->total_sectors = le64_to_cpu(ph.nb_sectors);
+    bs->total_sectors = le64_to_cpu(s->ph.nb_sectors);
 
-    if (le32_to_cpu(ph.version) != HEADER_VERSION) {
+    if (le32_to_cpu(s->ph.version) != HEADER_VERSION) {
         goto fail_format;
     }
-    if (!memcmp(ph.magic, HEADER_MAGIC, 16)) {
+    if (!memcmp(s->ph.magic, HEADER_MAGIC, 16)) {
         s->off_multiplier = 1;
         bs->total_sectors = 0xffffffff & bs->total_sectors;
-    } else if (!memcmp(ph.magic, HEADER_MAGIC2, 16)) {
-        s->off_multiplier = le32_to_cpu(ph.tracks);
+    } else if (!memcmp(s->ph.magic, HEADER_MAGIC2, 16)) {
+        s->off_multiplier = le32_to_cpu(s->ph.tracks);
     } else {
         goto fail_format;
     }
 
-    s->tracks = le32_to_cpu(ph.tracks);
+    s->tracks = le32_to_cpu(s->ph.tracks);
     if (s->tracks == 0) {
         error_setg(errp, "Invalid image: Zero sectors per track");
         ret = -EINVAL;
@@ -119,7 +120,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
-    s->catalog_size = le32_to_cpu(ph.catalog_entries);
+    s->catalog_size = le32_to_cpu(s->ph.catalog_entries);
     if (s->catalog_size > INT_MAX / sizeof(uint32_t)) {
         error_setg(errp, "Catalog too large");
         ret = -EFBIG;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 14/19] block/parallels: create catalog_offset helper
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
                     ` (2 preceding siblings ...)
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 13/19] block/parallels: store ParallelsHeader to the BDRVParallelsState Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-13 14:14     ` Roman Kagan
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 15/19] block/parallels: rename catalog_ names to bat_ Denis V. Lunev
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

to calculate entry offset inside catalog bitmap in parallels image.
This is a matter of convinience.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index f79ddff..d072276 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -67,6 +67,12 @@ typedef struct BDRVParallelsState {
     unsigned int off_multiplier;
 } BDRVParallelsState;
 
+
+static uint32_t catalog_offset(uint32_t index)
+{
+    return sizeof(ParallelsHeader) + sizeof(uint32_t) * index;
+}
+
 static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
     const ParallelsHeader *ph = (const void *)buf;
@@ -188,8 +194,7 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
 
     tmp = cpu_to_le32(s->catalog_bitmap[idx]);
 
-    ret = bdrv_pwrite_sync(bs->file,
-            sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeof(tmp));
+    ret = bdrv_pwrite_sync(bs->file, catalog_offset(idx), &tmp, sizeof(tmp));
     if (ret < 0) {
         return ret;
     }
@@ -342,8 +347,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
     }
 
     cat_entries = DIV_ROUND_UP(total_size, cl_size);
-    cat_sectors = DIV_ROUND_UP(cat_entries * sizeof(uint32_t) +
-                               sizeof(ParallelsHeader), cl_size);
+    cat_sectors = DIV_ROUND_UP(catalog_offset(cat_entries), cl_size);
     cat_sectors = (cat_sectors *  cl_size) >> BDRV_SECTOR_BITS;
 
     memset(&header, 0, sizeof(header));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 15/19] block/parallels: rename catalog_ names to bat_
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
                     ` (3 preceding siblings ...)
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 14/19] block/parallels: create catalog_offset helper Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-13 14:15     ` Roman Kagan
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update Denis V. Lunev
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

BAT means 'block allocation table'. Thus this name is clean and shorter
on writing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index d072276..ddc3aee 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -47,7 +47,7 @@ typedef struct ParallelsHeader {
     uint32_t heads;
     uint32_t cylinders;
     uint32_t tracks;
-    uint32_t catalog_entries;
+    uint32_t bat_entries;
     uint64_t nb_sectors;
     uint32_t inuse;
     uint32_t data_off;
@@ -59,8 +59,8 @@ typedef struct BDRVParallelsState {
 
     ParallelsHeader ph;
 
-    uint32_t *catalog_bitmap;
-    unsigned int catalog_size;
+    uint32_t *bat;
+    unsigned int bat_size;
 
     unsigned int tracks;
 
@@ -68,7 +68,7 @@ typedef struct BDRVParallelsState {
 } BDRVParallelsState;
 
 
-static uint32_t catalog_offset(uint32_t index)
+static uint32_t bat_offset(uint32_t index)
 {
     return sizeof(ParallelsHeader) + sizeof(uint32_t) * index;
 }
@@ -126,26 +126,26 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
-    s->catalog_size = le32_to_cpu(s->ph.catalog_entries);
-    if (s->catalog_size > INT_MAX / sizeof(uint32_t)) {
+    s->bat_size = le32_to_cpu(s->ph.bat_entries);
+    if (s->bat_size > INT_MAX / sizeof(uint32_t)) {
         error_setg(errp, "Catalog too large");
         ret = -EFBIG;
         goto fail;
     }
-    s->catalog_bitmap = g_try_new(uint32_t, s->catalog_size);
-    if (s->catalog_size && s->catalog_bitmap == NULL) {
+    s->bat = g_try_new(uint32_t, s->bat_size);
+    if (s->bat_size && s->bat == NULL) {
         ret = -ENOMEM;
         goto fail;
     }
 
     ret = bdrv_pread(bs->file, sizeof(ParallelsHeader),
-                     s->catalog_bitmap, s->catalog_size * sizeof(uint32_t));
+                     s->bat, s->bat_size * sizeof(uint32_t));
     if (ret < 0) {
         goto fail;
     }
 
-    for (i = 0; i < s->catalog_size; i++)
-        le32_to_cpus(&s->catalog_bitmap[i]);
+    for (i = 0; i < s->bat_size; i++)
+        le32_to_cpus(&s->bat[i]);
 
     qemu_co_mutex_init(&s->lock);
     return 0;
@@ -154,7 +154,7 @@ fail_format:
     error_setg(errp, "Image not in Parallels format");
     ret = -EINVAL;
 fail:
-    g_free(s->catalog_bitmap);
+    g_free(s->bat);
     return ret;
 }
 
@@ -166,9 +166,9 @@ static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
     offset = sector_num % s->tracks;
 
     /* not allocated */
-    if ((index >= s->catalog_size) || (s->catalog_bitmap[index] == 0))
+    if ((index >= s->bat_size) || (s->bat[index] == 0))
         return -1;
-    return (uint64_t)s->catalog_bitmap[index] * s->off_multiplier + offset;
+    return (uint64_t)s->bat[index] * s->off_multiplier + offset;
 }
 
 static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
@@ -181,24 +181,24 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
     idx = sector_num / s->tracks;
     offset = sector_num % s->tracks;
 
-    if (idx >= s->catalog_size) {
+    if (idx >= s->bat_size) {
         return -EINVAL;
     }
-    if (s->catalog_bitmap[idx] != 0) {
-        return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset;
+    if (s->bat[idx] != 0) {
+        return (uint64_t)s->bat[idx] * s->off_multiplier + offset;
     }
 
     pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
     bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
-    s->catalog_bitmap[idx] = pos / s->off_multiplier;
+    s->bat[idx] = pos / s->off_multiplier;
 
-    tmp = cpu_to_le32(s->catalog_bitmap[idx]);
+    tmp = cpu_to_le32(s->bat[idx]);
 
-    ret = bdrv_pwrite_sync(bs->file, catalog_offset(idx), &tmp, sizeof(tmp));
+    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
     if (ret < 0) {
         return ret;
     }
-    return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset;
+    return (uint64_t)s->bat[idx] * s->off_multiplier + offset;
 }
 
 static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
@@ -347,7 +347,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
     }
 
     cat_entries = DIV_ROUND_UP(total_size, cl_size);
-    cat_sectors = DIV_ROUND_UP(catalog_offset(cat_entries), cl_size);
+    cat_sectors = DIV_ROUND_UP(bat_offset(cat_entries), cl_size);
     cat_sectors = (cat_sectors *  cl_size) >> BDRV_SECTOR_BITS;
 
     memset(&header, 0, sizeof(header));
@@ -357,7 +357,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
     header.heads = cpu_to_le32(16);
     header.cylinders = cpu_to_le32(total_size / BDRV_SECTOR_SIZE / 16 / 32);
     header.tracks = cpu_to_le32(cl_size >> BDRV_SECTOR_BITS);
-    header.catalog_entries = cpu_to_le32(cat_entries);
+    header.bat_entries = cpu_to_le32(cat_entries);
     header.nb_sectors = cpu_to_le64(DIV_ROUND_UP(total_size, BDRV_SECTOR_SIZE));
     header.data_off = cpu_to_le32(cat_sectors);
 
@@ -387,7 +387,7 @@ exit:
 static void parallels_close(BlockDriverState *bs)
 {
     BDRVParallelsState *s = bs->opaque;
-    g_free(s->catalog_bitmap);
+    g_free(s->bat);
 }
 
 static QemuOptsList parallels_create_opts = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
                     ` (4 preceding siblings ...)
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 15/19] block/parallels: rename catalog_ names to bat_ Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-13 14:50     ` Roman Kagan
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os Denis V. Lunev
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

>From the point of guest each write to real disk prior to disk barrier
operation could be lost. Therefore there is no problem that "not synced"
new block is lost due to not updated allocation table if QEMU is crashed.

This patch improves writing performance of
  qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
  qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
from 45 Mb/sec to 160 Mb/sec on my SSD disk. The gain on rotational media
is much more sufficient, from 800 Kb/sec to 45 Mb/sec.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index ddc3aee..46cf031 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -194,7 +194,7 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
 
     tmp = cpu_to_le32(s->bat[idx]);
 
-    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
+    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
     if (ret < 0) {
         return ret;
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
                     ` (5 preceding siblings ...)
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-14 13:03     ` Roman Kagan
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets Denis V. Lunev
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion Denis V. Lunev
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

The idea is that we do not need to immediately sync BAT to the image as
from the guest point of view there is a possibility that IO is lost
even in the physical controller until flush command was finished.
bdrv_co_flush_to_os is exactly the right place for this purpose.

Technically the patch aligns writes into BAT to MAX(bdrv_align, 4096),
which elliminates read-modify-write transactions on BAT update and
cache ready-to-write content in a special buffer in BDRVParallelsState.

This buffer possibly contains ParallelsHeader if the first page of the
image should be modified. The header occupies first 64 bytes of the image
and the BAT starts immediately after it.

It is also possible that BAT end is not aligned to the cluster size.
ParallelsHeader->data_off is not specified for this case. We should write
only part of the cache in that case.

This patch speed ups
  qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
  qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
writing from 50-60 Mb/sec to 80-90 Mb/sec on rotational media and
from 160 Mb/sec to 190 Mb/sec on SSD disk.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 94 insertions(+), 5 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 46cf031..18b9267 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -62,6 +62,11 @@ typedef struct BDRVParallelsState {
     uint32_t *bat;
     unsigned int bat_size;
 
+    uint32_t *bat_cache;
+    unsigned bat_cache_size;
+    int bat_cache_off;
+    int data_off;
+
     unsigned int tracks;
 
     unsigned int off_multiplier;
@@ -148,6 +153,17 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         le32_to_cpus(&s->bat[i]);
 
     qemu_co_mutex_init(&s->lock);
+
+    s->bat_cache_off = -1;
+    if (bs->open_flags & BDRV_O_RDWR) {
+        s->bat_cache_size = MAX(bdrv_opt_mem_align(bs->file), 4096);
+        s->bat_cache = qemu_blockalign(bs->file, s->bat_cache_size);
+    }
+    s->data_off = le32_to_cpu(s->ph.data_off) * BDRV_SECTOR_SIZE;
+    if (s->data_off == 0) {
+        s->data_off = ROUND_UP(bat_offset(s->bat_size), BDRV_SECTOR_SIZE);
+    }
+
     return 0;
 
 fail_format:
@@ -171,10 +187,71 @@ static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
     return (uint64_t)s->bat[index] * s->off_multiplier + offset;
 }
 
+static int write_bat_cache(BlockDriverState *bs)
+{
+    BDRVParallelsState *s = bs->opaque;
+    int size, off;
+
+    if (s->bat_cache_off == -1) {
+        /* no cached data */
+        return 0;
+    }
+
+    size = s->bat_cache_size;
+    if (size + s->bat_cache_off > s->data_off) {
+        /* avoid writing to the first data block */
+        size = s->data_off - s->bat_cache_off;
+    }
+
+    off = s->bat_cache_off;
+    s->bat_cache_off = -1;
+    return bdrv_pwrite(bs->file, off, s->bat_cache, size);
+}
+
+static int cache_bat(BlockDriverState *bs, uint32_t idx, uint32_t new_data_off)
+{
+    int ret, i, off, cache_off;
+    int64_t first_idx, last_idx;
+    BDRVParallelsState *s = bs->opaque;
+    uint32_t *cache = s->bat_cache;
+
+    off = bat_offset(idx);
+    cache_off = (off / s->bat_cache_size) * s->bat_cache_size;
+
+    if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
+        ret = write_bat_cache(bs);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    first_idx = idx - (off - cache_off) / sizeof(uint32_t);
+    last_idx = first_idx + s->bat_cache_size / sizeof(uint32_t);
+    if (first_idx < 0) {
+        memcpy(s->bat_cache, &s->ph, sizeof(s->ph));
+        first_idx = 0;
+        cache = s->bat_cache + sizeof(s->ph) / sizeof(uint32_t);
+    }
+
+    if (last_idx > s->bat_size) {
+        memset(cache + s->bat_size - first_idx, 0,
+               sizeof(uint32_t) * (last_idx - s->bat_size));
+    }
+
+    for (i = 0; i < last_idx - first_idx; i++) {
+        cache[i] = cpu_to_le32(s->bat[first_idx + i]);
+    }
+    cache[idx - first_idx] = cpu_to_le32(new_data_off);
+    s->bat[idx] = new_data_off;
+
+    s->bat_cache_off = cache_off;
+    return 0;
+}
+
 static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
 {
     BDRVParallelsState *s = bs->opaque;
-    uint32_t idx, offset, tmp;
+    uint32_t idx, offset;
     int64_t pos;
     int ret;
 
@@ -190,17 +267,27 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
 
     pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
     bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
-    s->bat[idx] = pos / s->off_multiplier;
-
-    tmp = cpu_to_le32(s->bat[idx]);
 
-    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
+    ret = cache_bat(bs, idx, pos / s->off_multiplier);
     if (ret < 0) {
         return ret;
     }
     return (uint64_t)s->bat[idx] * s->off_multiplier + offset;
 }
 
+static coroutine_fn int parallels_co_flush_to_os(BlockDriverState *bs)
+{
+    BDRVParallelsState *s = bs->opaque;
+    int ret;
+
+    qemu_co_mutex_lock(&s->lock);
+    ret = write_bat_cache(bs);
+    qemu_co_mutex_unlock(&s->lock);
+
+    return ret;
+}
+
+
 static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
         int nb_sectors)
 {
@@ -387,6 +474,7 @@ exit:
 static void parallels_close(BlockDriverState *bs)
 {
     BDRVParallelsState *s = bs->opaque;
+    qemu_vfree(s->bat_cache);
     g_free(s->bat);
 }
 
@@ -416,6 +504,7 @@ static BlockDriver bdrv_parallels = {
     .bdrv_open		= parallels_open,
     .bdrv_close		= parallels_close,
     .bdrv_co_get_block_status = parallels_co_get_block_status,
+    .bdrv_co_flush_to_os      = parallels_co_flush_to_os,
     .bdrv_co_readv  = parallels_co_readv,
     .bdrv_co_writev = parallels_co_writev,
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
                     ` (6 preceding siblings ...)
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-14 14:26     ` Roman Kagan
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion Denis V. Lunev
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

This is preparational commit for tweaks in Parallels image expansion.
The idea is that enlarge via truncate by one data block is slow. It
would be much better to use fallocate via bdrv_write_zeroes and
expand by some significant amount at once.

This patch just adds proper parameters into BDRVParallelsState and
performs options parsing in parallels_open.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 18b9267..12a9cea 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -30,6 +30,7 @@
 #include "qemu-common.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/util.h"
 
 /**************************************************************/
 
@@ -54,6 +55,20 @@ typedef struct ParallelsHeader {
     char padding[12];
 } QEMU_PACKED ParallelsHeader;
 
+
+typedef enum ParallelsPreallocMode {
+    PRL_PREALLOC_MODE_FALLOCATE = 0,
+    PRL_PREALLOC_MODE_TRUNCATE = 1,
+    PRL_PREALLOC_MODE_MAX = 2,
+} ParallelsPreallocMode;
+
+static const char *prealloc_mode_lookup[] = {
+    "falloc",
+    "truncate",
+    NULL,
+};
+
+
 typedef struct BDRVParallelsState {
     CoMutex lock;
 
@@ -67,12 +82,40 @@ typedef struct BDRVParallelsState {
     int bat_cache_off;
     int data_off;
 
+    uint64_t prealloc_size;
+    ParallelsPreallocMode prealloc_mode;
+
     unsigned int tracks;
 
     unsigned int off_multiplier;
 } BDRVParallelsState;
 
 
+#define PARALLELS_OPT_PREALLOC_MODE     "prealloc-mode"
+#define PARALLELS_OPT_PREALLOC_SIZE     "prealloc-size"
+
+static QemuOptsList parallels_runtime_opts = {
+    .name = "parallels",
+    .head = QTAILQ_HEAD_INITIALIZER(parallels_runtime_opts.head),
+    .desc = {
+        {
+            .name = PARALLELS_OPT_PREALLOC_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Preallocation size on image expansion",
+            .def_value_str = "128MiB",
+        },
+        {
+            .name = PARALLELS_OPT_PREALLOC_MODE,
+            .type = QEMU_OPT_STRING,
+            .help = "Preallocation mode on image expansion "
+                    "(allowed values: falloc, truncate)",
+            .def_value_str = "falloc",
+        },
+        { /* end of list */ },
+    },
+};
+
+
 static uint32_t bat_offset(uint32_t index)
 {
     return sizeof(ParallelsHeader) + sizeof(uint32_t) * index;
@@ -99,6 +142,9 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVParallelsState *s = bs->opaque;
     int i;
     int ret;
+    QemuOpts *opts = NULL;
+    Error *local_err = NULL;
+    char *buf;
 
     ret = bdrv_pread(bs->file, 0, &s->ph, sizeof(s->ph));
     if (ret < 0) {
@@ -149,6 +195,27 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    opts = qemu_opts_create(&parallels_runtime_opts, NULL, 0, &local_err);
+    if (local_err != NULL) {
+        goto fail_options;
+    }
+
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err != NULL) {
+        goto fail_options;
+    }
+
+    s->prealloc_size =
+        qemu_opt_get_size_del(opts, PARALLELS_OPT_PREALLOC_SIZE, 0);
+    s->prealloc_size = MAX(s->tracks, s->prealloc_size >> BDRV_SECTOR_BITS);
+    buf = qemu_opt_get_del(opts, PARALLELS_OPT_PREALLOC_MODE);
+    s->prealloc_mode = qapi_enum_parse(prealloc_mode_lookup, buf,
+            PRL_PREALLOC_MODE_MAX, PRL_PREALLOC_MODE_FALLOCATE, &local_err);
+    g_free(buf);
+    if (local_err != NULL) {
+        goto fail_options;
+    }
+
     for (i = 0; i < s->bat_size; i++)
         le32_to_cpus(&s->bat[i]);
 
@@ -172,6 +239,11 @@ fail_format:
 fail:
     g_free(s->bat);
     return ret;
+
+fail_options:
+    error_propagate(errp, local_err);
+    ret = -EINVAL;
+    goto fail;
 }
 
 static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion
  2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
                     ` (7 preceding siblings ...)
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets Denis V. Lunev
@ 2014-12-30 10:07   ` Denis V. Lunev
  2015-01-14 17:56     ` Roman Kagan
  8 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2014-12-30 10:07 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

Plain image expansion spends a lot of time to update image file size.
This seriously affects the performance. The following simple test
  qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
  qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
could be improved if the format driver will pre-allocate some space
in the image file with a reasonable chunk.

This patch preallocates 128 Mb using bdrv_write_zeroes, which should
normally use fallocate() call inside. Fallback to older truncate()
could be used as a fallback using image open options thanks to the
previous patch.

The benefit is around 15%.

This patch is final in this series. Block driver has near native
performance now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 12a9cea..5ec4a0d 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -82,6 +82,7 @@ typedef struct BDRVParallelsState {
     int bat_cache_off;
     int data_off;
 
+    int64_t  prealloc_off;
     uint64_t prealloc_size;
     ParallelsPreallocMode prealloc_mode;
 
@@ -216,9 +217,19 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail_options;
     }
 
-    for (i = 0; i < s->bat_size; i++)
+    for (i = 0; i < s->bat_size; i++) {
+        int64_t off;
         le32_to_cpus(&s->bat[i]);
 
+        if (s->bat[i] == 0) {
+            continue;
+        }
+        off = s->bat[i] * s->off_multiplier;
+        if (off >= s->prealloc_off) {
+            s->prealloc_off = off + s->tracks;
+        }
+    }
+
     qemu_co_mutex_init(&s->lock);
 
     s->bat_cache_off = -1;
@@ -230,6 +241,9 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
     if (s->data_off == 0) {
         s->data_off = ROUND_UP(bat_offset(s->bat_size), BDRV_SECTOR_SIZE);
     }
+    if (s->prealloc_off == 0) {
+        s->prealloc_off = s->data_off >> BDRV_SECTOR_BITS;
+    }
 
     return 0;
 
@@ -338,7 +352,19 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
     }
 
     pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
-    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
+    if (s->prealloc_off + s->tracks > pos) {
+        if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE)
+            ret = bdrv_write_zeroes(bs->file, s->prealloc_off,
+                                    s->prealloc_size, 0);
+        else
+            ret = bdrv_truncate(bs->file,
+                    (s->prealloc_off + s->prealloc_size) << BDRV_SECTOR_BITS);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    pos = s->prealloc_off;
+    s->prealloc_off += s->tracks;
 
     ret = cache_bat(bs, idx, pos / s->off_multiplier);
     if (ret < 0) {
@@ -546,6 +572,11 @@ exit:
 static void parallels_close(BlockDriverState *bs)
 {
     BDRVParallelsState *s = bs->opaque;
+
+    if (bs->open_flags & BDRV_O_RDWR) {
+        bdrv_truncate(bs->file, s->prealloc_off << BDRV_SECTOR_BITS);
+    }
+
     qemu_vfree(s->bat_cache);
     g_free(s->bat);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 13/19] block/parallels: store ParallelsHeader to the BDRVParallelsState
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 13/19] block/parallels: store ParallelsHeader to the BDRVParallelsState Denis V. Lunev
@ 2015-01-13 14:13     ` Roman Kagan
  0 siblings, 0 replies; 38+ messages in thread
From: Roman Kagan @ 2015-01-13 14:13 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:06PM +0300, Denis V. Lunev wrote:
> This would be useful for the future for speed optimizations of new block
> creation in the image. At the moment each write to the catalog bitmap
> results in read-modify-write transaction. It would be beneficial to
> write by pages or sectors. Though in order to do that for the begining
> of the image we should keep the header somethere to obtain first sector
> of the image properly. BDRVParallelsState would be a good place for that.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)

Acked-by: Roman Kagan <rkagan@parallels.com>

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 14/19] block/parallels: create catalog_offset helper
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 14/19] block/parallels: create catalog_offset helper Denis V. Lunev
@ 2015-01-13 14:14     ` Roman Kagan
  0 siblings, 0 replies; 38+ messages in thread
From: Roman Kagan @ 2015-01-13 14:14 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:07PM +0300, Denis V. Lunev wrote:
> to calculate entry offset inside catalog bitmap in parallels image.
> This is a matter of convinience.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)

Acked-by: Roman Kagan <rkagan@parallels.com>

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 15/19] block/parallels: rename catalog_ names to bat_
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 15/19] block/parallels: rename catalog_ names to bat_ Denis V. Lunev
@ 2015-01-13 14:15     ` Roman Kagan
  0 siblings, 0 replies; 38+ messages in thread
From: Roman Kagan @ 2015-01-13 14:15 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:08PM +0300, Denis V. Lunev wrote:
> BAT means 'block allocation table'. Thus this name is clean and shorter
> on writing.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 48 ++++++++++++++++++++++++------------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)

Acked-by: Roman Kagan <rkagan@parallels.com>

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update Denis V. Lunev
@ 2015-01-13 14:50     ` Roman Kagan
  2015-01-13 15:17       ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Roman Kagan @ 2015-01-13 14:50 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:09PM +0300, Denis V. Lunev wrote:
> From the point of guest each write to real disk prior to disk barrier
> operation could be lost. Therefore there is no problem that "not synced"
> new block is lost due to not updated allocation table if QEMU is crashed.
> 
> This patch improves writing performance of
>   qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
>   qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
> from 45 Mb/sec to 160 Mb/sec on my SSD disk. The gain on rotational media
> is much more sufficient, from 800 Kb/sec to 45 Mb/sec.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index ddc3aee..46cf031 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -194,7 +194,7 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
>  
>      tmp = cpu_to_le32(s->bat[idx]);
>  
> -    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
> +    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
>      if (ret < 0) {
>          return ret;
>      }

AFAICT this worsens the problem that existed before this patch:

if the preceding bdrv_truncate(+cluster_size) gets reordered with this
bat entry update, there's a risk that on poweroff (snapshot, etc.) the
bat entry gets written to the storage while the file size is not
updated.  As a result, the bat entry in the file would point at a
cluster which hadn't been allocated yet.  When a new block is eventually
added to the file, you'd have two bat entries pointing at the same
cluster.

The _sync version used to leave this window quite narrow due to the
flush following the write.  The patch makes the reordering more likely.

I'm afraid the only reliable way to handle it is to put a barrier
between truncate and bat update, and mitigate the costs by batching the
file expansions, as you seem to do in the followup patches.


Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update
  2015-01-13 14:50     ` Roman Kagan
@ 2015-01-13 15:17       ` Denis V. Lunev
  2015-01-13 20:16         ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-13 15:17 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 13/01/15 17:50, Roman Kagan wrote:
> On Tue, Dec 30, 2014 at 01:07:09PM +0300, Denis V. Lunev wrote:
>>  From the point of guest each write to real disk prior to disk barrier
>> operation could be lost. Therefore there is no problem that "not synced"
>> new block is lost due to not updated allocation table if QEMU is crashed.
>>
>> This patch improves writing performance of
>>    qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
>>    qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
>> from 45 Mb/sec to 160 Mb/sec on my SSD disk. The gain on rotational media
>> is much more sufficient, from 800 Kb/sec to 45 Mb/sec.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block/parallels.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/parallels.c b/block/parallels.c
>> index ddc3aee..46cf031 100644
>> --- a/block/parallels.c
>> +++ b/block/parallels.c
>> @@ -194,7 +194,7 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
>>   
>>       tmp = cpu_to_le32(s->bat[idx]);
>>   
>> -    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
>> +    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
>>       if (ret < 0) {
>>           return ret;
>>       }
> AFAICT this worsens the problem that existed before this patch:
>
> if the preceding bdrv_truncate(+cluster_size) gets reordered with this
> bat entry update, there's a risk that on poweroff (snapshot, etc.) the
> bat entry gets written to the storage while the file size is not
> updated.  As a result, the bat entry in the file would point at a
> cluster which hadn't been allocated yet.  When a new block is eventually
> added to the file, you'd have two bat entries pointing at the same
> cluster.
>
> The _sync version used to leave this window quite narrow due to the
> flush following the write.  The patch makes the reordering more likely.
>
> I'm afraid the only reliable way to handle it is to put a barrier
> between truncate and bat update, and mitigate the costs by batching the
> file expansions, as you seem to do in the followup patches.
>
>
> Roman.
no, I think that this will not happen.

Here we are under the lock which was was taken in
parallels_co_writev and thus 2 contexts are impossible.
On the other hand bdrv_truncate does not delegate
the job to the other context and is performed
immediately. This from my point of view means
that truncate is performed always first and from
the safe context.

Regards,
     Den

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update
  2015-01-13 15:17       ` Denis V. Lunev
@ 2015-01-13 20:16         ` Denis V. Lunev
  2015-01-14 12:01           ` Roman Kagan
  0 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-13 20:16 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 13/01/15 18:17, Denis V. Lunev wrote:
> On 13/01/15 17:50, Roman Kagan wrote:
>> On Tue, Dec 30, 2014 at 01:07:09PM +0300, Denis V. Lunev wrote:
>>>  From the point of guest each write to real disk prior to disk barrier
>>> operation could be lost. Therefore there is no problem that "not 
>>> synced"
>>> new block is lost due to not updated allocation table if QEMU is 
>>> crashed.
>>>
>>> This patch improves writing performance of
>>>    qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
>>>    qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
>>> from 45 Mb/sec to 160 Mb/sec on my SSD disk. The gain on rotational 
>>> media
>>> is much more sufficient, from 800 Kb/sec to 45 Mb/sec.
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>   block/parallels.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/parallels.c b/block/parallels.c
>>> index ddc3aee..46cf031 100644
>>> --- a/block/parallels.c
>>> +++ b/block/parallels.c
>>> @@ -194,7 +194,7 @@ static int64_t allocate_sector(BlockDriverState 
>>> *bs, int64_t sector_num)
>>>         tmp = cpu_to_le32(s->bat[idx]);
>>>   -    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp, 
>>> sizeof(tmp));
>>> +    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>> AFAICT this worsens the problem that existed before this patch:
>>
>> if the preceding bdrv_truncate(+cluster_size) gets reordered with this
>> bat entry update, there's a risk that on poweroff (snapshot, etc.) the
>> bat entry gets written to the storage while the file size is not
>> updated.  As a result, the bat entry in the file would point at a
>> cluster which hadn't been allocated yet.  When a new block is eventually
>> added to the file, you'd have two bat entries pointing at the same
>> cluster.
>>
>> The _sync version used to leave this window quite narrow due to the
>> flush following the write.  The patch makes the reordering more likely.
>>
>> I'm afraid the only reliable way to handle it is to put a barrier
>> between truncate and bat update, and mitigate the costs by batching the
>> file expansions, as you seem to do in the followup patches.
>>
>>
>> Roman.
> no, I think that this will not happen.
>
> Here we are under the lock which was was taken in
> parallels_co_writev and thus 2 contexts are impossible.
> On the other hand bdrv_truncate does not delegate
> the job to the other context and is performed
> immediately. This from my point of view means
> that truncate is performed always first and from
> the safe context.
>
> Regards,
>     Den


Thinking a bit more on this. IMHO there is no problem
with a normal workflow, the problem could come
from a crash when the file state and a BAT state will
be not consistent and the BAT entry will point out
of the file (could point out of the file).

But it is not possible to fix at all this way. Single sync
does not provide much warranty. This should be
dealt in the other way, like done in QCOW2.

We should write proper magic into ParallelsHeader->inuse
on open and call sync immediately. On subsequent
open of the broken image we should perform
check consistency. The magic should be set to 0
on clean close.

I think that this stuff could be implemented separately
in the next patchset. If you this that this is mandatory,
OK, I can do that, be this will increase patchset a lot.
Check consistency is not an easy stuff.

Regards,
     Den

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update
  2015-01-13 20:16         ` Denis V. Lunev
@ 2015-01-14 12:01           ` Roman Kagan
  2015-01-14 12:30             ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Roman Kagan @ 2015-01-14 12:01 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Jan 13, 2015 at 11:16:04PM +0300, Denis V. Lunev wrote:
> On 13/01/15 18:17, Denis V. Lunev wrote:
> >On 13/01/15 17:50, Roman Kagan wrote:
> >>On Tue, Dec 30, 2014 at 01:07:09PM +0300, Denis V. Lunev wrote:
> >>>--- a/block/parallels.c
> >>>+++ b/block/parallels.c
> >>>@@ -194,7 +194,7 @@ static int64_t allocate_sector(BlockDriverState
> >>>*bs, int64_t sector_num)
> >>>        tmp = cpu_to_le32(s->bat[idx]);
> >>>  -    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp,
> >>>sizeof(tmp));
> >>>+    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
> >>>      if (ret < 0) {
> >>>          return ret;
> >>>      }
> >>if the preceding bdrv_truncate(+cluster_size) gets reordered with this
> >>bat entry update, there's a risk that on poweroff (snapshot, etc.) the
> >>bat entry gets written to the storage while the file size is not
> >>updated.  As a result, the bat entry in the file would point at a
> >>cluster which hadn't been allocated yet.  When a new block is eventually
> >>added to the file, you'd have two bat entries pointing at the same
> >>cluster.
> >>
> >>The _sync version used to leave this window quite narrow due to the
> >>flush following the write.  The patch makes the reordering more likely.
> >>
> >>I'm afraid the only reliable way to handle it is to put a barrier
> >>between truncate and bat update, and mitigate the costs by batching the
> >>file expansions, as you seem to do in the followup patches.
>
> Thinking a bit more on this. IMHO there is no problem
> with a normal workflow, the problem could come
> from a crash when the file state and a BAT state will
> be not consistent and the BAT entry will point out
> of the file (could point out of the file).

Yes that's exactly the scenario I've been talking about.

> But it is not possible to fix at all this way. Single sync
> does not provide much warranty.

How's that?

> We should write proper magic into ParallelsHeader->inuse
> on open and call sync immediately. On subsequent
> open of the broken image we should perform
> check consistency. The magic should be set to 0
> on clean close.

This seems to address the problem, but only if this protocol is adhered
to by all the tools that may access the image (Parallels Desktop,
Parallels Server, ploop, etc.).  Furthermore, if it is, it *has* to be
adhered to by this code, too, while adding write support, otherwise
we'll break the assumptions of those other tools.

> I think that this stuff could be implemented separately
> in the next patchset. If you this that this is mandatory,
> OK, I can do that, be this will increase patchset a lot.
> Check consistency is not an easy stuff.

I think as a 1st approximation, instead of a full-fledged consistency
check and repair, just refusing write access to the image that has
->inuse set would suffice.

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update
  2015-01-14 12:01           ` Roman Kagan
@ 2015-01-14 12:30             ` Denis V. Lunev
  0 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-14 12:30 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 14/01/15 15:01, Roman Kagan wrote:
> On Tue, Jan 13, 2015 at 11:16:04PM +0300, Denis V. Lunev wrote:
>> On 13/01/15 18:17, Denis V. Lunev wrote:
>>> On 13/01/15 17:50, Roman Kagan wrote:
>>>> On Tue, Dec 30, 2014 at 01:07:09PM +0300, Denis V. Lunev wrote:
>>>>> --- a/block/parallels.c
>>>>> +++ b/block/parallels.c
>>>>> @@ -194,7 +194,7 @@ static int64_t allocate_sector(BlockDriverState
>>>>> *bs, int64_t sector_num)
>>>>>         tmp = cpu_to_le32(s->bat[idx]);
>>>>>   -    ret = bdrv_pwrite_sync(bs->file, bat_offset(idx), &tmp,
>>>>> sizeof(tmp));
>>>>> +    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
>>>>>       if (ret < 0) {
>>>>>           return ret;
>>>>>       }
>>>> if the preceding bdrv_truncate(+cluster_size) gets reordered with this
>>>> bat entry update, there's a risk that on poweroff (snapshot, etc.) the
>>>> bat entry gets written to the storage while the file size is not
>>>> updated.  As a result, the bat entry in the file would point at a
>>>> cluster which hadn't been allocated yet.  When a new block is eventually
>>>> added to the file, you'd have two bat entries pointing at the same
>>>> cluster.
>>>>
>>>> The _sync version used to leave this window quite narrow due to the
>>>> flush following the write.  The patch makes the reordering more likely.
>>>>
>>>> I'm afraid the only reliable way to handle it is to put a barrier
>>>> between truncate and bat update, and mitigate the costs by batching the
>>>> file expansions, as you seem to do in the followup patches.
>> Thinking a bit more on this. IMHO there is no problem
>> with a normal workflow, the problem could come
>> from a crash when the file state and a BAT state will
>> be not consistent and the BAT entry will point out
>> of the file (could point out of the file).
> Yes that's exactly the scenario I've been talking about.
>
>> But it is not possible to fix at all this way. Single sync
>> does not provide much warranty.
> How's that?
>
>> We should write proper magic into ParallelsHeader->inuse
>> on open and call sync immediately. On subsequent
>> open of the broken image we should perform
>> check consistency. The magic should be set to 0
>> on clean close.
> This seems to address the problem, but only if this protocol is adhered
> to by all the tools that may access the image (Parallels Desktop,
> Parallels Server, ploop, etc.).  Furthermore, if it is, it *has* to be
> adhered to by this code, too, while adding write support, otherwise
> we'll break the assumptions of those other tools.
>
>> I think that this stuff could be implemented separately
>> in the next patchset. If you this that this is mandatory,
>> OK, I can do that, be this will increase patchset a lot.
>> Check consistency is not an easy stuff.
> I think as a 1st approximation, instead of a full-fledged consistency
> check and repair, just refusing write access to the image that has
> ->inuse set would suffice.
>
> Roman.
ok, seems reasonable

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os Denis V. Lunev
@ 2015-01-14 13:03     ` Roman Kagan
  2015-01-14 13:08       ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Roman Kagan @ 2015-01-14 13:03 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:10PM +0300, Denis V. Lunev wrote:
> The idea is that we do not need to immediately sync BAT to the image as
> from the guest point of view there is a possibility that IO is lost
> even in the physical controller until flush command was finished.
> bdrv_co_flush_to_os is exactly the right place for this purpose.
> 
> Technically the patch aligns writes into BAT to MAX(bdrv_align, 4096),
> which elliminates read-modify-write transactions on BAT update and
> cache ready-to-write content in a special buffer in BDRVParallelsState.
> 
> This buffer possibly contains ParallelsHeader if the first page of the
> image should be modified. The header occupies first 64 bytes of the image
> and the BAT starts immediately after it.
> 
> It is also possible that BAT end is not aligned to the cluster size.
> ParallelsHeader->data_off is not specified for this case. We should write
> only part of the cache in that case.
> 
> This patch speed ups
>   qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
>   qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
> writing from 50-60 Mb/sec to 80-90 Mb/sec on rotational media and
> from 160 Mb/sec to 190 Mb/sec on SSD disk.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 46cf031..18b9267 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -62,6 +62,11 @@ typedef struct BDRVParallelsState {
>      uint32_t *bat;
>      unsigned int bat_size;
>  
> +    uint32_t *bat_cache;
> +    unsigned bat_cache_size;
> +    int bat_cache_off;
> +    int data_off;
> +
>      unsigned int tracks;
>  
>      unsigned int off_multiplier;
> @@ -148,6 +153,17 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
>          le32_to_cpus(&s->bat[i]);
>  
>      qemu_co_mutex_init(&s->lock);
> +
> +    s->bat_cache_off = -1;
> +    if (bs->open_flags & BDRV_O_RDWR) {
> +        s->bat_cache_size = MAX(bdrv_opt_mem_align(bs->file), 4096);
> +        s->bat_cache = qemu_blockalign(bs->file, s->bat_cache_size);
> +    }
> +    s->data_off = le32_to_cpu(s->ph.data_off) * BDRV_SECTOR_SIZE;
> +    if (s->data_off == 0) {
> +        s->data_off = ROUND_UP(bat_offset(s->bat_size), BDRV_SECTOR_SIZE);
> +    }
> +
>      return 0;
>  
>  fail_format:
> @@ -171,10 +187,71 @@ static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
>      return (uint64_t)s->bat[index] * s->off_multiplier + offset;
>  }
>  
> +static int write_bat_cache(BlockDriverState *bs)
> +{
> +    BDRVParallelsState *s = bs->opaque;
> +    int size, off;
> +
> +    if (s->bat_cache_off == -1) {
> +        /* no cached data */
> +        return 0;
> +    }
> +
> +    size = s->bat_cache_size;
> +    if (size + s->bat_cache_off > s->data_off) {
> +        /* avoid writing to the first data block */
> +        size = s->data_off - s->bat_cache_off;
> +    }
> +
> +    off = s->bat_cache_off;
> +    s->bat_cache_off = -1;
> +    return bdrv_pwrite(bs->file, off, s->bat_cache, size);
> +}
> +
> +static int cache_bat(BlockDriverState *bs, uint32_t idx, uint32_t new_data_off)
> +{
> +    int ret, i, off, cache_off;
> +    int64_t first_idx, last_idx;
> +    BDRVParallelsState *s = bs->opaque;
> +    uint32_t *cache = s->bat_cache;
> +
> +    off = bat_offset(idx);
> +    cache_off = (off / s->bat_cache_size) * s->bat_cache_size;
> +
> +    if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
> +        ret = write_bat_cache(bs);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +
> +    first_idx = idx - (off - cache_off) / sizeof(uint32_t);
> +    last_idx = first_idx + s->bat_cache_size / sizeof(uint32_t);
> +    if (first_idx < 0) {
> +        memcpy(s->bat_cache, &s->ph, sizeof(s->ph));
> +        first_idx = 0;
> +        cache = s->bat_cache + sizeof(s->ph) / sizeof(uint32_t);
> +    }
> +
> +    if (last_idx > s->bat_size) {
> +        memset(cache + s->bat_size - first_idx, 0,
> +               sizeof(uint32_t) * (last_idx - s->bat_size));
> +    }
> +
> +    for (i = 0; i < last_idx - first_idx; i++) {
> +        cache[i] = cpu_to_le32(s->bat[first_idx + i]);
> +    }

You're re-populating the bat_cache on every bat update.  Why?

Shouldn't this be done only with empty cache, i.e. under
if (s->bat_cache_off == -1)?

> +    cache[idx - first_idx] = cpu_to_le32(new_data_off);
> +    s->bat[idx] = new_data_off;
> +
> +    s->bat_cache_off = cache_off;
> +    return 0;
> +}
> +
>  static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
>  {
>      BDRVParallelsState *s = bs->opaque;
> -    uint32_t idx, offset, tmp;
> +    uint32_t idx, offset;
>      int64_t pos;
>      int ret;
>  
> @@ -190,17 +267,27 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
>  
>      pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
>      bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
> -    s->bat[idx] = pos / s->off_multiplier;
> -
> -    tmp = cpu_to_le32(s->bat[idx]);
>  
> -    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
> +    ret = cache_bat(bs, idx, pos / s->off_multiplier);
>      if (ret < 0) {
>          return ret;
>      }
>      return (uint64_t)s->bat[idx] * s->off_multiplier + offset;
>  }
>  
> +static coroutine_fn int parallels_co_flush_to_os(BlockDriverState *bs)
> +{
> +    BDRVParallelsState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    ret = write_bat_cache(bs);
> +    qemu_co_mutex_unlock(&s->lock);
> +
> +    return ret;
> +}
> +
> +
>  static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
>          int nb_sectors)
>  {
> @@ -387,6 +474,7 @@ exit:
>  static void parallels_close(BlockDriverState *bs)
>  {
>      BDRVParallelsState *s = bs->opaque;
> +    qemu_vfree(s->bat_cache);

Don't you need to flush the bat cache here first?

>      g_free(s->bat);
>  }
>  
> @@ -416,6 +504,7 @@ static BlockDriver bdrv_parallels = {
>      .bdrv_open		= parallels_open,
>      .bdrv_close		= parallels_close,
>      .bdrv_co_get_block_status = parallels_co_get_block_status,
> +    .bdrv_co_flush_to_os      = parallels_co_flush_to_os,
>      .bdrv_co_readv  = parallels_co_readv,
>      .bdrv_co_writev = parallels_co_writev,

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os
  2015-01-14 13:03     ` Roman Kagan
@ 2015-01-14 13:08       ` Denis V. Lunev
  2015-01-14 13:34         ` Roman Kagan
  0 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-14 13:08 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 14/01/15 16:03, Roman Kagan wrote:
> On Tue, Dec 30, 2014 at 01:07:10PM +0300, Denis V. Lunev wrote:
>> The idea is that we do not need to immediately sync BAT to the image as
>> from the guest point of view there is a possibility that IO is lost
>> even in the physical controller until flush command was finished.
>> bdrv_co_flush_to_os is exactly the right place for this purpose.
>>
>> Technically the patch aligns writes into BAT to MAX(bdrv_align, 4096),
>> which elliminates read-modify-write transactions on BAT update and
>> cache ready-to-write content in a special buffer in BDRVParallelsState.
>>
>> This buffer possibly contains ParallelsHeader if the first page of the
>> image should be modified. The header occupies first 64 bytes of the image
>> and the BAT starts immediately after it.
>>
>> It is also possible that BAT end is not aligned to the cluster size.
>> ParallelsHeader->data_off is not specified for this case. We should write
>> only part of the cache in that case.
>>
>> This patch speed ups
>>    qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G
>>    qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds
>> writing from 50-60 Mb/sec to 80-90 Mb/sec on rotational media and
>> from 160 Mb/sec to 190 Mb/sec on SSD disk.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block/parallels.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 94 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/parallels.c b/block/parallels.c
>> index 46cf031..18b9267 100644
>> --- a/block/parallels.c
>> +++ b/block/parallels.c
>> @@ -62,6 +62,11 @@ typedef struct BDRVParallelsState {
>>       uint32_t *bat;
>>       unsigned int bat_size;
>>   
>> +    uint32_t *bat_cache;
>> +    unsigned bat_cache_size;
>> +    int bat_cache_off;
>> +    int data_off;
>> +
>>       unsigned int tracks;
>>   
>>       unsigned int off_multiplier;
>> @@ -148,6 +153,17 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
>>           le32_to_cpus(&s->bat[i]);
>>   
>>       qemu_co_mutex_init(&s->lock);
>> +
>> +    s->bat_cache_off = -1;
>> +    if (bs->open_flags & BDRV_O_RDWR) {
>> +        s->bat_cache_size = MAX(bdrv_opt_mem_align(bs->file), 4096);
>> +        s->bat_cache = qemu_blockalign(bs->file, s->bat_cache_size);
>> +    }
>> +    s->data_off = le32_to_cpu(s->ph.data_off) * BDRV_SECTOR_SIZE;
>> +    if (s->data_off == 0) {
>> +        s->data_off = ROUND_UP(bat_offset(s->bat_size), BDRV_SECTOR_SIZE);
>> +    }
>> +
>>       return 0;
>>   
>>   fail_format:
>> @@ -171,10 +187,71 @@ static int64_t seek_to_sector(BDRVParallelsState *s, int64_t sector_num)
>>       return (uint64_t)s->bat[index] * s->off_multiplier + offset;
>>   }
>>   
>> +static int write_bat_cache(BlockDriverState *bs)
>> +{
>> +    BDRVParallelsState *s = bs->opaque;
>> +    int size, off;
>> +
>> +    if (s->bat_cache_off == -1) {
>> +        /* no cached data */
>> +        return 0;
>> +    }
>> +
>> +    size = s->bat_cache_size;
>> +    if (size + s->bat_cache_off > s->data_off) {
>> +        /* avoid writing to the first data block */
>> +        size = s->data_off - s->bat_cache_off;
>> +    }
>> +
>> +    off = s->bat_cache_off;
>> +    s->bat_cache_off = -1;
>> +    return bdrv_pwrite(bs->file, off, s->bat_cache, size);
>> +}
>> +
>> +static int cache_bat(BlockDriverState *bs, uint32_t idx, uint32_t new_data_off)
>> +{
>> +    int ret, i, off, cache_off;
>> +    int64_t first_idx, last_idx;
>> +    BDRVParallelsState *s = bs->opaque;
>> +    uint32_t *cache = s->bat_cache;
>> +
>> +    off = bat_offset(idx);
>> +    cache_off = (off / s->bat_cache_size) * s->bat_cache_size;
>> +
>> +    if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
>> +        ret = write_bat_cache(bs);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    first_idx = idx - (off - cache_off) / sizeof(uint32_t);
>> +    last_idx = first_idx + s->bat_cache_size / sizeof(uint32_t);
>> +    if (first_idx < 0) {
>> +        memcpy(s->bat_cache, &s->ph, sizeof(s->ph));
>> +        first_idx = 0;
>> +        cache = s->bat_cache + sizeof(s->ph) / sizeof(uint32_t);
>> +    }
>> +
>> +    if (last_idx > s->bat_size) {
>> +        memset(cache + s->bat_size - first_idx, 0,
>> +               sizeof(uint32_t) * (last_idx - s->bat_size));
>> +    }
>> +
>> +    for (i = 0; i < last_idx - first_idx; i++) {
>> +        cache[i] = cpu_to_le32(s->bat[first_idx + i]);
>> +    }
> You're re-populating the bat_cache on every bat update.  Why?
>
> Shouldn't this be done only with empty cache, i.e. under
> if (s->bat_cache_off == -1)?
reasonable, but condition should be a bit different, namely

if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {



>
>> +    cache[idx - first_idx] = cpu_to_le32(new_data_off);
>> +    s->bat[idx] = new_data_off;
>> +
>> +    s->bat_cache_off = cache_off;
>> +    return 0;
>> +}
>> +
>>   static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
>>   {
>>       BDRVParallelsState *s = bs->opaque;
>> -    uint32_t idx, offset, tmp;
>> +    uint32_t idx, offset;
>>       int64_t pos;
>>       int ret;
>>   
>> @@ -190,17 +267,27 @@ static int64_t allocate_sector(BlockDriverState *bs, int64_t sector_num)
>>   
>>       pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
>>       bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
>> -    s->bat[idx] = pos / s->off_multiplier;
>> -
>> -    tmp = cpu_to_le32(s->bat[idx]);
>>   
>> -    ret = bdrv_pwrite(bs->file, bat_offset(idx), &tmp, sizeof(tmp));
>> +    ret = cache_bat(bs, idx, pos / s->off_multiplier);
>>       if (ret < 0) {
>>           return ret;
>>       }
>>       return (uint64_t)s->bat[idx] * s->off_multiplier + offset;
>>   }
>>   
>> +static coroutine_fn int parallels_co_flush_to_os(BlockDriverState *bs)
>> +{
>> +    BDRVParallelsState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    ret = write_bat_cache(bs);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +
>> +    return ret;
>> +}
>> +
>> +
>>   static int cluster_remainder(BDRVParallelsState *s, int64_t sector_num,
>>           int nb_sectors)
>>   {
>> @@ -387,6 +474,7 @@ exit:
>>   static void parallels_close(BlockDriverState *bs)
>>   {
>>       BDRVParallelsState *s = bs->opaque;
>> +    qemu_vfree(s->bat_cache);
> Don't you need to flush the bat cache here first?

parallels_co_flush_to_os should happen beforehand


void bdrv_close(BlockDriverState *bs)
{
     ....
     bdrv_flush(bs); <-- namely inside this


>>       g_free(s->bat);
>>   }
>>   
>> @@ -416,6 +504,7 @@ static BlockDriver bdrv_parallels = {
>>       .bdrv_open		= parallels_open,
>>       .bdrv_close		= parallels_close,
>>       .bdrv_co_get_block_status = parallels_co_get_block_status,
>> +    .bdrv_co_flush_to_os      = parallels_co_flush_to_os,
>>       .bdrv_co_readv  = parallels_co_readv,
>>       .bdrv_co_writev = parallels_co_writev,
> Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os
  2015-01-14 13:08       ` Denis V. Lunev
@ 2015-01-14 13:34         ` Roman Kagan
  2015-01-14 14:29           ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Roman Kagan @ 2015-01-14 13:34 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Wed, Jan 14, 2015 at 04:08:50PM +0300, Denis V. Lunev wrote:
> On 14/01/15 16:03, Roman Kagan wrote:
> >On Tue, Dec 30, 2014 at 01:07:10PM +0300, Denis V. Lunev wrote:
> >>+static int cache_bat(BlockDriverState *bs, uint32_t idx, uint32_t new_data_off)
> >>+{
> >>+    int ret, i, off, cache_off;
> >>+    int64_t first_idx, last_idx;
> >>+    BDRVParallelsState *s = bs->opaque;
> >>+    uint32_t *cache = s->bat_cache;
> >>+
> >>+    off = bat_offset(idx);
> >>+    cache_off = (off / s->bat_cache_size) * s->bat_cache_size;
> >>+
> >>+    if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
> >>+        ret = write_bat_cache(bs);
> >>+        if (ret < 0) {
> >>+            return ret;
> >>+        }
> >>+    }
> >>+
> >>+    first_idx = idx - (off - cache_off) / sizeof(uint32_t);
> >>+    last_idx = first_idx + s->bat_cache_size / sizeof(uint32_t);
> >>+    if (first_idx < 0) {
> >>+        memcpy(s->bat_cache, &s->ph, sizeof(s->ph));
> >>+        first_idx = 0;
> >>+        cache = s->bat_cache + sizeof(s->ph) / sizeof(uint32_t);
> >>+    }
> >>+
> >>+    if (last_idx > s->bat_size) {
> >>+        memset(cache + s->bat_size - first_idx, 0,
> >>+               sizeof(uint32_t) * (last_idx - s->bat_size));
> >>+    }
> >>+
> >>+    for (i = 0; i < last_idx - first_idx; i++) {
> >>+        cache[i] = cpu_to_le32(s->bat[first_idx + i]);
> >>+    }
> >You're re-populating the bat_cache on every bat update.  Why?
> >
> >Shouldn't this be done only with empty cache, i.e. under
> >if (s->bat_cache_off == -1)?
> reasonable, but condition should be a bit different, namely
> 
> if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {

No, this is the condition to flush the cache which you already do.

Then, either upon flushing it or on the first entry into cache_bat(),
the cache is empty which is indicated by ->bat_cache_off == -1, at which
point you need to populate it from ->bat.

> >>  static void parallels_close(BlockDriverState *bs)
> >>  {
> >>      BDRVParallelsState *s = bs->opaque;
> >>+    qemu_vfree(s->bat_cache);
> >Don't you need to flush the bat cache here first?
> 
> parallels_co_flush_to_os should happen beforehand
> 
> 
> void bdrv_close(BlockDriverState *bs)
> {
>     ....
>     bdrv_flush(bs); <-- namely inside this

Indeed.

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets Denis V. Lunev
@ 2015-01-14 14:26     ` Roman Kagan
  2015-01-14 14:31       ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Roman Kagan @ 2015-01-14 14:26 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:11PM +0300, Denis V. Lunev wrote:
> This is preparational commit for tweaks in Parallels image expansion.
> The idea is that enlarge via truncate by one data block is slow. It
> would be much better to use fallocate via bdrv_write_zeroes and
> expand by some significant amount at once.
> 
> This patch just adds proper parameters into BDRVParallelsState and
> performs options parsing in parallels_open.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 18b9267..12a9cea 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -30,6 +30,7 @@
>  #include "qemu-common.h"
>  #include "block/block_int.h"
>  #include "qemu/module.h"
> +#include "qapi/util.h"
>  
>  /**************************************************************/
>  
> @@ -54,6 +55,20 @@ typedef struct ParallelsHeader {
>      char padding[12];
>  } QEMU_PACKED ParallelsHeader;
>  
> +
> +typedef enum ParallelsPreallocMode {
> +    PRL_PREALLOC_MODE_FALLOCATE = 0,
> +    PRL_PREALLOC_MODE_TRUNCATE = 1,
> +    PRL_PREALLOC_MODE_MAX = 2,
> +} ParallelsPreallocMode;
> +
> +static const char *prealloc_mode_lookup[] = {
> +    "falloc",
> +    "truncate",
> +    NULL,

There is already generic "preallocaton" option, BLOCK_OPT_PREALLOC,
which is handled by qcow2 and raw-posix.  It means slightly different
thing: the *whole* image is preallocated using the method specified.

I think it would make sense to consolidate that option with this new
batched allocation in the generic block code.  I guess qcow2 and
raw-posix would benefit from it, too.

At any rate I think it's a matter for a separate patchset.

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os
  2015-01-14 13:34         ` Roman Kagan
@ 2015-01-14 14:29           ` Denis V. Lunev
  2015-01-14 16:44             ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-14 14:29 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 14/01/15 16:34, Roman Kagan wrote:
> On Wed, Jan 14, 2015 at 04:08:50PM +0300, Denis V. Lunev wrote:
>> On 14/01/15 16:03, Roman Kagan wrote:
>>> On Tue, Dec 30, 2014 at 01:07:10PM +0300, Denis V. Lunev wrote:
>>>> +static int cache_bat(BlockDriverState *bs, uint32_t idx, uint32_t new_data_off)
>>>> +{
>>>> +    int ret, i, off, cache_off;
>>>> +    int64_t first_idx, last_idx;
>>>> +    BDRVParallelsState *s = bs->opaque;
>>>> +    uint32_t *cache = s->bat_cache;
>>>> +
>>>> +    off = bat_offset(idx);
>>>> +    cache_off = (off / s->bat_cache_size) * s->bat_cache_size;
>>>> +
>>>> +    if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
>>>> +        ret = write_bat_cache(bs);
>>>> +        if (ret < 0) {
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    first_idx = idx - (off - cache_off) / sizeof(uint32_t);
>>>> +    last_idx = first_idx + s->bat_cache_size / sizeof(uint32_t);
>>>> +    if (first_idx < 0) {
>>>> +        memcpy(s->bat_cache, &s->ph, sizeof(s->ph));
>>>> +        first_idx = 0;
>>>> +        cache = s->bat_cache + sizeof(s->ph) / sizeof(uint32_t);
>>>> +    }
>>>> +
>>>> +    if (last_idx > s->bat_size) {
>>>> +        memset(cache + s->bat_size - first_idx, 0,
>>>> +               sizeof(uint32_t) * (last_idx - s->bat_size));
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < last_idx - first_idx; i++) {
>>>> +        cache[i] = cpu_to_le32(s->bat[first_idx + i]);
>>>> +    }
>>> You're re-populating the bat_cache on every bat update.  Why?
>>>
>>> Shouldn't this be done only with empty cache, i.e. under
>>> if (s->bat_cache_off == -1)?
>> reasonable, but condition should be a bit different, namely
>>
>> if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
> No, this is the condition to flush the cache which you already do.
>
> Then, either upon flushing it or on the first entry into cache_bat(),
> the cache is empty which is indicated by ->bat_cache_off == -1, at which
> point you need to populate it from ->bat.

you are wrong. BAT cache is a single page of the whole BAT
which can be more than several MBs. This page should be
repopulated when the off is changed.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets
  2015-01-14 14:26     ` Roman Kagan
@ 2015-01-14 14:31       ` Denis V. Lunev
  2015-01-14 17:22         ` Roman Kagan
  0 siblings, 1 reply; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-14 14:31 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 14/01/15 17:26, Roman Kagan wrote:
> On Tue, Dec 30, 2014 at 01:07:11PM +0300, Denis V. Lunev wrote:
>> This is preparational commit for tweaks in Parallels image expansion.
>> The idea is that enlarge via truncate by one data block is slow. It
>> would be much better to use fallocate via bdrv_write_zeroes and
>> expand by some significant amount at once.
>>
>> This patch just adds proper parameters into BDRVParallelsState and
>> performs options parsing in parallels_open.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block/parallels.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 72 insertions(+)
>>
>> diff --git a/block/parallels.c b/block/parallels.c
>> index 18b9267..12a9cea 100644
>> --- a/block/parallels.c
>> +++ b/block/parallels.c
>> @@ -30,6 +30,7 @@
>>   #include "qemu-common.h"
>>   #include "block/block_int.h"
>>   #include "qemu/module.h"
>> +#include "qapi/util.h"
>>   
>>   /**************************************************************/
>>   
>> @@ -54,6 +55,20 @@ typedef struct ParallelsHeader {
>>       char padding[12];
>>   } QEMU_PACKED ParallelsHeader;
>>   
>> +
>> +typedef enum ParallelsPreallocMode {
>> +    PRL_PREALLOC_MODE_FALLOCATE = 0,
>> +    PRL_PREALLOC_MODE_TRUNCATE = 1,
>> +    PRL_PREALLOC_MODE_MAX = 2,
>> +} ParallelsPreallocMode;
>> +
>> +static const char *prealloc_mode_lookup[] = {
>> +    "falloc",
>> +    "truncate",
>> +    NULL,
> There is already generic "preallocaton" option, BLOCK_OPT_PREALLOC,
> which is handled by qcow2 and raw-posix.  It means slightly different
> thing: the *whole* image is preallocated using the method specified.
>
> I think it would make sense to consolidate that option with this new
> batched allocation in the generic block code.  I guess qcow2 and
> raw-posix would benefit from it, too.
>
> At any rate I think it's a matter for a separate patchset.
>
> Roman.
it is too early :) I think that I should provide the rationale for
the preallocation in general. I am working on this with CEPH :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os
  2015-01-14 14:29           ` Denis V. Lunev
@ 2015-01-14 16:44             ` Denis V. Lunev
  0 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-14 16:44 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 14/01/15 17:29, Denis V. Lunev wrote:
> On 14/01/15 16:34, Roman Kagan wrote:
>> On Wed, Jan 14, 2015 at 04:08:50PM +0300, Denis V. Lunev wrote:
>>> On 14/01/15 16:03, Roman Kagan wrote:
>>>> On Tue, Dec 30, 2014 at 01:07:10PM +0300, Denis V. Lunev wrote:
>>>>> +static int cache_bat(BlockDriverState *bs, uint32_t idx, uint32_t 
>>>>> new_data_off)
>>>>> +{
>>>>> +    int ret, i, off, cache_off;
>>>>> +    int64_t first_idx, last_idx;
>>>>> +    BDRVParallelsState *s = bs->opaque;
>>>>> +    uint32_t *cache = s->bat_cache;
>>>>> +
>>>>> +    off = bat_offset(idx);
>>>>> +    cache_off = (off / s->bat_cache_size) * s->bat_cache_size;
>>>>> +
>>>>> +    if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
>>>>> +        ret = write_bat_cache(bs);
>>>>> +        if (ret < 0) {
>>>>> +            return ret;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    first_idx = idx - (off - cache_off) / sizeof(uint32_t);
>>>>> +    last_idx = first_idx + s->bat_cache_size / sizeof(uint32_t);
>>>>> +    if (first_idx < 0) {
>>>>> +        memcpy(s->bat_cache, &s->ph, sizeof(s->ph));
>>>>> +        first_idx = 0;
>>>>> +        cache = s->bat_cache + sizeof(s->ph) / sizeof(uint32_t);
>>>>> +    }
>>>>> +
>>>>> +    if (last_idx > s->bat_size) {
>>>>> +        memset(cache + s->bat_size - first_idx, 0,
>>>>> +               sizeof(uint32_t) * (last_idx - s->bat_size));
>>>>> +    }
>>>>> +
>>>>> +    for (i = 0; i < last_idx - first_idx; i++) {
>>>>> +        cache[i] = cpu_to_le32(s->bat[first_idx + i]);
>>>>> +    }
>>>> You're re-populating the bat_cache on every bat update. Why?
>>>>
>>>> Shouldn't this be done only with empty cache, i.e. under
>>>> if (s->bat_cache_off == -1)?
>>> reasonable, but condition should be a bit different, namely
>>>
>>> if (s->bat_cache_off != -1 && s->bat_cache_off != cache_off) {
>> No, this is the condition to flush the cache which you already do.
>>
>> Then, either upon flushing it or on the first entry into cache_bat(),
>> the cache is empty which is indicated by ->bat_cache_off == -1, at which
>> point you need to populate it from ->bat.
>
> you are wrong. BAT cache is a single page of the whole BAT
> which can be more than several MBs. This page should be
> repopulated when the off is changed.
ok, you are spoken about write_bat_cache which effectively marks
cache as invalid. Thus we are both spoken about the same
condition and we are on the same page.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets
  2015-01-14 14:31       ` Denis V. Lunev
@ 2015-01-14 17:22         ` Roman Kagan
  0 siblings, 0 replies; 38+ messages in thread
From: Roman Kagan @ 2015-01-14 17:22 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Wed, Jan 14, 2015 at 05:31:20PM +0300, Denis V. Lunev wrote:
> On 14/01/15 17:26, Roman Kagan wrote:
> >On Tue, Dec 30, 2014 at 01:07:11PM +0300, Denis V. Lunev wrote:
> >>This is preparational commit for tweaks in Parallels image expansion.
> >>The idea is that enlarge via truncate by one data block is slow. It
> >>would be much better to use fallocate via bdrv_write_zeroes and
> >>expand by some significant amount at once.
> >>
> >>This patch just adds proper parameters into BDRVParallelsState and
> >>performs options parsing in parallels_open.
> >>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>CC: Kevin Wolf <kwolf@redhat.com>
> >>CC: Stefan Hajnoczi <stefanha@redhat.com>
> >>---
> >>  block/parallels.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 72 insertions(+)
> >>
> >>diff --git a/block/parallels.c b/block/parallels.c
> >>index 18b9267..12a9cea 100644
> >>--- a/block/parallels.c
> >>+++ b/block/parallels.c
> >>@@ -30,6 +30,7 @@
> >>  #include "qemu-common.h"
> >>  #include "block/block_int.h"
> >>  #include "qemu/module.h"
> >>+#include "qapi/util.h"
> >>  /**************************************************************/
> >>@@ -54,6 +55,20 @@ typedef struct ParallelsHeader {
> >>      char padding[12];
> >>  } QEMU_PACKED ParallelsHeader;
> >>+
> >>+typedef enum ParallelsPreallocMode {
> >>+    PRL_PREALLOC_MODE_FALLOCATE = 0,
> >>+    PRL_PREALLOC_MODE_TRUNCATE = 1,
> >>+    PRL_PREALLOC_MODE_MAX = 2,
> >>+} ParallelsPreallocMode;
> >>+
> >>+static const char *prealloc_mode_lookup[] = {
> >>+    "falloc",
> >>+    "truncate",
> >>+    NULL,
> >There is already generic "preallocaton" option, BLOCK_OPT_PREALLOC,
> >which is handled by qcow2 and raw-posix.  It means slightly different
> >thing: the *whole* image is preallocated using the method specified.
> >
> >I think it would make sense to consolidate that option with this new
> >batched allocation in the generic block code.  I guess qcow2 and
> >raw-posix would benefit from it, too.
> >
> >At any rate I think it's a matter for a separate patchset.
> >
> >Roman.
> it is too early :) I think that I should provide the rationale for
> the preallocation in general. I am working on this with CEPH :)

OK then, maybe indeed it should land in parallels driver first, and move
into the generic code if found appropriate...

Then my only problem with this patch is the naming of the options: they
look too similar to the generic one while the meaning is different.
Maybe "batched_alloc" reflects the meaning better?

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion
  2014-12-30 10:07   ` [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion Denis V. Lunev
@ 2015-01-14 17:56     ` Roman Kagan
  2015-01-14 17:57       ` Denis V. Lunev
  0 siblings, 1 reply; 38+ messages in thread
From: Roman Kagan @ 2015-01-14 17:56 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Tue, Dec 30, 2014 at 01:07:12PM +0300, Denis V. Lunev wrote:
> Plain image expansion spends a lot of time to update image file size.
> This seriously affects the performance. The following simple test
>   qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
>   qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
> could be improved if the format driver will pre-allocate some space
> in the image file with a reasonable chunk.
> 
> This patch preallocates 128 Mb using bdrv_write_zeroes, which should
> normally use fallocate() call inside. Fallback to older truncate()
> could be used as a fallback using image open options thanks to the
> previous patch.
> 
> The benefit is around 15%.
> 
> This patch is final in this series. Block driver has near native
> performance now.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/parallels.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 12a9cea..5ec4a0d 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -82,6 +82,7 @@ typedef struct BDRVParallelsState {
>      int bat_cache_off;
>      int data_off;
>  
> +    int64_t  prealloc_off;

This field name confused me.  I think "data_end" would fit better.

Otherwise looks good to me.

Roman.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion
  2015-01-14 17:56     ` Roman Kagan
@ 2015-01-14 17:57       ` Denis V. Lunev
  0 siblings, 0 replies; 38+ messages in thread
From: Denis V. Lunev @ 2015-01-14 17:57 UTC (permalink / raw)
  To: Roman Kagan, Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 14/01/15 20:56, Roman Kagan wrote:
> On Tue, Dec 30, 2014 at 01:07:12PM +0300, Denis V. Lunev wrote:
>> Plain image expansion spends a lot of time to update image file size.
>> This seriously affects the performance. The following simple test
>>    qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
>>    qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
>> could be improved if the format driver will pre-allocate some space
>> in the image file with a reasonable chunk.
>>
>> This patch preallocates 128 Mb using bdrv_write_zeroes, which should
>> normally use fallocate() call inside. Fallback to older truncate()
>> could be used as a fallback using image open options thanks to the
>> previous patch.
>>
>> The benefit is around 15%.
>>
>> This patch is final in this series. Block driver has near native
>> performance now.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block/parallels.c | 35 +++++++++++++++++++++++++++++++++--
>>   1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/parallels.c b/block/parallels.c
>> index 12a9cea..5ec4a0d 100644
>> --- a/block/parallels.c
>> +++ b/block/parallels.c
>> @@ -82,6 +82,7 @@ typedef struct BDRVParallelsState {
>>       int bat_cache_off;
>>       int data_off;
>>   
>> +    int64_t  prealloc_off;
> This field name confused me.  I think "data_end" would fit better.
>
> Otherwise looks good to me.
>
> Roman.
agree

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-01-14 17:57 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-30  9:28 [Qemu-devel] [PATCH v2 0/19] write/create for Parallels images with reasonable performance Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 01/19] iotests, parallels: quote TEST_IMG in 076 test to be path-safe Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 02/19] block/parallels: rename parallels_header to ParallelsHeader Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 03/19] block/parallels: switch to bdrv_read Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 04/19] block/parallels: read up to cluster end in one go Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 05/19] block/parallels: add get_block_status Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 06/19] block/parallels: provide _co_readv routine for parallels format driver Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 07/19] block/parallels: replace magic constants 4, 64 with proper sizeofs Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 08/19] block/parallels: _co_writev callback for Parallels format Denis V. Lunev
2014-12-30  9:28 ` [Qemu-devel] [PATCH 09/19] iotests, parallels: test for write into Parallels image Denis V. Lunev
2014-12-30 10:07 ` [Qemu-devel] [PATCH 10/19] block/parallels: support parallels image creation Denis V. Lunev
2014-12-30 10:07   ` [Qemu-devel] [PATCH 11/19] iotests, parallels: test for newly created parallels image via qemu-img Denis V. Lunev
2014-12-30 10:07   ` [Qemu-devel] [PATCH 12/19] parallels: change copyright information in the image header Denis V. Lunev
2014-12-30 10:07   ` [Qemu-devel] [PATCH 13/19] block/parallels: store ParallelsHeader to the BDRVParallelsState Denis V. Lunev
2015-01-13 14:13     ` Roman Kagan
2014-12-30 10:07   ` [Qemu-devel] [PATCH 14/19] block/parallels: create catalog_offset helper Denis V. Lunev
2015-01-13 14:14     ` Roman Kagan
2014-12-30 10:07   ` [Qemu-devel] [PATCH 15/19] block/parallels: rename catalog_ names to bat_ Denis V. Lunev
2015-01-13 14:15     ` Roman Kagan
2014-12-30 10:07   ` [Qemu-devel] [PATCH 16/19] block/parallels: no need to flush on each block allocation table update Denis V. Lunev
2015-01-13 14:50     ` Roman Kagan
2015-01-13 15:17       ` Denis V. Lunev
2015-01-13 20:16         ` Denis V. Lunev
2015-01-14 12:01           ` Roman Kagan
2015-01-14 12:30             ` Denis V. Lunev
2014-12-30 10:07   ` [Qemu-devel] [PATCH 17/19] block/parallels: delay writing to BAT till bdrv_co_flush_to_os Denis V. Lunev
2015-01-14 13:03     ` Roman Kagan
2015-01-14 13:08       ` Denis V. Lunev
2015-01-14 13:34         ` Roman Kagan
2015-01-14 14:29           ` Denis V. Lunev
2015-01-14 16:44             ` Denis V. Lunev
2014-12-30 10:07   ` [Qemu-devel] [PATCH 18/19] block/parallels: add prealloc-mode and prealloc-size open paramemets Denis V. Lunev
2015-01-14 14:26     ` Roman Kagan
2015-01-14 14:31       ` Denis V. Lunev
2015-01-14 17:22         ` Roman Kagan
2014-12-30 10:07   ` [Qemu-devel] [PATCH 19/19] block/parallels: optimize linear image expansion Denis V. Lunev
2015-01-14 17:56     ` Roman Kagan
2015-01-14 17:57       ` Denis V. Lunev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).