From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43285)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YRQzo-0000lC-0X
	for qemu-devel@nongnu.org; Fri, 27 Feb 2015 14:54:49 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1YRQzm-0000dH-S6
	for qemu-devel@nongnu.org; Fri, 27 Feb 2015 14:54:47 -0500
From: Max Reitz <mreitz@redhat.com>
Date: Fri, 27 Feb 2015 14:54:39 -0500
Message-Id: <1425066879-27326-1-git-send-email-mreitz@redhat.com>
Subject: [Qemu-devel] [PATCH v3] block/vdi: Add locking for parallel requests
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Weil <sw@weilnetz.de>, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>

When allocating a new cluster, the first write to it must be the one
doing the allocation, because that one pads its write request to the
cluster size; if another write to that cluster is executed before it,
that write will be overwritten due to the padding.

See https://bugs.launchpad.net/qemu/+bug/1422307 for what can go wrong
without this patch.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
v3: Hopefully finally found the real issue which causes the problems
    described in the bug report; at least it sounds very reasonable and
    I can no longer reproduce any of the issues described there.
    Thank you, Paolo and Stefan!
---
 block/vdi.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/block/vdi.c b/block/vdi.c
index 74030c6..53bd02f 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -53,6 +53,7 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "migration/migration.h"
+#include "block/coroutine.h"
 
 #if defined(CONFIG_UUID)
 #include <uuid/uuid.h>
@@ -196,6 +197,8 @@ typedef struct {
     /* VDI header (converted to host endianness). */
     VdiHeader header;
 
+    CoMutex write_lock;
+
     Error *migration_blocker;
 } BDRVVdiState;
 
@@ -504,6 +507,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
               "vdi", bdrv_get_device_name(bs), "live migration");
     migrate_add_blocker(s->migration_blocker);
 
+    qemu_co_mutex_init(&s->write_lock);
+
     return 0;
 
  fail_free_bmap:
@@ -639,11 +644,31 @@ static int vdi_co_write(BlockDriverState *bs,
                    buf, n_sectors * SECTOR_SIZE);
             memset(block + (sector_in_block + n_sectors) * SECTOR_SIZE, 0,
                    (s->block_sectors - n_sectors - sector_in_block) * SECTOR_SIZE);
+
+            /* Note that this coroutine does not yield anywhere from reading the
+             * bmap entry until here, so in regards to all the coroutines trying
+             * to write to this cluster, the one doing the allocation will
+             * always be the first to try to acquire the lock.
+             * Therefore, it is also the first that will actually be able to
+             * acquire the lock and thus the padded cluster is written before
+             * the other coroutines can write to the affected area. */
+            qemu_co_mutex_lock(&s->write_lock);
             ret = bdrv_write(bs->file, offset, block, s->block_sectors);
+            qemu_co_mutex_unlock(&s->write_lock);
         } else {
             uint64_t offset = s->header.offset_data / SECTOR_SIZE +
                               (uint64_t)bmap_entry * s->block_sectors +
                               sector_in_block;
+            qemu_co_mutex_lock(&s->write_lock);
+            /* This lock is only used to make sure the following write operation
+             * is executed after the write issued by the coroutine allocating
+             * this cluster, therefore we do not need to keep it locked.
+             * As stated above, the allocating coroutine will always try to lock
+             * the mutex before all the other concurrent accesses to that
+             * cluster, therefore at this point we can be absolutely certain
+             * that that write operation has returned (there may be other writes
+             * in flight, but they do not concern this very operation). */
+            qemu_co_mutex_unlock(&s->write_lock);
             ret = bdrv_write(bs->file, offset, buf, n_sectors);
         }
 
-- 
2.1.0