qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [PATCH] Disk image shared and exclusive locks.
Date: Fri, 4 Dec 2009 16:53:01 +0000	[thread overview]
Message-ID: <20091204165301.GA4167@amd.home.annexia.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]

[from the commit message ...]

Allow qemu to acquire shared and exclusive locks on disk images.
This is done by extending the -drive option with an additional,
optional parameter:

  -drive [...],lock=none
  -drive [...],lock=shared
  -drive [...],lock=exclusive

lock=none is the default, and it means that we don't try to acquire
any sort of lock.

lock=shared tries to acquire a shared lock on the disk image.
Multiple instances of qemu may all hold this sort of lock.

lock=exclusive tries to acquire an exclusive lock on the disk
image.  An exclusive lock excludes all other shared and exclusive
locks.

If acquisition of a lock fails, opening the image fails.

The implementation of locks only works for raw POSIX and Win32
files.  However many of the other block types are implemented
in terms of these drivers, so they "inherit" locking too.  Other
drivers are read-only, so don't require locking.  Below we note
only the cases where locking is *not* implemented:

  cloop - directly open()s the file, no locking implemented
  cow - same as cloop
  curl - protocol probably doesn't support locking
  nbd - same as curl

---

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

[-- Attachment #2: 0001-Disk-image-shared-and-exclusive-locks.patch --]
[-- Type: text/plain, Size: 8053 bytes --]

>From 9706b0e9e553cb874a9c1fd569620ee689b6efaa Mon Sep 17 00:00:00 2001
From: Richard Jones <rjones@redhat.com>
Date: Fri, 4 Dec 2009 15:07:07 +0000
Subject: [PATCH] Disk image shared and exclusive locks.

Allow qemu to acquire shared and exclusive locks on disk images.
This is done by extending the -drive option with an additional,
optional parameter:

  -drive [...],lock=none
  -drive [...],lock=shared
  -drive [...],lock=exclusive

lock=none is the default, and it means that we don't try to acquire
any sort of lock.

lock=shared tries to acquire a shared lock on the disk image.
Multiple instances of qemu may all hold this sort of lock.

lock=exclusive tries to acquire an exclusive lock on the disk
image.  An exclusive lock excludes all other shared and exclusive
locks.

If acquisition of a lock fails, opening the image fails.

The implementation of locks only works for raw POSIX and Win32
files.  However many of the other block types are implemented
in terms of these drivers, so they "inherit" locking too.  Other
drivers are read-only, so don't require locking.  Below we note
only the cases where locking is *not* implemented:

  cloop - directly open()s the file, no locking implemented
  cow - same as cloop
  curl - protocol probably doesn't support locking
  nbd - same as curl
---
 block.c           |    2 +-
 block.h           |    3 +++
 block/raw-posix.c |   13 +++++++++++++
 block/raw-win32.c |   17 +++++++++++++++++
 qemu-config.c     |    4 ++++
 qemu-options.hx   |    7 +++++++
 roms/seabios      |    2 +-
 vl.c              |   16 ++++++++++++++++
 8 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 853f025..653ffeb 100644
--- a/block.c
+++ b/block.c
@@ -448,7 +448,7 @@ int bdrv_open2(BlockDriverState *bs, const char *filename, int flags,
     try_rw = !bs->read_only || bs->is_temporary;
     if (!(flags & BDRV_O_FILE))
         open_flags = (try_rw ? BDRV_O_RDWR : 0) |
-            (flags & (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO));
+            (flags & (BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO|BDRV_O_LOCK_MASK));
     else
         open_flags = flags & ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
     if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv))
diff --git a/block.h b/block.h
index 4a8b628..5458619 100644
--- a/block.h
+++ b/block.h
@@ -38,8 +38,11 @@ typedef struct QEMUSnapshotInfo {
 #define BDRV_O_NOCACHE     0x0020 /* do not use the host page cache */
 #define BDRV_O_CACHE_WB    0x0040 /* use write-back caching */
 #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
+#define BDRV_O_LOCK_SHARED 0x0100 /* fail unless we can lock shared */
+#define BDRV_O_LOCK_EXCLUSIVE 0x0200 /* fail unless we can lock exclusive */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)
+#define BDRV_O_LOCK_MASK   (BDRV_O_LOCK_SHARED | BDRV_O_LOCK_EXCLUSIVE)
 
 #define BDRV_SECTOR_BITS   9
 #define BDRV_SECTOR_SIZE   (1 << BDRV_SECTOR_BITS)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a6a22b..e6c141b 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -133,6 +133,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
 {
     BDRVRawState *s = bs->opaque;
     int fd, ret;
+    struct flock lk;
 
     s->lseek_err_cnt = 0;
 
@@ -163,6 +164,18 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
     s->fd = fd;
     s->aligned_buf = NULL;
 
+    if (bdrv_flags & BDRV_O_LOCK_MASK) {
+        if (bdrv_flags & BDRV_O_LOCK_SHARED)
+            lk.l_type = F_RDLCK;
+        else /* bdrv_flags & BDRV_O_LOCK_EXCLUSIVE */
+            lk.l_type = F_WRLCK;
+        lk.l_whence = SEEK_SET;
+        lk.l_start = 0;
+        lk.l_len = 0; /* means lock the whole file */
+        if (fcntl (fd, F_SETLK, &lk) == -1)
+            goto out_close;
+    }
+
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
         s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE);
         if (s->aligned_buf == NULL) {
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 72acad5..9d0cfc7 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -78,6 +78,8 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
     BDRVRawState *s = bs->opaque;
     int access_flags, create_flags;
     DWORD overlapped;
+    DWORD lock_flags;
+    OVERLAPPED ov;
 
     s->type = FTYPE_FILE;
 
@@ -106,6 +108,21 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
             return -EACCES;
         return -1;
     }
+
+    if (flags & BDRV_O_LOCK_MASK) {
+        lock_flags = LOCKFILE_FAIL_IMMEDIATELY;
+        if (flags & BDRV_O_LOCK_EXCLUSIVE)
+            lock_flags |= LOCKFILE_EXCLUSIVE_LOCK;
+
+        memset(&ov, 0, sizeof(ov));
+        ov.Offset = 0;
+        ov.OffsetHigh = 0;
+
+        if (!LockFileEx(s->hfile, lock_flags, 0, 1, 0, &ov))
+            /* For compatibility with the POSIX lock failure ... */
+            return -EAGAIN;
+    }
+
     return 0;
 }
 
diff --git a/qemu-config.c b/qemu-config.c
index 92b5363..106ebcf 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -77,6 +77,10 @@ QemuOptsList qemu_drive_opts = {
         },{
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
+        },{
+            .name = "lock",
+            .type = QEMU_OPT_STRING,
+            .help = "lock disk image (exclusive, shared, none)",
         },
         { /* end if list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 1b5781a..c43eefb 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -104,6 +104,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none][,format=f][,serial=s]\n"
     "       [,addr=A][,id=name][,aio=threads|native]\n"
+    "       [,lock=exclusive|shared|none]\n"
     "                use 'file' as a drive image\n")
 DEF("set", HAS_ARG, QEMU_OPTION_set,
     "-set group.id.arg=value\n"
@@ -146,6 +147,12 @@ an untrusted format header.
 This option specifies the serial number to assign to the device.
 @item addr=@var{addr}
 Specify the controller's PCI address (if=virtio only).
+@item lock=@var{mode}
+Acquire a lock on the disk image (@var{file}).
+Available modes are: exclusive, shared, none.
+The default is "none", meaning we don't try to acquire a lock.  To
+avoid multiple virtual machines trying to write to a disk at the
+same time (which can cause disk corruption), use lock=exclusive.
 @end table
 
 By default, writethrough caching is used for all block device.  This means that
diff --git a/roms/seabios b/roms/seabios
index 42bc394..4d9d400 160000
--- a/roms/seabios
+++ b/roms/seabios
@@ -1 +1 @@
-Subproject commit 42bc3940d93911e382f5e72289f043d1faa9083e
+Subproject commit 4d9d400031417528d4fbe05f845f3759237e48d3
diff --git a/vl.c b/vl.c
index 96ab020..69e8cb7 100644
--- a/vl.c
+++ b/vl.c
@@ -2030,6 +2030,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
     const char *devaddr;
     DriveInfo *dinfo;
     int snapshot = 0;
+    int lockmode = 0;
 
     *fatal_error = 1;
 
@@ -2220,6 +2221,19 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
         }
     }
 
+    if ((buf = qemu_opt_get(opts, "lock")) != NULL) {
+        if (!strcmp(buf, "none"))
+            lockmode = 0;
+        else if (!strcmp(buf, "shared"))
+            lockmode = BDRV_O_LOCK_SHARED;
+        else if (!strcmp(buf, "exclusive"))
+            lockmode = BDRV_O_LOCK_EXCLUSIVE;
+        else {
+           fprintf(stderr, "qemu: invalid lock option\n");
+           return NULL;
+        }
+    }
+
     /* compute bus and unit according index */
 
     if (index != -1) {
@@ -2364,6 +2378,8 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
         (void)bdrv_set_read_only(dinfo->bdrv, 1);
     }
 
+    bdrv_flags |= lockmode;
+
     if (bdrv_open2(dinfo->bdrv, file, bdrv_flags, drv) < 0) {
         fprintf(stderr, "qemu: could not open disk image %s: %s\n",
                         file, strerror(errno));
-- 
1.6.5.2


             reply	other threads:[~2009-12-04 16:53 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-04 16:53 Richard W.M. Jones [this message]
2009-12-04 17:15 ` [Qemu-devel] [PATCH] Disk image shared and exclusive locks Anthony Liguori
2009-12-04 21:57   ` Richard W.M. Jones
2009-12-04 22:29     ` Anthony Liguori
2009-12-05 17:31       ` Avi Kivity
2009-12-05 17:47         ` Anthony Liguori
2009-12-05 17:55           ` Avi Kivity
2009-12-05 17:59             ` Anthony Liguori
2009-12-07 10:31               ` Jamie Lokier
2009-12-07 10:42                 ` Kevin Wolf
2009-12-07 10:48                   ` Avi Kivity
2009-12-07 10:56                     ` Kevin Wolf
2009-12-07 11:28                   ` Jamie Lokier
2009-12-07 11:51                     ` Kevin Wolf
2009-12-07 12:06                     ` Daniel P. Berrange
2009-12-07 10:45                 ` Daniel P. Berrange
2009-12-07 11:19                   ` Jamie Lokier
2009-12-07 11:30                     ` Daniel P. Berrange
2009-12-07 11:31                       ` Richard W.M. Jones
2009-12-07 11:38                         ` Jamie Lokier
2009-12-07 11:49                         ` Daniel P. Berrange
2009-12-07 11:59                           ` Richard W.M. Jones
2009-12-07 14:35                           ` [Qemu-devel] " Paolo Bonzini
2009-12-07 13:43                       ` [Qemu-devel] " Anthony Liguori
2009-12-07 14:01                         ` Daniel P. Berrange
2009-12-07 14:15                           ` Anthony Liguori
2009-12-07 14:28                             ` Daniel P. Berrange
2009-12-07 14:53                               ` Anthony Liguori
2009-12-08  9:40                                 ` Kevin Wolf
2009-12-07 11:04                 ` Richard W.M. Jones
2009-12-07 10:58           ` Richard W.M. Jones
2009-12-07 11:35             ` Jamie Lokier
2009-12-07 13:39             ` Anthony Liguori
2009-12-07 14:08               ` Richard W.M. Jones
2009-12-07 14:22                 ` Anthony Liguori
2009-12-07 14:31                   ` Richard W.M. Jones
2009-12-07 14:55                     ` Anthony Liguori
2009-12-08  9:48                     ` Kevin Wolf
2009-12-08 10:16                       ` Richard W.M. Jones
2009-12-07 14:38               ` [Qemu-devel] " Paolo Bonzini
2009-12-07  9:38   ` [Qemu-devel] " Daniel P. Berrange
2009-12-07 10:39 ` Chris Webb
2009-12-07 13:32   ` Anthony Liguori
2009-12-07 13:38     ` Chris Webb
2009-12-07 13:47       ` Anthony Liguori
2009-12-07 14:25     ` Daniel P. Berrange
2009-12-07 14:58       ` Chris Webb
2009-12-07 14:16 ` [Qemu-devel] [PATCH VERSION 2] " Richard W.M. Jones
2009-12-07 15:06   ` Anthony Liguori
2009-12-08  8:48     ` [Qemu-devel] " Paolo Bonzini
2009-12-08 10:00   ` [Qemu-devel] " Kevin Wolf
2009-12-08 10:25     ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091204165301.GA4167@amd.home.annexia.org \
    --to=rjones@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).