* [Qemu-devel] PATCH: v3 Allow control over drive file open mode
@ 2008-08-01 10:29 Daniel P. Berrange
2008-08-01 15:10 ` Anthony Liguori
0 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrange @ 2008-08-01 10:29 UTC (permalink / raw)
To: qemu-devel
This is version 3 of the patch.
The current block driver code will attempt to open a file backing a drive
for read/write with O_RDWR first, and if that fails, fallback to opening
it readonly with O_RDONLY. So if you set file permissions to readonly on
the underlying drive backing store, QEMU will fallback to opening it read
only, and discard any writes.
Xen has a concept of a read-only disks in its configuration format, and
thus it would be desirable to have an explicit option to request that a
drive operate read-only, regardless of whether underlying file permissions
allow write access or not. We'd like to support this in libvirt too for
QEMU/KVM guests. Finally, in some cases it is desirable to see the failure
if the disk can't be opened read-write, rather than falling back to read
only mode - many guests will be more or less inoperable if their root
filesystem is read-only so there's little point booting.
The current block.h file did already have flags defined
#define BDRV_O_RDONLY 0x0000
#define BDRV_O_RDWR 0x0002
#define BDRV_O_ACCESS 0x0003
However the bdrv_open2() method was treating a 'flags' value of 0, as being
effectively RDWR, and nearly all callers pass in 0 even when they expect
to get a writable file, so the O_RDONLY flag was not usable as is.
So this patch does a couple of things:
- Change the value of the RDONLY flag to be 0x0001 so it can be
distinguished from 0, which all callers assume means read-write
will read-only fallback.
- Adds a new option to the '-drive' command line parameter to specify
how the file should be opened, and any fallback
mode=rw - try to open read-write and return error if this fails.
mode=ro - try to open read-only and return error if this fails
If no 'mode' is provided, it'll try read-write, and fallback to
read-only. This matches existing behaviour
- Extends the monitor 'change' command to allow an optional mode to
be specified, again defaulting to current behaviour. Takes the same
values as the 'mode' param
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
block-raw-posix.c | 6 +++---
block-raw-win32.c | 4 ++--
block.c | 21 ++++++++++++++-------
block.h | 5 ++---
monitor.c | 20 ++++++++++++++------
qemu-doc.texi | 4 ++++
qemu-nbd.c | 3 +++
vl.c | 15 ++++++++++++---
8 files changed, 54 insertions(+), 24 deletions(-)
NB This is against SVN revision 4975, but will likely conflict with
the other patch currently being discussed onlist wrt to qcow encryption
and passwords. I can trivially rebase it if required.
Daniel
Index: block-raw-win32.c
===================================================================
--- block-raw-win32.c (revision 4975)
+++ block-raw-win32.c (working copy)
@@ -90,7 +90,7 @@
s->type = FTYPE_FILE;
- if ((flags & BDRV_O_ACCESS) == O_RDWR) {
+ if (!(flags & BDRV_O_RDONLY)) {
access_flags = GENERIC_READ | GENERIC_WRITE;
} else {
access_flags = GENERIC_READ;
@@ -463,7 +463,7 @@
}
s->type = find_device_type(bs, filename);
- if ((flags & BDRV_O_ACCESS) == O_RDWR) {
+ if (!(flags & BDRV_O_RDONLY)) {
access_flags = GENERIC_READ | GENERIC_WRITE;
} else {
access_flags = GENERIC_READ;
Index: vl.c
===================================================================
--- vl.c (revision 4975)
+++ vl.c (working copy)
@@ -5399,11 +5399,12 @@
int max_devs;
int index;
int cache;
+ int mode = 0; /* Default to writable, with fallback to readonly for back-compat */
int bdrv_flags;
char *str = arg->opt;
char *params[] = { "bus", "unit", "if", "index", "cyls", "heads",
"secs", "trans", "media", "snapshot", "file",
- "cache", "format", NULL };
+ "cache", "format", "mode", NULL };
if (check_params(buf, sizeof(buf), params, str) < 0) {
fprintf(stderr, "qemu: unknown parameter '%s' in '%s'\n",
@@ -5585,6 +5586,14 @@
}
}
+ if (get_param_value(buf, sizeof(buf), "mode", str)) {
+ if (strcmp(buf, "ro") == 0)
+ mode = BDRV_O_RDONLY;
+ else if (strcmp(buf, "rw") == 0)
+ mode = BDRV_O_RDWR;
+ }
+
+
if (arg->file == NULL)
get_param_value(file, sizeof(file), "file", str);
else
@@ -5682,7 +5691,7 @@
}
if (!file[0])
return 0;
- bdrv_flags = 0;
+ bdrv_flags = mode;
if (snapshot)
bdrv_flags |= BDRV_O_SNAPSHOT;
if (!cache)
@@ -7605,7 +7614,7 @@
"-cdrom file use 'file' as IDE cdrom image (cdrom is ide1 master)\n"
"-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
" [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
- " [,cache=on|off][,format=f]\n"
+ " [,cache=on|off][,format=f][,mode=ro|rw]\n"
" use 'file' as a drive image\n"
"-mtdblock file use 'file' as on-board Flash memory image\n"
"-sd file use 'file' as SecureDigital card image\n"
Index: block-raw-posix.c
===================================================================
--- block-raw-posix.c (revision 4975)
+++ block-raw-posix.c (working copy)
@@ -103,7 +103,7 @@
s->lseek_err_cnt = 0;
open_flags = O_BINARY;
- if ((flags & BDRV_O_ACCESS) == O_RDWR) {
+ if (!(flags & BDRV_O_RDONLY)) {
open_flags |= O_RDWR;
} else {
open_flags |= O_RDONLY;
@@ -117,7 +117,7 @@
#endif
s->type = FTYPE_FILE;
-
+ fprintf(stderr, "Open raw %d %d %s\n", open_flags, flags, filename);
fd = open(filename, open_flags, 0644);
if (fd < 0) {
ret = -errno;
@@ -896,7 +896,7 @@
}
#endif
open_flags = O_BINARY;
- if ((flags & BDRV_O_ACCESS) == O_RDWR) {
+ if (!(flags & BDRV_O_RDONLY)) {
open_flags |= O_RDWR;
} else {
open_flags |= O_RDONLY;
Index: qemu-nbd.c
===================================================================
--- qemu-nbd.c (revision 4975)
+++ qemu-nbd.c (working copy)
@@ -332,6 +332,9 @@
if (bs == NULL)
return 1;
+ if (readonly)
+ flags |= BDRV_O_RDONLY;
+
if (bdrv_open(bs, argv[optind], flags) == -1)
return 1;
Index: qemu-doc.texi
===================================================================
--- qemu-doc.texi (revision 4975)
+++ qemu-doc.texi (working copy)
@@ -268,6 +268,10 @@
@var{snapshot} is "on" or "off" and allows to enable snapshot for given drive (see @option{-snapshot}).
@item cache=@var{cache}
@var{cache} is "on" or "off" and allows to disable host cache to access data.
+@item mode=@var{mode}
+@var{mode} is "rw" to open the file read-write, or "ro" to open the file read-only.
+If neither is specified, an attempt will be made to open read-write, falling back
+to read-only if this fails.
@item format=@var{format}
Specify which disk @var{format} will be used rather than detecting
the format. Can be used to specifiy format=raw to avoid interpreting
Index: monitor.c
===================================================================
--- monitor.c (revision 4975)
+++ monitor.c (working copy)
@@ -396,11 +396,19 @@
eject_device(bs, force);
}
-static void do_change_block(const char *device, const char *filename, const char *fmt)
+static void do_change_block(const char *device, const char *filename, const char *fmt, const char *mode)
{
BlockDriverState *bs;
BlockDriver *drv = NULL;
+ int flags = 0;
+ if (mode) {
+ if (strcmp(mode, "ro") == 0)
+ flags = BDRV_O_RDONLY;
+ else if (strcmp(mode, "rw") == 0)
+ flags = BDRV_O_RDWR;
+ }
+
bs = bdrv_find(device);
if (!bs) {
term_printf("device not found\n");
@@ -415,7 +423,7 @@
}
if (eject_device(bs, 0) < 0)
return;
- bdrv_open2(bs, filename, 0, drv);
+ bdrv_open2(bs, filename, flags, drv);
qemu_key_check(bs, filename);
}
@@ -434,12 +442,12 @@
}
}
-static void do_change(const char *device, const char *target, const char *fmt)
+static void do_change(const char *device, const char *target, const char *fmt, const char *mode)
{
if (strcmp(device, "vnc") == 0) {
do_change_vnc(target);
} else {
- do_change_block(device, target, fmt);
+ do_change_block(device, target, fmt, mode);
}
}
@@ -1374,8 +1382,8 @@
"", "quit the emulator" },
{ "eject", "-fB", do_eject,
"[-f] device", "eject a removable medium (use -f to force it)" },
- { "change", "BFs?", do_change,
- "device filename [format]", "change a removable medium, optional format" },
+ { "change", "BFs?s?", do_change,
+ "device filename [format] [mode]", "change a removable medium, optional format and mode" },
{ "screendump", "F", do_screen_dump,
"filename", "save screen into PPM image 'filename'" },
{ "logfile", "F", do_logfile,
Index: block.c
===================================================================
--- block.c (revision 4975)
+++ block.c (working copy)
@@ -381,17 +381,24 @@
bs->opaque = qemu_mallocz(drv->instance_size);
if (bs->opaque == NULL && drv->instance_size > 0)
return -1;
- /* Note: for compatibility, we open disk image files as RDWR, and
- RDONLY as fallback */
+ /* If BDRV_O_RDONLY is set open in read only mode
+ * If BDRV_O_RDWR is set open in read write mode
+ * If neither is set, try read-write and fallback to read only
+ */
if (!(flags & BDRV_O_FILE))
- open_flags = BDRV_O_RDWR | (flags & BDRV_O_DIRECT);
+ open_flags = (flags & (BDRV_O_DIRECT | BDRV_O_RDONLY | BDRV_O_RDWR));
else
open_flags = flags & ~(BDRV_O_FILE | BDRV_O_SNAPSHOT);
ret = drv->bdrv_open(bs, filename, open_flags);
- if (ret == -EACCES && !(flags & BDRV_O_FILE)) {
- ret = drv->bdrv_open(bs, filename, BDRV_O_RDONLY);
+
+ /* Trying fallback to read only mode here.. */
+ if (ret == -EACCES &&
+ !(flags & (BDRV_O_RDWR | BDRV_O_RDONLY))) {
+ open_flags |= BDRV_O_RDONLY;
+ ret = drv->bdrv_open(bs, filename, open_flags);
+ }
+ if (ret == 0 && (flags & BDRV_O_RDONLY))
bs->read_only = 1;
- }
if (ret < 0) {
qemu_free(bs->opaque);
bs->opaque = NULL;
@@ -416,7 +423,7 @@
}
path_combine(backing_filename, sizeof(backing_filename),
filename, bs->backing_file);
- if (bdrv_open(bs->backing_hd, backing_filename, 0) < 0)
+ if (bdrv_open(bs->backing_hd, backing_filename, flags & (BDRV_O_RDONLY | BDRV_O_RDWR)) < 0)
goto fail;
}
Index: block.h
===================================================================
--- block.h (revision 4975)
+++ block.h (working copy)
@@ -36,9 +36,8 @@
uint64_t vm_clock_nsec; /* VM clock relative to boot */
} QEMUSnapshotInfo;
-#define BDRV_O_RDONLY 0x0000
-#define BDRV_O_RDWR 0x0002
-#define BDRV_O_ACCESS 0x0003
+#define BDRV_O_RDONLY 0x0001 /* Open as read-only */
+#define BDRV_O_RDWR 0x0002 /* Open as read-write */
#define BDRV_O_CREAT 0x0004 /* create an empty file */
#define BDRV_O_SNAPSHOT 0x0008 /* open the file read only and save writes in a snapshot */
#define BDRV_O_FILE 0x0010 /* open as a raw file (do not try to
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 10:29 [Qemu-devel] PATCH: v3 Allow control over drive file open mode Daniel P. Berrange
@ 2008-08-01 15:10 ` Anthony Liguori
2008-08-01 16:15 ` Daniel P. Berrange
2008-08-01 16:57 ` Ian Jackson
0 siblings, 2 replies; 12+ messages in thread
From: Anthony Liguori @ 2008-08-01 15:10 UTC (permalink / raw)
To: Daniel P. Berrange, qemu-devel
Daniel P. Berrange wrote:
> This is version 3 of the patch.
>
Allowing the user to specify what mode we use to open a file is IMHO not
a good interface for a user. A user should only be concerned with how
we expose a disk to the guest, not the underlying implementation of how
we support this. It has subtle side-effects that a user is not going to
expect unless they are intimately familiar with how QEMU is implemented
(like snapshotting breaking).
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 15:10 ` Anthony Liguori
@ 2008-08-01 16:15 ` Daniel P. Berrange
2008-08-01 16:23 ` Andreas Färber
2008-08-01 17:06 ` Jamie Lokier
2008-08-01 16:57 ` Ian Jackson
1 sibling, 2 replies; 12+ messages in thread
From: Daniel P. Berrange @ 2008-08-01 16:15 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel
On Fri, Aug 01, 2008 at 10:10:09AM -0500, Anthony Liguori wrote:
> Daniel P. Berrange wrote:
> >This is version 3 of the patch.
> >
>
> Allowing the user to specify what mode we use to open a file is IMHO not
> a good interface for a user. A user should only be concerned with how
> we expose a disk to the guest, not the underlying implementation of how
> we support this. It has subtle side-effects that a user is not going to
> expect unless they are intimately familiar with how QEMU is implemented
> (like snapshotting breaking).
So if I'm understanding you correctly, you'd be OK with a mode=ro, which
would make the emulated drive itself read-only - independant of what
open() flags we use for opening the file on the host ?
So my next question would be, what IDE / SCSI / USB drives have a
concept of 'read only' ? A CDROM clearly is clearly readonly, and
floppy disks probably be to but can you mark general disks as read
only ?
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 16:15 ` Daniel P. Berrange
@ 2008-08-01 16:23 ` Andreas Färber
2008-08-01 17:06 ` Jamie Lokier
1 sibling, 0 replies; 12+ messages in thread
From: Andreas Färber @ 2008-08-01 16:23 UTC (permalink / raw)
To: Daniel P. Berrange, qemu-devel
Am 01.08.2008 um 18:15 schrieb Daniel P. Berrange:
> what IDE / SCSI / USB drives have a
> concept of 'read only' ? A CDROM clearly is clearly readonly, and
> floppy disks probably be to but can you mark general disks as read
> only ?
USB flash drives can sometimes be made read-only, similar to a floppy.
Andreas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 15:10 ` Anthony Liguori
2008-08-01 16:15 ` Daniel P. Berrange
@ 2008-08-01 16:57 ` Ian Jackson
2008-08-01 17:14 ` Anthony Liguori
1 sibling, 1 reply; 12+ messages in thread
From: Ian Jackson @ 2008-08-01 16:57 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori writes ("Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode"):
> Allowing the user to specify what mode we use to open a file is IMHO not
> a good interface for a user. A user should only be concerned with how
> we expose a disk to the guest, not the underlying implementation of how
> we support this. It has subtle side-effects that a user is not going to
> expect unless they are intimately familiar with how QEMU is implemented
> (like snapshotting breaking).
I think I agree, but with qualifications:
The user readonly flag ought to mean
1. qemu will definitely not permit the guest to write to the object
represented (if it is a cow file then even the cow will not be
writeable)
2. If the emulated device type supports it the guest will be
told that it may not write to the device. If this is not possible
and the user has not overridden this check then the entire request
will be rejected (rather than exporting the device read/write).
3. qemu will to communicate the consequences for its future use
of the underlying host operating system object(s) appropriately
to the host system (as this might be relevant for cacheing,
concurrent access, etc.)
4. qemu will take steps to try to ensure that bugs and missing
changes in the readonly implementation don't leave a security
hole where the
5. Operations (such as cow commit) that would modify data
(either host data or data as seen by the guest or both)
are not supported.
I think 3 and 4 mean that it should pass O_RDONLY to the underlying
filesystem objects where feasible.
I'm afraid I don't understand your point about breaking snapshotting.
Perhaps you could explain the scenario ?
Ian.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 16:15 ` Daniel P. Berrange
2008-08-01 16:23 ` Andreas Färber
@ 2008-08-01 17:06 ` Jamie Lokier
1 sibling, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 2008-08-01 17:06 UTC (permalink / raw)
To: Daniel P. Berrange, qemu-devel
Daniel P. Berrange wrote:
> On Fri, Aug 01, 2008 at 10:10:09AM -0500, Anthony Liguori wrote:
> > Daniel P. Berrange wrote:
> > >This is version 3 of the patch.
> > >
> >
> > Allowing the user to specify what mode we use to open a file is IMHO not
> > a good interface for a user. A user should only be concerned with how
> > we expose a disk to the guest, not the underlying implementation of how
> > we support this. It has subtle side-effects that a user is not going to
> > expect unless they are intimately familiar with how QEMU is implemented
> > (like snapshotting breaking).
>
> So if I'm understanding you correctly, you'd be OK with a mode=ro, which
> would make the emulated drive itself read-only - independant of what
> open() flags we use for opening the file on the host ?
>
> So my next question would be, what IDE / SCSI / USB drives have a
> concept of 'read only' ? A CDROM clearly is clearly readonly, and
> floppy disks probably be to but can you mark general disks as read
> only ?
Many USB drives do have a little switch. USB drives use the SCSI
protocol, so SCSI is sort of covered.
They aren't as common but there are read-only IDE interfaces out
there, for phorensic disk reading.
-- Jamie
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 16:57 ` Ian Jackson
@ 2008-08-01 17:14 ` Anthony Liguori
2008-08-01 17:27 ` Jamie Lokier
2008-08-11 16:05 ` Ian Jackson
0 siblings, 2 replies; 12+ messages in thread
From: Anthony Liguori @ 2008-08-01 17:14 UTC (permalink / raw)
To: qemu-devel
Ian Jackson wrote:
> Anthony Liguori writes ("Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode"):
>
>> Allowing the user to specify what mode we use to open a file is IMHO not
>> a good interface for a user. A user should only be concerned with how
>> we expose a disk to the guest, not the underlying implementation of how
>> we support this. It has subtle side-effects that a user is not going to
>> expect unless they are intimately familiar with how QEMU is implemented
>> (like snapshotting breaking).
>>
>
> I think I agree, but with qualifications:
>
> The user readonly flag ought to mean
> 1. qemu will definitely not permit the guest to write to the object
> represented (if it is a cow file then even the cow will not be
> writeable)
> 2. If the emulated device type supports it the guest will be
> told that it may not write to the device. If this is not possible
> and the user has not overridden this check then the entire request
> will be rejected (rather than exporting the device read/write).
>
I don't think it should be possible to override the check. If the
device doesn't support it, we should reject it. However, it sounds like
all the devices we emulate support some form of read-only so this is
perhaps not going to be a problem.
> 3. qemu will to communicate the consequences for its future use
> of the underlying host operating system object(s) appropriately
> to the host system (as this might be relevant for cacheing,
> concurrent access, etc.)
>
When possible.
> 4. qemu will take steps to try to ensure that bugs and missing
> changes in the readonly implementation don't leave a security
> hole where the
I'm not at all worried about this FWIW.
>
> 5. Operations (such as cow commit) that would modify data
> (either host data or data as seen by the guest or both)
> are not supported.
>
I believe commit should work just as it does now.
> I think 3 and 4 mean that it should pass O_RDONLY to the underlying
> filesystem objects where feasible.
>
> I'm afraid I don't understand your point about breaking snapshotting.
> Perhaps you could explain the scenario ?
>
qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on
Down the road, you do a savevm. savevm has to create a checkpoint in
all disk images. The checkpoint will be empty in boo.img but it still
needs to create one. An alternative scenario would be a read-only root fs:
qemu-system-x86_64 -drive file=rootfs.img,read-only=on
Where you wanted to be able to do a savevm.
Regards,
Anthony Liguori
> Ian.
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 17:14 ` Anthony Liguori
@ 2008-08-01 17:27 ` Jamie Lokier
2008-08-11 16:05 ` Ian Jackson
1 sibling, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 2008-08-01 17:27 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Down the road, you do a savevm. savevm has to create a checkpoint in
> all disk images. The checkpoint will be empty in boo.img but it still
> needs to create one. An alternative scenario would be a read-only root fs:
>
> qemu-system-x86_64 -drive file=rootfs.img,read-only=on
>
> Where you wanted to be able to do a savevm.
I agree this could be useful (despite my wariness of snapshots :)
In which case, it would be all-the-more useful if it worked correctly
when multiple QEMU instances are using the image file.
"Down the road" - you might not know in advance that you're wishing to
save a snapshot, and by then you might have other running QEMU
instances using that image which you don't want to kill, and of course
if you can't snapshot one, you can't snapshot any of them to shut them
down temporarily.
It should be enough to take an advisory lock during the savevm writes,
shouldn't it?
-- Jamie
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-01 17:14 ` Anthony Liguori
2008-08-01 17:27 ` Jamie Lokier
@ 2008-08-11 16:05 ` Ian Jackson
2008-08-12 11:42 ` Kevin Wolf
2008-08-12 13:31 ` Anthony Liguori
1 sibling, 2 replies; 12+ messages in thread
From: Ian Jackson @ 2008-08-11 16:05 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori writes ("Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode"):
> Ian Jackson wrote:
> > 4. qemu will take steps to try to ensure that bugs and missing
> > changes in the readonly implementation don't leave a security
> > hole where the
>
> I'm not at all worried about this FWIW.
Well, in the Xen world we have different priorities I think. Many
people are depending very heavily on the security properties of Xen
including the hardware emulations. So for us it is important that we
take all the measures we can to stop us from accidentally committing
bugs.
If this is something that qemu upstream doesn't want then we will have
to maintain it as a permanent delta in our `Xen' tree (along with the
mapcache, etc. etc.)
It would be fine if there were in fact only one place where writes
take place. But actually there are multiple routes at pretty much all
levels: multiple entrypoints (aio and not, pwrite or not), multiple
block drivers, in cow devices multiple read/write paths depending on
whether the block is already present in the diffs and multiple writes
to the .img for each requested write, etc.
> qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on
>
> Down the road, you do a savevm. savevm has to create a checkpoint in
> all disk images. The checkpoint will be empty in boo.img but it still
> needs to create one.
Perhaps I don't understand clearly enough how you imagine this
scenario. Surely when the snapshot is resumed it is sufficient for
the file boo.img to be identical ?
What are these checkpoints ? Are they currently written by anything ?
I've tried to look for the code you would be referring to but I can't
seem to find it. qemu_savevm_state just calls all the things recorded
with register_savevm and I can't seem to find anything that would do
anything to block drivers. Of course there are many things in hw/*.c
that call register_savevm. I looked in hw/ide.c too and it just saves
the state of the driver, and seems content to assume that the contents
(as seen through qemu's block driver abstract interface) of the block
device will be identical because nothing will change it between save
and restore.
If the file is opened read-only then qemu won't change it. Are you
imagining a scenario where the cow img file would be modified by some
external process between save and restore ? Surely that would be a
broken thing to do.
Or do you mean that the intent is to cope with the backing file
changing between save and restore and that therefore the whole backing
file will be copied into the .img ?
I'm afraid as you can see I'm still confused.
Ian.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-11 16:05 ` Ian Jackson
@ 2008-08-12 11:42 ` Kevin Wolf
2008-08-12 13:31 ` Anthony Liguori
1 sibling, 0 replies; 12+ messages in thread
From: Kevin Wolf @ 2008-08-12 11:42 UTC (permalink / raw)
To: Ian Jackson; +Cc: qemu-devel
Ian Jackson schrieb:
>> qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on
>>
>> Down the road, you do a savevm. savevm has to create a checkpoint in
>> all disk images. The checkpoint will be empty in boo.img but it still
>> needs to create one.
>
> Perhaps I don't understand clearly enough how you imagine this
> scenario. Surely when the snapshot is resumed it is sufficient for
> the file boo.img to be identical ?
>
> What are these checkpoints ? Are they currently written by anything ?
>
> I've tried to look for the code you would be referring to but I can't
> seem to find it. qemu_savevm_state just calls all the things recorded
> with register_savevm and I can't seem to find anything that would do
> anything to block drivers.
This code is in do_savevm (in vl.c). Look for /* create the snapshots */
Kevin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-11 16:05 ` Ian Jackson
2008-08-12 11:42 ` Kevin Wolf
@ 2008-08-12 13:31 ` Anthony Liguori
2008-08-12 23:56 ` Jamie Lokier
1 sibling, 1 reply; 12+ messages in thread
From: Anthony Liguori @ 2008-08-12 13:31 UTC (permalink / raw)
To: qemu-devel
Ian Jackson wrote:
> Anthony Liguori writes ("Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode"):
>
>> Ian Jackson wrote:
>>
>>> 4. qemu will take steps to try to ensure that bugs and missing
>>> changes in the readonly implementation don't leave a security
>>> hole where the
>>>
>> I'm not at all worried about this FWIW.
>>
>
> Well, in the Xen world we have different priorities I think. Many
> people are depending very heavily on the security properties of Xen
> including the hardware emulations. So for us it is important that we
> take all the measures we can to stop us from accidentally committing
> bugs.
>
Which is fine, but you're missing my fundamental argument. Having a
read-only flag exposed to the user should not translate to "open the
underlying files O_RDONLY". That's an implementation detail. If that's
what ends up happening, great. However, I'll make the argument also
that for certain circumstances that's not actually what we want.
> If this is something that qemu upstream doesn't want then we will have
> to maintain it as a permanent delta in our `Xen' tree (along with the
> mapcache, etc. etc.)
>
> It would be fine if there were in fact only one place where writes
> take place. But actually there are multiple routes at pretty much all
> levels: multiple entrypoints (aio and not, pwrite or not), multiple
> block drivers, in cow devices multiple read/write paths depending on
> whether the block is already present in the diffs and multiple writes
> to the .img for each requested write, etc.
>
All writes end up going through the block-raw-posix write functions.
>> qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on
>>
>> Down the road, you do a savevm. savevm has to create a checkpoint in
>> all disk images. The checkpoint will be empty in boo.img but it still
>> needs to create one.
>>
>
> Perhaps I don't understand clearly enough how you imagine this
> scenario. Surely when the snapshot is resumed it is sufficient for
> the file boo.img to be identical ?
>
Not really. When the snapshot is restored, what do you do with
boo.img? Do you just use the main L1 table if no properly named
snapshots are available? That seems quite error prone to me.
Another example is introducing a copy-on-read disk. This would be
useful for image stream. Think a qcow2 file that backs an http block
device. The perfect use-case for something like this is an ISO image
which a user would want to export to the guest read-only. However, we
need to modify the qcow2 image as the guest reads it (to do the
copy-on-read).
N.B. I've said before that there's no reason that a read-only disk
cannot result in the file being opened O_RDONLY (for raw in particular)
but that is a detail of each block device and I don't think it should be
the case for qcow2.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
2008-08-12 13:31 ` Anthony Liguori
@ 2008-08-12 23:56 ` Jamie Lokier
0 siblings, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 2008-08-12 23:56 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Which is fine, but you're missing my fundamental argument. Having a
> read-only flag exposed to the user should not translate to "open the
> underlying files O_RDONLY". That's an implementation detail. If that's
> what ends up happening, great. However, I'll make the argument also
> that for certain circumstances that's not actually what we want.
And when the file is not owned by the user running QEMU?
I.e. it's shared.
> >>qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on
> >>
> >>Down the road, you do a savevm. savevm has to create a checkpoint in
> >>all disk images. The checkpoint will be empty in boo.img but it still
> >>needs to create one.
> >
> >Perhaps I don't understand clearly enough how you imagine this
> >scenario. Surely when the snapshot is resumed it is sufficient for
> >the file boo.img to be identical ?
>
> Not really. When the snapshot is restored, what do you do with
> boo.img? Do you just use the main L1 table if no properly named
> snapshots are available? That seems quite error prone to me.
That's a fair question.
But if boo.img is used with several concurrent QEMUs - a legitimate
use of a read-only disk image - how can writing snapshot metadata to
it be safe?
TBH, the snapshot behaviour is really confusing and the snapshot
behaviour is not well documented. Let's see:
1. It will write a snapshot record to read only qcow2 images, but
not to raw images? So they *behave* differently - it's not
merely a different format, it has side effects. What if I don't
want side effects, I just want a compact format?
2. You *need* the snapshot record stored in qcow2, yet it's ok that
raw doesn't store it? Seems to me sometimes I don't need the
snapshot record, it would be nice if I could request not to have
it. I always resume from the last saved snapshot anyway - which
was always made with the CPU stopped. (Simulated suspend/resume).
3. The documentation (that I found) does not explain that snapshot
records the *disk* state as well as the machine state. This was
a big surprise to me. It does say you need at least one qcow2
file before snapshot is possible.
4. Which file is the machine state stored in? The first one on the
command line, or the first disk index?
5. As the disk state is snapshotted - how do I extract a
snapshotted disk e.g. to "qemu-img convert" it or transport it
into something else? Can I delete a snapshot without starting
qemu with the *exact same arguments* as before, except -S, and
doing it from the monitor?
6. What do "commit" or "qemu-img commit" do to snapshots? Do they
break all snapshots but the current one?
7. What happens if qemu dies / is killed / host crashes / power
fails during "commit" or "savevm"? Does it leave the files
inconsistent and the VM wrecked? Both functions can take quite
a long time.
8. Sometimes I want a (machine-state) snapshot and I *don't* want
to use qcow2 for the disk image. It seems non-orthogonal that I
can use raw images (or other formats) for all but one disk - ok
I have to be careful to only resume from that particular
snapshot, or by rebooting afresh (simulated unclean boot) - but
I can't use raw images for all disks.
9. Sometimes I want a disk-state snapshot (now that I know about
them :-) and I *don't* want a machine-state snapshot. In other
words, I may want to boot using a disk snapshotted earlier,
without initialising device state from that snapshot -
especially when using a much different version of QEMU, KVM or
Xen. There is no harm in using the disk - it just looks like a
CPU reset to the guest, which is acceptable - even clean if the
save happened with the guest in a safe state. Currently I am
using "qemu-img -b" branches to get a similar effect -
snapshotting disks seems much better, since you don't have long
commit pauses to tidy up.
> Another example is introducing a copy-on-read disk. This would be
> useful for image stream. Think a qcow2 file that backs an http block
> device. The perfect use-case for something like this is an ISO image
> which a user would want to export to the guest read-only. However, we
> need to modify the qcow2 image as the guest reads it (to do the
> copy-on-read).
That's a good example. If copy-on-read is implemented, you won't see
me or anyone objecting to it opening the file writable!
> N.B. I've said before that there's no reason that a read-only disk
> cannot result in the file being opened O_RDONLY (for raw in particular)
> but that is a detail of each block device and I don't think it should be
> the case for qcow2.
Another reason why I've begun recommending to clients to stop using
qcow2 then for important VMs (the other is possible corruption on qemu
death / power failure), unless they have really tight space issues.
-- Jamie
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-08-12 23:57 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-01 10:29 [Qemu-devel] PATCH: v3 Allow control over drive file open mode Daniel P. Berrange
2008-08-01 15:10 ` Anthony Liguori
2008-08-01 16:15 ` Daniel P. Berrange
2008-08-01 16:23 ` Andreas Färber
2008-08-01 17:06 ` Jamie Lokier
2008-08-01 16:57 ` Ian Jackson
2008-08-01 17:14 ` Anthony Liguori
2008-08-01 17:27 ` Jamie Lokier
2008-08-11 16:05 ` Ian Jackson
2008-08-12 11:42 ` Kevin Wolf
2008-08-12 13:31 ` Anthony Liguori
2008-08-12 23:56 ` Jamie Lokier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).