From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:57132) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QXDsh-0000Ap-BL for qemu-devel@nongnu.org; Thu, 16 Jun 2011 10:49:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QXDsY-0007u4-Em for qemu-devel@nongnu.org; Thu, 16 Jun 2011 10:49:14 -0400 Received: from e7.ny.us.ibm.com ([32.97.182.137]:33321) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QXDsY-0007ry-21 for qemu-devel@nongnu.org; Thu, 16 Jun 2011 10:49:06 -0400 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e7.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p5GEOfaT015262 for ; Thu, 16 Jun 2011 10:24:41 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p5GEmrXw124592 for ; Thu, 16 Jun 2011 10:48:53 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p5GAmeds030881 for ; Thu, 16 Jun 2011 07:48:40 -0300 Message-ID: <4DFA17D3.9020405@us.ibm.com> Date: Thu, 16 Jun 2011 10:48:51 -0400 From: Corey Bryant MIME-Version: 1.0 References: <4DF762A4.2040503@us.ibm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2] Add support for fd: protocol List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: libvir-list@redhat.com, Anthony Liguori , qemu-devel@nongnu.org, Tyler C Hicks On 06/15/2011 03:12 PM, Blue Swirl wrote: > On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant wrote: >> > sVirt provides SELinux MAC isolation for Qemu guest processes and their >> > corresponding resources (image files). sVirt provides this support >> > by labeling guests and resources with security labels that are stored >> > in file system extended attributes. Some file systems, such as NFS, do >> > not support the extended attribute security namespace, which is needed >> > for image file isolation when using the sVirt SELinux security driver >> > in libvirt. >> > >> > The proposed solution entails a combination of Qemu, libvirt, and >> > SELinux patches that work together to isolate multiple guests' images >> > when they're stored in the same NFS mount. This results in an >> > environment where sVirt isolation and NFS image file isolation can both >> > be provided. >> > >> > This patch contains the Qemu code to support this solution. I would >> > like to solicit input from the libvirt community prior to starting >> > the libvirt patch. >> > >> > Currently, Qemu opens an image file in addition to performing the >> > necessary read and write operations. The proposed solution will move >> > the open out of Qemu and into libvirt. Once libvirt opens an image >> > file for the guest, it will pass the file descriptor to Qemu via a >> > new fd: protocol. >> > >> > If the image file resides in an NFS mount, the following SELinux policy >> > changes will provide image isolation: >> > >> > - A new SELinux boolean is created (e.g. virt_read_write_nfs) to >> > allow Qemu (svirt_t) to only have SELinux read and write >> > permissions on nfs_t files >> > >> > - Qemu (svirt_t) also gets SELinux use permissions on libvirt >> > (virtd_t) file descriptors >> > >> > Following is a sample invocation of Qemu using the fd: protocol on >> > the command line: >> > >> > qemu -drive file=fd:4,format=qcow2 >> > >> > The fd: protocol is also supported by the drive_add monitor command. >> > This requires that the specified file descriptor is passed to the >> > monitor alongside a prior getfd monitor command. >> > >> > There are some additional features provided by certain image types >> > where Qemu reopens the image file. All of these scenarios will be >> > unsupported for the fd: protocol, at least for this patch: >> > >> > - The -snapshot command line option >> > - The savevm monitor command >> > - The snapshot_blkdev monitor command >> > - Starting Qemu with a backing file > There's also native CDROM device. Did you consider adding an explicit > reopen method to block layer? Thanks. Yes it looks like I overlooked CDROM reopens. I'm not sure that I'm clear on the purpose of the reopen function. Would the goal be to funnel all block layer reopens through a single function, enabling potential future support where a privileged layer of Qemu, or libvirt, performs the open? > >> > The thought is that this support can be added in the future, but is >> > not required for the initial fd: support. >> > >> > This patch was tested with the following formats: raw, cow, qcow, >> > qcow2, qed, and vmdk, using the fd: protocol from the command line >> > and the monitor. Tests were also run to verify existing file name >> > support and qemu-img were not regressed. Non-valid file descriptors, >> > fd: without format, snapshot and backing files were also tested. >> > >> > Signed-off-by: Corey Bryant >> > --- >> > block.c | 16 ++++++++++ >> > block.h | 1 + >> > block/cow.c | 5 +++ >> > block/qcow.c | 5 +++ >> > block/qcow2.c | 5 +++ >> > block/qed.c | 4 ++ >> > block/raw-posix.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++------ >> > block/vmdk.c | 5 +++ >> > block_int.h | 1 + >> > blockdev.c | 10 ++++++ >> > monitor.c | 5 +++ >> > monitor.h | 1 + >> > qemu-options.hx | 8 +++-- >> > qemu-tool.c | 5 +++ >> > 14 files changed, 140 insertions(+), 12 deletions(-) >> > >> > diff --git a/block.c b/block.c >> > index 24a25d5..500db84 100644 >> > --- a/block.c >> > +++ b/block.c >> > @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, >> > char tmp_filename[PATH_MAX]; >> > char backing_filename[PATH_MAX]; >> > >> > + if (bdrv_is_fd_protocol(bs)) { >> > + return -ENOTSUP; >> > + } >> > + >> > /* if snapshot, we create a temporary backing file and open it >> > instead of opening 'filename' directly */ >> > >> > @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, >> > >> > /* Find the right image format driver */ >> > if (!drv) { >> > + /* format must be specified for fd: protocol */ >> > + if (bdrv_is_fd_protocol(bs)) { >> > + return -ENOTSUP; >> > + } >> > ret = find_image_format(filename,&drv); >> > } >> > >> > @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs) >> > return bs->enable_write_cache; >> > } >> > >> > +int bdrv_is_fd_protocol(BlockDriverState *bs) >> > +{ >> > + return bs->fd_protocol; >> > +} >> > + >> > /* XXX: no longer used */ >> > void bdrv_set_change_cb(BlockDriverState *bs, >> > void (*change_cb)(void *opaque, int reason), >> > @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs, >> > BlockDriver *drv = bs->drv; >> > if (!drv) >> > return -ENOMEDIUM; >> > + if (bdrv_is_fd_protocol(bs)) { >> > + return -ENOTSUP; >> > + } >> > if (drv->bdrv_snapshot_create) >> > return drv->bdrv_snapshot_create(bs, sn_info); >> > if (bs->file) >> > diff --git a/block.h b/block.h >> > index da7d39c..dd46d52 100644 >> > --- a/block.h >> > +++ b/block.h >> > @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs); >> > int bdrv_is_read_only(BlockDriverState *bs); >> > int bdrv_is_sg(BlockDriverState *bs); >> > int bdrv_enable_write_cache(BlockDriverState *bs); >> > +int bdrv_is_fd_protocol(BlockDriverState *bs); >> > int bdrv_is_inserted(BlockDriverState *bs); >> > int bdrv_media_changed(BlockDriverState *bs); >> > int bdrv_is_locked(BlockDriverState *bs); >> > diff --git a/block/cow.c b/block/cow.c >> > index 4cf543c..e17f8e7 100644 >> > --- a/block/cow.c >> > +++ b/block/cow.c >> > @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags) >> > pstrcpy(bs->backing_file, sizeof(bs->backing_file), >> > cow_header.backing_file); >> > >> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >> > + /* backing file currently not supported by fd: protocol */ >> > + goto fail; >> > + } >> > + >> > bitmap_size = ((bs->total_sectors + 7)>> 3) + sizeof(cow_header); >> > s->cow_sectors_offset = (bitmap_size + 511)& ~511; >> > return 0; >> > diff --git a/block/qcow.c b/block/qcow.c >> > index a26c886..1e46bdb 100644 >> > --- a/block/qcow.c >> > +++ b/block/qcow.c >> > @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags) >> > if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len) >> > goto fail; >> > bs->backing_file[len] = '\0'; >> > + >> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >> > + /* backing file currently not supported by fd: protocol */ >> > + goto fail; >> > + } >> > } >> > return 0; >> > >> > diff --git a/block/qcow2.c b/block/qcow2.c >> > index 8451ded..8b9c160 100644 >> > --- a/block/qcow2.c >> > +++ b/block/qcow2.c >> > @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags) >> > goto fail; >> > } >> > bs->backing_file[len] = '\0'; >> > + >> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >> > + ret = -ENOTSUP; >> > + goto fail; >> > + } >> > } >> > if (qcow2_read_snapshots(bs)< 0) { >> > ret = -EINVAL; >> > diff --git a/block/qed.c b/block/qed.c >> > index 3970379..5028897 100644 >> > --- a/block/qed.c >> > +++ b/block/qed.c >> > @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags) >> > return ret; >> > } >> > >> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >> > + return -ENOTSUP; >> > + } >> > + >> > if (s->header.features& QED_F_BACKING_FORMAT_NO_PROBE) { >> > pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw"); >> > } >> > diff --git a/block/raw-posix.c b/block/raw-posix.c >> > index 4cd7d7a..c72de3d 100644 >> > --- a/block/raw-posix.c >> > +++ b/block/raw-posix.c >> > @@ -28,6 +28,7 @@ >> > #include "block_int.h" >> > #include "module.h" >> > #include "block/raw-posix-aio.h" >> > +#include "monitor.h" >> > >> > #ifdef CONFIG_COCOA >> > #include >> > @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, >> > int bdrv_flags, int open_flags) >> > { >> > BDRVRawState *s = bs->opaque; >> > - int fd, ret; >> > + int fd = -1; >> > + int ret; >> > >> > ret = raw_normalize_devicepath(&filename); >> > if (ret != 0) { >> > @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, >> > if (!(bdrv_flags& BDRV_O_CACHE_WB)) >> > s->open_flags |= O_DSYNC; >> > >> > - s->fd = -1; >> > - fd = qemu_open(filename, s->open_flags, 0644); >> > - if (fd< 0) { >> > - ret = -errno; >> > - if (ret == -EROFS) >> > - ret = -EACCES; >> > - return ret; >> > + if (s->fd == -1) { >> > + fd = qemu_open(filename, s->open_flags, 0644); >> > + if (fd< 0) { >> > + ret = -errno; >> > + if (ret == -EROFS) { >> > + ret = -EACCES; >> > + } >> > + return ret; >> > + } >> > + s->fd = fd; >> > } >> > - s->fd = fd; >> > s->aligned_buf = NULL; >> > >> > if ((bdrv_flags& BDRV_O_NOCACHE)) { >> > @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags) >> > { >> > BDRVRawState *s = bs->opaque; >> > >> > + s->fd = -1; >> > s->type = FTYPE_FILE; >> > return raw_open_common(bs, filename, flags, 0); >> > } >> > @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = { >> > .create_options = raw_create_options, >> > }; >> > >> > +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags) >> > +{ >> > + BDRVRawState *s = bs->opaque; >> > + const char *fd_str; >> > + int fd; >> > + >> > + /* extract the file descriptor - fail if it's not fd: */ >> > + if (!strstart(filename, "fd:",&fd_str)) { >> > + return -EINVAL; >> > + } >> > + >> > + if (!qemu_isdigit(fd_str[0])) { >> > + /* get fd from monitor */ >> > + fd = qemu_get_fd(fd_str); >> > + if (fd == -1) { >> > + return -EBADF; >> > + } >> > + } else { >> > + char *endptr = NULL; >> > + >> > + fd = strtol(fd_str,&endptr, 10); >> > + if (*endptr || (fd == 0&& fd_str == endptr)) { >> > + return -EBADF; >> > + } >> > + } >> > + >> > + s->fd = fd; >> > + s->type = FTYPE_FILE; >> > + >> > + return raw_open_common(bs, filename, flags, 0); >> > +} >> > + >> > +static BlockDriver bdrv_file_fd = { >> > + .format_name = "file", >> > + .protocol_name = "fd", >> > + .instance_size = sizeof(BDRVRawState), >> > + .bdrv_probe = NULL, /* no probe for protocols */ >> > + .bdrv_file_open = raw_open_fd, >> > + .bdrv_read = raw_read, >> > + .bdrv_write = raw_write, >> > + .bdrv_close = raw_close, >> > + .bdrv_flush = raw_flush, >> > + .bdrv_discard = raw_discard, >> > + >> > + .bdrv_aio_readv = raw_aio_readv, >> > + .bdrv_aio_writev = raw_aio_writev, >> > + .bdrv_aio_flush = raw_aio_flush, >> > + >> > + .bdrv_truncate = raw_truncate, >> > + .bdrv_getlength = raw_getlength, >> > + >> > + .create_options = raw_create_options, >> > +}; >> > + >> > /***********************************************/ >> > /* host device */ >> > >> > @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) >> > } >> > #endif >> > >> > + s->fd = -1; >> > s->type = FTYPE_FILE; >> > #if defined(__linux__) >> > { >> > @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags) >> > BDRVRawState *s = bs->opaque; >> > int ret; >> > >> > + s->fd = -1; >> > s->type = FTYPE_FD; >> > >> > /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */ >> > @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags) >> > { >> > BDRVRawState *s = bs->opaque; >> > >> > + s->fd = -1; >> > s->type = FTYPE_CD; >> > >> > /* open will not fail even if no CD is inserted, so add O_NONBLOCK */ >> > @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void) >> > * Register all the drivers. Note that order is important, the driver >> > * registered last will get probed first. >> > */ >> > + bdrv_register(&bdrv_file_fd); >> > bdrv_register(&bdrv_file); >> > bdrv_register(&bdrv_host_device); >> > #ifdef __linux__ >> > diff --git a/block/vmdk.c b/block/vmdk.c >> > index 922b23d..2ea808e 100644 >> > --- a/block/vmdk.c >> > +++ b/block/vmdk.c >> > @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs) >> > return -1; >> > >> > pstrcpy(bs->backing_file, end_name - p_name + 1, p_name); >> > + >> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >> > + /* backing file currently not supported by fd: protocol */ >> > + return -1; >> > + } >> > } >> > >> > return 0; >> > diff --git a/block_int.h b/block_int.h >> > index fa91337..a305ee2 100644 >> > --- a/block_int.h >> > +++ b/block_int.h >> > @@ -152,6 +152,7 @@ struct BlockDriverState { >> > int encrypted; /* if true, the media is encrypted */ >> > int valid_key; /* if true, a valid encryption key has been set */ >> > int sg; /* if true, the device is a /dev/sg* */ >> > + int fd_protocol; /* if true, the fd: protocol was specified */ > bool? > I was following suit here, but I agree that bool would be better. Or better, these could all be reduced to bit flags. My thought is that I'll stick with following suit here though. >> > /* event callback when inserting/removing */ >> > void (*change_cb)(void *opaque, int reason); >> > void *change_opaque; >> > diff --git a/blockdev.c b/blockdev.c >> > index e81e0ab..a536c20 100644 >> > --- a/blockdev.c >> > +++ b/blockdev.c >> > @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) >> > >> > bdrv_flags |= ro ? 0 : BDRV_O_RDWR; >> > >> > + if (strncmp(file, "fd:", 3) == 0) { >> > + dinfo->bdrv->fd_protocol = 1; >> > + } >> > + >> > ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv); >> > if (ret< 0) { >> > error_report("could not open disk image %s: %s", >> > @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) >> > goto out; >> > } >> > >> > + if (bdrv_is_fd_protocol(bs)) { >> > + qerror_report(QERR_UNSUPPORTED); >> > + ret = -1; >> > + goto out; >> > + } >> > + >> > pstrcpy(old_filename, sizeof(old_filename), bs->filename); >> > >> > old_drv = bs->drv; >> > diff --git a/monitor.c b/monitor.c >> > index 59a3e76..ea60be2 100644 >> > --- a/monitor.c >> > +++ b/monitor.c >> > @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname) >> > return -1; >> > } >> > >> > +int qemu_get_fd(const char *fdname) >> > +{ >> > + return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1; >> > +} >> > + >> > static const mon_cmd_t mon_cmds[] = { >> > #include "hmp-commands.h" >> > { NULL, NULL, }, >> > diff --git a/monitor.h b/monitor.h >> > index 4f2d328..de5b987 100644 >> > --- a/monitor.h >> > +++ b/monitor.h >> > @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs, >> > void *opaque); >> > >> > int monitor_get_fd(Monitor *mon, const char *fdname); >> > +int qemu_get_fd(const char *fdname); >> > >> > void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) >> > GCC_FMT_ATTR(2, 0); >> > diff --git a/qemu-options.hx b/qemu-options.hx >> > index 1d5ad8b..f9b66f4 100644 >> > --- a/qemu-options.hx >> > +++ b/qemu-options.hx >> > @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}). >> > ETEXI >> > >> > DEF("drive", HAS_ARG, QEMU_OPTION_drive, >> > - "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" >> > + "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" >> > " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" >> > " [,cache=writethrough|writeback|none|unsafe][,format=f]\n" >> > " [,serial=s][,addr=A][,id=name][,aio=threads|native]\n" >> > @@ -129,10 +129,12 @@ STEXI >> > Define a new drive. Valid options are: >> > >> > @table @option >> > -@item file=@var{file} >> > +@item file=[fd:]@var{file} >> > This option defines which disk image (@pxref{disk_images}) to use with >> > this drive. If the filename contains comma, you must double it >> > -(for instance, "file=my,,file" to use file "my,file"). >> > +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file} >> > +specifies the file descriptor of an already open disk image. >> > +@option{format=}@var{format} is required by @option{fd:}@var{file}. >> > @item if=@var{interface} >> > This option defines on which type on interface the drive is connected. >> > Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio. >> > diff --git a/qemu-tool.c b/qemu-tool.c >> > index 41e5c41..8fe6b8c 100644 >> > --- a/qemu-tool.c >> > +++ b/qemu-tool.c >> > @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock) >> > { >> > return 0; >> > } >> > + >> > +int qemu_get_fd(const char *fdname) >> > +{ >> > + return -1; >> > +} >> > -- >> > 1.7.1 >> > >> > >> > Regards, Corey Bryant