From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33605) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VspSA-00021Q-HC for qemu-devel@nongnu.org; Tue, 17 Dec 2013 02:52:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VspS6-0002FZ-43 for qemu-devel@nongnu.org; Tue, 17 Dec 2013 02:52:30 -0500 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:54015 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VspS5-0002FR-Ls for qemu-devel@nongnu.org; Tue, 17 Dec 2013 02:52:26 -0500 Message-ID: <52B002F2.4060906@kamp.de> Date: Tue, 17 Dec 2013 08:53:22 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1387208069-9302-1-git-send-email-pl@kamp.de> <52AFCDF4.9020804@redhat.com> In-Reply-To: <52AFCDF4.9020804@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH] block: add native support for NFS List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng , qemu-devel@nongnu.org Cc: kwolf@redhat.com, pbonzini@redhat.com, stefanha@redhat.com, ronniesahlberg@gmail.com Hi Fam, On 17.12.2013 05:07, Fam Zheng wrote: > On 2013年12月16日 23:34, Peter Lieven wrote: >> This patch adds native support for accessing images on NFS shares without >> the requirement to actually mount the entire NFS share on the host. >> >> NFS Images can simply be specified by an url of the form: >> nfs://// >> >> For example: >> qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2 >> >> You need libnfs from Ronnie Sahlberg available at: >> git://github.com/sahlberg/libnfs.git >> for this to work. >> >> During configure it is automatically probed for libnfs and support >> is enabled on-the-fly. You can forbid or enforce libnfs support >> with --disable-libnfs or --enable-libnfs respectively. >> >> Due to NFS restrictions you might need to execute your binaries >> as root, allow them to open priviledged ports (<1024) or specify >> insecure option on the NFS server. >> >> Signed-off-by: Peter Lieven > > Looks nice! Thanks for the work! Thank you ;-) > >> --- >> MAINTAINERS | 5 + >> block/Makefile.objs | 1 + >> block/nfs.c | 420 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> configure | 38 +++++ >> 4 files changed, 464 insertions(+) >> create mode 100644 block/nfs.c >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index c19133f..f53d184 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -899,6 +899,11 @@ M: Peter Lieven >> S: Supported >> F: block/iscsi.c >> >> +NFS >> +M: Peter Lieven >> +S: Maintained >> +F: block/nfs.c >> + >> SSH >> M: Richard W.M. Jones >> S: Supported >> diff --git a/block/Makefile.objs b/block/Makefile.objs >> index f43ecbc..1bac94e 100644 >> --- a/block/Makefile.objs >> +++ b/block/Makefile.objs >> @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o >> ifeq ($(CONFIG_POSIX),y) >> block-obj-y += nbd.o sheepdog.o >> block-obj-$(CONFIG_LIBISCSI) += iscsi.o >> +block-obj-$(CONFIG_LIBISCSI) += nfs.o >> block-obj-$(CONFIG_CURL) += curl.o >> block-obj-$(CONFIG_RBD) += rbd.o >> block-obj-$(CONFIG_GLUSTERFS) += gluster.o >> diff --git a/block/nfs.c b/block/nfs.c >> new file mode 100644 >> index 0000000..d6cb4c0 >> --- /dev/null >> +++ b/block/nfs.c >> @@ -0,0 +1,420 @@ >> +/* >> + * QEMU Block driver for native access to files on NFS shares >> + * >> + * Copyright (c) 2013 Peter Lieven >> + * >> + * Permission is hereby granted, free of charge, to any person obtaining a copy >> + * of this software and associated documentation files (the "Software"), to deal >> + * in the Software without restriction, including without limitation the rights >> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell >> + * copies of the Software, and to permit persons to whom the Software is >> + * furnished to do so, subject to the following conditions: >> + * >> + * The above copyright notice and this permission notice shall be included in >> + * all copies or substantial portions of the Software. >> + * >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN >> + * THE SOFTWARE. >> + */ >> + >> +#include "config-host.h" >> + >> +#include >> +#include >> +#include "qemu-common.h" >> +#include "qemu/config-file.h" >> +#include "qemu/error-report.h" >> +#include "block/block_int.h" >> +#include "trace.h" >> +#include "block/scsi.h" >> +#include "qemu/iov.h" >> +#include "sysemu/sysemu.h" >> +#include "qmp-commands.h" > > Copied from block/iscsi.c? SCSI and QMP are not necessary for this file. And maybe also arpa/inet.h, I'm not sure about that though. Yes, libiscsi and libnfs are quite similar. ;-) > >> + >> +#include >> +#include >> +#include >> +#include >> + >> +typedef struct nfsclient { >> + struct nfs_context *context; >> + struct nfsfh *fh; >> + int events; >> + bool has_zero_init; >> + int64_t allocated_file_size; >> + QEMUBH *close_bh; > > This is unused. Ups, its a leftover from a nasty segfault debug session. > >> +} nfsclient; > > Please use CamelCase for type names... Ok > >> + >> +typedef struct NFSTask { >> + int status; >> + int complete; >> + QEMUIOVector *iov; >> + Coroutine *co; >> + QEMUBH *bh; >> +} NFSTask; > > as you do with this. > >> + >> +static void nfs_process_read(void *arg); >> +static void nfs_process_write(void *arg); >> + >> +static void nfs_set_events(nfsclient *client) >> +{ >> + int ev; >> + /* We always register a read handler. */ >> + ev = POLLIN; >> + ev |= nfs_which_events(client->context); >> + if (ev != client->events) { >> + qemu_aio_set_fd_handler(nfs_get_fd(client->context), >> + nfs_process_read, >> + (ev & POLLOUT) ? nfs_process_write : NULL, >> + client); >> + >> + } >> + client->events = ev; >> +} >> + >> +static void nfs_process_read(void *arg) >> +{ >> + nfsclient *client = arg; >> + nfs_service(client->context, POLLIN); >> + nfs_set_events(client); >> +} >> + >> +static void nfs_process_write(void *arg) >> +{ >> + nfsclient *client = arg; >> + nfs_service(client->context, POLLOUT); >> + nfs_set_events(client); >> +} >> + >> +static void nfs_co_init_task(nfsclient *client, NFSTask *Task) >> +{ >> + *Task = (NFSTask) { > > Please use lower case for variable names. Ok > >> + .co = qemu_coroutine_self(), >> + }; >> +} >> + >> +static void nfs_co_generic_bh_cb(void *opaque) >> +{ >> + NFSTask *Task = opaque; >> + qemu_bh_delete(Task->bh); >> + qemu_coroutine_enter(Task->co, NULL); >> +} >> + >> +static void nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data, void *private_data) > > This line is too long. Please use scripts/checkpatch.pl to check the coding style. (Some other lines have trailing whitespaces) I thought I had done this. Strange... > >> +{ >> + NFSTask *Task = private_data; >> + Task->complete = 1; >> + Task->status = status; >> + if (Task->status > 0 && Task->iov) { >> + if (Task->status == Task->iov->size) { >> + qemu_iovec_from_buf(Task->iov, 0, data, status); >> + } else { >> + Task->status = -1; >> + } >> + } >> + if (Task->co) { >> + Task->bh = qemu_bh_new(nfs_co_generic_bh_cb, Task); >> + qemu_bh_schedule(Task->bh); >> + } >> +} >> + >> +static int coroutine_fn nfs_co_readv(BlockDriverState *bs, >> + int64_t sector_num, int nb_sectors, >> + QEMUIOVector *iov) >> +{ >> + nfsclient *client = bs->opaque; >> + struct NFSTask Task; >> + >> + nfs_co_init_task(client, &Task); >> + Task.iov = iov; >> + >> + if (nfs_pread_async(client->context, client->fh, >> + sector_num * BDRV_SECTOR_SIZE, >> + nb_sectors * BDRV_SECTOR_SIZE, >> + nfs_co_generic_cb, &Task) != 0) { >> + return -EIO; >> + } >> + >> + while (!Task.complete) { >> + nfs_set_events(client); >> + qemu_coroutine_yield(); >> + } >> + >> + if (Task.status != nb_sectors * BDRV_SECTOR_SIZE) { >> + return -EIO; > > In error case, does Task.status possibly contain error number other than -EIO? Would it be useful to return the value? > You are right. checked the libnfs code and the function nfsstat3_to_errno(int error) which wraps NFS errors to POSIX errors. I will pass the error in case Task.status is < 0. >> + } >> + >> + return 0; >> +} >> + >> +static int coroutine_fn nfs_co_writev(BlockDriverState *bs, >> + int64_t sector_num, int nb_sectors, >> + QEMUIOVector *iov) >> +{ >> + nfsclient *client = bs->opaque; >> + struct NFSTask Task; >> + char *buf = NULL; >> + >> + nfs_co_init_task(client, &Task); >> + >> + buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE); >> + qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE); >> + >> + if (nfs_pwrite_async(client->context, client->fh, >> + sector_num * BDRV_SECTOR_SIZE, >> + nb_sectors * BDRV_SECTOR_SIZE, >> + buf, nfs_co_generic_cb, &Task) != 0) { >> + g_free(buf); >> + return -EIO; >> + } >> + >> + while (!Task.complete) { >> + nfs_set_events(client); >> + qemu_coroutine_yield(); >> + } >> + >> + g_free(buf); >> + >> + if (Task.status != nb_sectors * BDRV_SECTOR_SIZE) { >> + return -EIO; >> + } >> + >> + bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors); >> + client->allocated_file_size = -ENOTSUP; > > Why does allocated_file_size become not supported after a write? I thought that someone would ask this ;-) bdrv_allocated_file_size is only used in image info. I saved some code here implementing an async call. On open I fstat anyway and store that value. For qemu-img info this is sufficient, but the allocated size likely changes after a write. -ENOTSUP is the default if bdrv_allocated_file_size is not implemented. > >> + return 0; >> +} >> + >> +static int coroutine_fn nfs_co_flush(BlockDriverState *bs) >> +{ >> + nfsclient *client = bs->opaque; >> + struct NFSTask Task; >> + >> + nfs_co_init_task(client, &Task); >> + >> + if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb, &Task) != 0) { >> + return -EIO; >> + } >> + >> + while (!Task.complete) { >> + nfs_set_events(client); >> + qemu_coroutine_yield(); >> + } >> + >> + if (Task.status != 0) { >> + return -EIO; >> + } >> + >> + return 0; >> +} >> + >> +static QemuOptsList runtime_opts = { >> + .name = "nfs", >> + .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head), >> + .desc = { >> + { >> + .name = "filename", >> + .type = QEMU_OPT_STRING, >> + .help = "URL to the NFS file", >> + }, >> + { /* end of list */ } >> + }, >> +}; >> + >> +static void nfs_file_close(BlockDriverState *bs) >> +{ >> + nfsclient *client = bs->opaque; >> + if (client->context) { >> + if (client->fh) { >> + nfs_close(client->context, client->fh); >> + } >> + qemu_aio_set_fd_handler(nfs_get_fd(client->context), NULL, NULL, NULL); >> + nfs_destroy_context(client->context); >> + } >> + memset(client, 0, sizeof(nfsclient)); >> +} >> + >> + >> +static int nfs_file_open_common(BlockDriverState *bs, QDict *options, int flags, >> + int open_flags, Error **errp) >> +{ >> + nfsclient *client = bs->opaque; >> + const char *filename; >> + int ret = 0; >> + QemuOpts *opts; >> + Error *local_err = NULL; >> + char *server = NULL, *path = NULL, *file = NULL, *strp; >> + struct stat st; >> + >> + opts = qemu_opts_create_nofail(&runtime_opts); >> + qemu_opts_absorb_qdict(opts, options, &local_err); >> + if (error_is_set(&local_err)) { >> + qerror_report_err(local_err); >> + error_free(local_err); >> + ret = -EINVAL; >> + goto fail; >> + } >> + >> + filename = qemu_opt_get(opts, "filename"); >> + >> + client->context = nfs_init_context(); >> + >> + if (client->context == NULL) { >> + error_setg(errp, "Failed to init NFS context"); >> + ret = -EINVAL; >> + goto fail; >> + } >> + >> + server = g_strdup(filename + 6); > > Please check the length of filename is longer than 6 before accessing filename[6]. Good point. I will check for this, but in fact I think it can't happen because we will never end up there if filename does not start with nfs:// > >> + if (server[0] == '/' || server[0] == '\0') { >> + error_setg(errp, "Invalid server in URL"); >> + ret = -EINVAL; >> + goto fail; >> + } >> + strp = strchr(server, '/'); >> + if (strp == NULL) { >> + error_setg(errp, "Invalid URL specified.\n"); >> + ret = -EINVAL; >> + goto fail; >> + } >> + path = g_strdup(strp); >> + *strp = 0; >> + strp = strrchr(path, '/'); >> + if (strp == NULL) { >> + error_setg(errp, "Invalid URL specified.\n"); >> + ret = -EINVAL; >> + goto fail; >> + } >> + file = g_strdup(strp); >> + *strp = 0; >> + >> + if (nfs_mount(client->context, server, path) != 0) { >> + error_setg(errp, "Failed to mount nfs share: %s", >> + nfs_get_error(client->context)); >> + ret = -EINVAL; >> + goto fail; >> + } >> + >> + if (open_flags & O_CREAT) { >> + if (nfs_creat(client->context, file, 0600, &client->fh) != 0) { >> + error_setg(errp, "Failed to create file: %s", >> + nfs_get_error(client->context)); >> + ret = -EINVAL; >> + goto fail; >> + } >> + } else { >> + open_flags = (flags & BDRV_O_RDWR) ? O_RDWR : O_RDONLY; >> + if (nfs_open(client->context, file, open_flags, &client->fh) != 0) { >> + error_setg(errp, "Failed to open file : %s", >> + nfs_get_error(client->context)); >> + ret = -EINVAL; >> + goto fail; >> + } >> + } >> + >> + if (nfs_fstat(client->context, client->fh, &st) != 0) { >> + error_setg(errp, "Failed to fstat file: %s", >> + nfs_get_error(client->context)); >> + ret = -EIO; >> + goto fail; >> + } >> + >> + bs->total_sectors = st.st_size / BDRV_SECTOR_SIZE; > > Please use DIV_ROUND_UP(). Otherwise the remainder in last sector couldn't be read. Will do. Can't it happen that we end up reading unallocated sectors? > >> + client->has_zero_init = S_ISREG(st.st_mode); >> + client->allocated_file_size = st.st_blocks * st.st_blksize; >> + goto out; >> +fail: >> + nfs_file_close(bs); >> +out: >> + g_free(server); >> + g_free(path); >> + g_free(file); >> + return ret; >> +} >> + >> +static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags, >> + Error **errp) { >> + return nfs_file_open_common(bs, options, flags, 0, errp); >> +} >> + >> +static int nfs_file_create(const char *filename, QEMUOptionParameter *options, >> + Error **errp) >> +{ >> + int ret = 0; >> + int64_t total_size = 0; >> + BlockDriverState *bs; >> + nfsclient *client = NULL; >> + QDict *bs_options; >> + >> + bs = bdrv_new(""); >> + >> + /* Read out options */ >> + while (options && options->name) { >> + if (!strcmp(options->name, "size")) { >> + total_size = options->value.n / BDRV_SECTOR_SIZE; > > Why divide by BDRV_SECTOR_SIZE only to ... copy and paste error. > >> + } >> + options++; >> + } >> + >> + bs->opaque = g_malloc0(sizeof(struct nfsclient)); >> + client = bs->opaque; >> + >> + bs_options = qdict_new(); >> + qdict_put(bs_options, "filename", qstring_from_str(filename)); >> + ret = nfs_file_open_common(bs, bs_options, 0, O_CREAT, NULL); >> + QDECREF(bs_options); >> + if (ret != 0) { >> + goto out; >> + } >> + ret = nfs_ftruncate(client->context, client->fh, total_size * BDRV_SECTOR_SIZE); > > ... multiply it back later? > >> + if (ret != 0) { >> + ret = -ENOSPC;; > > There is an extra semicolon. And is it right to hard code ENOSPC here, without checking value of ret? will return the real error wrapped by nfsstat3_to_errno(). > >> + } >> +out: >> + nfs_file_close(bs); >> + g_free(bs->opaque); >> + bs->opaque = NULL; >> + bdrv_unref(bs); >> + return ret; >> +} >> + > > Fam > Thanks for reviewing! One wish as I think you are the VMDK format maintainer. Can you rework vmdk_create and possible other fucntions in VMDK to use only bdrv_* functions (like in qcow2_create). Currently its not possible to create a VMDK on an NFS share directly caused by useing qemu_file_* calls. This also affectecs other drivers. QCOW2 and RAW work perfectly. Peter