qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, pbonzini@redhat.com, stefanha@redhat.com,
	ronniesahlberg@gmail.com
Subject: Re: [Qemu-devel] [PATCH] block: add native support for NFS
Date: Tue, 17 Dec 2013 08:53:22 +0100	[thread overview]
Message-ID: <52B002F2.4060906@kamp.de> (raw)
In-Reply-To: <52AFCDF4.9020804@redhat.com>

Hi Fam,

On 17.12.2013 05:07, Fam Zheng wrote:
> On 2013年12月16日 23:34, Peter Lieven wrote:
>> This patch adds native support for accessing images on NFS shares without
>> the requirement to actually mount the entire NFS share on the host.
>>
>> NFS Images can simply be specified by an url of the form:
>> nfs://<host>/<export>/<filename>
>>
>> For example:
>> qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
>>
>> You need libnfs from Ronnie Sahlberg available at:
>>     git://github.com/sahlberg/libnfs.git
>> for this to work.
>>
>> During configure it is automatically probed for libnfs and support
>> is enabled on-the-fly. You can forbid or enforce libnfs support
>> with --disable-libnfs or --enable-libnfs respectively.
>>
>> Due to NFS restrictions you might need to execute your binaries
>> as root, allow them to open priviledged ports (<1024) or specify
>> insecure option on the NFS server.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>
> Looks nice! Thanks for the work!
Thank you ;-)
>
>> ---
>>   MAINTAINERS         |    5 +
>>   block/Makefile.objs |    1 +
>>   block/nfs.c         |  420 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   configure           |   38 +++++
>>   4 files changed, 464 insertions(+)
>>   create mode 100644 block/nfs.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index c19133f..f53d184 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -899,6 +899,11 @@ M: Peter Lieven <pl@kamp.de>
>>   S: Supported
>>   F: block/iscsi.c
>>
>> +NFS
>> +M: Peter Lieven <pl@kamp.de>
>> +S: Maintained
>> +F: block/nfs.c
>> +
>>   SSH
>>   M: Richard W.M. Jones <rjones@redhat.com>
>>   S: Supported
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index f43ecbc..1bac94e 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>>   ifeq ($(CONFIG_POSIX),y)
>>   block-obj-y += nbd.o sheepdog.o
>>   block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>> +block-obj-$(CONFIG_LIBISCSI) += nfs.o
>>   block-obj-$(CONFIG_CURL) += curl.o
>>   block-obj-$(CONFIG_RBD) += rbd.o
>>   block-obj-$(CONFIG_GLUSTERFS) += gluster.o
>> diff --git a/block/nfs.c b/block/nfs.c
>> new file mode 100644
>> index 0000000..d6cb4c0
>> --- /dev/null
>> +++ b/block/nfs.c
>> @@ -0,0 +1,420 @@
>> +/*
>> + * QEMU Block driver for native access to files on NFS shares
>> + *
>> + * Copyright (c) 2013 Peter Lieven <pl@kamp.de>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "config-host.h"
>> +
>> +#include <poll.h>
>> +#include <arpa/inet.h>
>> +#include "qemu-common.h"
>> +#include "qemu/config-file.h"
>> +#include "qemu/error-report.h"
>> +#include "block/block_int.h"
>> +#include "trace.h"
>> +#include "block/scsi.h"
>> +#include "qemu/iov.h"
>> +#include "sysemu/sysemu.h"
>> +#include "qmp-commands.h"
>
> Copied from block/iscsi.c? SCSI and QMP are not necessary for this file. And maybe also arpa/inet.h, I'm not sure about that though.
Yes, libiscsi and libnfs are quite similar. ;-)
>
>> +
>> +#include <nfsc/libnfs-zdr.h>
>> +#include <nfsc/libnfs.h>
>> +#include <nfsc/libnfs-raw.h>
>> +#include <nfsc/libnfs-raw-mount.h>
>> +
>> +typedef struct nfsclient {
>> +    struct nfs_context *context;
>> +    struct nfsfh *fh;
>> +    int events;
>> +    bool has_zero_init;
>> +    int64_t allocated_file_size;
>> +    QEMUBH *close_bh;
>
> This is unused.
Ups, its a leftover from a nasty segfault debug session.
>
>> +} nfsclient;
>
> Please use CamelCase for type names...
Ok
>
>> +
>> +typedef struct NFSTask {
>> +    int status;
>> +    int complete;
>> +    QEMUIOVector *iov;
>> +    Coroutine *co;
>> +    QEMUBH *bh;
>> +} NFSTask;
>
> as you do with this.
>
>> +
>> +static void nfs_process_read(void *arg);
>> +static void nfs_process_write(void *arg);
>> +
>> +static void nfs_set_events(nfsclient *client)
>> +{
>> +    int ev;
>> +    /* We always register a read handler.  */
>> +    ev = POLLIN;
>> +    ev |= nfs_which_events(client->context);
>> +    if (ev != client->events) {
>> +        qemu_aio_set_fd_handler(nfs_get_fd(client->context),
>> +                      nfs_process_read,
>> +                      (ev & POLLOUT) ? nfs_process_write : NULL,
>> +                      client);
>> +
>> +    }
>> +    client->events = ev;
>> +}
>> +
>> +static void nfs_process_read(void *arg)
>> +{
>> +    nfsclient *client = arg;
>> +    nfs_service(client->context, POLLIN);
>> +    nfs_set_events(client);
>> +}
>> +
>> +static void nfs_process_write(void *arg)
>> +{
>> +    nfsclient *client = arg;
>> +    nfs_service(client->context, POLLOUT);
>> +    nfs_set_events(client);
>> +}
>> +
>> +static void nfs_co_init_task(nfsclient *client, NFSTask *Task)
>> +{
>> +    *Task = (NFSTask) {
>
> Please use lower case for variable names.
Ok
>
>> +        .co         = qemu_coroutine_self(),
>> +    };
>> +}
>> +
>> +static void nfs_co_generic_bh_cb(void *opaque)
>> +{
>> +    NFSTask *Task = opaque;
>> +    qemu_bh_delete(Task->bh);
>> +    qemu_coroutine_enter(Task->co, NULL);
>> +}
>> +
>> +static void nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data, void *private_data)
>
> This line is too long. Please use scripts/checkpatch.pl to check the coding style. (Some other lines have trailing whitespaces)
I thought I had done this. Strange...
>
>> +{
>> +    NFSTask *Task = private_data;
>> +    Task->complete = 1;
>> +    Task->status = status;
>> +    if (Task->status > 0 && Task->iov) {
>> +        if (Task->status == Task->iov->size) {
>> +            qemu_iovec_from_buf(Task->iov, 0, data, status);
>> +        } else {
>> +            Task->status = -1;
>> +        }
>> +    }
>> +    if (Task->co) {
>> +        Task->bh = qemu_bh_new(nfs_co_generic_bh_cb, Task);
>> +        qemu_bh_schedule(Task->bh);
>> +    }
>> +}
>> +
>> +static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
>> +                                     int64_t sector_num, int nb_sectors,
>> +                                     QEMUIOVector *iov)
>> +{
>> +    nfsclient *client = bs->opaque;
>> +    struct NFSTask Task;
>> +
>> +    nfs_co_init_task(client, &Task);
>> +    Task.iov = iov;
>> +
>> +    if (nfs_pread_async(client->context, client->fh,
>> +                        sector_num * BDRV_SECTOR_SIZE,
>> +                        nb_sectors * BDRV_SECTOR_SIZE,
>> +                        nfs_co_generic_cb, &Task) != 0) {
>> +        return -EIO;
>> +    }
>> +
>> +    while (!Task.complete) {
>> +        nfs_set_events(client);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    if (Task.status != nb_sectors * BDRV_SECTOR_SIZE) {
>> +        return -EIO;
>
> In error case, does Task.status possibly contain error number other than -EIO? Would it be useful to return the value?
>
You are right. checked the libnfs code and the function nfsstat3_to_errno(int error) which wraps NFS errors to POSIX errors.
I will pass the error in case Task.status is < 0.

>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
>> +                                        int64_t sector_num, int nb_sectors,
>> +                                        QEMUIOVector *iov)
>> +{
>> +    nfsclient *client = bs->opaque;
>> +    struct NFSTask Task;
>> +    char *buf = NULL;
>> +
>> +    nfs_co_init_task(client, &Task);
>> +
>> +    buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
>> +    qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
>> +
>> +    if (nfs_pwrite_async(client->context, client->fh,
>> +                         sector_num * BDRV_SECTOR_SIZE,
>> +                         nb_sectors * BDRV_SECTOR_SIZE,
>> +                         buf, nfs_co_generic_cb, &Task) != 0) {
>> +        g_free(buf);
>> +        return -EIO;
>> +    }
>> +
>> +    while (!Task.complete) {
>> +        nfs_set_events(client);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    g_free(buf);
>> +
>> +    if (Task.status != nb_sectors * BDRV_SECTOR_SIZE) {
>> +        return -EIO;
>> +    }
>> +
>> +    bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors);
>> +    client->allocated_file_size = -ENOTSUP;
>
> Why does allocated_file_size become not supported after a write?
I thought that someone would ask this ;-) bdrv_allocated_file_size is only
used in image info. I saved some code here implementing an async call.
On open I fstat anyway and store that value. For qemu-img info this is
sufficient, but the allocated size likely changes after a write. -ENOTSUP
is the default if bdrv_allocated_file_size is not implemented.
>
>> +    return 0;
>> +}
>> +
>> +static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
>> +{
>> +    nfsclient *client = bs->opaque;
>> +    struct NFSTask Task;
>> +
>> +    nfs_co_init_task(client, &Task);
>> +
>> +    if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb, &Task) != 0) {
>> +        return -EIO;
>> +    }
>> +
>> +    while (!Task.complete) {
>> +        nfs_set_events(client);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    if (Task.status != 0) {
>> +        return -EIO;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static QemuOptsList runtime_opts = {
>> +    .name = "nfs",
>> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = "filename",
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "URL to the NFS file",
>> +        },
>> +        { /* end of list */ }
>> +    },
>> +};
>> +
>> +static void nfs_file_close(BlockDriverState *bs)
>> +{
>> +    nfsclient *client = bs->opaque;
>> +    if (client->context) {
>> +        if (client->fh) {
>> +            nfs_close(client->context, client->fh);
>> +        }
>> +        qemu_aio_set_fd_handler(nfs_get_fd(client->context), NULL, NULL, NULL);
>> +        nfs_destroy_context(client->context);
>> +    }
>> +    memset(client, 0, sizeof(nfsclient));
>> +}
>> +
>> +
>> +static int nfs_file_open_common(BlockDriverState *bs, QDict *options, int flags,
>> +                         int open_flags, Error **errp)
>> +{
>> +    nfsclient *client = bs->opaque;
>> +    const char *filename;
>> +    int ret = 0;
>> +    QemuOpts *opts;
>> +    Error *local_err = NULL;
>> +    char *server = NULL, *path = NULL, *file = NULL, *strp;
>> +    struct stat st;
>> +
>> +    opts = qemu_opts_create_nofail(&runtime_opts);
>> +    qemu_opts_absorb_qdict(opts, options, &local_err);
>> +    if (error_is_set(&local_err)) {
>> +        qerror_report_err(local_err);
>> +        error_free(local_err);
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    filename = qemu_opt_get(opts, "filename");
>> +
>> +    client->context = nfs_init_context();
>> +
>> +    if (client->context == NULL) {
>> +        error_setg(errp, "Failed to init NFS context");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    server = g_strdup(filename + 6);
>
> Please check the length of filename is longer than 6 before accessing filename[6].
Good point. I will check for this, but in fact I think it can't happen because we will
never end up there if filename does not start with nfs://
>
>> +    if (server[0] == '/' || server[0] == '\0') {
>> +        error_setg(errp, "Invalid server in URL");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +    strp = strchr(server, '/');
>> +    if (strp == NULL) {
>> +        error_setg(errp, "Invalid URL specified.\n");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +    path = g_strdup(strp);
>> +    *strp = 0;
>> +    strp = strrchr(path, '/');
>> +    if (strp == NULL) {
>> +        error_setg(errp, "Invalid URL specified.\n");
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +    file = g_strdup(strp);
>> +    *strp = 0;
>> +
>> +    if (nfs_mount(client->context, server, path) != 0) {
>> +        error_setg(errp, "Failed to mount nfs share: %s",
>> +                    nfs_get_error(client->context));
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if (open_flags & O_CREAT) {
>> +        if (nfs_creat(client->context, file, 0600, &client->fh) != 0) {
>> +            error_setg(errp, "Failed to create file: %s",
>> + nfs_get_error(client->context));
>> +            ret = -EINVAL;
>> +            goto fail;
>> +        }
>> +    } else {
>> +        open_flags = (flags & BDRV_O_RDWR) ? O_RDWR : O_RDONLY;
>> +        if (nfs_open(client->context, file, open_flags, &client->fh) != 0) {
>> +            error_setg(errp, "Failed to open file : %s",
>> +                       nfs_get_error(client->context));
>> +            ret = -EINVAL;
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    if (nfs_fstat(client->context, client->fh, &st) != 0) {
>> +        error_setg(errp, "Failed to fstat file: %s",
>> +                   nfs_get_error(client->context));
>> +        ret = -EIO;
>> +        goto fail;
>> +    }
>> +
>> +    bs->total_sectors = st.st_size / BDRV_SECTOR_SIZE;
>
> Please use DIV_ROUND_UP(). Otherwise the remainder in last sector couldn't be read.
Will do. Can't it happen that we end up reading unallocated sectors?
>
>> +    client->has_zero_init = S_ISREG(st.st_mode);
>> +    client->allocated_file_size = st.st_blocks * st.st_blksize;
>> +    goto out;
>> +fail:
>> +    nfs_file_close(bs);
>> +out:
>> +    g_free(server);
>> +    g_free(path);
>> +    g_free(file);
>> +    return ret;
>> +}
>> +
>> +static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags,
>> +                         Error **errp) {
>> +    return nfs_file_open_common(bs, options, flags, 0, errp);
>> +}
>> +
>> +static int nfs_file_create(const char *filename, QEMUOptionParameter *options,
>> +                         Error **errp)
>> +{
>> +    int ret = 0;
>> +    int64_t total_size = 0;
>> +    BlockDriverState *bs;
>> +    nfsclient *client = NULL;
>> +    QDict *bs_options;
>> +
>> +    bs = bdrv_new("");
>> +
>> +    /* Read out options */
>> +    while (options && options->name) {
>> +        if (!strcmp(options->name, "size")) {
>> +            total_size = options->value.n / BDRV_SECTOR_SIZE;
>
> Why divide by BDRV_SECTOR_SIZE only to ...
copy and paste error.
>
>> +        }
>> +        options++;
>> +    }
>> +
>> +    bs->opaque = g_malloc0(sizeof(struct nfsclient));
>> +    client = bs->opaque;
>> +
>> +    bs_options = qdict_new();
>> +    qdict_put(bs_options, "filename", qstring_from_str(filename));
>> +    ret = nfs_file_open_common(bs, bs_options, 0, O_CREAT, NULL);
>> +    QDECREF(bs_options);
>> +    if (ret != 0) {
>> +        goto out;
>> +    }
>> +    ret = nfs_ftruncate(client->context, client->fh, total_size * BDRV_SECTOR_SIZE);
>
> ... multiply it back later?
>
>> +    if (ret != 0) {
>> +        ret = -ENOSPC;;
>
> There is an extra semicolon. And is it right to hard code ENOSPC here, without checking value of ret?
will return the real error wrapped by nfsstat3_to_errno().

>
>> +    }
>> +out:
>> +    nfs_file_close(bs);
>> +    g_free(bs->opaque);
>> +    bs->opaque = NULL;
>> +    bdrv_unref(bs);
>> +    return ret;
>> +}
>> +
>
> Fam
>
Thanks for reviewing!

One wish as I think you are the VMDK format maintainer. Can you rework vmdk_create and possible
other fucntions in VMDK to use only bdrv_* functions (like in qcow2_create). Currently its not possible to create
a VMDK on an NFS share directly caused by useing qemu_file_* calls.
This also affectecs other drivers. QCOW2 and RAW work perfectly.

Peter

  reply	other threads:[~2013-12-17  7:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-16 15:34 [Qemu-devel] [PATCH] block: add native support for NFS Peter Lieven
2013-12-16 16:33 ` ronnie sahlberg
2013-12-16 17:01 ` ronnie sahlberg
2013-12-17  4:07 ` Fam Zheng
2013-12-17  7:53   ` Peter Lieven [this message]
2013-12-17  8:29     ` Fam Zheng
2013-12-17  8:46       ` Peter Lieven
2013-12-17  8:51         ` Fam Zheng
2013-12-17  8:55           ` Peter Lieven
2013-12-17  9:01             ` Fam Zheng
2013-12-17  9:07               ` Peter Lieven
2013-12-17  9:46                 ` Fam Zheng
2013-12-17 10:31                   ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52B002F2.4060906@kamp.de \
    --to=pl@kamp.de \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).