qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Peter Lieven <pl@kamp.de>
Cc: famz@redhat.com, ronniesahlberg@gmail.com, qemu-devel@nongnu.org,
	owasserm@redhat.com, stefanha@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCHv5] block: add native support for NFS
Date: Thu, 9 Jan 2014 15:13:22 +0100	[thread overview]
Message-ID: <20140109141322.GB2862@dhcp-200-207.str.redhat.com> (raw)
In-Reply-To: <1388062130-21126-1-git-send-email-pl@kamp.de>

Am 26.12.2013 um 13:48 hat Peter Lieven geschrieben:
> This patch adds native support for accessing images on NFS shares without
> the requirement to actually mount the entire NFS share on the host.
> 
> NFS Images can simply be specified by an url of the form:
> nfs://<host>/<export>/<filename>
> 
> For example:
> qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
> 
> You need LibNFS from Ronnie Sahlberg available at:
>    git://github.com/sahlberg/libnfs.git
> for this to work.
> 
> During configure it is automatically probed for libnfs and support
> is enabled on-the-fly. You can forbid or enforce libnfs support
> with --disable-libnfs or --enable-libnfs respectively.
> 
> Due to NFS restrictions you might need to execute your binaries
> as root, allow them to open priviledged ports (<1024) or specify
> insecure option on the NFS server.
> 
> For additional information on ROOT vs. non-ROOT operation and URL
> format + parameters see:
>    https://raw.github.com/sahlberg/libnfs/master/README
> 
> LibNFS currently support NFS version 3 only.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> v4->v5:
>  - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS.
>    This allows for URL parameter processing directly in LibNFS without altering
>    the qemu NFS block driver. This bumps the version requirement for LibNFS
>    to 1.9.0 though.

Considering that we'll likely want to add new options in the future, I'm
not sure if this is a good idea. This means that struct nfs_url will
change, and if qemu isn't updated, it might not even notice that some
option was requested in a new field that it doesn't know and provide,
so it will silently ignore it. Or if we have a qemu built against an
older libnfs, this must be marked as an incompatible ABI change, so it
can't run at all with the newer libnfs version.

>  - added a pointer to the LibNFS readme where additional information about
>    ROOT privilidge requirements can be found as this raised a few concerns.
>  - removed a trailing dot in an error statement [Fam].
> 
> v3->v4:
>  - finally added full implementation of bdrv_get_allocated_file_size [Stefan]
>  - removed trailing \n from error statements [Stefan]
> 
> v2->v3:
>  - rebased the stefanha/block
>  - use pkg_config to check for libnfs (ignoring cflags which are broken in 1.8.0) [Stefan]
>  - fixed NFSClient declaration [Stefan]
>  - renamed Task variables to task [Stefan]
>  - renamed NFSTask to NFSRPC [Ronnie]
>  - do not update bs->total_sectors in nfs_co_writev [Stefan]
>  - return -ENOMEM on all async call failures [Stefan,Ronnie]
>  - fully implement ftruncate
>  - use util/uri.c for URL parsing [Stefan]
>  - reworked nfs_file_open_common to nfs_client_open which works on NFSClient [Stefan]
>  - added a comment ot the connect message that libnfs support NFSv3 only at the moment.
>  - DID NOT add full implementation of bdrv_get_allocated_file_size because
>    we are not in a coroutine context and I cannot do an async call here.
>    I could do a sync call if there would be a guarantee that no requests
>    are in flight. [Stefan]
> 
> v1->v2:
>  - fixed block/Makefile.objs [Ronnie]
>  - do not always register a read handler [Ronnie]
>  - add support for reading beyond EOF [Fam]
>  - fixed struct and paramter naming [Fam]
>  - fixed overlong lines and whitespace errors [Fam]
>  - return return status from libnfs whereever possible [Fam]
>  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
>  - avoid segfault when parsing filname [Fam]
>  - remove unused close_bh from NFSClient [Fam]
>  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in nfs_file_create [Fam]
> 
>  MAINTAINERS         |    5 +
>  block/Makefile.objs |    1 +
>  block/nfs.c         |  405 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure           |   26 ++++

qapi-schema.json is missing, so you can't add NFS block devices using
blockdev-add.

>  4 files changed, 437 insertions(+)
>  create mode 100644 block/nfs.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a5ab8f8..09996ab 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -935,6 +935,11 @@ M: Peter Lieven <pl@kamp.de>
>  S: Supported
>  F: block/iscsi.c
>  
> +NFS
> +M: Peter Lieven <pl@kamp.de>
> +S: Maintained
> +F: block/nfs.c
> +
>  SSH
>  M: Richard W.M. Jones <rjones@redhat.com>
>  S: Supported
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 4e8c91e..e254a21 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  ifeq ($(CONFIG_POSIX),y)
>  block-obj-y += nbd.o nbd-client.o sheepdog.o
>  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
> +block-obj-$(CONFIG_LIBNFS) += nfs.o
>  block-obj-$(CONFIG_CURL) += curl.o
>  block-obj-$(CONFIG_RBD) += rbd.o
>  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
> diff --git a/block/nfs.c b/block/nfs.c
> new file mode 100644
> index 0000000..4023b71
> --- /dev/null
> +++ b/block/nfs.c
> @@ -0,0 +1,405 @@
> +/*
> + * QEMU Block driver for native access to files on NFS shares
> + *
> + * Copyright (c) 2013 Peter Lieven <pl@kamp.de>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "config-host.h"
> +
> +#include <poll.h>
> +#include "qemu-common.h"
> +#include "qemu/config-file.h"
> +#include "qemu/error-report.h"
> +#include "block/block_int.h"
> +#include "trace.h"
> +#include "qemu/iov.h"
> +#include "sysemu/sysemu.h"
> +
> +#include <nfsc/libnfs-zdr.h>
> +#include <nfsc/libnfs.h>
> +#include <nfsc/libnfs-raw.h>
> +#include <nfsc/libnfs-raw-mount.h>
> +
> +typedef struct NFSClient {
> +    struct nfs_context *context;
> +    struct nfsfh *fh;
> +    int events;
> +    bool has_zero_init;
> +} NFSClient;
> +
> +typedef struct NFSRPC {
> +    int status;
> +    int complete;
> +    QEMUIOVector *iov;
> +    struct stat *st;
> +    Coroutine *co;
> +    QEMUBH *bh;
> +} NFSRPC;
> +
> +static void nfs_process_read(void *arg);
> +static void nfs_process_write(void *arg);
> +
> +static void nfs_set_events(NFSClient *client)
> +{
> +    int ev = nfs_which_events(client->context);
> +    if (ev != client->events) {
> +        qemu_aio_set_fd_handler(nfs_get_fd(client->context),
> +                      (ev & POLLIN) ? nfs_process_read : NULL,
> +                      (ev & POLLOUT) ? nfs_process_write : NULL,
> +                      client);
> +
> +    }
> +    client->events = ev;
> +}
> +
> +static void nfs_process_read(void *arg)
> +{
> +    NFSClient *client = arg;
> +    nfs_service(client->context, POLLIN);
> +    nfs_set_events(client);
> +}
> +
> +static void nfs_process_write(void *arg)
> +{
> +    NFSClient *client = arg;
> +    nfs_service(client->context, POLLOUT);
> +    nfs_set_events(client);
> +}
> +
> +static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
> +{
> +    *task = (NFSRPC) {
> +        .co         = qemu_coroutine_self(),
> +    };
> +}
> +
> +static void nfs_co_generic_bh_cb(void *opaque)
> +{
> +    NFSRPC *task = opaque;
> +    qemu_bh_delete(task->bh);
> +    qemu_coroutine_enter(task->co, NULL);
> +}
> +
> +static void
> +nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
> +                  void *private_data)
> +{
> +    NFSRPC *task = private_data;
> +    task->complete = 1;
> +    task->status = status;
> +    if (task->status > 0 && task->iov) {
> +        if (task->status <= task->iov->size) {
> +            qemu_iovec_from_buf(task->iov, 0, data, task->status);
> +        } else {
> +            task->status = -EIO;

Short reads don't happen in practice with libnfs?

> +        }
> +    }
> +    if (task->status == 0 && task->st) {
> +        memcpy(task->st, data, sizeof(struct stat));
> +    }
> +    if (task->co) {
> +        task->bh = qemu_bh_new(nfs_co_generic_bh_cb, task);
> +        qemu_bh_schedule(task->bh);
> +    }
> +}
> +
> +static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
> +                                     int64_t sector_num, int nb_sectors,
> +                                     QEMUIOVector *iov)
> +{
> +    NFSClient *client = bs->opaque;
> +    NFSRPC task;
> +
> +    nfs_co_init_task(client, &task);
> +    task.iov = iov;
> +
> +    if (nfs_pread_async(client->context, client->fh,
> +                        sector_num * BDRV_SECTOR_SIZE,
> +                        nb_sectors * BDRV_SECTOR_SIZE,
> +                        nfs_co_generic_cb, &task) != 0) {
> +        return -ENOMEM;
> +    }
> +
> +    while (!task.complete) {
> +        nfs_set_events(client);
> +        qemu_coroutine_yield();
> +    }
> +
> +    if (task.status < 0) {
> +        return task.status;
> +    }
> +
> +    return 0;
> +}
> +
> +static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
> +                                        int64_t sector_num, int nb_sectors,
> +                                        QEMUIOVector *iov)
> +{
> +    NFSClient *client = bs->opaque;
> +    NFSRPC task;
> +    char *buf = NULL;
> +
> +    nfs_co_init_task(client, &task);
> +
> +    buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
> +    qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
> +
> +    if (nfs_pwrite_async(client->context, client->fh,
> +                         sector_num * BDRV_SECTOR_SIZE,
> +                         nb_sectors * BDRV_SECTOR_SIZE,
> +                         buf, nfs_co_generic_cb, &task) != 0) {
> +        g_free(buf);
> +        return -ENOMEM;
> +    }
> +
> +    while (!task.complete) {
> +        nfs_set_events(client);
> +        qemu_coroutine_yield();
> +    }
> +
> +    g_free(buf);
> +
> +    if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
> +        return task.status < 0 ? task.status : -EIO;

Does this duplicate the logic of nfs_co_generic_cb?

> +    }
> +
> +    return 0;
> +}
> +
> +static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
> +{
> +    NFSClient *client = bs->opaque;
> +    NFSRPC task;
> +
> +    nfs_co_init_task(client, &task);
> +
> +    if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
> +                        &task) != 0) {
> +        return -ENOMEM;
> +    }
> +
> +    while (!task.complete) {
> +        nfs_set_events(client);
> +        qemu_coroutine_yield();
> +    }
> +
> +    return task.status;
> +}
> +
> +static QemuOptsList runtime_opts = {
> +    .name = "nfs",
> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> +    .desc = {
> +        {
> +            .name = "filename",
> +            .type = QEMU_OPT_STRING,
> +            .help = "URL to the NFS file",
> +        },
> +        { /* end of list */ }
> +    },
> +};

I think this is the point where I disagree. First of all, if it's a URL,
call it "url" instead of "filename". But more importantly, a URL encodes
options in a string instead of structured options that can be set
separately.

I would split this into options 'server', 'path', etc. and only have
bdrv_parse_filename() invoke the URL parsing to fill these options if
the user preferred the URL style.

Kevin

  parent reply	other threads:[~2014-01-09 14:13 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-26 12:48 [Qemu-devel] [PATCHv5] block: add native support for NFS Peter Lieven
2014-01-03 10:37 ` Stefan Hajnoczi
2014-01-03 10:51   ` Peter Lieven
2014-01-03 11:04   ` Peter Lieven
2014-01-03 11:28   ` Peter Lieven
2014-01-06  1:18     ` Stefan Hajnoczi
2014-01-06  6:53       ` Peter Lieven
2014-01-09 14:13 ` Kevin Wolf [this message]
2014-01-09 16:08   ` Peter Lieven
2014-01-10 11:40     ` Kevin Wolf
2014-01-10 12:12       ` Peter Lieven
2014-01-10 12:30         ` Paolo Bonzini
2014-01-10 14:49           ` ronnie sahlberg
2014-01-10 15:05             ` Peter Lieven
2014-01-10 15:46               ` Kevin Wolf
2014-01-10 16:10                 ` Peter Lieven
2014-01-10 17:16                   ` ronnie sahlberg
2014-01-10 18:05                     ` Paolo Bonzini
2014-01-10 18:07                       ` Peter Lieven
2014-01-10 18:24                         ` Paolo Bonzini
2014-01-10 18:47                           ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140109141322.GB2862@dhcp-200-207.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).