[Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
@ 2012-07-21  8:29 Bharata B Rao
  2012-07-21  8:30 ` [Qemu-devel] [RFC PATCH 1/2] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Bharata B Rao @ 2012-07-21  8:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Anand Avati, Amar Tumballi, Vijay Bellur

Hi,

Here is the v2 patchset for supporting GlusterFS protocol from QEMU.

This set of patches enables QEMU to boot VM images from gluster volumes.
This is achieved by adding gluster as a new block backend driver in QEMU.
Its already possible to boot from VM images on gluster volumes, but this
patchset provides the ability to boot VM images from gluster volumes by
by-passing the FUSE layer in gluster. In case the image is present on the
local system, it is possible to even bypass client and server translator and
hence the RPC overhead.

The major change in this version is to not implement libglusterfs based
gluster backend within QEMU but instead use libgfapi. libgfapi library
from GlusterFS project provides APIs to access gluster volumes directly.
With the use of libgfapi, the specification of gluster backend from QEMU
matches more closely with the GlusterFS's way of specifying volumes. We now
specify the gluster backed image like this:

-drive file=gluster:server@port:volname:image

- Here 'gluster' is the protocol.
- 'server@port' specifies the server where the volume file specification for
  the given volume resides. 'port' is the port number on which gluster
  management daemon (glusterd) is listening. This is optional and if not
  specified, QEMU will send 0 which will make libgfapi to use the default
  port.
- 'volname' is the name of the gluster volume which contains the VM image.
- 'image' is the path to the actual VM image in the gluster volume.

Note that we are no longer using volfiles directly and use volume names
instead. For this to work, gluster management daemon (glusterd) needs to
be running on the QEMU node. This limits the QEMU user to access the volumes by
the default volfiles that are generated by gluster CLI. This should be
fine as long as gluster CLI provides the capability to generate or regenerate
volume files for a given volume with the xlator set that QEMU user is
interested in. GlusterFS developers tell me that this can be provided with
some enhancements to Gluster CLI/glusterd. Note that the custom volume files
is typically needed when GlusterFS server is co-located with QEMU in
which case it would  be beneficial to get rid of client-server overhead and
RPC communication overhead.

Using the patches
=================
- GlusterFS backend is enabled by specifying --enable-glusterfs with the
  configure script.
- You need to have installed GlusterFS from latest gluster git which provides
  the required libgfapi library. (git://git.gluster.com/glusterfs.git)
- As mentioned above, the VM image on gluster volume can be specified like
  this:
	-drive file=gluster:localhost:testvol:/F17,format=gluster

  Note that format=gluster is not needed ideally and its a work around I have
  until libgfapi provides a working connection cleanup routine (glfs_fini()).
  When the format isn't specified, QEMU figures out the format by doing
  find_image_format that results in one open and close before opening the
  image file long term for standard read and write. Gluster connection
  initialization is done from open and connection termination is done from
  close. But since glfs_fini() isn't working yet, I am bypassing
  find_image_format by specifying format=gluster directly which results in
  just one open and hence I am not limited by glfs_fini().

Changes for v2
==============
- Removed the libglusterfs based gluster backend implementation within
  QEMU and instead using the APIs from libgfapi.
- Change in the specification from file=gluster:volfile:image to
  file=gluster:server@port:volname:image. format=gluster is no longer
  specified.
- Passing iovectors obtained from generic block layer of QEMU directly to
  libgfapi instead of converting them to linear buffers.
- Processing the aio call back directly from the read thread of event handler
  pipe instead of scheduling a BH and delegating the processing to that BH.
- Added async flush (fsync) support.
- Refusal to handle partial read/writes in gluster block driver.
- Other minor cleanups based on review comments from v1 post. There is one
  comment that is yet to be addressed (using a local argument vs using a
  field of bs - Stefan), I will address them in the next iteration.

v1
==
lists.nongnu.org/archive/html/qemu-devel/2012-06/msg01745.html

fio benchmark numbers
=====================
Environment
-----------
Dual core x86_64 laptop
QEMU (c0958559b1a58)
GlusterFS (35810fb2a7a12)
Guest: Fedora 16 (kernel 3.1.0-7.fc16.x86_64)
Host: Fedora 16 (kernel 3.4)
fio-HEAD-47ea504

fio jobfile
-----------
# cat aio-read-direct-seq 
; Read 4 files with aio at different depths
[global]
ioengine=libaio
direct=1
rw=read
bs=128k
size=512m
directory=/data1

[file1]
iodepth=4

[file2]
iodepth=32

[file3]
iodepth=8

[file4]
iodepth=16

Scenarios
---------
Base: QEMU boots from directly from image on gluster brick.
Fuse mount: QEMU boots from VM image on gluster FUSE mount.
Fuse bypass: QEMU uses gluster protocol and uses standard client volfile.
Fuse bypass custom: QEMU uses gluster protocol and uses a minimal client
	volfile that just has client xlator.
RPC bypass: QEMU uses just posix xlator and doesn't depend on gluster server.

Numbers (aggrb, minb and maxb in kB/s. mint and maxt in msec)
-------
			aggrb	minb	maxb	mint	maxt
Base			63076	15769	17488	29979	33248
Fuse mount		29392	7348	9266	56581	71350
Fuse bypass		53609	13402	14909	35164	39119
Fuse bypass custom	62968	15742	17962	29188	33305
RPC bypass		63505	15876	18534	28287	33023

All the scenarios used if=virtio and cache=none options.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 1/2] qemu: Add a config option for GlusterFS as block backend
  2012-07-21  8:29 [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Bharata B Rao
@ 2012-07-21  8:30 ` Bharata B Rao
  2012-07-21  8:31 ` [Qemu-devel] [RFC PATCH 2/2] block: gluster " Bharata B Rao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Bharata B Rao @ 2012-07-21  8:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Anand Avati, Amar Tumballi, Vijay Bellur

qemu: Add a config option for GlusterFS as block backend

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---

 configure |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)


diff --git a/configure b/configure
index 500fe24..03547b9 100755
--- a/configure
+++ b/configure
@@ -824,6 +824,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent="no"
   ;;
+  --disable-glusterfs) glusterfs="no"
+  ;;
+  --enable-glusterfs) glusterfs="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1110,6 +1114,8 @@ echo "  --disable-guest-agent    disable building of the QEMU Guest Agent"
 echo "  --enable-guest-agent     enable building of the QEMU Guest Agent"
 echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
+echo "  --enable-glusterfs       enable GlusterFS backend"
+echo "  --disable-glusterfs      disable GlusterFS backend"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2259,6 +2265,29 @@ EOF
   fi
 fi
 
+##########################################
+# glusterfs probe
+if test "$glusterfs" != "no" ; then
+  cat > $TMPC <<EOF
+#include <glusterfs/api/glfs.h>
+int main(void) {
+    (void) glfs_new("volume");
+    return 0;
+}
+EOF
+  glusterfs_libs="-lgfapi -lgfrpc -lgfxdr"
+  if compile_prog "" "$glusterfs_libs" ; then
+    glusterfs=yes
+    libs_tools="$glusterfs_libs $libs_tools"
+    libs_softmmu="$glusterfs_libs $libs_softmmu"
+  else
+    if test "$glusterfs" = "yes" ; then
+      feature_not_found "GlusterFS backend support"
+    fi
+    glusterfs=no
+  fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3055,6 +3084,7 @@ echo "OpenGL support    $opengl"
 echo "libiscsi support  $libiscsi"
 echo "build guest agent $guest_agent"
 echo "coroutine backend $coroutine_backend"
+echo "GlusterFS support $glusterfs"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3384,6 +3414,10 @@ if test "$has_environ" = "yes" ; then
   echo "CONFIG_HAS_ENVIRON=y" >> $config_host_mak
 fi
 
+if test "$glusterfs" = "yes" ; then
+  echo "CONFIG_GLUSTERFS=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend
  2012-07-21  8:29 [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Bharata B Rao
  2012-07-21  8:30 ` [Qemu-devel] [RFC PATCH 1/2] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
@ 2012-07-21  8:31 ` Bharata B Rao
  2012-07-22 15:38   ` Stefan Hajnoczi
  2012-07-21 12:22 ` [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Vijay Bellur
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Bharata B Rao @ 2012-07-21  8:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Anand Avati, Amar Tumballi, Vijay Bellur

block: gluster as block backend

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

This patch adds gluster as the new block backend in QEMU. This gives QEMU
the ability to boot VM images from gluster volumes.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---

 block/Makefile.objs |    1 
 block/gluster.c     |  483 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 484 insertions(+), 0 deletions(-)
 create mode 100644 block/gluster.c


diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..a1ae67f 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -9,3 +9,4 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
+block-obj-$(CONFIG_GLUSTERFS) += gluster.o
diff --git a/block/gluster.c b/block/gluster.c
new file mode 100644
index 0000000..c33a006
--- /dev/null
+++ b/block/gluster.c
@@ -0,0 +1,483 @@
+/*
+ * GlusterFS backend for QEMU
+ *
+ * (AIO implementation is derived from block/rbd.c)
+ *
+ * Copyright (C) 2012 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "block_int.h"
+#include <glusterfs/api/glfs.h>
+
+typedef struct GlusterConf {
+    char server[HOST_NAME_MAX];
+    int port;
+    char volname[128]; /* TODO: use GLUSTERD_MAX_VOLUME_NAME */
+    char image[PATH_MAX];
+} GlusterConf;
+
+typedef struct GlusterAIOCB {
+    BlockDriverAIOCB common;
+    QEMUIOVector *qiov;
+    char *bounce;
+    struct BDRVGlusterState *s;
+    int cancelled;
+} GlusterAIOCB;
+
+typedef struct GlusterCBKData {
+    GlusterAIOCB *acb;
+    struct BDRVGlusterState *s;
+    int64_t size;
+    int ret;
+} GlusterCBKData;
+
+typedef struct BDRVGlusterState {
+    struct glfs *glfs;
+    int fds[2];
+    int open_flags;
+    struct glfs_fd *fd;
+    int qemu_aio_count;
+    int event_reader_pos;
+    GlusterCBKData *event_gcbk;
+} BDRVGlusterState;
+
+#define GLUSTER_FD_READ 0
+#define GLUSTER_FD_WRITE 1
+
+static void qemu_gluster_complete_aio(GlusterCBKData *gcbk)
+{
+    GlusterAIOCB *acb = gcbk->acb;
+    int ret;
+
+    if (acb->cancelled) {
+        qemu_aio_release(acb);
+        goto done;
+    }
+
+    if (gcbk->ret == gcbk->size) {
+        ret = 0; /* Success */
+    } else if (gcbk->ret < 0) {
+        ret = gcbk->ret; /* Read/Write failed */
+    } else {
+        ret = -EINVAL; /* Partial read/write - fail it */
+    }
+    acb->common.cb(acb->common.opaque, ret);
+    qemu_aio_release(acb);
+
+done:
+    g_free(gcbk);
+}
+
+static void qemu_gluster_aio_event_reader(void *opaque)
+{
+    BDRVGlusterState *s = opaque;
+    ssize_t ret;
+
+    do {
+        char *p = (char *)&s->event_gcbk;
+
+        ret = read(s->fds[GLUSTER_FD_READ], p + s->event_reader_pos,
+                   sizeof(s->event_gcbk) - s->event_reader_pos);
+        if (ret > 0) {
+            s->event_reader_pos += ret;
+            if (s->event_reader_pos == sizeof(s->event_gcbk)) {
+                s->event_reader_pos = 0;
+                qemu_gluster_complete_aio(s->event_gcbk);
+                s->qemu_aio_count--;
+            }
+        }
+    } while (ret < 0 && errno == EINTR);
+}
+
+static int qemu_gluster_aio_flush_cb(void *opaque)
+{
+    BDRVGlusterState *s = opaque;
+
+    return (s->qemu_aio_count > 0);
+}
+
+/*
+ * file=protocol:server@port:volname:image
+ */
+static int qemu_gluster_parsename(GlusterConf *c, const char *filename)
+{
+    char *file = g_strdup(filename);
+    char *token, *next_token, *saveptr;
+    char *token_s, *next_token_s, *saveptr_s;
+    int ret = -EINVAL;
+
+    /* Discard the protocol */
+    token = strtok_r(file, ":", &saveptr);
+    if (!token) {
+        goto out;
+    }
+
+    /* server@port */
+    next_token = strtok_r(NULL, ":", &saveptr);
+    if (!next_token) {
+        goto out;
+    }
+    if (strchr(next_token, '@')) {
+        token_s = strtok_r(next_token, "@", &saveptr_s);
+        if (!token_s) {
+            goto out;
+        }
+        strncpy(c->server, token_s, HOST_NAME_MAX);
+        next_token_s = strtok_r(NULL, "@", &saveptr_s);
+        if (!next_token_s) {
+            goto out;
+        }
+        c->port = atoi(next_token_s);
+    } else {
+        strncpy(c->server, next_token, HOST_NAME_MAX);
+        c->port = 0;
+    }
+
+    /* volname */
+    next_token = strtok_r(NULL, ":", &saveptr);
+    if (!next_token) {
+        goto out;
+    }
+    strncpy(c->volname, next_token, 128);
+
+    /* image */
+    next_token = strtok_r(NULL, ":", &saveptr);
+    if (!next_token) {
+        goto out;
+    }
+    strncpy(c->image, next_token, PATH_MAX);
+    ret = 0;
+out:
+    g_free(file);
+    return ret;
+}
+
+static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename)
+{
+    struct glfs *glfs = NULL;
+    int ret;
+
+    ret = qemu_gluster_parsename(c, filename);
+    if (ret < 0) {
+        errno = -ret;
+        goto out;
+    }
+
+    glfs = glfs_new(c->volname);
+    if (!glfs) {
+        goto out;
+    }
+
+    ret = glfs_set_volfile_server(glfs, "socket", c->server, c->port);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /*
+     * TODO: Logging is not necessary but instead nice to have.
+     * Can QEMU optionally log into a standard place ?
+     * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of
+     * hard coded values like 7 here.
+     */
+    ret = glfs_set_logging(glfs, "/tmp/qemu-gluster.log", 7);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = glfs_init(glfs);
+    if (ret < 0) {
+        goto out;
+    }
+    return glfs;
+
+out:
+    if (glfs) {
+        (void)glfs_fini(glfs);
+    }
+    return NULL;
+}
+
+static int qemu_gluster_open(BlockDriverState *bs, const char *filename,
+    int bdrv_flags)
+{
+    BDRVGlusterState *s = bs->opaque;
+    GlusterConf *c = g_malloc(sizeof(GlusterConf));
+    int ret;
+
+    s->glfs = qemu_gluster_init(c, filename);
+    if (!s->glfs) {
+        ret = -errno;
+        goto out;
+    }
+
+    s->open_flags |=  O_BINARY;
+    s->open_flags &= ~O_ACCMODE;
+    if (bdrv_flags & BDRV_O_RDWR) {
+        s->open_flags |= O_RDWR;
+    } else {
+        s->open_flags |= O_RDONLY;
+    }
+
+    if ((bdrv_flags & BDRV_O_NOCACHE)) {
+        s->open_flags |= O_DIRECT;
+    }
+
+    s->fd = glfs_open(s->glfs, c->image, s->open_flags);
+    if (!s->fd) {
+        ret = -errno;
+        goto out;
+    }
+
+    ret = qemu_pipe(s->fds);
+    if (ret < 0) {
+        goto out;
+    }
+    fcntl(s->fds[0], F_SETFL, O_NONBLOCK);
+    fcntl(s->fds[1], F_SETFL, O_NONBLOCK);
+    qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ],
+        qemu_gluster_aio_event_reader, NULL, qemu_gluster_aio_flush_cb, s);
+    g_free(c);
+    return ret;
+
+out:
+    g_free(c);
+    if (s->fd) {
+        glfs_close(s->fd);
+    }
+    if (s->glfs) {
+        (void) glfs_fini(s->glfs);
+    }
+    return ret;
+}
+
+static int qemu_gluster_create(const char *filename,
+        QEMUOptionParameter *options)
+{
+    struct glfs *glfs;
+    struct glfs_fd *fd;
+    GlusterConf *c = g_malloc(sizeof(GlusterConf));
+    int ret = 0;
+    int64_t total_size = 0;
+
+    glfs = qemu_gluster_init(c, filename);
+    if (!glfs) {
+        ret = -errno;
+        goto out;
+    }
+
+    /* Read out options */
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            total_size = options->value.n / BDRV_SECTOR_SIZE;
+        }
+        options++;
+    }
+
+    fd = glfs_creat(glfs, c->image, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, S_IRWXU);
+    if (!fd) {
+        ret = -errno;
+    } else {
+        if (glfs_ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
+            ret = -errno;
+        }
+        if (glfs_close(fd) != 0) {
+            ret = -errno;
+        }
+    }
+out:
+    g_free(c);
+    if (glfs) {
+        (void) glfs_fini(glfs);
+    }
+    return ret;
+}
+
+static AIOPool gluster_aio_pool = {
+    .aiocb_size = sizeof(GlusterAIOCB),
+};
+
+static int qemu_gluster_send_pipe(BDRVGlusterState *s, GlusterCBKData *gcbk)
+{
+    int ret = 0;
+    while (1) {
+        fd_set wfd;
+        int fd = s->fds[GLUSTER_FD_WRITE];
+
+        ret = write(fd, (void *)&gcbk, sizeof(gcbk));
+        if (ret >= 0) {
+            break;
+        }
+        if (errno == EINTR) {
+            continue;
+        }
+        if (errno != EAGAIN) {
+            break;
+        }
+
+        FD_ZERO(&wfd);
+        FD_SET(fd, &wfd);
+        do {
+            ret = select(fd + 1, NULL, &wfd, NULL, NULL);
+        } while (ret < 0 && errno == EINTR);
+    }
+    return ret;
+}
+
+static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
+{
+    GlusterCBKData *gcbk = (GlusterCBKData *)arg;
+    BDRVGlusterState *s = gcbk->s;
+
+    gcbk->ret = ret;
+    if (qemu_gluster_send_pipe(s, gcbk) < 0) {
+        error_report("Could not complete read/write/flush from gluster");
+        abort();
+    }
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int write)
+{
+    int ret;
+    GlusterAIOCB *acb;
+    GlusterCBKData *gcbk;
+    BDRVGlusterState *s = bs->opaque;
+    size_t size;
+    off_t offset;
+
+    acb = qemu_aio_get(&gluster_aio_pool, bs, cb, opaque);
+    acb->qiov = qiov;
+    acb->s = s;
+
+    offset = sector_num * BDRV_SECTOR_SIZE;
+    size = nb_sectors * BDRV_SECTOR_SIZE;
+    s->qemu_aio_count++;
+
+    gcbk = g_malloc(sizeof(GlusterCBKData));
+    gcbk->acb = acb;
+    gcbk->s = s;
+    gcbk->size = size;
+
+    if (write) {
+        ret = glfs_pwritev_async(s->fd, qiov->iov, qiov->niov, offset, 0,
+            &gluster_finish_aiocb, gcbk);
+    } else {
+        ret = glfs_preadv_async(s->fd, qiov->iov, qiov->niov, offset, 0,
+            &gluster_finish_aiocb, gcbk);
+    }
+
+    if (ret < 0) {
+        goto out;
+    }
+    return &acb->common;
+
+out:
+    g_free(gcbk);
+    s->qemu_aio_count--;
+    qemu_aio_release(acb);
+    return NULL;
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_readv(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_gluster_aio_rw(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_writev(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_gluster_aio_rw(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    int ret;
+    GlusterAIOCB *acb;
+    GlusterCBKData *gcbk;
+    BDRVGlusterState *s = bs->opaque;
+
+    acb = qemu_aio_get(&gluster_aio_pool, bs, cb, opaque);
+    acb->s = s;
+    s->qemu_aio_count++;
+
+    gcbk = g_malloc(sizeof(GlusterCBKData));
+    gcbk->acb = acb;
+    gcbk->s = s;
+    gcbk->size = 0;
+
+    ret = glfs_fsync_async(s->fd, &gluster_finish_aiocb, gcbk);
+    if (ret < 0) {
+        goto out;
+    }
+    return &acb->common;
+
+out:
+    g_free(gcbk);
+    s->qemu_aio_count--;
+    qemu_aio_release(acb);
+    return NULL;
+}
+
+static int64_t qemu_gluster_getlength(BlockDriverState *bs)
+{
+    BDRVGlusterState *s = bs->opaque;
+    struct stat st;
+    int ret;
+
+    ret = glfs_fstat(s->fd, &st);
+    if (ret < 0) {
+        return -errno;
+    } else {
+        return st.st_size;
+    }
+}
+
+static void qemu_gluster_close(BlockDriverState *bs)
+{
+    BDRVGlusterState *s = bs->opaque;
+
+    if (s->fd) {
+        glfs_close(s->fd);
+        s->fd = NULL;
+    }
+}
+
+static QEMUOptionParameter qemu_gluster_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+    { NULL }
+};
+
+static BlockDriver bdrv_gluster = {
+    .format_name = "gluster",
+    .protocol_name = "gluster",
+    .instance_size = sizeof(BDRVGlusterState),
+    .bdrv_file_open = qemu_gluster_open,
+    .bdrv_close = qemu_gluster_close,
+    .bdrv_create = qemu_gluster_create,
+    .bdrv_getlength = qemu_gluster_getlength,
+
+    .bdrv_aio_readv = qemu_gluster_aio_readv,
+    .bdrv_aio_writev = qemu_gluster_aio_writev,
+    .bdrv_aio_flush = qemu_gluster_aio_flush,
+
+    .create_options = qemu_gluster_create_options,
+};
+
+static void bdrv_gluster_init(void)
+{
+    bdrv_register(&bdrv_gluster);
+}
+
+block_init(bdrv_gluster_init);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend
  2012-07-21  8:31 ` [Qemu-devel] [RFC PATCH 2/2] block: gluster " Bharata B Rao
@ 2012-07-22 15:38   ` Stefan Hajnoczi
  2012-07-23  8:32     ` Bharata B Rao
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2012-07-22 15:38 UTC (permalink / raw)
  To: bharata; +Cc: Amar Tumballi, Anand Avati, qemu-devel, Vijay Bellur

On Sat, Jul 21, 2012 at 9:31 AM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> +typedef struct GlusterAIOCB {
> +    BlockDriverAIOCB common;
> +    QEMUIOVector *qiov;

The qiov field is unused.

> +    char *bounce;

Unused.

> +    struct BDRVGlusterState *s;

You can get this through common.bs->opaque, but if you like having a
shortcut, that's fine.

> +    int cancelled;

bool

> +} GlusterAIOCB;
> +
> +typedef struct GlusterCBKData {
> +    GlusterAIOCB *acb;
> +    struct BDRVGlusterState *s;
> +    int64_t size;
> +    int ret;
> +} GlusterCBKData;

I think GlusterCBKData could just be part of GlusterAIOCB.  That would
simplify the code a little and avoid some malloc/free.

> +
> +typedef struct BDRVGlusterState {
> +    struct glfs *glfs;
> +    int fds[2];
> +    int open_flags;
> +    struct glfs_fd *fd;
> +    int qemu_aio_count;
> +    int event_reader_pos;
> +    GlusterCBKData *event_gcbk;
> +} BDRVGlusterState;
> +
> +#define GLUSTER_FD_READ 0
> +#define GLUSTER_FD_WRITE 1
> +
> +static void qemu_gluster_complete_aio(GlusterCBKData *gcbk)
> +{
> +    GlusterAIOCB *acb = gcbk->acb;
> +    int ret;
> +
> +    if (acb->cancelled) {

Where does cancelled get set?

> +        qemu_aio_release(acb);
> +        goto done;
> +    }
> +
> +    if (gcbk->ret == gcbk->size) {
> +        ret = 0; /* Success */
> +    } else if (gcbk->ret < 0) {
> +        ret = gcbk->ret; /* Read/Write failed */
> +    } else {
> +        ret = -EINVAL; /* Partial read/write - fail it */

EINVAL is for invalid arguments.  EIO would be better.

> +/*
> + * file=protocol:server@port:volname:image
> + */
> +static int qemu_gluster_parsename(GlusterConf *c, const char *filename)
> +{
> +    char *file = g_strdup(filename);
> +    char *token, *next_token, *saveptr;
> +    char *token_s, *next_token_s, *saveptr_s;
> +    int ret = -EINVAL;
> +
> +    /* Discard the protocol */
> +    token = strtok_r(file, ":", &saveptr);
> +    if (!token) {
> +        goto out;
> +    }
> +
> +    /* server@port */
> +    next_token = strtok_r(NULL, ":", &saveptr);
> +    if (!next_token) {
> +        goto out;
> +    }
> +    if (strchr(next_token, '@')) {
> +        token_s = strtok_r(next_token, "@", &saveptr_s);
> +        if (!token_s) {
> +            goto out;
> +        }
> +        strncpy(c->server, token_s, HOST_NAME_MAX);

strncpy(3) will not NUL-terminate when token_s is HOST_NAME_MAX
characters long.  QEMU has cutils.c:pstrcpy().

When the argument is too long we should probably report an error
instead of truncating.

Same below.

> +        next_token_s = strtok_r(NULL, "@", &saveptr_s);
> +        if (!next_token_s) {
> +            goto out;
> +        }
> +        c->port = atoi(next_token_s);

No error checking.  If the input is invalid an error message would
help the user here.

> +static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename)
> +{
> +    struct glfs *glfs = NULL;
> +    int ret;
> +
> +    ret = qemu_gluster_parsename(c, filename);
> +    if (ret < 0) {
> +        errno = -ret;
> +        goto out;
> +    }
> +
> +    glfs = glfs_new(c->volname);
> +    if (!glfs) {
> +        goto out;
> +    }
> +
> +    ret = glfs_set_volfile_server(glfs, "socket", c->server, c->port);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /*
> +     * TODO: Logging is not necessary but instead nice to have.
> +     * Can QEMU optionally log into a standard place ?

QEMU prints to stderr, can you do that here too?  The global log file
is not okay, especially when multiple QEMU instances are running.

> +     * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of
> +     * hard coded values like 7 here.
> +     */
> +    ret = glfs_set_logging(glfs, "/tmp/qemu-gluster.log", 7);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = glfs_init(glfs);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    return glfs;
> +
> +out:
> +    if (glfs) {
> +        (void)glfs_fini(glfs);
> +    }
> +    return NULL;
> +}
> +
> +static int qemu_gluster_open(BlockDriverState *bs, const char *filename,
> +    int bdrv_flags)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    GlusterConf *c = g_malloc(sizeof(GlusterConf));

Can this be allocated on the stack?

> +    int ret;
> +
> +    s->glfs = qemu_gluster_init(c, filename);
> +    if (!s->glfs) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    s->open_flags |=  O_BINARY;

Can open_flags be a local variable?

> +static int qemu_gluster_create(const char *filename,
> +        QEMUOptionParameter *options)
> +{
> +    struct glfs *glfs;
> +    struct glfs_fd *fd;
> +    GlusterConf *c = g_malloc(sizeof(GlusterConf));
> +    int ret = 0;
> +    int64_t total_size = 0;
> +
> +    glfs = qemu_gluster_init(c, filename);
> +    if (!glfs) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    /* Read out options */
> +    while (options && options->name) {
> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> +            total_size = options->value.n / BDRV_SECTOR_SIZE;
> +        }
> +        options++;
> +    }
> +
> +    fd = glfs_creat(glfs, c->image, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, S_IRWXU);

Why set the execute permission bit?

> +static void qemu_gluster_close(BlockDriverState *bs)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +
> +    if (s->fd) {
> +        glfs_close(s->fd);
> +        s->fd = NULL;
> +    }

Why not call glfs_fini() here?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend
  2012-07-22 15:38   ` Stefan Hajnoczi
@ 2012-07-23  8:32     ` Bharata B Rao
  2012-07-23  9:06       ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Bharata B Rao @ 2012-07-23  8:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Amar Tumballi, Anand Avati, qemu-devel, Vijay Bellur

On Sun, Jul 22, 2012 at 04:38:00PM +0100, Stefan Hajnoczi wrote:
> On Sat, Jul 21, 2012 at 9:31 AM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
> > +typedef struct GlusterAIOCB {
> > +    BlockDriverAIOCB common;
> > +    QEMUIOVector *qiov;
> 
> The qiov field is unused.
> 
> > +    char *bounce;
> 
> Unused.

Yes, removed these two.

> 
> > +    struct BDRVGlusterState *s;
> 
> You can get this through common.bs->opaque, but if you like having a
> shortcut, that's fine.
> 
> > +    int cancelled;
> 
> bool

Ok.

> 
> > +} GlusterAIOCB;
> > +
> > +typedef struct GlusterCBKData {
> > +    GlusterAIOCB *acb;
> > +    struct BDRVGlusterState *s;
> > +    int64_t size;
> > +    int ret;
> > +} GlusterCBKData;
> 
> I think GlusterCBKData could just be part of GlusterAIOCB.  That would
> simplify the code a little and avoid some malloc/free.

Are you suggesting to put a field

GlusterCBKData gcbk;

inside GlusterAIOCB and use gcbk from there or

Are you suggesting that I make the fields of GlusterCBKData part of
GlusterAIOCB and get rid of GlusterCBKData altogether ? This means I would
have to pass the GlusterAIOCB to gluster async calls and update its fields from
gluster callback routine. I can do this, but I am not sure if you can touch
the fields of GlusterAIOCB in non-QEMU threads (gluster callback thread).

> 
> > +
> > +typedef struct BDRVGlusterState {
> > +    struct glfs *glfs;
> > +    int fds[2];
> > +    int open_flags;
> > +    struct glfs_fd *fd;
> > +    int qemu_aio_count;
> > +    int event_reader_pos;
> > +    GlusterCBKData *event_gcbk;
> > +} BDRVGlusterState;
> > +
> > +#define GLUSTER_FD_READ 0
> > +#define GLUSTER_FD_WRITE 1
> > +
> > +static void qemu_gluster_complete_aio(GlusterCBKData *gcbk)
> > +{
> > +    GlusterAIOCB *acb = gcbk->acb;
> > +    int ret;
> > +
> > +    if (acb->cancelled) {
> 
> Where does cancelled get set?

I realised that I am not supporting bdrv_aio_cancel(). I guess I will have
to add support for this in next version.

> 
> > +        qemu_aio_release(acb);
> > +        goto done;
> > +    }
> > +
> > +    if (gcbk->ret == gcbk->size) {
> > +        ret = 0; /* Success */
> > +    } else if (gcbk->ret < 0) {
> > +        ret = gcbk->ret; /* Read/Write failed */
> > +    } else {
> > +        ret = -EINVAL; /* Partial read/write - fail it */
> 
> EINVAL is for invalid arguments.  EIO would be better.

Ok.

> 
> > +/*
> > + * file=protocol:server@port:volname:image
> > + */
> > +static int qemu_gluster_parsename(GlusterConf *c, const char *filename)
> > +{
> > +    char *file = g_strdup(filename);
> > +    char *token, *next_token, *saveptr;
> > +    char *token_s, *next_token_s, *saveptr_s;
> > +    int ret = -EINVAL;
> > +
> > +    /* Discard the protocol */
> > +    token = strtok_r(file, ":", &saveptr);
> > +    if (!token) {
> > +        goto out;
> > +    }
> > +
> > +    /* server@port */
> > +    next_token = strtok_r(NULL, ":", &saveptr);
> > +    if (!next_token) {
> > +        goto out;
> > +    }
> > +    if (strchr(next_token, '@')) {
> > +        token_s = strtok_r(next_token, "@", &saveptr_s);
> > +        if (!token_s) {
> > +            goto out;
> > +        }
> > +        strncpy(c->server, token_s, HOST_NAME_MAX);
> 
> strncpy(3) will not NUL-terminate when token_s is HOST_NAME_MAX
> characters long.  QEMU has cutils.c:pstrcpy().

Will use pstrcpy.

> 
> When the argument is too long we should probably report an error
> instead of truncating.

Or should we let gluster APIs to flag an error with truncated
server and volume names ?

> 
> Same below.
> 
> > +        next_token_s = strtok_r(NULL, "@", &saveptr_s);
> > +        if (!next_token_s) {
> > +            goto out;
> > +        }
> > +        c->port = atoi(next_token_s);
> 
> No error checking.  If the input is invalid an error message would
> help the user here.

Fixed.

> 
> > +static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename)
> > +{
> > +    struct glfs *glfs = NULL;
> > +    int ret;
> > +
> > +    ret = qemu_gluster_parsename(c, filename);
> > +    if (ret < 0) {
> > +        errno = -ret;
> > +        goto out;
> > +    }
> > +
> > +    glfs = glfs_new(c->volname);
> > +    if (!glfs) {
> > +        goto out;
> > +    }
> > +
> > +    ret = glfs_set_volfile_server(glfs, "socket", c->server, c->port);
> > +    if (ret < 0) {
> > +        goto out;
> > +    }
> > +
> > +    /*
> > +     * TODO: Logging is not necessary but instead nice to have.
> > +     * Can QEMU optionally log into a standard place ?
> 
> QEMU prints to stderr, can you do that here too?  The global log file
> is not okay, especially when multiple QEMU instances are running.

Ok, I can do glfs_set_logging(glfs, "/dev/stderr", loglevel);

> 
> > +     * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of
> > +     * hard coded values like 7 here.
> > +     */
> > +    ret = glfs_set_logging(glfs, "/tmp/qemu-gluster.log", 7);
> > +    if (ret < 0) {
> > +        goto out;
> > +    }
> > +
> > +    ret = glfs_init(glfs);
> > +    if (ret < 0) {
> > +        goto out;
> > +    }
> > +    return glfs;
> > +
> > +out:
> > +    if (glfs) {
> > +        (void)glfs_fini(glfs);
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static int qemu_gluster_open(BlockDriverState *bs, const char *filename,
> > +    int bdrv_flags)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    GlusterConf *c = g_malloc(sizeof(GlusterConf));
> 
> Can this be allocated on the stack?

It consists of PATH_MAX(4096), HOST_NAME_MAX(255) and GLUSTERD_MAX_VOLUME_NAME
(1000). A bit heavy to be on stack ?

> 
> > +    int ret;
> > +
> > +    s->glfs = qemu_gluster_init(c, filename);
> > +    if (!s->glfs) {
> > +        ret = -errno;
> > +        goto out;
> > +    }
> > +
> > +    s->open_flags |=  O_BINARY;
> 
> Can open_flags be a local variable?

Yes, fixed.

> 
> > +static int qemu_gluster_create(const char *filename,
> > +        QEMUOptionParameter *options)
> > +{
> > +    struct glfs *glfs;
> > +    struct glfs_fd *fd;
> > +    GlusterConf *c = g_malloc(sizeof(GlusterConf));
> > +    int ret = 0;
> > +    int64_t total_size = 0;
> > +
> > +    glfs = qemu_gluster_init(c, filename);
> > +    if (!glfs) {
> > +        ret = -errno;
> > +        goto out;
> > +    }
> > +
> > +    /* Read out options */
> > +    while (options && options->name) {
> > +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> > +            total_size = options->value.n / BDRV_SECTOR_SIZE;
> > +        }
> > +        options++;
> > +    }
> > +
> > +    fd = glfs_creat(glfs, c->image, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, S_IRWXU);
> 
> Why set the execute permission bit?

Changed to read and write only.

> 
> > +static void qemu_gluster_close(BlockDriverState *bs)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +
> > +    if (s->fd) {
> > +        glfs_close(s->fd);
> > +        s->fd = NULL;
> > +    }
> 
> Why not call glfs_fini() here?

Missed that, fixed now.

Thanks for your comments.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend
  2012-07-23  8:32     ` Bharata B Rao
@ 2012-07-23  9:06       ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2012-07-23  9:06 UTC (permalink / raw)
  To: bharata; +Cc: Amar Tumballi, Anand Avati, qemu-devel, Vijay Bellur

On Mon, Jul 23, 2012 at 9:32 AM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> On Sun, Jul 22, 2012 at 04:38:00PM +0100, Stefan Hajnoczi wrote:
>> On Sat, Jul 21, 2012 at 9:31 AM, Bharata B Rao
>> <bharata@linux.vnet.ibm.com> wrote:
>> > +} GlusterAIOCB;
>> > +
>> > +typedef struct GlusterCBKData {
>> > +    GlusterAIOCB *acb;
>> > +    struct BDRVGlusterState *s;
>> > +    int64_t size;
>> > +    int ret;
>> > +} GlusterCBKData;
>>
>> I think GlusterCBKData could just be part of GlusterAIOCB.  That would
>> simplify the code a little and avoid some malloc/free.
>
> Are you suggesting to put a field
>
> GlusterCBKData gcbk;
>
> inside GlusterAIOCB and use gcbk from there or
>
> Are you suggesting that I make the fields of GlusterCBKData part of
> GlusterAIOCB and get rid of GlusterCBKData altogether ? This means I would
> have to pass the GlusterAIOCB to gluster async calls and update its fields from
> gluster callback routine. I can do this, but I am not sure if you can touch
> the fields of GlusterAIOCB in non-QEMU threads (gluster callback thread).

The fields in GlusterCBKData could become part of GlusterAIOCB.
Different threads can access fields in a struct, they just need to
ensure access is synchronized if they touch the same fields.  In the
case of this code I think there is nothing that requires
synchronization beyond the pipe mechanism that you already use to
complete processing in a QEMU thread.

>> When the argument is too long we should probably report an error
>> instead of truncating.
>
> Or should we let gluster APIs to flag an error with truncated
> server and volume names ?

What if the truncated name is a valid but different object?  For example:
Max chars = 5
Objects:
"helloworld"
"hello"

If "helloworld" is truncated to "hello" we get no error back because
it's a valid object!

We need to either check sizes explicitly without truncating or use a
g_strdup() approach without any size limits and let the gfapi
functions error out if the input string is too long.

>> > +static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename)
>> > +{
>> > +    struct glfs *glfs = NULL;
>> > +    int ret;
>> > +
>> > +    ret = qemu_gluster_parsename(c, filename);
>> > +    if (ret < 0) {
>> > +        errno = -ret;
>> > +        goto out;
>> > +    }
>> > +
>> > +    glfs = glfs_new(c->volname);
>> > +    if (!glfs) {
>> > +        goto out;
>> > +    }
>> > +
>> > +    ret = glfs_set_volfile_server(glfs, "socket", c->server, c->port);
>> > +    if (ret < 0) {
>> > +        goto out;
>> > +    }
>> > +
>> > +    /*
>> > +     * TODO: Logging is not necessary but instead nice to have.
>> > +     * Can QEMU optionally log into a standard place ?
>>
>> QEMU prints to stderr, can you do that here too?  The global log file
>> is not okay, especially when multiple QEMU instances are running.
>
> Ok, I can do glfs_set_logging(glfs, "/dev/stderr", loglevel);

Yes.  I think "-" is best since it is supported by gfapi
(libglusterfs/src/logging.c:gf_log_init).  /dev/stderr is not POSIX.

>> > +     * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of
>> > +     * hard coded values like 7 here.
>> > +     */
>> > +    ret = glfs_set_logging(glfs, "/tmp/qemu-gluster.log", 7);
>> > +    if (ret < 0) {
>> > +        goto out;
>> > +    }
>> > +
>> > +    ret = glfs_init(glfs);
>> > +    if (ret < 0) {
>> > +        goto out;
>> > +    }
>> > +    return glfs;
>> > +
>> > +out:
>> > +    if (glfs) {
>> > +        (void)glfs_fini(glfs);
>> > +    }
>> > +    return NULL;
>> > +}
>> > +
>> > +static int qemu_gluster_open(BlockDriverState *bs, const char *filename,
>> > +    int bdrv_flags)
>> > +{
>> > +    BDRVGlusterState *s = bs->opaque;
>> > +    GlusterConf *c = g_malloc(sizeof(GlusterConf));
>>
>> Can this be allocated on the stack?
>
> It consists of PATH_MAX(4096), HOST_NAME_MAX(255) and GLUSTERD_MAX_VOLUME_NAME
> (1000). A bit heavy to be on stack ?

This is userspace, stacks are big but it's up to you.

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-21  8:29 [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Bharata B Rao
  2012-07-21  8:30 ` [Qemu-devel] [RFC PATCH 1/2] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
  2012-07-21  8:31 ` [Qemu-devel] [RFC PATCH 2/2] block: gluster " Bharata B Rao
@ 2012-07-21 12:22 ` Vijay Bellur
  2012-07-21 13:04   ` Bharata B Rao
  2012-07-22 14:42 ` Stefan Hajnoczi
  2012-07-23  9:16 ` Daniel P. Berrange
  4 siblings, 1 reply; 20+ messages in thread
From: Vijay Bellur @ 2012-07-21 12:22 UTC (permalink / raw)
  To: bharata; +Cc: Anand Avati, Amar Tumballi, qemu-devel

On 07/21/2012 01:59 PM, Bharata B Rao wrote:
> We now
> specify the gluster backed image like this:
>
> -drive file=gluster:server@port:volname:image
>
> - Here 'gluster' is the protocol.
> - 'server@port' specifies the server where the volume file specification for
>    the given volume resides. 'port' is the port number on which gluster
>    management daemon (glusterd) is listening. This is optional and if not
>    specified, QEMU will send 0 which will make libgfapi to use the default
>    port.
> - 'volname' is the name of the gluster volume which contains the VM image.
> - 'image' is the path to the actual VM image in the gluster volume.
>
> Note that we are no longer using volfiles directly and use volume names
> instead. For this to work, gluster management daemon (glusterd) needs to
> be running on the QEMU node.


glusterd needs to be running on the server that is specified in 
'server@port' option.



> Scenarios
> ---------
> Base: QEMU boots from directly from image on gluster brick.
> Fuse mount: QEMU boots from VM image on gluster FUSE mount.
> Fuse bypass: QEMU uses gluster protocol and uses standard client volfile.
> Fuse bypass custom: QEMU uses gluster protocol and uses a minimal client
> 	volfile that just has client xlator.
> RPC bypass: QEMU uses just posix xlator and doesn't depend on gluster server.
>

For the Fuse mount, Fuse bypass and Fuse bypass custom scenarios, I 
assume a single local gluster brick is being used. Is this correct?

Thanks,
Vijay

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-21 12:22 ` [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Vijay Bellur
@ 2012-07-21 13:04   ` Bharata B Rao
  0 siblings, 0 replies; 20+ messages in thread
From: Bharata B Rao @ 2012-07-21 13:04 UTC (permalink / raw)
  To: Vijay Bellur; +Cc: Anand Avati, Amar Tumballi, qemu-devel

On Sat, Jul 21, 2012 at 05:52:59PM +0530, Vijay Bellur wrote:
> On 07/21/2012 01:59 PM, Bharata B Rao wrote:
> >We now
> >specify the gluster backed image like this:
> >
> >-drive file=gluster:server@port:volname:image
> >
> >- Here 'gluster' is the protocol.
> >- 'server@port' specifies the server where the volume file specification for
> >   the given volume resides. 'port' is the port number on which gluster
> >   management daemon (glusterd) is listening. This is optional and if not
> >   specified, QEMU will send 0 which will make libgfapi to use the default
> >   port.
> >- 'volname' is the name of the gluster volume which contains the VM image.
> >- 'image' is the path to the actual VM image in the gluster volume.
> >
> >Note that we are no longer using volfiles directly and use volume names
> >instead. For this to work, gluster management daemon (glusterd) needs to
> >be running on the QEMU node.
> 
> 
> glusterd needs to be running on the server that is specified in
> 'server@port' option.

oh yes, thanks.

> 
> 
> 
> >Scenarios
> >---------
> >Base: QEMU boots from directly from image on gluster brick.
> >Fuse mount: QEMU boots from VM image on gluster FUSE mount.
> >Fuse bypass: QEMU uses gluster protocol and uses standard client volfile.
> >Fuse bypass custom: QEMU uses gluster protocol and uses a minimal client
> >	volfile that just has client xlator.
> >RPC bypass: QEMU uses just posix xlator and doesn't depend on gluster server.
> >
> 
> For the Fuse mount, Fuse bypass and Fuse bypass custom scenarios, I
> assume a single local gluster brick is being used. Is this correct?

Right, fio numbers are for single local gluster brick scenario.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-21  8:29 [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Bharata B Rao
                   ` (2 preceding siblings ...)
  2012-07-21 12:22 ` [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Vijay Bellur
@ 2012-07-22 14:42 ` Stefan Hajnoczi
  2012-07-23  8:50   ` Bharata B Rao
  2012-07-23  9:16 ` Daniel P. Berrange
  4 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2012-07-22 14:42 UTC (permalink / raw)
  To: bharata; +Cc: Amar Tumballi, Anand Avati, qemu-devel, Vijay Bellur

On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> -drive file=gluster:server@port:volname:image
>
> - Here 'gluster' is the protocol.
> - 'server@port' specifies the server where the volume file specification for
>   the given volume resides. 'port' is the port number on which gluster
>   management daemon (glusterd) is listening. This is optional and if not
>   specified, QEMU will send 0 which will make libgfapi to use the default
>   port.

'server@port' is weird notation.  Normally it is 'server:port' (e.g.
URLs).  Can you change it?

What about the other transports supported by libgfapi: UNIX domain
sockets and RDMA?  My reading of glfs.h is that there are 3 connection
options:
1. 'transport': 'socket' (default), 'unix', 'rdma'
2. 'host': server hostname for 'socket', path to UNIX domain socket
for 'unix', or something else for 'rdma'
3. 'port': TCP port when 'socket' is used.  Ignored otherwise.

Unfortunately QEMU block drivers cannot take custom options yet.  That
would make it possible to cleanly map these connection options and
save you from inventing syntax which doesn't expose all options.

In the meantime it would be nice if the syntax exposed all options.

> Note that we are no longer using volfiles directly and use volume names
> instead. For this to work, gluster management daemon (glusterd) needs to
> be running on the QEMU node. This limits the QEMU user to access the volumes by
> the default volfiles that are generated by gluster CLI. This should be
> fine as long as gluster CLI provides the capability to generate or regenerate
> volume files for a given volume with the xlator set that QEMU user is
> interested in. GlusterFS developers tell me that this can be provided with
> some enhancements to Gluster CLI/glusterd. Note that the custom volume files
> is typically needed when GlusterFS server is co-located with QEMU in
> which case it would  be beneficial to get rid of client-server overhead and
> RPC communication overhead.

My knowledge of GlusterFS is limited.  Here is what I am thinking:

1. The user cannot specify a local configuration file, you require
that there is a glusterd running which provides configuration
information.
2. It is currently not possible to bypass RPC because the glusterd
managed configuration file doesn't support that.

I'm not sure if these statements are true?

Would you support local volfiles in the future again?  Why force users
to run glusterd?

> - As mentioned above, the VM image on gluster volume can be specified like
>   this:
>         -drive file=gluster:localhost:testvol:/F17,format=gluster
>
>   Note that format=gluster is not needed ideally and its a work around I have
>   until libgfapi provides a working connection cleanup routine (glfs_fini()).
>   When the format isn't specified, QEMU figures out the format by doing
>   find_image_format that results in one open and close before opening the
>   image file long term for standard read and write. Gluster connection
>   initialization is done from open and connection termination is done from
>   close. But since glfs_fini() isn't working yet, I am bypassing
>   find_image_format by specifying format=gluster directly which results in
>   just one open and hence I am not limited by glfs_fini().

Has libgfapi been released yet?  Does it have versioning which will
allow the QEMU GlusterFS block driver to build against different
versions?  I'm just wondering how the pieces will fit together once
distros start shipping them.

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-22 14:42 ` Stefan Hajnoczi
@ 2012-07-23  8:50   ` Bharata B Rao
  2012-07-23  9:20     ` Stefan Hajnoczi
  2012-07-23  9:36     ` Vijay Bellur
  0 siblings, 2 replies; 20+ messages in thread
From: Bharata B Rao @ 2012-07-23  8:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Amar Tumballi, Anand Avati, qemu-devel, Vijay Bellur

On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote:
> On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
> > -drive file=gluster:server@port:volname:image
> >
> > - Here 'gluster' is the protocol.
> > - 'server@port' specifies the server where the volume file specification for
> >   the given volume resides. 'port' is the port number on which gluster
> >   management daemon (glusterd) is listening. This is optional and if not
> >   specified, QEMU will send 0 which will make libgfapi to use the default
> >   port.
> 
> 'server@port' is weird notation.  Normally it is 'server:port' (e.g.
> URLs).  Can you change it?

I don't like but, but settled for it since port was optional and : was
being used as separator here.

> 
> What about the other transports supported by libgfapi: UNIX domain
> sockets and RDMA?  My reading of glfs.h is that there are 3 connection
> options:
> 1. 'transport': 'socket' (default), 'unix', 'rdma'
> 2. 'host': server hostname for 'socket', path to UNIX domain socket
> for 'unix', or something else for 'rdma'
> 3. 'port': TCP port when 'socket' is used.  Ignored otherwise.
> 
> Unfortunately QEMU block drivers cannot take custom options yet.  That
> would make it possible to cleanly map these connection options and
> save you from inventing syntax which doesn't expose all options.
> 
> In the meantime it would be nice if the syntax exposed all options.

So without the capability to pass custom options to block drivers, am I forced
to keep extending the file= with more and more options ?

file=gluster:transport:server:port:volname:image ?

Looks ugly and not easy to make any particular option optional. If needed I can
support this from GlusterFS backend.

> 
> > Note that we are no longer using volfiles directly and use volume names
> > instead. For this to work, gluster management daemon (glusterd) needs to
> > be running on the QEMU node. This limits the QEMU user to access the volumes by
> > the default volfiles that are generated by gluster CLI. This should be
> > fine as long as gluster CLI provides the capability to generate or regenerate
> > volume files for a given volume with the xlator set that QEMU user is
> > interested in. GlusterFS developers tell me that this can be provided with
> > some enhancements to Gluster CLI/glusterd. Note that the custom volume files
> > is typically needed when GlusterFS server is co-located with QEMU in
> > which case it would  be beneficial to get rid of client-server overhead and
> > RPC communication overhead.
> 
> My knowledge of GlusterFS is limited.  Here is what I am thinking:
> 
> 1. The user cannot specify a local configuration file, you require
> that there is a glusterd running which provides configuration
> information.

Yes. User only specifies a volume name and glusterd is used to fetch
the right volume file for that volume name.

> 2. It is currently not possible to bypass RPC because the glusterd
> managed configuration file doesn't support that.

It is possible. Gluster already supports custom extensions
to volume names and it is possible to use the required volfile by specifying
this custom volname extension.

For eg, if I have a volume named test, by default the volfile used for
it will be test-fuse.vol. Currently I can put my own custom volfile into
the standard location and get glusterd pick that up. I can specify
test.rpcbypass as volname and glusterd will pick test.rpcbypass.vol.

What is currently not supported is the ability to create test.rpcbypass.vol
from gluster CLI. I believe that gluster developers are ok with enhancing
gluster CLI to support generating/regenerating volfiles for a given volume
with custom translator set.

> 
> I'm not sure if these statements are true?
> 
> Would you support local volfiles in the future again?  Why force users
> to run glusterd?

I will let gluster folks on CC to answer this and let us know the benefits
of always depending on glusterd.

I guess running glusterd would be beneficial when supporting migration. QEMU
working from a local volume (with volname=test.rpcbypass) can be easily
restarted on a different node by just changing volname to test. glusterd will
take care of fetching the right volfile automatically for us.

> 
> > - As mentioned above, the VM image on gluster volume can be specified like
> >   this:
> >         -drive file=gluster:localhost:testvol:/F17,format=gluster
> >
> >   Note that format=gluster is not needed ideally and its a work around I have
> >   until libgfapi provides a working connection cleanup routine (glfs_fini()).
> >   When the format isn't specified, QEMU figures out the format by doing
> >   find_image_format that results in one open and close before opening the
> >   image file long term for standard read and write. Gluster connection
> >   initialization is done from open and connection termination is done from
> >   close. But since glfs_fini() isn't working yet, I am bypassing
> >   find_image_format by specifying format=gluster directly which results in
> >   just one open and hence I am not limited by glfs_fini().
> 
> Has libgfapi been released yet?

Its part of gluster mainline now.

> Does it have versioning which will
> allow the QEMU GlusterFS block driver to build against different
> versions?  I'm just wondering how the pieces will fit together once
> distros start shipping them.

I request gluster folks on CC to comment about version and shipping
information.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  8:50   ` Bharata B Rao
@ 2012-07-23  9:20     ` Stefan Hajnoczi
  2012-07-23  9:34       ` ronnie sahlberg
                         ` (3 more replies)
  2012-07-23  9:36     ` Vijay Bellur
  1 sibling, 4 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2012-07-23  9:20 UTC (permalink / raw)
  To: Kevin Wolf, Markus Armbruster
  Cc: Amar Tumballi, bharata, Anand Avati, qemu-devel, Vijay Bellur

On Mon, Jul 23, 2012 at 9:50 AM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote:
>> On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao
>> <bharata@linux.vnet.ibm.com> wrote:
>> > -drive file=gluster:server@port:volname:image
>> >
>> > - Here 'gluster' is the protocol.
>> > - 'server@port' specifies the server where the volume file specification for
>> >   the given volume resides. 'port' is the port number on which gluster
>> >   management daemon (glusterd) is listening. This is optional and if not
>> >   specified, QEMU will send 0 which will make libgfapi to use the default
>> >   port.
>>
>> 'server@port' is weird notation.  Normally it is 'server:port' (e.g.
>> URLs).  Can you change it?
>
> I don't like but, but settled for it since port was optional and : was
> being used as separator here.
>
>>
>> What about the other transports supported by libgfapi: UNIX domain
>> sockets and RDMA?  My reading of glfs.h is that there are 3 connection
>> options:
>> 1. 'transport': 'socket' (default), 'unix', 'rdma'
>> 2. 'host': server hostname for 'socket', path to UNIX domain socket
>> for 'unix', or something else for 'rdma'
>> 3. 'port': TCP port when 'socket' is used.  Ignored otherwise.
>>
>> Unfortunately QEMU block drivers cannot take custom options yet.  That
>> would make it possible to cleanly map these connection options and
>> save you from inventing syntax which doesn't expose all options.
>>
>> In the meantime it would be nice if the syntax exposed all options.
>
> So without the capability to pass custom options to block drivers, am I forced
> to keep extending the file= with more and more options ?
>
> file=gluster:transport:server:port:volname:image ?
>
> Looks ugly and not easy to make any particular option optional. If needed I can
> support this from GlusterFS backend.

Kevin, Markus: Any thoughts on passing options to block drivers?
Encoding GlusterFS options into a "filename" string is pretty
cumbersome.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  9:20     ` Stefan Hajnoczi
@ 2012-07-23  9:34       ` ronnie sahlberg
  2012-07-23  9:35         ` Stefan Hajnoczi
  2012-07-23 14:34       ` Eric Blake
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: ronnie sahlberg @ 2012-07-23  9:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anand Avati, Vijay Bellur, Amar Tumballi, qemu-devel,
	Markus Armbruster, bharata

Stefan,

in iscsi,  i just specify those extra arguments that are required that
are not part of the url itself as just command line options :


qemu-system-i386 -iscsi initiator-name=iqn.qemu.test:my-initiator \
    -boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \
    -cdrom iscsi://127.0.0.1/iqn.qemu.test/2

Here  initiator-name is a custom option to the iscsi layer to tell it
which name to use when identifying/logging in to the target.

Similar concept could be used by glusterfs ?

regards
ronnie sahlberg



On Mon, Jul 23, 2012 at 7:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Jul 23, 2012 at 9:50 AM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
>> On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote:
>>> On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao
>>> <bharata@linux.vnet.ibm.com> wrote:
>>> > -drive file=gluster:server@port:volname:image
>>> >
>>> > - Here 'gluster' is the protocol.
>>> > - 'server@port' specifies the server where the volume file specification for
>>> >   the given volume resides. 'port' is the port number on which gluster
>>> >   management daemon (glusterd) is listening. This is optional and if not
>>> >   specified, QEMU will send 0 which will make libgfapi to use the default
>>> >   port.
>>>
>>> 'server@port' is weird notation.  Normally it is 'server:port' (e.g.
>>> URLs).  Can you change it?
>>
>> I don't like but, but settled for it since port was optional and : was
>> being used as separator here.
>>
>>>
>>> What about the other transports supported by libgfapi: UNIX domain
>>> sockets and RDMA?  My reading of glfs.h is that there are 3 connection
>>> options:
>>> 1. 'transport': 'socket' (default), 'unix', 'rdma'
>>> 2. 'host': server hostname for 'socket', path to UNIX domain socket
>>> for 'unix', or something else for 'rdma'
>>> 3. 'port': TCP port when 'socket' is used.  Ignored otherwise.
>>>
>>> Unfortunately QEMU block drivers cannot take custom options yet.  That
>>> would make it possible to cleanly map these connection options and
>>> save you from inventing syntax which doesn't expose all options.
>>>
>>> In the meantime it would be nice if the syntax exposed all options.
>>
>> So without the capability to pass custom options to block drivers, am I forced
>> to keep extending the file= with more and more options ?
>>
>> file=gluster:transport:server:port:volname:image ?
>>
>> Looks ugly and not easy to make any particular option optional. If needed I can
>> support this from GlusterFS backend.
>
> Kevin, Markus: Any thoughts on passing options to block drivers?
> Encoding GlusterFS options into a "filename" string is pretty
> cumbersome.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  9:34       ` ronnie sahlberg
@ 2012-07-23  9:35         ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2012-07-23  9:35 UTC (permalink / raw)
  To: ronnie sahlberg
  Cc: Kevin Wolf, Anand Avati, Vijay Bellur, Amar Tumballi, qemu-devel,
	Markus Armbruster, bharata

On Mon, Jul 23, 2012 at 10:34 AM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> in iscsi,  i just specify those extra arguments that are required that
> are not part of the url itself as just command line options :
>
>
> qemu-system-i386 -iscsi initiator-name=iqn.qemu.test:my-initiator \
>     -boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \
>     -cdrom iscsi://127.0.0.1/iqn.qemu.test/2
>
> Here  initiator-name is a custom option to the iscsi layer to tell it
> which name to use when identifying/logging in to the target.
>
> Similar concept could be used by glusterfs ?

That works for global options only.  If it's per-drive then this
approach can't be used because different drives might need different
option values.

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  9:20     ` Stefan Hajnoczi
  2012-07-23  9:34       ` ronnie sahlberg
@ 2012-07-23 14:34       ` Eric Blake
  2012-07-24  3:34         ` Bharata B Rao
  2012-07-24 10:24       ` Kevin Wolf
  2012-07-24 11:30       ` Markus Armbruster
  3 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2012-07-23 14:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anand Avati, Vijay Bellur, Amar Tumballi, qemu-devel,
	Markus Armbruster, bharata

[-- Attachment #1: Type: text/plain, Size: 982 bytes --]

On 07/23/2012 03:20 AM, Stefan Hajnoczi wrote:
>>
>> So without the capability to pass custom options to block drivers, am I forced
>> to keep extending the file= with more and more options ?
>>
>> file=gluster:transport:server:port:volname:image ?
>>
>> Looks ugly and not easy to make any particular option optional. If needed I can
>> support this from GlusterFS backend.
> 
> Kevin, Markus: Any thoughts on passing options to block drivers?
> Encoding GlusterFS options into a "filename" string is pretty
> cumbersome.

On 07/23/2012 03:28 AM, ronnie sahlberg wrote:> Why not use
>
> -drive file=gluster://server[:port]/volname/image

At which point, options can fit into this URI scheme:

-drive file=gluster://server:port/volname/image?option1=foo&option2=bar

where anything after the ? of the URI can introduce whichever options
you need.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23 14:34       ` Eric Blake
@ 2012-07-24  3:34         ` Bharata B Rao
  0 siblings, 0 replies; 20+ messages in thread
From: Bharata B Rao @ 2012-07-24  3:34 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Anand Avati, Vijay Bellur, Stefan Hajnoczi,
	Amar Tumballi, Markus Armbruster, qemu-devel

On Mon, Jul 23, 2012 at 08:34:30AM -0600, Eric Blake wrote:
> On 07/23/2012 03:20 AM, Stefan Hajnoczi wrote:
> >>
> >> So without the capability to pass custom options to block drivers, am I forced
> >> to keep extending the file= with more and more options ?
> >>
> >> file=gluster:transport:server:port:volname:image ?
> >>
> >> Looks ugly and not easy to make any particular option optional. If needed I can
> >> support this from GlusterFS backend.
> > 
> > Kevin, Markus: Any thoughts on passing options to block drivers?
> > Encoding GlusterFS options into a "filename" string is pretty
> > cumbersome.
> 
> On 07/23/2012 03:28 AM, ronnie sahlberg wrote:> Why not use
> >
> > -drive file=gluster://server[:port]/volname/image
> 
> At which point, options can fit into this URI scheme:
> 
> -drive file=gluster://server:port/volname/image?option1=foo&option2=bar
> 
> where anything after the ? of the URI can introduce whichever options
> you need.

The URI covered everything and left only transport as the option, which could
be made part of the URI itself ?

So looks like we have two options:

gluster://server[:port]/[transport]/volname/image

vs

gluster:server:[port]:[transport]:volname:image

Unless there is a strong preference on one over the other, I am inclined
to go with the latter (colon based) approach and expect user to provide
double colons (::) wherever any default value needs to be specified.

Eg 1. gluster:localhost:::test:/a.img
Eg 2. gluster:localhost:0:socket:test:/a.img

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  9:20     ` Stefan Hajnoczi
  2012-07-23  9:34       ` ronnie sahlberg
  2012-07-23 14:34       ` Eric Blake
@ 2012-07-24 10:24       ` Kevin Wolf
  2012-07-24 11:30       ` Markus Armbruster
  3 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2012-07-24 10:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Anand Avati, Vijay Bellur, Amar Tumballi, Markus Armbruster,
	qemu-devel, bharata

Am 23.07.2012 11:20, schrieb Stefan Hajnoczi:
> On Mon, Jul 23, 2012 at 9:50 AM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
>> On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote:
>>> What about the other transports supported by libgfapi: UNIX domain
>>> sockets and RDMA?  My reading of glfs.h is that there are 3 connection
>>> options:
>>> 1. 'transport': 'socket' (default), 'unix', 'rdma'
>>> 2. 'host': server hostname for 'socket', path to UNIX domain socket
>>> for 'unix', or something else for 'rdma'
>>> 3. 'port': TCP port when 'socket' is used.  Ignored otherwise.
>>>
>>> Unfortunately QEMU block drivers cannot take custom options yet.  That
>>> would make it possible to cleanly map these connection options and
>>> save you from inventing syntax which doesn't expose all options.
>>>
>>> In the meantime it would be nice if the syntax exposed all options.
>>
>> So without the capability to pass custom options to block drivers, am I forced
>> to keep extending the file= with more and more options ?
>>
>> file=gluster:transport:server:port:volname:image ?
>>
>> Looks ugly and not easy to make any particular option optional. If needed I can
>> support this from GlusterFS backend.
> 
> Kevin, Markus: Any thoughts on passing options to block drivers?
> Encoding GlusterFS options into a "filename" string is pretty
> cumbersome.

This is the way it is without -blockdev. *shrug*

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  9:20     ` Stefan Hajnoczi
                         ` (2 preceding siblings ...)
  2012-07-24 10:24       ` Kevin Wolf
@ 2012-07-24 11:30       ` Markus Armbruster
  3 siblings, 0 replies; 20+ messages in thread
From: Markus Armbruster @ 2012-07-24 11:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anand Avati, Vijay Bellur, Amar Tumballi, qemu-devel,
	bharata

Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Mon, Jul 23, 2012 at 9:50 AM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
>> On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote:
>>> On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao
>>> <bharata@linux.vnet.ibm.com> wrote:
>>> > -drive file=gluster:server@port:volname:image
>>> >
>>> > - Here 'gluster' is the protocol.
>>> > - 'server@port' specifies the server where the volume file
>>> > specification for
>>> >   the given volume resides. 'port' is the port number on which gluster
>>> >   management daemon (glusterd) is listening. This is optional and if not
>>> >   specified, QEMU will send 0 which will make libgfapi to use the default
>>> >   port.
>>>
>>> 'server@port' is weird notation.  Normally it is 'server:port' (e.g.
>>> URLs).  Can you change it?
>>
>> I don't like but, but settled for it since port was optional and : was
>> being used as separator here.

For what it's worth, icsi.c uses

iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>

>>> What about the other transports supported by libgfapi: UNIX domain
>>> sockets and RDMA?  My reading of glfs.h is that there are 3 connection
>>> options:
>>> 1. 'transport': 'socket' (default), 'unix', 'rdma'
>>> 2. 'host': server hostname for 'socket', path to UNIX domain socket
>>> for 'unix', or something else for 'rdma'
>>> 3. 'port': TCP port when 'socket' is used.  Ignored otherwise.
>>>
>>> Unfortunately QEMU block drivers cannot take custom options yet.  That
>>> would make it possible to cleanly map these connection options and
>>> save you from inventing syntax which doesn't expose all options.
>>>
>>> In the meantime it would be nice if the syntax exposed all options.
>>
>> So without the capability to pass custom options to block drivers, am I forced
>> to keep extending the file= with more and more options ?
>>
>> file=gluster:transport:server:port:volname:image ?
>>
>> Looks ugly and not easy to make any particular option optional. If
>> needed I can
>> support this from GlusterFS backend.
>
> Kevin, Markus: Any thoughts on passing options to block drivers?
> Encoding GlusterFS options into a "filename" string is pretty
> cumbersome.

Yes, it is.  Every block driver with special parameters has its own ad
hoc parser, and some of them are badly designed.

I'm working on a sane way to configure block drivers.  Until we got
that, encoding parameters into the "filename" is what you have to do.

Once we got it, block drivers shouldn't be required to expose *all*
their parameters via -drive's encoded "filename".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  8:50   ` Bharata B Rao
  2012-07-23  9:20     ` Stefan Hajnoczi
@ 2012-07-23  9:36     ` Vijay Bellur
  1 sibling, 0 replies; 20+ messages in thread
From: Vijay Bellur @ 2012-07-23  9:36 UTC (permalink / raw)
  To: bharata; +Cc: Stefan Hajnoczi, Anand Avati, qemu-devel, Amar Tumballi

On 07/23/2012 02:20 PM, Bharata B Rao wrote:

>
>> 2. It is currently not possible to bypass RPC because the glusterd
>> managed configuration file doesn't support that.
>
> It is possible. Gluster already supports custom extensions
> to volume names and it is possible to use the required volfile by specifying
> this custom volname extension.
>
> For eg, if I have a volume named test, by default the volfile used for
> it will be test-fuse.vol. Currently I can put my own custom volfile into
> the standard location and get glusterd pick that up. I can specify
> test.rpcbypass as volname and glusterd will pick test.rpcbypass.vol.
>
> What is currently not supported is the ability to create test.rpcbypass.vol
> from gluster CLI. I believe that gluster developers are ok with enhancing
> gluster CLI to support generating/regenerating volfiles for a given volume
> with custom translator set.


Yes, this would be the preferred approach. We can tune the volume file 
generation to evolve the desired configuration file.

>
>>
>> I'm not sure if these statements are true?
>>
>> Would you support local volfiles in the future again?  Why force users
>> to run glusterd?
>
> I will let gluster folks on CC to answer this and let us know the benefits
> of always depending on glusterd.
>
> I guess running glusterd would be beneficial when supporting migration. QEMU
> working from a local volume (with volname=test.rpcbypass) can be easily
> restarted on a different node by just changing volname to test. glusterd will
> take care of fetching the right volfile automatically for us.

Yes, running glusterd would be beneficial in migration. Without deriving 
the file from glusterd features like volume tuning, client monitoring 
etc. would not be available to to clients that talk to a gluster volume. 
Additionally, driving configuration generation and management through 
glusterd helps in standardizing and stabilizing gluster configurations.

>

>>
>> Has libgfapi been released yet?
>
> Its part of gluster mainline now.
>
>> Does it have versioning which will
>> allow the QEMU GlusterFS block driver to build against different
>> versions?  I'm just wondering how the pieces will fit together once
>> distros start shipping them.
>
> I request gluster folks on CC to comment about version and shipping
> information.
>

There is no release that contains libgfapi as yet. Once that is done, we 
can probably have the dependency specified in QEMU as well.

Regards,
Vijay

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-21  8:29 [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Bharata B Rao
                   ` (3 preceding siblings ...)
  2012-07-22 14:42 ` Stefan Hajnoczi
@ 2012-07-23  9:16 ` Daniel P. Berrange
  2012-07-23  9:28   ` ronnie sahlberg
  4 siblings, 1 reply; 20+ messages in thread
From: Daniel P. Berrange @ 2012-07-23  9:16 UTC (permalink / raw)
  To: Bharata B Rao; +Cc: Amar Tumballi, Anand Avati, qemu-devel, Vijay Bellur

On Sat, Jul 21, 2012 at 01:59:17PM +0530, Bharata B Rao wrote:
> Hi,
> 
> Here is the v2 patchset for supporting GlusterFS protocol from QEMU.
> 
> This set of patches enables QEMU to boot VM images from gluster volumes.
> This is achieved by adding gluster as a new block backend driver in QEMU.
> Its already possible to boot from VM images on gluster volumes, but this
> patchset provides the ability to boot VM images from gluster volumes by
> by-passing the FUSE layer in gluster. In case the image is present on the
> local system, it is possible to even bypass client and server translator and
> hence the RPC overhead.
> 
> The major change in this version is to not implement libglusterfs based
> gluster backend within QEMU but instead use libgfapi. libgfapi library
> from GlusterFS project provides APIs to access gluster volumes directly.
> With the use of libgfapi, the specification of gluster backend from QEMU
> matches more closely with the GlusterFS's way of specifying volumes. We now
> specify the gluster backed image like this:
> 
> -drive file=gluster:server@port:volname:image
> 
> - Here 'gluster' is the protocol.
> - 'server@port' specifies the server where the volume file specification for
>   the given volume resides. 'port' is the port number on which gluster
>   management daemon (glusterd) is listening. This is optional and if not
>   specified, QEMU will send 0 which will make libgfapi to use the default
>   port.
> - 'volname' is the name of the gluster volume which contains the VM image.
> - 'image' is the path to the actual VM image in the gluster volume.

I don't think we should be using '@' as a field separator here, when ":"
can do that job just fine. In addition we already have a precendent set
with the sheepdog driver for using ':' for separating all fields:

  -drive file=sheepdog:example.org:6000:imagename

If you want to allow for port number to be omitted, this can be handled
thus:

  -drive file=sheepdog:example.org::imagename

which is how -chardev deals with omitted port numbers

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
  2012-07-23  9:16 ` Daniel P. Berrange
@ 2012-07-23  9:28   ` ronnie sahlberg
  0 siblings, 0 replies; 20+ messages in thread
From: ronnie sahlberg @ 2012-07-23  9:28 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Anand Avati, Vijay Bellur, Amar Tumballi, qemu-devel,
	Bharata B Rao

Why not use

-drive file=gluster://server[:port]/volname/image

A great many protocols today use the form

<protocol>://<server>:<port>]/<path>

so this would make it consistent with a lot of other naming schemes
out there, and imho make
the url more intuitive.


FTP looks like this :   ftp://user:password@host:port/path
NFS looks like this :   nfs://<host>:<port><url-path>
CIFS looks like this :
smb://[[[authdomain;]user@]host[:port][/share[/path][/name]]][?context]

For iSCSI we use  :   iscsi://<server>[:<port>]/<target>/<lun>


(The iscsi syntax was picked explicitely to be consistent with the
de-facto url naming scheme.)


I would argue that this is the de-facto way to create a url for
different protocols,   so it would imho be natural to specify a
glusterfs url in a similar format.


ronnie sahlberg


On Mon, Jul 23, 2012 at 7:16 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
> On Sat, Jul 21, 2012 at 01:59:17PM +0530, Bharata B Rao wrote:
>> Hi,
>>
>> Here is the v2 patchset for supporting GlusterFS protocol from QEMU.
>>
>> This set of patches enables QEMU to boot VM images from gluster volumes.
>> This is achieved by adding gluster as a new block backend driver in QEMU.
>> Its already possible to boot from VM images on gluster volumes, but this
>> patchset provides the ability to boot VM images from gluster volumes by
>> by-passing the FUSE layer in gluster. In case the image is present on the
>> local system, it is possible to even bypass client and server translator and
>> hence the RPC overhead.
>>
>> The major change in this version is to not implement libglusterfs based
>> gluster backend within QEMU but instead use libgfapi. libgfapi library
>> from GlusterFS project provides APIs to access gluster volumes directly.
>> With the use of libgfapi, the specification of gluster backend from QEMU
>> matches more closely with the GlusterFS's way of specifying volumes. We now
>> specify the gluster backed image like this:
>>
>> -drive file=gluster:server@port:volname:image
>>
>> - Here 'gluster' is the protocol.
>> - 'server@port' specifies the server where the volume file specification for
>>   the given volume resides. 'port' is the port number on which gluster
>>   management daemon (glusterd) is listening. This is optional and if not
>>   specified, QEMU will send 0 which will make libgfapi to use the default
>>   port.
>> - 'volname' is the name of the gluster volume which contains the VM image.
>> - 'image' is the path to the actual VM image in the gluster volume.
>
> I don't think we should be using '@' as a field separator here, when ":"
> can do that job just fine. In addition we already have a precendent set
> with the sheepdog driver for using ':' for separating all fields:
>
>   -drive file=sheepdog:example.org:6000:imagename
>
> If you want to allow for port number to be omitted, this can be handled
> thus:
>
>   -drive file=sheepdog:example.org::imagename
>
> which is how -chardev deals with omitted port numbers
>
> Regards,
> Daniel
> --
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-07-24 11:30 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-21  8:29 [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Bharata B Rao
2012-07-21  8:30 ` [Qemu-devel] [RFC PATCH 1/2] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
2012-07-21  8:31 ` [Qemu-devel] [RFC PATCH 2/2] block: gluster " Bharata B Rao
2012-07-22 15:38   ` Stefan Hajnoczi
2012-07-23  8:32     ` Bharata B Rao
2012-07-23  9:06       ` Stefan Hajnoczi
2012-07-21 12:22 ` [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Vijay Bellur
2012-07-21 13:04   ` Bharata B Rao
2012-07-22 14:42 ` Stefan Hajnoczi
2012-07-23  8:50   ` Bharata B Rao
2012-07-23  9:20     ` Stefan Hajnoczi
2012-07-23  9:34       ` ronnie sahlberg
2012-07-23  9:35         ` Stefan Hajnoczi
2012-07-23 14:34       ` Eric Blake
2012-07-24  3:34         ` Bharata B Rao
2012-07-24 10:24       ` Kevin Wolf
2012-07-24 11:30       ` Markus Armbruster
2012-07-23  9:36     ` Vijay Bellur
2012-07-23  9:16 ` Daniel P. Berrange
2012-07-23  9:28   ` ronnie sahlberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).