qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU
@ 2012-06-11 14:18 Bharata B Rao
  2012-06-11 14:19 ` [Qemu-devel] [RFC PATCH 1/3] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Bharata B Rao @ 2012-06-11 14:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amar Tumballi, Vijay Bellur

Hi,

This set of patches enables QEMU to boot VM images from gluster volumes.
This is achieved by adding gluster as a new block backend driver in QEMU.
Its already possible to boot from VM images on gluster volumes using Fuse
mount, but this patchset provides the ability to boot VM images from gluster
volumes by by-passing the FUSE layer in gluster. In case the image is present
on the local system, it is possible to even bypass client and server
translator and hence the RPC overhead.

QEMU with gluster backend support will take the volume file on command line
and then link to libglusterfs library to perform IO to image on gluster volume.
block/gluster-helpers.c has bare minimum gluster code that is necessary for
QEMU to boot and work with image on gluster volume. I have implemented routines
like gluster_create, gluster_open, gluster_aio_readv etc which will eventually
not be necessary when we have equivalent routines in libglusterfsclient working.
While I have this implementation here, we are also working actively on
resurrecting libglusterfsclient and using QEMU with it. In addition to
posix routines, block/gluster-helpers.c has some elaborate lookup code which
also will become redundant with libglusterfsclient.

The patches are experimental in nature and I have only verified that I can
boot an image from gluster volume using these patches in fuse-bypass and
rpc-bypass modes. I haven't tested with full-blown version of volume file
(that is generated by gluster CLI), but have always use only hand crafted
volume files with just posix translator in it.

How to use this patchset
========================
1. Compiling GlusterFS

- Get GlusterFS source from git://git.gluster.com/glusterfs.git
- Compile and install
  # ./autogen.sh; ./configure; make; make install
- Copy a few required header files and libraries
  # mkdir /usr/local/include/glusterfs/
  # cp glusterfs/libglusterfs/src/*.h /usr/local/include/glusterfs/
  # cp glusterfs/config.h /usr/local/include/glusterfs/
  # cp glusterfs/contrib/uuid/uuid.h /usr/local/include/glusterfs/

2. Compiling QEMU

- Get QEMU sources
- Apply the patches from this patchset.
- Configure
  # ./configure --disable-werror --target-list=x86_64-softmmu --enable-glusterfs --enable-uuid
- make; make install

Note: I have to resort to --disable-werror to mainly tide over the warnings
in block/gluster-helpers.c. I didn't spent too much effort in cleaning this
up since this code will be gone once we have a working libglusterfsclient.

3. Starting GlusterFS server

# glusterfsd -f s-qemu.vol

# cat s-qemu.vol
volume vm
  type storage/posix
  option directory /vm
end-volume

volume server
  type protocol/server
  subvolumes vm
  option transport-type tcp
  option auth.addr.vm.allow *
end-volume

Here /vm is the directory exported by the server. Ensure that this
directory is present before GlusterFS server is started.

4. Creating VM image

# qemu-img create -f gluster gluster:c-qemu.vol:/F16 5G

# cat c-qemu.vol
volume client
  type protocol/client
  option transport-type tcp
  option remote-host bharata
  option remote-subvolume vm
end-volume

5. Install a distro (say Fedora16) on the VM image

# qemu-system-x86_64 --enable-kvm -smp 4 -m 1024 -drive file=gluster:c-qemu.vol:/F16,format=gluster -cdrom Fedora-16-x86_64-DVD.iso

After this follow the normal F16 installation.

Next time onwards, the following QEMU command can be used to directly start
the VM.

6. Start the VM (Fuse-bypass)

# qemu-system-x86_64 --enable-kvm --nographic -smp 4 -m 1024 -drive file=gluster:c-qemu.vol:/F16,format=gluster

6a. Booting VM in RPC-bypass mode.

# cat c-qemu-rpcbypass.vol 
volume vm
  type storage/posix
  option directory /vm
end-volume

# qemu-system-x86_64 --enable-kvm --nographic -smp 4 -m 1024 -drive file=gluster:c-qemu-rpcbypass.vol:/F16,format=gluster

Note that in this case, its not necessary to run a gluster server.

Tests
=====
I have done some initial tests using fio. Here are the details:

Environment
-----------
Dual core x86_64 laptop
QEMU (f8687bab919672ccd)
GlusterFS (c40b73fc453caf12) 
Guest: Fedora 16 (kernel 3.1.0-7.fc16.x86_64)
Host: Fedora 16 (kernel 3.4)
fio-HEAD-47ea504

fio jobfile
-----------
# cat aio-read-direct-seq 
; Read 4 files with aio at different depths
[global]
ioengine=libaio
direct=1
rw=read
bs=128k
size=512m
directory=/data1

[file1]
iodepth=4

[file2]
iodepth=32

[file3]
iodepth=8

[file4]
iodepth=16

Base
----
QEMU: qemu-system-x86_64 --enable-kvm --nographic -m 1024 -smp 4 -drive file=/vm/dir1/F16,cache=none

Fuse mount
----------
Server: glusterfsd -f s-qemu.vol
Client: glusterfs -f c-qemu.vol /mnt
QEMU: qemu-system-x86_64 --enable-kvm --nographic -m 1024 -smp 4 -drive file=/mnt/dir1/F16,cache=none

Fuse bypass
-----------
Server: glusterfsd -f s-qemu.vol
QEMU: qemu-system-x86_64 --enable-kvm --nographic -m 1024 -smp 4 -drive file=gluster:/c-qemu.vol:/dir1/F16,format=gluster,cache=none

RPC bypass
----------
QEMU: qemu-system-x86_64 --enable-kvm --nographic -m 1024 -smp 4 -drive file=gluster:/c-qemu-rpcbypass.vol:/dir1/F16,format=gluster,cache=none

Numbers (aggrb, minb and maxb in kB/s. mint and maxt in msec)
-------
		aggrb	minb	maxb	mint	maxt
Base		72916	18229	18945	27673	28761
Fuse mount	8211	2052	3094	169433	255396
Fuse bypass	66591	16647	17806	29444	31493
RPC bypass	70940	17735	18782	27914	29562

Note that these are just indicative numbers and I haven't really tuned
QEMU or GlusterFS or fio to achieve best results. However with this test
we can see that Fuse mount case is not ideal and Fuse bypass and RPC bypass
help.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [RFC PATCH 1/3] qemu: Add a config option for GlusterFS as block backend
  2012-06-11 14:18 [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Bharata B Rao
@ 2012-06-11 14:19 ` Bharata B Rao
  2012-06-11 14:20 ` [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs Bharata B Rao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Bharata B Rao @ 2012-06-11 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amar Tumballi, Vijay Bellur

qemu: Add a config option for GlusterFS as block backend

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---

 configure |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)


diff --git a/configure b/configure
index b55a792..dc89592 100755
--- a/configure
+++ b/configure
@@ -824,6 +824,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent="no"
   ;;
+  --disable-glusterfs) glusterfs="no"
+  ;;
+  --enable-glusterfs) glusterfs="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1110,6 +1114,8 @@ echo "  --disable-guest-agent    disable building of the QEMU Guest Agent"
 echo "  --enable-guest-agent     enable building of the QEMU Guest Agent"
 echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
+echo "  --enable-glusterfs       enable GlusterFS backend"
+echo "  --disable-glusterfs      disable GlusterFS backend"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2240,6 +2246,31 @@ EOF
   fi
 fi
 
+##########################################
+# glusterfs probe
+if test "$glusterfs" != "no" ; then
+  cat > $TMPC <<EOF
+#include <stdio.h>
+#include <glusterfs/xlator.h>
+int main(void) {
+    xlator_t xlator;
+    xlator_init(&xlator);
+    return 0;
+}
+EOF
+  glusterfs_libs="-lglusterfs"
+  if compile_prog "" "$glusterfs_libs" ; then
+    glusterfs=yes
+    libs_tools="$glusterfs_libs $libs_tools"
+    libs_softmmu="$glusterfs_libs $libs_softmmu"
+  else
+    if test "$glusterfs" = "yes" ; then
+      feature_not_found "GlusterFS backend support"
+    fi
+    glusterfs=no
+  fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3014,6 +3045,7 @@ echo "OpenGL support    $opengl"
 echo "libiscsi support  $libiscsi"
 echo "build guest agent $guest_agent"
 echo "coroutine backend $coroutine_backend"
+echo "GlusterFS support $glusterfs"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3342,6 +3374,10 @@ if test "$linux_magic_h" = "yes" ; then
   echo "CONFIG_LINUX_MAGIC_H=y" >> $config_host_mak
 fi
 
+if test "$glusterfs" = "yes" ; then
+  echo "CONFIG_GLUSTERFS=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs
  2012-06-11 14:18 [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Bharata B Rao
  2012-06-11 14:19 ` [Qemu-devel] [RFC PATCH 1/3] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
@ 2012-06-11 14:20 ` Bharata B Rao
  2012-06-18 17:35   ` Stefan Hajnoczi
  2012-06-11 14:21 ` [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend Bharata B Rao
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Bharata B Rao @ 2012-06-11 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amar Tumballi, Vijay Bellur

block: GlusterFS helpers to interface with libglusterfs

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

This patch does two things:

- Adds GlusterFS specific init routines that enable QEMU to load
  volume file and load necessary translators.
- Implements routines like gluster_open(), gluster_read(), gluster_write()
  and gluster_close() that will be used by block layer of QEMU to do
  IO on VM images exported by GlusterFS server.

When libglusterfsclient is resurrected, this entire patch becomes redundant.
Gluster init routines and other POSIX calls are present in libglusterfsclient.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---

 Makefile.objs           |    1 
 block/gluster-helpers.c | 1018 +++++++++++++++++++++++++++++++++++++++++++++++
 block/gluster-helpers.h |   40 ++
 3 files changed, 1059 insertions(+), 0 deletions(-)
 create mode 100644 block/gluster-helpers.c
 create mode 100644 block/gluster-helpers.h


diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..25190ba 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -59,6 +59,7 @@ block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_LIBISCSI) += iscsi.o
 block-nested-$(CONFIG_CURL) += curl.o
 block-nested-$(CONFIG_RBD) += rbd.o
+block-nested-$(CONFIG_GLUSTERFS) += gluster-helpers.o
 
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
 
diff --git a/block/gluster-helpers.c b/block/gluster-helpers.c
new file mode 100644
index 0000000..cae3fdf
--- /dev/null
+++ b/block/gluster-helpers.c
@@ -0,0 +1,1018 @@
+/*
+ * Helper routines for GlusterFS backend
+ * (Based on libglusterfsclient)
+ *
+ * Copyright (C) 2012 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include <stdio.h>
+#include "gluster-helpers.h"
+static glusterfs_ctx_t *gctx;
+
+/********** Translator helpers ***********/
+typedef struct {
+    pthread_mutex_t lock;
+    pthread_cond_t reply_cond;
+    call_stub_t *reply_stub;
+    char complete;
+    loc_t *loc;
+    fd_t *fd;
+    gluster_aiocb_t *gaiocb;
+    struct iobref *iobref;
+    struct iobuf *iob;
+} gluster_local_t;
+
+static call_frame_t *get_call_frame_for_req(glusterfs_ctx_t *ctx)
+{
+    call_pool_t *pool = ctx->pool;
+    xlator_t *this = (xlator_t *)ctx->active->top;
+    call_frame_t *frame;
+
+    frame = create_frame(this, pool);
+    if (!frame) {
+        return NULL;
+    }
+
+    frame->root->uid = geteuid();
+    frame->root->gid = getegid();
+    return frame;
+}
+
+#define GLUSTER_FOP(ctx, stub, op, local, args ...)			\
+    do {								\
+        call_frame_t *frame = get_call_frame_for_req(ctx);		\
+        xlator_t *xl = (xlator_t *)ctx->active->top;			\
+        frame->local = local;						\
+        frame->root->state = gctx;					\
+        pthread_cond_init(&local->reply_cond, NULL);			\
+        pthread_mutex_init(&local->lock, NULL);				\
+        STACK_WIND(frame, gluster_##op##_cbk, xl, xl->fops->op, args);	\
+        pthread_mutex_lock(&local->lock);				\
+        {								\
+       	    while (!local->complete) {					\
+       	        pthread_cond_wait(&local->reply_cond, &local->lock);	\
+       	    }								\
+        }								\
+        pthread_mutex_unlock(&local->lock);				\
+        stub = local->reply_stub;					\
+        FREE(frame->local);						\
+        frame->local = NULL;						\
+        STACK_DESTROY(frame->root);					\
+    } while (0)								\
+
+#define GLUSTER_FOP_NO_WAIT(ctx, stub, op, local, args ...)		\
+    do {								\
+        call_frame_t *frame = get_call_frame_for_req(ctx);		\
+        xlator_t *xl = (xlator_t *)ctx->active->top;			\
+        frame->local = local;						\
+        frame->root->state = gctx;					\
+        STACK_WIND(frame, gluster_##op##_cbk, xl, xl->fops->op, args);	\
+        stub = local->reply_stub;					\
+    } while (0)								\
+
+#define GLUSTER_REPLY_NOTIFY(local)					\
+    do {								\
+        pthread_mutex_lock(&local->lock);				\
+        local->complete = 1;						\
+        pthread_cond_broadcast(&local->reply_cond);			\
+        pthread_mutex_unlock(&local->lock);				\
+    } while (0)								\
+
+/********** Name resolution ***********/
+static int32_t gluster_lookup_cbk(call_frame_t *frame, void *cookie,
+        xlator_t *this, int op_ret, int op_errno, inode_t *inode,
+        struct iatt *buf, dict_t *dict, struct iatt *postparent)
+{
+    gluster_local_t *local = frame->local;
+    inode_t *link_inode;
+
+    if (op_ret) {
+        goto out;
+    }
+
+    /* TODO: Do this only for non-root inodes */
+    link_inode = inode_link(inode, local->loc->parent, local->loc->name, buf);
+    local->loc->inode = link_inode;
+out:
+    local->reply_stub = fop_lookup_cbk_stub(frame, NULL, op_ret, op_errno,
+            inode, buf, dict, postparent);
+
+    GLUSTER_REPLY_NOTIFY(local);
+    return 0;
+}
+
+/*
+ * Lookup @path under @parent. Return with inode reference held.
+ */
+static inode_t *gluster_lookup(inode_t *parent, inode_table_t *itable,
+        char *path)
+{
+    inode_t *inode = NULL;
+    loc_t loc;
+    gluster_local_t *local;
+    call_stub_t *stub;
+    int op_ret;
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        return NULL;
+    }
+
+    loc.path = gf_strdup(path);
+    loc.name = basename(path);
+    loc.inode = inode_new(itable);
+    loc.parent = parent; /* inode_ref ? */
+    uuid_copy(loc.pargfid, parent->gfid);
+    local->loc = &loc;
+    GLUSTER_FOP(gctx, stub, lookup, local, &loc, NULL);
+
+    op_ret = stub->args.lookup_cbk.op_ret;
+    errno = stub->args.lookup_cbk.op_errno;
+
+    if (op_ret == -1) {
+        goto out;
+    }
+
+    inode = stub->args.lookup_cbk.inode;
+    if (inode != loc.inode) {
+        inode_unref(loc.inode);
+        (void)inode_ref(inode);
+    } else {
+        inode = loc.inode;
+    }
+out:
+    /* inode_unref(parent); */
+    call_stub_destroy(stub);
+    GF_FREE((void *)loc.path);
+    return inode;
+}
+
+static int gluster_path_resolve_hard(loc_t *loc, int lookup_basename)
+{
+    inode_t *parent, *inode, *curr;
+    inode_table_t *itable;
+    char *pathname, *basename, *token, *next_token, *strtokptr;
+    int ret = 0;
+
+    basename = gf_strdup(loc->path);
+    if (!basename) {
+        ret = -1;
+        goto out;
+    }
+
+    if (!lookup_basename) {
+        pathname = dirname(basename);
+    } else {
+        pathname = basename;
+    }
+
+    itable = ((xlator_t *)gctx->active->top)->itable;
+    parent = inode_from_path(itable, "/");
+    if (!parent) {
+        parent = gluster_lookup(parent, itable, "/");
+        if (!parent) {
+            ret = -1;
+            goto out;
+        }
+    }
+
+    token = strtok_r(pathname, "/", &strtokptr);
+    if (!token) {
+        /* root inode */
+        loc->inode = parent;
+        loc->parent = NULL;
+        ret = 0;
+        goto out;
+    }
+
+    while (token) {
+        curr = inode_grep(itable, parent, token);
+        if (!curr) {
+            loc->parent = parent;
+            inode = gluster_lookup(parent, itable, token);
+            if (!inode) {
+                ret = -1;
+                goto out;
+            }
+            loc->inode = inode;
+            curr = inode;
+        }
+        next_token = strtok_r(NULL, "/", &strtokptr);
+        if (next_token) {
+            inode_unref(parent);
+            parent = curr;
+            curr = NULL;
+        } else {
+            inode = curr;
+            loc->parent = parent;
+            loc->inode = inode;
+        }
+        token = next_token;
+    }
+out:
+    if (!ret && !lookup_basename) {
+        if (loc->parent) {
+            inode_unref(loc->parent);
+        }
+        if (loc->inode) {
+            loc->parent = loc->inode;
+            loc->inode = NULL;
+        }
+    }
+    if (basename) {
+        GF_FREE(basename);
+    }
+    return ret;
+}
+
+/*
+ * Resolve loc.path to loc->parent and loc->inode.
+ */
+static int gluster_path_resolve(loc_t *loc, int lookup_basename)
+{
+    inode_t *parent, *inode = NULL;
+    inode_table_t *itable;
+    char *pathname, *dir;
+    int ret = 0;
+
+    pathname = gf_strdup(loc->path);
+    if (!pathname) {
+        ret = -1;
+        goto out;
+    }
+
+    dir = dirname(pathname);
+    itable = ((xlator_t *)gctx->active->top)->itable;
+
+    /* Check if the inode exists */
+    parent = inode_from_path(itable, dir);
+    if (parent) {
+        loc->parent = parent;
+        if (!lookup_basename) {
+            /* Got parent */
+            ret = 0;
+            goto out;
+        } else {
+            inode = inode_from_path(itable, loc->path);
+            if (inode) {
+                /* Got the complete path */
+                loc->inode = inode;
+                ret = 0;
+                goto out;
+            }
+        }
+    }
+    
+    if (loc->parent) {
+        inode_unref(loc->parent);
+        loc->parent = NULL;
+    } else if (inode) {
+        /* shouldn't happen */
+        inode_unref(inode);
+    }
+    ret = gluster_path_resolve_hard(loc, lookup_basename);
+out:
+    if (pathname) {
+        GF_FREE(pathname);
+    }
+    return ret;
+}
+
+/********** POSIX interfaces ***********/
+static int gluster_readv_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
+        int32_t op_ret, int32_t op_errno, struct iovec *vector,
+        int32_t count, struct iatt *stbuf, struct iobref *iobref)
+{
+    gluster_local_t *local = frame->local;
+    gluster_aiocb_t *gaiocb = local->gaiocb;
+    call_stub_t *stub = NULL;
+
+    local->reply_stub = fop_readv_cbk_stub(frame, NULL, op_ret,
+            op_errno, vector, count, stbuf, iobref, NULL);
+
+    stub = local->reply_stub;
+    if (!local->gaiocb) {
+        GLUSTER_REPLY_NOTIFY(local);
+        goto out;
+    } else {
+        op_ret = stub->args.readv_cbk.op_ret;
+        errno = stub->args.readv_cbk.op_errno;
+        count = stub->args.readv_cbk.count;
+        vector = stub->args.readv_cbk.vector;
+
+        if (op_ret > 0) {
+            int i = 0, op_ret = 0;
+            size_t size = gaiocb->size;
+            char *buf = gaiocb->buf;
+
+            while (size && (i < count)) {
+                int len = (size < vector[i].iov_len) ?  size: vector[i].iov_len;
+                memcpy(buf, vector[i++].iov_base, len);
+                buf += len;
+                size -= len;
+                op_ret += len;
+            }
+        }
+
+        gaiocb->ret = op_ret;
+        if (gaiocb->completion_fn) {
+	    gaiocb->completion_fn(gaiocb);
+        }
+
+        FREE(local);
+        frame->local = NULL;
+        STACK_DESTROY(frame->root);
+        call_stub_destroy(stub);
+    }
+out:
+    return op_ret;
+}
+
+int gluster_aio_readv(fd_t *fd, gluster_aiocb_t *gaiocb)
+{
+    int op_ret = 0;
+    call_stub_t *stub;
+    gluster_local_t *local;
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        return -1;
+    }
+
+    local->gaiocb = gaiocb;
+    GLUSTER_FOP_NO_WAIT(gctx, stub, readv, local, fd, gaiocb->size,
+        gaiocb->offset, 0, NULL);
+
+    return op_ret;
+}
+
+static int gluster_writev_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
+        int32_t op_ret, int32_t op_errno,
+        struct iatt *prebuf, struct iatt *postbuf)
+{
+    gluster_local_t *local = frame->local;
+    gluster_aiocb_t *gaiocb = local->gaiocb;
+    call_stub_t *stub = NULL;
+
+    local->reply_stub = fop_writev_cbk_stub(frame, NULL, op_ret, op_errno,
+        prebuf, postbuf, NULL);
+    stub = local->reply_stub;
+
+    if (!local->gaiocb) {
+        GLUSTER_REPLY_NOTIFY(local);
+        goto out;
+    } else {
+        gaiocb->ret = op_ret = stub->args.writev_cbk.op_ret;
+        errno = stub->args.writev_cbk.op_errno;
+
+        if (gaiocb->completion_fn) {
+            gaiocb->completion_fn(gaiocb);
+        }
+        if (local->iob) {
+            iobuf_unref(local->iob);
+        }
+        if (local->iobref) {
+            iobref_unref(local->iobref);
+        }
+
+        FREE(local);
+        frame->local = NULL;
+        STACK_DESTROY(frame->root);
+        call_stub_destroy(stub);
+    }
+out:
+        return op_ret;
+}
+
+int gluster_aio_writev(fd_t *fd, gluster_aiocb_t *gaiocb)
+{
+    int op_ret = 0;
+    struct iobref *iobref = NULL;
+    struct iobuf *iob = NULL;
+    struct iovec iov;
+    gluster_local_t *local;
+    call_stub_t *stub = NULL;
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        goto out;
+    }
+
+    local->gaiocb = gaiocb;
+    iobref = local->iobref = iobref_new();
+    if (!iobref) {
+        goto out;
+    }
+
+    iob = local->iob = iobuf_get(gctx->iobuf_pool);
+    if (!iob) {
+        goto out;
+    }
+
+    memcpy(iob->ptr, gaiocb->buf, gaiocb->size);
+    iobref_add(iobref, iob);
+    iov.iov_base = local->iob->ptr;
+    iov.iov_len = gaiocb->size;
+
+    GLUSTER_FOP_NO_WAIT(gctx, stub, writev, local, fd, &iov, 1,
+        gaiocb->offset, 0, iobref, NULL);
+out:
+    return op_ret;
+}
+
+static void gluster_loc_wipe(loc_t *loc)
+{
+    if (loc->path) {
+        GF_FREE((void *)loc->path);
+    }
+    if (loc->parent) {
+        inode_unref(loc->parent);
+    }
+    if (loc->inode) {
+        inode_unref(loc->inode);
+    }
+    loc->path = loc->name = NULL;
+}
+
+static int gluster_create_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
+        int32_t op_ret, int32_t op_errno, fd_t *fd, inode_t *inode,
+        struct iatt *buf, struct iatt *preparent, struct iatt *postparent)
+{
+    gluster_local_t *local = frame->local;
+
+    local->reply_stub = fop_create_cbk_stub(frame, NULL, op_ret, op_errno,
+            fd, inode, buf, preparent, postparent, NULL);
+    GLUSTER_REPLY_NOTIFY(local);
+    return 0;
+}
+
+static int __gluster_do_create(loc_t *loc, int flags, int mode, fd_t *fd)
+{
+    call_stub_t *stub = NULL;
+    gluster_local_t *local;
+    dict_t *dict = NULL;
+    int ret = -1;
+    inode_t *inode;
+    uuid_t gfid;
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        goto out;
+    }
+
+    dict = dict_new();
+    if (!dict) {
+        goto out;
+    }
+
+    uuid_generate(gfid);
+    ret = dict_set_static_bin(dict, "gfid-req", gfid, 16);
+    if (ret < 0) {
+        goto out;
+    }
+
+    GLUSTER_FOP(gctx, stub, create, local, loc, flags, mode, 0, fd, dict);
+    ret = stub->args.create_cbk.op_ret;
+    errno = stub->args.create_cbk.op_errno;
+    if (ret == -1) {
+        goto out;
+    }
+
+    inode = stub->args.create_cbk.inode;
+    inode_link(inode, loc->parent, loc->name, &stub->args.create_cbk.buf);
+    inode_lookup(inode);
+out:
+    call_stub_destroy(stub);
+    if (dict) {
+        dict_unref(dict);
+    }
+    return ret;
+}
+
+static int gluster_open_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
+        int32_t op_ret, int32_t op_errno, fd_t *fd)
+{
+    gluster_local_t *local = frame->local;
+
+    local->reply_stub = fop_open_cbk_stub(frame, NULL, op_ret, op_errno, fd,
+        NULL);
+    GLUSTER_REPLY_NOTIFY(local);
+    return 0;
+}
+
+static int __gluster_do_open(loc_t *loc, int flags, fd_t *fd)
+{
+    call_stub_t *stub = NULL;
+    gluster_local_t *local;
+    int ret = -1;
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        goto out;
+    }
+
+    GLUSTER_FOP(gctx, stub, open, local, loc, flags, fd, 0);
+    ret = stub->args.open_cbk.op_ret;
+    errno = stub->args.open_cbk.op_errno;
+out:
+    call_stub_destroy(stub);
+    return ret;
+}
+
+static gluster_file_t gluster_do_open(const char *path, int flags, int mode)
+{
+    int ret = 0;
+    loc_t loc = {0, };
+    fd_t *fd = NULL;
+    int create = ((flags & O_CREAT) == O_CREAT) ? 1: 0;
+    char *pathname = NULL;
+
+    loc.path = gf_strdup(path);
+    ret = gluster_path_resolve(&loc, 1);
+    if (ret == -1 && !create) {
+        goto out;
+    }
+
+    if (!ret && create && ((flags & O_EXCL) == O_EXCL)) {
+        /* EEXIST */
+        ret = -1;
+        goto out;
+    }
+
+    if (ret == -1 && create) {
+        inode_table_t *itable = ((xlator_t *)gctx->active->top)->itable;
+
+        gluster_loc_wipe(&loc);
+
+        /* lookup parent */
+        loc.path = gf_strdup(path);
+        ret = gluster_path_resolve(&loc, 0);
+        if (ret == -1) {
+            goto out;
+        }
+
+        /* alloc new inode for child */
+        loc.inode = inode_new(itable);
+    }
+		
+    pathname = gf_strdup(path);
+    loc.name = basename(pathname);
+
+    fd = fd_create(loc.inode, getpid());
+    if (!fd) {
+        goto out;
+    }
+    fd->flags = flags;
+
+    if (!create && loc.inode && !uuid_is_null(loc.inode->gfid)) {
+        uuid_copy(loc.gfid, loc.inode->gfid);
+    }
+    if (loc.parent && !uuid_is_null(loc.parent->gfid)) {
+        uuid_copy(loc.pargfid, loc.parent->gfid);
+    }
+
+    if (create) {
+        ret = __gluster_do_create(&loc, flags, mode, fd);
+    } else {
+        ret = __gluster_do_open(&loc, flags, fd);
+    }
+
+    if (ret == -1) {
+        fd_unref(fd);
+        fd = NULL;
+    }
+out:
+    gluster_loc_wipe(&loc);
+    if (pathname)
+        GF_FREE(pathname);
+    return (gluster_file_t)fd;
+}
+
+gluster_file_t gluster_open(const char *path, int flags, int mode)
+{
+    return gluster_do_open(path, flags, mode);
+}
+
+gluster_file_t gluster_creat(const char *path, int mode)
+{
+    return gluster_do_open(path, (O_CREAT | O_WRONLY | O_TRUNC), mode);
+}
+
+static int gluster_fstat_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
+        int32_t op_ret, int32_t op_errno, struct iatt *buf)
+{
+    gluster_local_t *local = frame->local;
+
+    local->reply_stub = fop_fstat_cbk_stub(frame, NULL, op_ret, op_errno, buf,
+        NULL);
+    GLUSTER_REPLY_NOTIFY(local);
+    return 0;
+}
+
+int gluster_fstat(gluster_file_t fd, struct stat *buf)
+{
+    int op_ret = -1;
+    gluster_local_t *local;
+    call_stub_t *stub = NULL;
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        goto out;
+    }
+
+    GLUSTER_FOP(gctx, stub, fstat, local, fd, NULL);
+    op_ret = stub->args.fstat_cbk.op_ret;
+    errno = stub->args.fstat_cbk.op_errno;
+
+    if (!op_ret) {
+        iatt_to_stat(&stub->args.fstat_cbk.buf, buf);
+    }
+out:
+    call_stub_destroy(stub);
+    return op_ret;
+}
+
+static int gluster_flush_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
+        int32_t op_ret, int32_t op_errno)
+{
+    gluster_local_t *local = frame->local;
+
+    local->reply_stub = fop_flush_cbk_stub(frame, NULL, op_ret, op_errno, NULL);
+    GLUSTER_REPLY_NOTIFY(local);
+    return 0;
+}
+
+int gluster_close(gluster_file_t fd)
+{
+    int op_ret = -1;
+    gluster_local_t *local;
+    call_stub_t *stub = NULL;
+
+    if (!fd) {
+        goto out;
+    }
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        goto out;
+    }
+
+    GLUSTER_FOP(gctx, stub, flush, local, fd, NULL);
+    op_ret = stub->args.flush_cbk.op_ret;
+    errno = stub->args.flush_cbk.op_errno;
+
+out:
+    call_stub_destroy(stub);
+    return op_ret;
+}
+
+static int gluster_ftruncate_cbk(call_frame_t *frame, void *cookie,
+        xlator_t *xlator, int32_t op_ret, int32_t op_errno,
+        struct iatt *prebuf, struct iatt *postbuf)
+{
+    gluster_local_t *local = frame->local;
+
+    local->reply_stub = fop_ftruncate_cbk_stub(frame, NULL, op_ret, op_errno,
+        prebuf, postbuf, NULL);
+    GLUSTER_REPLY_NOTIFY(local);
+    return 0;
+}
+
+int gluster_ftruncate(gluster_file_t fd, off_t length)
+{
+    int op_ret = -1;
+    gluster_local_t *local;
+    call_stub_t *stub;
+
+    if (!fd) {
+        goto out;
+    }
+
+    local = CALLOC(1, sizeof(*local));
+    if (!local) {
+        goto out;
+    }
+
+    GLUSTER_FOP(gctx, stub, ftruncate, local, fd, length, NULL);
+    op_ret = stub->args.ftruncate_cbk.op_ret;
+    errno = stub->args.ftruncate_cbk.op_errno;
+
+out:
+    call_stub_destroy(stub);
+    return op_ret;
+}
+
+/********** Glusterfs initialization ******/
+struct xlator_fops gluster_master_fops = { };
+struct xlator_cbks gluster_master_cbks = { };
+
+static int gluster_master_init(xlator_t *this)
+{
+    return 0;
+}
+
+static void gluster_master_fini(xlator_t *this)
+{
+    return;
+}
+
+static int gluster_master_notify(xlator_t *this, int32_t event, void *data, ...)
+{
+    glusterfs_graph_t *graph = data;
+    inode_table_t *itable;
+
+    switch (event) {
+    case GF_EVENT_GRAPH_NEW:
+        /* This should ideally be under GF_EVENT_CHILD_UP */
+        if (!graph) {
+            break;
+        }
+        graph->used = 1;
+        itable = inode_table_new(0, graph->top);
+        if (!itable) {
+            return -1;
+        }
+        ((xlator_t *)graph->top)->itable = itable;
+        break;
+    case GF_EVENT_CHILD_UP:
+    case GF_EVENT_CHILD_DOWN:
+    case GF_EVENT_CHILD_CONNECTING:
+        break;
+    case GF_EVENT_AUTH_FAILED:
+        gluster_master_fini(this);
+        break;
+    default:
+        break;
+    }
+    return 0;
+}
+
+/*
+ * Master translator that's equivalent of FUSE
+ * Can't do xlator_dyn_load since we don't have .so for this - Need
+ * to populate the master xlator manually.
+ */
+static int gluster_master_setup(glusterfs_ctx_t *ctx)
+{
+    xlator_t *master;
+
+    master = CALLOC(1, sizeof(*master));
+    if (!master) {
+        goto out;
+    }
+
+    master->type = gf_strdup("gluster/master");
+    master->name = gf_strdup("gluster-master");
+    master->init = gluster_master_init;
+    master->fops = &gluster_master_fops;
+    master->cbks = &gluster_master_cbks;
+    master->notify = gluster_master_notify;
+    master->fini = gluster_master_fini;
+    INIT_LIST_HEAD(&master->volume_options);
+
+    master->ctx = ctx;
+    if (xlator_init(master)) {
+        goto out;
+    }
+    ctx->master = master;
+    return 0;
+out:
+    if (master) {
+        FREE(master);
+    }
+    return -1;
+}
+
+static void gluster_ctx_destroy(void)
+{
+    call_pool_t *pool;
+
+    if (!gctx) {
+        return;
+    }
+    if (gctx->iobuf_pool) {
+        iobuf_pool_destroy(gctx->iobuf_pool);
+    }
+    if (gctx->dict_pool) {
+        mem_pool_destroy(gctx->dict_pool);
+    }
+    if (gctx->dict_pair_pool) {
+        mem_pool_destroy(gctx->dict_pair_pool);
+    }
+    if (gctx->dict_data_pool) {
+        mem_pool_destroy(gctx->dict_data_pool);
+    }
+    if (gctx->stub_mem_pool) {
+        mem_pool_destroy(gctx->stub_mem_pool);
+    }
+
+    pool = gctx->pool;
+    if (pool) {
+        if (pool->frame_mem_pool) {
+            mem_pool_destroy(pool->frame_mem_pool);
+        }
+        if (pool->stack_mem_pool) {
+            mem_pool_destroy(pool->stack_mem_pool);
+        }
+        GF_FREE(pool);
+    }
+}
+
+static int gluster_ctx_init(void)
+{
+    int ret = 0;
+    call_pool_t *pool = NULL;
+    cmd_args_t *cmd_args = NULL;
+
+    gctx = CALLOC(1, sizeof(*gctx));
+    if (!gctx) {
+        ret = -1;
+        goto out;
+    }
+
+    INIT_LIST_HEAD(&gctx->graphs);
+    INIT_LIST_HEAD(&gctx->mempool_list);
+    ret = pthread_mutex_init(&gctx->lock, NULL);
+    if (ret) {
+        goto out;
+    }
+
+    THIS->ctx = gctx;
+
+    ret = -1;
+    gctx->page_size = 128 * GF_UNIT_KB;
+    gctx->iobuf_pool = iobuf_pool_new();
+    if (!gctx->iobuf_pool) {
+        goto out;
+    }
+
+    gctx->event_pool = event_pool_new(16384);
+    if (!gctx->event_pool) {
+        goto out;
+    }
+
+    gctx->dict_pool = mem_pool_new(dict_t, 1024);
+    if (!gctx->dict_pool) {
+        goto out;
+    }
+
+    gctx->dict_pair_pool = mem_pool_new(data_pair_t, 16*GF_UNIT_KB);
+    if (!gctx->dict_pair_pool) {
+        goto out;
+    }
+
+    gctx->dict_data_pool = mem_pool_new(data_t, 8*GF_UNIT_KB);
+    if (!gctx->dict_data_pool) {
+        goto out;
+    }
+
+    pool = CALLOC(1, sizeof(call_pool_t));
+    if (!pool) {
+        goto out;
+    }
+
+    INIT_LIST_HEAD(&pool->all_frames);
+    LOCK_INIT(&pool->lock);
+    gctx->pool = pool;
+
+    pool->frame_mem_pool = mem_pool_new(call_frame_t, 4096);
+    if (!pool->frame_mem_pool) {
+        goto out;
+    }
+
+    pool->stack_mem_pool = mem_pool_new(call_stack_t, 1024);
+    if (!pool->stack_mem_pool) {
+        goto out;
+    }
+
+    gctx->stub_mem_pool = mem_pool_new(call_stub_t, 1024);
+    if (!gctx->stub_mem_pool) {
+        goto out;
+    }
+
+    cmd_args = &gctx->cmd_args;
+    INIT_LIST_HEAD(&cmd_args->xlator_options);
+    return 0;
+
+out:
+    gluster_ctx_destroy();
+    return ret;
+}
+
+static int gluster_graph_init(char *volfile)
+{
+    FILE *fp = NULL;
+    glusterfs_graph_t *graph = NULL;
+    int ret = -1;
+
+    fp = fopen(volfile, "r");
+    if (!fp) {
+        ret = -errno;
+        goto out;
+    }
+
+    graph = glusterfs_graph_construct(fp);
+    if (!graph) {
+        goto out;
+    }
+
+    ret = glusterfs_graph_prepare(graph, gctx);
+    if (ret) {
+        goto out;
+    }
+
+    ret = glusterfs_graph_activate(graph, gctx);
+    if (ret) {
+        goto out;
+    }
+out:
+    if (fp) {
+        fclose(fp);
+    }
+    if (graph && ret) {
+        glusterfs_graph_destroy(graph);
+        gctx->active = NULL;
+    }
+    return ret;
+}
+	
+static int gluster_logging_init(char *logfile)
+{
+    if (gf_log_init(logfile) == -1) {
+        return -1;
+    }
+    gf_log_set_loglevel(GF_LOG_INFO);
+    return 0;
+}
+
+static void *gluster_handle_poll(void *arg)
+{
+    glusterfs_ctx_t *ctx = arg;
+
+    event_dispatch(ctx->event_pool);
+    return NULL;
+}
+
+glusterfs_ctx_t *gluster_init(char *volfile)
+{
+    int ret = 0;
+    pthread_t thread;
+
+    ret = glusterfs_this_init();
+    if (ret) {
+        goto out;
+    }
+
+    ret = gluster_ctx_init();
+    if (ret) {
+        goto out;
+    }
+
+    ret = glusterfs_uuid_buf_init();
+    if (ret) {
+        goto out;
+    }
+
+    ret = glusterfs_lkowner_buf_init();
+    if (ret) {
+        goto out;
+    }
+
+    /* FIX: Without an explicit log file, the log is put to console! */
+    ret = gluster_logging_init("/tmp/qemu-gluster.log");
+    if (ret) {
+        goto out;
+    }
+
+    ret = gluster_master_setup(gctx);
+    if (ret) {
+        goto out;
+    }
+
+    ret = gluster_graph_init(volfile);
+    if (ret) {
+        goto out;
+    }
+
+    ret = pthread_create(&thread, NULL, gluster_handle_poll,
+    (void *)gctx);
+    if (ret) {
+        goto out;
+    }
+    return gctx;
+
+out:
+    if (gctx) {
+        if (gctx->master) {
+            xlator_destroy(gctx->master);
+        }
+        if (gctx->active) {
+            glusterfs_graph_destroy(gctx->active);
+        }
+    }
+    gluster_ctx_destroy();
+    return NULL;
+}
diff --git a/block/gluster-helpers.h b/block/gluster-helpers.h
new file mode 100644
index 0000000..612f0f2
--- /dev/null
+++ b/block/gluster-helpers.h
@@ -0,0 +1,40 @@
+/*
+ * Helper routines for GlusterFS backend
+ * (Based on libglusterfsclient)
+ *
+ * Copyright (C) 2012 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+#include <glusterfs/glusterfs.h>
+#include <glusterfs/common-utils.h>
+#include <glusterfs/iobuf.h>
+#include <glusterfs/stack.h>
+#include <glusterfs/event.h>
+#include <glusterfs/call-stub.h>
+#include <glusterfs/mem-pool.h>
+#include <glusterfs/globals.h>
+#include <glusterfs/fd.h>
+
+#include <libgen.h>
+
+typedef void * gluster_file_t;
+
+typedef struct {
+    char *buf;
+    size_t size;
+    off_t offset;
+    int ret;
+    void (*completion_fn)(void *arg);
+    void *opaque;
+} gluster_aiocb_t;
+	
+extern glusterfs_ctx_t *gluster_init(char *volfile);
+extern int gluster_aio_readv(fd_t *fd, gluster_aiocb_t *gaiocb);
+extern int gluster_aio_writev(fd_t *fd, gluster_aiocb_t *gaiocb);
+extern gluster_file_t gluster_open(const char *path, int flags, int mode);
+extern int gluster_fstat(gluster_file_t fd, struct stat *buf);
+extern int gluster_close(gluster_file_t fd);
+extern gluster_file_t gluster_creat(const char *path, int mode);
+extern int gluster_ftruncate(gluster_file_t fd, off_t length);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-11 14:18 [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Bharata B Rao
  2012-06-11 14:19 ` [Qemu-devel] [RFC PATCH 1/3] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
  2012-06-11 14:20 ` [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs Bharata B Rao
@ 2012-06-11 14:21 ` Bharata B Rao
  2012-06-18 17:35   ` Stefan Hajnoczi
  2012-06-18 15:36 ` [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Stefan Hajnoczi
  2012-07-06  5:35 ` Bharata B Rao
  4 siblings, 1 reply; 17+ messages in thread
From: Bharata B Rao @ 2012-06-11 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amar Tumballi, Vijay Bellur

block: gluster as block backend

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

This patch adds gluster as the new block backend in QEMU. This gives QEMU
the ability to boot VM images from gluster volumes.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---

 Makefile.objs   |    2 
 block/gluster.c |  435 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 436 insertions(+), 1 deletions(-)
 create mode 100644 block/gluster.c


diff --git a/Makefile.objs b/Makefile.objs
index 25190ba..859b88a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -59,7 +59,7 @@ block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_LIBISCSI) += iscsi.o
 block-nested-$(CONFIG_CURL) += curl.o
 block-nested-$(CONFIG_RBD) += rbd.o
-block-nested-$(CONFIG_GLUSTERFS) += gluster-helpers.o
+block-nested-$(CONFIG_GLUSTERFS) += gluster-helpers.o gluster.o
 
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
 
diff --git a/block/gluster.c b/block/gluster.c
new file mode 100644
index 0000000..1566cb7
--- /dev/null
+++ b/block/gluster.c
@@ -0,0 +1,435 @@
+/*
+ * GlusterFS backend for QEMU
+ *
+ * (AIO implementation is derived from block/rbd.c)
+ *
+ * Copyright (C) 2012 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "block_int.h"
+#include "gluster-helpers.h"
+
+typedef void *gluster_file_t;
+
+typedef struct glusterConf {
+    char volfile[PATH_MAX];
+    char image[PATH_MAX];
+} glusterConf;
+
+typedef struct BDRVGlusterState {
+    int fds[2];
+    int open_flags;
+    gluster_file_t fd;
+    glusterfs_ctx_t *ctx;
+    int qemu_aio_count;
+    int event_reader_pos;
+    gluster_aiocb_t *event_gaiocb;
+} BDRVGlusterState;
+
+typedef struct glusterAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+    QEMUIOVector *qiov;
+    int ret;
+    int write;
+    char *bounce;
+    BDRVGlusterState *s;
+    int cancelled;
+    int error;
+} glusterAIOCB;
+
+#define GLUSTER_FD_READ 0
+#define GLUSTER_FD_WRITE 1
+
+/*
+ * file=protocol:volfile:image
+ */
+static int qemu_gluster_parsename(glusterConf *c, const char *filename)
+{
+    char *file = g_strdup(filename);
+    char *token, *next_token, *saveptr;
+    int ret = 0;
+
+    /* Discard the protocol */
+    token = strtok_r(file, ":", &saveptr);
+    if (!token) {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /* volfile */
+    next_token = strtok_r(NULL, ":", &saveptr);
+    if (!next_token) {
+        ret = -EINVAL;
+        goto out;
+    }
+    strncpy(c->volfile, next_token, PATH_MAX);
+
+    /* image */
+    next_token = strtok_r(NULL, ":", &saveptr);
+    if (!next_token) {
+        ret = -EINVAL;
+        goto out;
+    }
+    strncpy(c->image, next_token, PATH_MAX);
+out:
+    g_free(file);
+    return ret;
+}
+
+static void gluster_aio_bh_cb(void *opaque)
+{
+    glusterAIOCB *acb = opaque;
+
+    if (!acb->write) {
+        qemu_iovec_from_buffer(acb->qiov, acb->bounce, acb->qiov->size);
+    }
+    qemu_vfree(acb->bounce);
+    acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
+    qemu_bh_delete(acb->bh);
+    acb->bh = NULL;
+
+    qemu_aio_release(acb);
+}
+
+static void qemu_gluster_complete_aio(gluster_aiocb_t *gaiocb)
+{
+    glusterAIOCB *acb = (glusterAIOCB *)gaiocb->opaque;
+    int64_t r;
+
+    if (acb->cancelled) {
+        qemu_vfree(acb->bounce);
+        qemu_aio_release(acb);
+        goto done;
+    }
+
+    r = gaiocb->ret;
+
+    if (acb->write) {
+        if (r < 0) {
+            acb->ret = r;
+            acb->error = 1;
+        } else if (!acb->error) {
+            acb->ret = gaiocb->size;
+        }
+    } else {
+        if (r < 0) {
+            memset(gaiocb->buf, 0, gaiocb->size);
+            acb->ret = r;
+            acb->error = 1;
+        } else if (r < gaiocb->size) {
+            memset(gaiocb->buf + r, 0, gaiocb->size - r);
+            if (!acb->error) {
+                acb->ret = gaiocb->size;
+            }
+        } else if (!acb->error) {
+            acb->ret = r;
+        }
+    }
+    acb->bh = qemu_bh_new(gluster_aio_bh_cb, acb);
+    qemu_bh_schedule(acb->bh);
+done:
+    g_free(gaiocb);
+}
+
+static void qemu_gluster_aio_event_reader(void *opaque)
+{
+    BDRVGlusterState *s = opaque;
+    ssize_t ret;
+
+    do {
+        char *p = (char *)&s->event_gaiocb;
+
+        ret = read(s->fds[GLUSTER_FD_READ], p + s->event_reader_pos,
+                   sizeof(s->event_gaiocb) - s->event_reader_pos);
+        if (ret > 0) {
+            s->event_reader_pos += ret;
+            if (s->event_reader_pos == sizeof(s->event_gaiocb)) {
+                s->event_reader_pos = 0;
+                qemu_gluster_complete_aio(s->event_gaiocb);
+                s->qemu_aio_count--;
+            }
+        }
+    } while (ret < 0 && errno == EINTR);
+}
+
+static int qemu_gluster_aio_flush_cb(void *opaque)
+{
+    BDRVGlusterState *s = opaque;
+
+    return (s->qemu_aio_count > 0);
+}
+
+static int qemu_gluster_open(BlockDriverState *bs, const char *filename,
+    int bdrv_flags)
+{
+    BDRVGlusterState *s = bs->opaque;
+    glusterConf *c = g_malloc(sizeof(glusterConf));
+    int ret = -1;
+
+    if (qemu_gluster_parsename(c, filename)) {
+        goto out;
+    }
+
+    s->ctx = gluster_init(c->volfile);
+    if (!s->ctx) {
+        goto out;
+    }
+
+    /* FIX: Server client handshake takes time */
+    sleep(1);
+
+    s->open_flags |=  O_BINARY;
+    s->open_flags &= ~O_ACCMODE;
+    if (bdrv_flags & BDRV_O_RDWR) {
+        s->open_flags |= O_RDWR;
+    } else {
+        s->open_flags |= O_RDONLY;
+    }
+
+    /* Use O_DSYNC for write-through caching, no flags for write-back caching,
+     * and O_DIRECT for no caching. */
+    if ((bdrv_flags & BDRV_O_NOCACHE))
+        s->open_flags |= O_DIRECT;
+    if (!(bdrv_flags & BDRV_O_CACHE_WB))
+        s->open_flags |= O_DSYNC;
+
+    s->fd = gluster_open(c->image, s->open_flags, 0);
+    if (!s->fd) {
+        goto out;
+    }
+
+    ret = qemu_pipe(s->fds);
+    if (ret < 0) {
+        goto out;
+    }
+    fcntl(s->fds[0], F_SETFL, O_NONBLOCK);
+    fcntl(s->fds[1], F_SETFL, O_NONBLOCK);
+    qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ],
+        qemu_gluster_aio_event_reader, NULL, qemu_gluster_aio_flush_cb, s);
+out:
+    if (c) {
+        g_free(c);
+    }
+    if (ret < 0) {
+        gluster_close(s->fd);
+    }
+    return ret;
+}
+
+static int qemu_gluster_create(const char *filename,
+        QEMUOptionParameter *options)
+{
+    glusterConf *c = g_malloc(sizeof(glusterConf));
+    int ret = 0;
+    gluster_file_t fd;
+    int64_t total_size = 0;
+
+    ret = qemu_gluster_parsename(c, filename);
+    if (ret) {
+        goto out;
+    }
+
+    if (!gluster_init(c->volfile)) {
+        ret = -1;
+        goto out;
+    }
+
+    /* FIX: Server client handshake takes time */
+    sleep(1);
+
+    /* Read out options */
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            total_size = options->value.n / BDRV_SECTOR_SIZE;
+        }
+        options++;
+    }
+
+    fd = gluster_creat(c->image, 0644);
+    if (!fd) {
+        ret = -errno;
+    } else {
+        if (gluster_ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
+            ret = -errno;
+        }
+        if (gluster_close(fd) != 0) {
+            ret = -errno;
+        }
+    }
+out:
+    if (c) {
+        g_free(c);
+    }
+    return ret;
+}
+
+static AIOPool gluster_aio_pool = {
+    .aiocb_size = sizeof(glusterAIOCB),
+};
+
+static int qemu_gluster_send_pipe(BDRVGlusterState *s, gluster_aiocb_t *gaiocb)
+{
+    int ret = 0;
+    while (1) {
+        fd_set wfd;
+        int fd = s->fds[GLUSTER_FD_WRITE];
+
+        ret = write(fd, (void *)&gaiocb, sizeof(gaiocb));
+        if (ret >= 0) {
+            break;
+        }
+        if (errno == EINTR) {
+            continue;
+        }
+        if (errno != EAGAIN) {
+            break;
+        }
+
+        FD_ZERO(&wfd);
+        FD_SET(fd, &wfd);
+        do {
+            ret = select(fd + 1, NULL, &wfd, NULL, NULL);
+        } while (ret < 0 && errno == EINTR);
+    }
+    return ret;
+}
+
+static void gluster_finish_aiocb(void *arg)
+{
+    int ret;
+    gluster_aiocb_t *gaiocb = (gluster_aiocb_t *)arg;
+    BDRVGlusterState *s = ((glusterAIOCB *)gaiocb->opaque)->s;
+
+    ret = qemu_gluster_send_pipe(s, gaiocb);
+    if (ret < 0) {
+        g_free(gaiocb);
+    }
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int write)
+{
+    int ret;
+    glusterAIOCB *acb;
+    gluster_aiocb_t *gaiocb;
+    BDRVGlusterState *s = bs->opaque;
+    char *buf;
+    size_t size;
+    off_t offset;
+
+    acb = qemu_aio_get(&gluster_aio_pool, bs, cb, opaque);
+    acb->write = write;
+    acb->qiov = qiov;
+    acb->bounce = qemu_blockalign(bs, qiov->size);
+    acb->ret = 0;
+    acb->bh = NULL;
+    acb->s = s;
+
+    if (write) {
+        qemu_iovec_to_buffer(acb->qiov, acb->bounce);
+    }
+
+    buf = acb->bounce;
+    offset = sector_num * BDRV_SECTOR_SIZE;
+    size = nb_sectors * BDRV_SECTOR_SIZE;
+    s->qemu_aio_count++;
+
+    gaiocb = g_malloc(sizeof(gluster_aiocb_t));
+    gaiocb->opaque = acb;
+    gaiocb->buf = buf;
+    gaiocb->offset = offset;
+    gaiocb->size = size;
+    gaiocb->completion_fn = &gluster_finish_aiocb;
+
+    if (write) {
+        ret = gluster_aio_writev(s->fd, gaiocb);
+    } else {
+        ret = gluster_aio_readv(s->fd, gaiocb);
+    }
+
+    if (ret < 0) {
+        goto out;
+    }
+    return &acb->common;
+
+out:
+    g_free(gaiocb);
+    s->qemu_aio_count--;
+    qemu_aio_release(acb);
+    return NULL;
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_readv(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_gluster_aio_rw(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+}
+
+static BlockDriverAIOCB *qemu_gluster_aio_writev(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_gluster_aio_rw(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
+}
+
+static int64_t qemu_gluster_getlength(BlockDriverState *bs)
+{
+    BDRVGlusterState *s = bs->opaque;
+    gluster_file_t fd = s->fd;
+    struct stat st;
+    int ret;
+
+    ret = gluster_fstat(fd, &st);
+    if (ret < 0) {
+        return -1;
+    } else {
+        return st.st_size;
+    }
+}
+
+static void qemu_gluster_close(BlockDriverState *bs)
+{
+    BDRVGlusterState *s = bs->opaque;
+
+    if (s->fd) {
+        gluster_close(s->fd);
+        s->fd = NULL;
+    }
+}
+
+static QEMUOptionParameter qemu_gluster_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+    { NULL }
+};
+
+static BlockDriver bdrv_gluster = {
+    .format_name = "gluster",
+    .protocol_name = "gluster",
+    .instance_size = sizeof(BDRVGlusterState),
+    .bdrv_file_open = qemu_gluster_open,
+    .bdrv_close = qemu_gluster_close,
+    .bdrv_create = qemu_gluster_create,
+    .bdrv_getlength = qemu_gluster_getlength,
+
+    .bdrv_aio_readv = qemu_gluster_aio_readv,
+    .bdrv_aio_writev = qemu_gluster_aio_writev,
+
+    .create_options = qemu_gluster_create_options,
+};
+
+static void bdrv_gluster_init(void)
+{
+    bdrv_register(&bdrv_gluster);
+}
+
+block_init(bdrv_gluster_init);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU
  2012-06-11 14:18 [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Bharata B Rao
                   ` (2 preceding siblings ...)
  2012-06-11 14:21 ` [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend Bharata B Rao
@ 2012-06-18 15:36 ` Stefan Hajnoczi
  2012-06-19  9:10   ` Bharata B Rao
  2012-07-06  5:35 ` Bharata B Rao
  4 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2012-06-18 15:36 UTC (permalink / raw)
  To: bharata; +Cc: Amar Tumballi, qemu-devel, Vijay Bellur

On Mon, Jun 11, 2012 at 3:18 PM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> 4. Creating VM image
>
> # qemu-img create -f gluster gluster:c-qemu.vol:/F16 5G

Do you really need "-f gluster"?  The format should be "raw" (default)
and the protocol should be "gluster".  Specifying "gluster:..." as the
filename takes care of hooking up the GlusterFS protocol.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-11 14:21 ` [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend Bharata B Rao
@ 2012-06-18 17:35   ` Stefan Hajnoczi
  2012-06-19  9:27     ` Avi Kivity
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2012-06-18 17:35 UTC (permalink / raw)
  To: bharata; +Cc: Kevin Wolf, Paolo Bonzini, Amar Tumballi, qemu-devel,
	Vijay Bellur

On Mon, Jun 11, 2012 at 3:21 PM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> +#include "block_int.h"
> +#include "gluster-helpers.h"
> +
> +typedef void *gluster_file_t;

This typedef is already in gluster-helpers.h.  It's ugly BTW, "typedef
struct gluster_file gluster_file_t" is nicer since it won't cast to
other pointer types automatically.

> +
> +typedef struct glusterConf {
> +    char volfile[PATH_MAX];
> +    char image[PATH_MAX];
> +} glusterConf;

QEMU coding style always uses UpperCase for struct names.

> +static void qemu_gluster_aio_event_reader(void *opaque)
> +{
> +    BDRVGlusterState *s = opaque;
> +    ssize_t ret;
> +
> +    do {
> +        char *p = (char *)&s->event_gaiocb;

Why make this a BDRVGlusterState field?  It could be a local, I think.

> +    /* Use O_DSYNC for write-through caching, no flags for write-back caching,
> +     * and O_DIRECT for no caching. */
> +    if ((bdrv_flags & BDRV_O_NOCACHE))
> +        s->open_flags |= O_DIRECT;
> +    if (!(bdrv_flags & BDRV_O_CACHE_WB))
> +        s->open_flags |= O_DSYNC;

Paolo has changed this recently, you might need to use
bs->enable_write_cache instead.

> +out:
> +    if (c) {
> +        g_free(c);
> +    }

g_free(NULL) is a nop, you never need to test that the pointer is non-NULL.

> +static void gluster_finish_aiocb(void *arg)
> +{
> +    int ret;
> +    gluster_aiocb_t *gaiocb = (gluster_aiocb_t *)arg;
> +    BDRVGlusterState *s = ((glusterAIOCB *)gaiocb->opaque)->s;
> +
> +    ret = qemu_gluster_send_pipe(s, gaiocb);
> +    if (ret < 0) {
> +        g_free(gaiocb);

What about the glusterAIOCB?  You need to invoke the callback with an
error value.

What about decrementing the in-flight I/O request count?

> +static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs,
> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> +        BlockDriverCompletionFunc *cb, void *opaque, int write)
> +{
> +    int ret;
> +    glusterAIOCB *acb;
> +    gluster_aiocb_t *gaiocb;
> +    BDRVGlusterState *s = bs->opaque;
> +    char *buf;
> +    size_t size;
> +    off_t offset;
> +
> +    acb = qemu_aio_get(&gluster_aio_pool, bs, cb, opaque);
> +    acb->write = write;
> +    acb->qiov = qiov;
> +    acb->bounce = qemu_blockalign(bs, qiov->size);
> +    acb->ret = 0;
> +    acb->bh = NULL;
> +    acb->s = s;
> +
> +    if (write) {
> +        qemu_iovec_to_buffer(acb->qiov, acb->bounce);
> +    }
> +
> +    buf = acb->bounce;
> +    offset = sector_num * BDRV_SECTOR_SIZE;
> +    size = nb_sectors * BDRV_SECTOR_SIZE;
> +    s->qemu_aio_count++;
> +
> +    gaiocb = g_malloc(sizeof(gluster_aiocb_t));

Can you make this a field of glusterAIOCB?  Then you don't need to
worry about freeing gaiocb later.

> +static int64_t qemu_gluster_getlength(BlockDriverState *bs)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    gluster_file_t fd = s->fd;
> +    struct stat st;
> +    int ret;
> +
> +    ret = gluster_fstat(fd, &st);
> +    if (ret < 0) {
> +        return -1;

Please return a negative errno instead of -1.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs
  2012-06-11 14:20 ` [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs Bharata B Rao
@ 2012-06-18 17:35   ` Stefan Hajnoczi
  2012-06-19  9:31     ` Bharata B Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2012-06-18 17:35 UTC (permalink / raw)
  To: bharata; +Cc: Amar Tumballi, qemu-devel, Vijay Bellur

On Mon, Jun 11, 2012 at 3:20 PM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> +    ret = pthread_create(&thread, NULL, gluster_handle_poll,
> +    (void *)gctx);

Please use qemu-thread.h.  QEMU uses signals so you almost certainly
want to mask signals for this thread (qemu_thread_create() does that).

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU
  2012-06-18 15:36 ` [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Stefan Hajnoczi
@ 2012-06-19  9:10   ` Bharata B Rao
  0 siblings, 0 replies; 17+ messages in thread
From: Bharata B Rao @ 2012-06-19  9:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Amar Tumballi, qemu-devel, Vijay Bellur

On Mon, Jun 18, 2012 at 04:36:04PM +0100, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 3:18 PM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
> > 4. Creating VM image
> >
> > # qemu-img create -f gluster gluster:c-qemu.vol:/F16 5G
> 
> Do you really need "-f gluster"?

I realized that we don't. I was picked it up from the semantics of rbd.

> The format should be "raw" (default)
> and the protocol should be "gluster".  Specifying "gluster:..." as the
> filename takes care of hooking up the GlusterFS protocol.

You are right. The current patches work w/o explicit specification of gluster
as format, but I will update the documentation in the next version.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-18 17:35   ` Stefan Hajnoczi
@ 2012-06-19  9:27     ` Avi Kivity
  2012-06-19  9:30     ` Bharata B Rao
  2012-07-01 14:49     ` Paolo Bonzini
  2 siblings, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2012-06-19  9:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Vijay Bellur, Amar Tumballi, qemu-devel, bharata,
	Paolo Bonzini

On 06/18/2012 08:35 PM, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 3:21 PM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
>> +#include "block_int.h"
>> +#include "gluster-helpers.h"
>> +
>> +typedef void *gluster_file_t;
> 
> This typedef is already in gluster-helpers.h.  It's ugly BTW, "typedef
> struct gluster_file gluster_file_t" is nicer since it won't cast to
> other pointer types automatically.

gluster_file_t can only be cast to a NACK since names ending with _t are
reserved by the C runtime.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-18 17:35   ` Stefan Hajnoczi
  2012-06-19  9:27     ` Avi Kivity
@ 2012-06-19  9:30     ` Bharata B Rao
  2012-06-19 11:05       ` Stefan Hajnoczi
  2012-07-01 14:49     ` Paolo Bonzini
  2 siblings, 1 reply; 17+ messages in thread
From: Bharata B Rao @ 2012-06-19  9:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Amar Tumballi, qemu-devel,
	Vijay Bellur

On Mon, Jun 18, 2012 at 06:35:28PM +0100, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 3:21 PM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
> > +#include "block_int.h"
> > +#include "gluster-helpers.h"
> > +
> > +typedef void *gluster_file_t;
> 
> This typedef is already in gluster-helpers.h.

Yes, will fix that.

> It's ugly BTW, "typedef
> struct gluster_file gluster_file_t" is nicer since it won't cast to
> other pointer types automatically.

Gluster routines in libglusterfsclient operate on gluster specific descriptor
called fd_t.

glusterfs_open returns a pointer to fd_t and rest of the read/write routines
take that pointer as input. libglusterfsclient hides this pointer by doing

typedef void *glusterfs_file_t.

I wanted to return an integer fd from open and then use them with read and
write. But that would need some code in gluster backend to convert integer
fd to fd_t and vice versa. Since libglusterfsclient doesn't deal with integer
fd's, I retained this ugly typedef.

> 
> > +
> > +typedef struct glusterConf {
> > +    char volfile[PATH_MAX];
> > +    char image[PATH_MAX];
> > +} glusterConf;
> 
> QEMU coding style always uses UpperCase for struct names.

Ok, will fix.

> 
> > +static void qemu_gluster_aio_event_reader(void *opaque)
> > +{
> > +    BDRVGlusterState *s = opaque;
> > +    ssize_t ret;
> > +
> > +    do {
> > +        char *p = (char *)&s->event_gaiocb;
> 
> Why make this a BDRVGlusterState field?  It could be a local, I think.

I could I guess, I was just following what rbd does.

> 
> > +    /* Use O_DSYNC for write-through caching, no flags for write-back caching,
> > +     * and O_DIRECT for no caching. */
> > +    if ((bdrv_flags & BDRV_O_NOCACHE))
> > +        s->open_flags |= O_DIRECT;
> > +    if (!(bdrv_flags & BDRV_O_CACHE_WB))
> > +        s->open_flags |= O_DSYNC;
> 
> Paolo has changed this recently, you might need to use
> bs->enable_write_cache instead.

I picked up this logic from block/raw-posix.c:raw_open_common(). Don't see
anything related to bs->enable_write_cache there. Will find out more about
bs->enable_write_cache.

> 
> > +out:
> > +    if (c) {
> > +        g_free(c);
> > +    }
> 
> g_free(NULL) is a nop, you never need to test that the pointer is non-NULL.

Ok.

> 
> > +static void gluster_finish_aiocb(void *arg)
> > +{
> > +    int ret;
> > +    gluster_aiocb_t *gaiocb = (gluster_aiocb_t *)arg;
> > +    BDRVGlusterState *s = ((glusterAIOCB *)gaiocb->opaque)->s;
> > +
> > +    ret = qemu_gluster_send_pipe(s, gaiocb);
> > +    if (ret < 0) {
> > +        g_free(gaiocb);
> 
> What about the glusterAIOCB?  You need to invoke the callback with an
> error value.
> 
> What about decrementing the in-flight I/O request count?

Again, this comes from rbd. gluster_finish_aiocb() is the callback
that we have registered with gluster. I am not doing any error handling when
we even fail to write to the pipe. An even reader would be waiting to read
from the other end of the pipe. Typically error handling and decrementing
the in-flight IO request count is done by that event reader. But in this
case, we even failed to kick (via pipe write) the even reader.

> 
> > +static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs,
> > +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> > +        BlockDriverCompletionFunc *cb, void *opaque, int write)
> > +{
> > +    int ret;
> > +    glusterAIOCB *acb;
> > +    gluster_aiocb_t *gaiocb;
> > +    BDRVGlusterState *s = bs->opaque;
> > +    char *buf;
> > +    size_t size;
> > +    off_t offset;
> > +
> > +    acb = qemu_aio_get(&gluster_aio_pool, bs, cb, opaque);
> > +    acb->write = write;
> > +    acb->qiov = qiov;
> > +    acb->bounce = qemu_blockalign(bs, qiov->size);
> > +    acb->ret = 0;
> > +    acb->bh = NULL;
> > +    acb->s = s;
> > +
> > +    if (write) {
> > +        qemu_iovec_to_buffer(acb->qiov, acb->bounce);
> > +    }
> > +
> > +    buf = acb->bounce;
> > +    offset = sector_num * BDRV_SECTOR_SIZE;
> > +    size = nb_sectors * BDRV_SECTOR_SIZE;
> > +    s->qemu_aio_count++;
> > +
> > +    gaiocb = g_malloc(sizeof(gluster_aiocb_t));
> 
> Can you make this a field of glusterAIOCB?  Then you don't need to
> worry about freeing gaiocb later.

Hmm, I already have glusterAIOCB as part of gaiocb.

> 
> > +static int64_t qemu_gluster_getlength(BlockDriverState *bs)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    gluster_file_t fd = s->fd;
> > +    struct stat st;
> > +    int ret;
> > +
> > +    ret = gluster_fstat(fd, &st);
> > +    if (ret < 0) {
> > +        return -1;
> 
> Please return a negative errno instead of -1.

Ok. May be I could just return value from gluster_fstat().

Thanks for your review.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs
  2012-06-18 17:35   ` Stefan Hajnoczi
@ 2012-06-19  9:31     ` Bharata B Rao
  2012-07-02  9:52       ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Bharata B Rao @ 2012-06-19  9:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Amar Tumballi, qemu-devel, Vijay Bellur

On Mon, Jun 18, 2012 at 06:35:52PM +0100, Stefan Hajnoczi wrote:
> On Mon, Jun 11, 2012 at 3:20 PM, Bharata B Rao
> <bharata@linux.vnet.ibm.com> wrote:
> > +    ret = pthread_create(&thread, NULL, gluster_handle_poll,
> > +    (void *)gctx);
> 
> Please use qemu-thread.h.  QEMU uses signals so you almost certainly
> want to mask signals for this thread (qemu_thread_create() does that).

Ok. This is temporary since this entire patch (2/3) would be redundant
when we have libglusterfsclient working.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-19  9:30     ` Bharata B Rao
@ 2012-06-19 11:05       ` Stefan Hajnoczi
  2012-07-01 14:50         ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2012-06-19 11:05 UTC (permalink / raw)
  To: bharata; +Cc: Kevin Wolf, Paolo Bonzini, Amar Tumballi, qemu-devel,
	Vijay Bellur

On Tue, Jun 19, 2012 at 10:30 AM, Bharata B Rao
<bharata@linux.vnet.ibm.com> wrote:
> On Mon, Jun 18, 2012 at 06:35:28PM +0100, Stefan Hajnoczi wrote:
>> On Mon, Jun 11, 2012 at 3:21 PM, Bharata B Rao
>> <bharata@linux.vnet.ibm.com> wrote:
>> > + á á/* Use O_DSYNC for write-through caching, no flags for write-back caching,
>> > + á á * and O_DIRECT for no caching. */
>> > + á áif ((bdrv_flags & BDRV_O_NOCACHE))
>> > + á á á ás->open_flags |= O_DIRECT;
>> > + á áif (!(bdrv_flags & BDRV_O_CACHE_WB))
>> > + á á á ás->open_flags |= O_DSYNC;
>>
>> Paolo has changed this recently, you might need to use
>> bs->enable_write_cache instead.
>
> I picked up this logic from block/raw-posix.c:raw_open_common(). Don't see
> anything related to bs->enable_write_cache there. Will find out more about
> bs->enable_write_cache.

If you fetch the latest qemu.git and check bdrv_open_common() there is
new code that stashes BDRV_O_CACHE_WB in bs->enable_write_cache and
then opens the actual block driver with BDRV_O_CACHE_WB set.  You can
use bdrv_enable_write_cache() to test the original flag.

>> > +static void gluster_finish_aiocb(void *arg)
>> > +{
>> > + á áint ret;
>> > + á ágluster_aiocb_t *gaiocb = (gluster_aiocb_t *)arg;
>> > + á áBDRVGlusterState *s = ((glusterAIOCB *)gaiocb->opaque)->s;
>> > +
>> > + á áret = qemu_gluster_send_pipe(s, gaiocb);
>> > + á áif (ret < 0) {
>> > + á á á ág_free(gaiocb);
>>
>> What about the glusterAIOCB?  You need to invoke the callback with an
>> error value.
>>
>> What about decrementing the in-flight I/O request count?
>
> Again, this comes from rbd. gluster_finish_aiocb() is the callback
> that we have registered with gluster. I am not doing any error handling when
> we even fail to write to the pipe. An even reader would be waiting to read
> from the other end of the pipe. Typically error handling and decrementing
> the in-flight IO request count is done by that event reader. But in this
> case, we even failed to kick (via pipe write) the even reader.

It sounds like you're saying the request is not properly cleaned up
and completed on failure.  Please fix :).

>> > +static int64_t qemu_gluster_getlength(BlockDriverState *bs)
>> > +{
>> > + á áBDRVGlusterState *s = bs->opaque;
>> > + á ágluster_file_t fd = s->fd;
>> > + á ástruct stat st;
>> > + á áint ret;
>> > +
>> > + á áret = gluster_fstat(fd, &st);
>> > + á áif (ret < 0) {
>> > + á á á áreturn -1;
>>
>> Please return a negative errno instead of -1.
>
> Ok. May be I could just return value from gluster_fstat().

The gluster_fstat() code also does not return negative errnos (at
least in the first case I checked, when CALLOC() fails).

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-18 17:35   ` Stefan Hajnoczi
  2012-06-19  9:27     ` Avi Kivity
  2012-06-19  9:30     ` Bharata B Rao
@ 2012-07-01 14:49     ` Paolo Bonzini
  2 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2012-07-01 14:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Vijay Bellur, Amar Tumballi, qemu-devel, bharata

Il 18/06/2012 19:35, Stefan Hajnoczi ha scritto:
>> > +    /* Use O_DSYNC for write-through caching, no flags for write-back caching,
>> > +     * and O_DIRECT for no caching. */
>> > +    if ((bdrv_flags & BDRV_O_NOCACHE))
>> > +        s->open_flags |= O_DIRECT;
>> > +    if (!(bdrv_flags & BDRV_O_CACHE_WB))
>> > +        s->open_flags |= O_DSYNC;
> Paolo has changed this recently, you might need to use
> bs->enable_write_cache instead.

At the protocol (i.e. low-level backend) level you don't need to do
anything really, if you implement bdrv_flush_to_disk correctly.

Looking at BDRV_O_CACHE_WB will do no harm, it's just dead code.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend
  2012-06-19 11:05       ` Stefan Hajnoczi
@ 2012-07-01 14:50         ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2012-07-01 14:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Vijay Bellur, Amar Tumballi, qemu-devel, bharata

Il 19/06/2012 13:05, Stefan Hajnoczi ha scritto:
>> > I picked up this logic from block/raw-posix.c:raw_open_common(). Don't see
>> > anything related to bs->enable_write_cache there. Will find out more about
>> > bs->enable_write_cache.
> If you fetch the latest qemu.git and check bdrv_open_common() there is
> new code that stashes BDRV_O_CACHE_WB in bs->enable_write_cache and
> then opens the actual block driver with BDRV_O_CACHE_WB set.  You can
> use bdrv_enable_write_cache() to test the original flag.

Yes, but you shouldn't do this when opening.  You should always open for
writeback.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs
  2012-06-19  9:31     ` Bharata B Rao
@ 2012-07-02  9:52       ` Paolo Bonzini
  2012-07-02 10:05         ` Bharata B Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2012-07-02  9:52 UTC (permalink / raw)
  To: bharata; +Cc: Stefan Hajnoczi, Amar Tumballi, qemu-devel, Vijay Bellur

Il 19/06/2012 11:31, Bharata B Rao ha scritto:
>>> > > +    ret = pthread_create(&thread, NULL, gluster_handle_poll,
>>> > > +    (void *)gctx);
>> > 
>> > Please use qemu-thread.h.  QEMU uses signals so you almost certainly
>> > want to mask signals for this thread (qemu_thread_create() does that).
> Ok. This is temporary since this entire patch (2/3) would be redundant
> when we have libglusterfsclient working.

Please make sure that libglusterfsclient also masks signals for its own
threads.  It's a pretty common source of bugs with threaded libraries,
and masking the signals is harmless for programs that do not care.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs
  2012-07-02  9:52       ` Paolo Bonzini
@ 2012-07-02 10:05         ` Bharata B Rao
  0 siblings, 0 replies; 17+ messages in thread
From: Bharata B Rao @ 2012-07-02 10:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, Amar Tumballi, qemu-devel, Vijay Bellur

On Mon, Jul 02, 2012 at 11:52:04AM +0200, Paolo Bonzini wrote:
> Il 19/06/2012 11:31, Bharata B Rao ha scritto:
> >>> > > +    ret = pthread_create(&thread, NULL, gluster_handle_poll,
> >>> > > +    (void *)gctx);
> >> > 
> >> > Please use qemu-thread.h.  QEMU uses signals so you almost certainly
> >> > want to mask signals for this thread (qemu_thread_create() does that).
> > Ok. This is temporary since this entire patch (2/3) would be redundant
> > when we have libglusterfsclient working.
> 
> Please make sure that libglusterfsclient also masks signals for its own
> threads.  It's a pretty common source of bugs with threaded libraries,
> and masking the signals is harmless for programs that do not care.

Will take care of this. Thanks for your review.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU
  2012-06-11 14:18 [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Bharata B Rao
                   ` (3 preceding siblings ...)
  2012-06-18 15:36 ` [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Stefan Hajnoczi
@ 2012-07-06  5:35 ` Bharata B Rao
  4 siblings, 0 replies; 17+ messages in thread
From: Bharata B Rao @ 2012-07-06  5:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Anand Avati, Amar Tumballi, Vijay Bellur

On Mon, Jun 11, 2012 at 07:48:07PM +0530, Bharata B Rao wrote:
> 
> # qemu-system-x86_64 --enable-kvm --nographic -smp 4 -m 1024 -drive file=gluster:c-qemu.vol:/F16,format=gluster
> 

If you notice above, I am directly feeding the gluster volume file to QEMU.
During my discussions with GlusterFS developers, I realized that it is more
relevant to use volume name than volume file directly. Hence I am proposing
that QEMU should have the gluster specifications like this:

-drive file=gluster:volfileserver:volumename:imagename

Note that volfile server and volume name combination is the standard way
of specifying a volume in gluster world.

QEMU would pass volfileserver:volumename to libglusterfsclient which will
contact the server and fetch the right client volume file to use. This
specification change has two side effects:

1. It limits the QEMU user to pick volfiles that are generated only
   by gluster CLI. This should be fine as long as gluster CLI provides
   the capability to generate or regenerate volume files for a given volume
   with the xlator set that QEMU user is interested in. GlusterFS developers
   tell me that this can be provided with some enhancements to
   glusterCLI/glusterd.

   Note that custom volume files could be typically needed when GlusterFS
   server is co-located with QEMU in which case you would like to get rid
   of client-server overhead and RPC communication overhead.

2. As of now, the specification of volfileserver:volumename picks the
   default client volume file for the given volume name. Since we want to
   have the flexibility to pick the volfile of our choice, we could slightly
   extend the specification like this:

   volfileserver:volumename.custom

   that would instruct gluster to pick a custom volume file.

   Eg. volfileserver:myvol.rpcbypass would pick myvol.rpcbypass.vol volume
   file for the volume myvol.

   Here also, the creation of custom volume file is done by gluster CLI, so
   that gluster is in a position to pick the right volume file.

   As per my current understanding, such custom volfile extensions works with
   gluster.

Request QEMU developers to comment on the new specification and point out any
use cases that wouldn't be possible or would be hindered by this
volfileserver:volumename specification.

I also request the GlusterFS developers (on CC) to validate my understanding
of gluster here and let me know if the proposed enhancements to
glusterCLI/glusterd are indeed feasible.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-07-06  5:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-11 14:18 [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Bharata B Rao
2012-06-11 14:19 ` [Qemu-devel] [RFC PATCH 1/3] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
2012-06-11 14:20 ` [Qemu-devel] [RFC PATCH 2/3] block: GlusterFS helpers to interface with libglusterfs Bharata B Rao
2012-06-18 17:35   ` Stefan Hajnoczi
2012-06-19  9:31     ` Bharata B Rao
2012-07-02  9:52       ` Paolo Bonzini
2012-07-02 10:05         ` Bharata B Rao
2012-06-11 14:21 ` [Qemu-devel] [RFC PATCH 3/3] block: gluster as block backend Bharata B Rao
2012-06-18 17:35   ` Stefan Hajnoczi
2012-06-19  9:27     ` Avi Kivity
2012-06-19  9:30     ` Bharata B Rao
2012-06-19 11:05       ` Stefan Hajnoczi
2012-07-01 14:50         ` Paolo Bonzini
2012-07-01 14:49     ` Paolo Bonzini
2012-06-18 15:36 ` [Qemu-devel] [RFC PATCH 0/3] GlusterFS support in QEMU Stefan Hajnoczi
2012-06-19  9:10   ` Bharata B Rao
2012-07-06  5:35 ` Bharata B Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).