qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: harryxiyou@gmail.com
Cc: kwolf@redhat.com, cloudxy@googlegroups.com, kanghua151@gmail.com,
	qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH] HLFS driver for QEMU
Date: Mon, 18 Mar 2013 12:10:28 +0100	[thread overview]
Message-ID: <20130318111028.GC11058@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <1363589264-31606-1-git-send-email-harryxiyou@gmail.com>

On Mon, Mar 18, 2013 at 02:47:44PM +0800, harryxiyou@gmail.com wrote:
> From: Harry Wei <harryxiyou@gmail.com>
> 
> HLFS is HDFS-based(Hadoop Distributed File System) Log-Structured File
> System. Actually, HLFS, currently, is not a FS but a block-storage system,
> which we simplify LFS to fit block-level storage. So you could also call HLFS
> as HLBS (HDFS-based Log-Structured Block-storage System).  HLFS has
> two mode, which are local mode and HDFS mode. HDFS is once write and
> many read so HLFS realize LBS(Log-Structured Block-storage System) to
> achieve reading and writing randomly. LBS is based on LFS's basic theories
> but is different from LFS, which LBS fits block storage better. See
> http://code.google.com/p/cloudxy/wiki/WHAT_IS_CLOUDXY
> about HLFS in details.
> 
> Currently, HLFS support following features:
> 
> 1, Portions of POSIX --- Just realize some interfaces VM image need.
> 2, Randomly Read/Write.
> 3, Large file storage (TB).
> 4, Support snapshots(Linear snapshots and tree snapshots), Clone,
> Block compression, Cache, etc.
> 5, A copy of the data more.
> 6, Cluster system can dynamic expand.
> ...
> 
> Signed-off-by: Harry Wei <harryxiyou@gmail.com>

Is HLFS making releases that distros can package?  I don't see packages
in Debian or Fedora.

Block drivers in qemu.git should have active and sustainable communities
behind them.  For example, if this is a research project where the team
will move on within a year, then it may not be appropriate to merge the
code into QEMU.  Can you share some background on the HLFS community and
where the project is heading?

> ---
>  block/Makefile.objs |    2 +-
>  block/hlfs.c        |  515 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure           |   51 +++++
>  3 files changed, 567 insertions(+), 1 deletion(-)
>  create mode 100644 block/hlfs.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index c067f38..723c7a5 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -8,7 +8,7 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  
>  ifeq ($(CONFIG_POSIX),y)
> -block-obj-y += nbd.o sheepdog.o
> +block-obj-y += nbd.o sheepdog.o hlfs.o

Missing CONFIG_HLFS to compile out hlfs.o.

> +static int hlbs_write(BlockDriverState *bs, int64_t sector_num, const uint8_t
> +        *buf, int nb_sectors)
> +{
> +    int ret;
> +    BDRVHLBSState *s = bs->opaque;
> +    uint32_t write_size = nb_sectors * 512;
> +    uint64_t offset = sector_num * 512;
> +    ret = hlfs_write(s->hctrl, (char *)buf, write_size, offset);
> +    if (ret != write_size) {
> +        return -EIO;
> +    }
> +    return 0;
> +}
> +
> +static int hlbs_read(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
> +        int nb_sectors)
> +{
> +    int ret;
> +    BDRVHLBSState *s = bs->opaque;
> +    uint32_t read_size = nb_sectors * 512;
> +    uint64_t offset = sector_num * 512;
> +    ret = hlfs_read(s->hctrl, (char *)buf, read_size, offset);
> +    if (ret != read_size) {
> +        return -EIO;
> +    }
> +    return 0;
> +}
> +
> +static int hlbs_flush(BlockDriverState *bs)
> +{
> +    int ret;
> +    BDRVHLBSState *s = bs->opaque;
> +    ret = hlfs_flush(s->hctrl);
> +    return ret;
> +}

read/write/flush should be either .bdrv_co_* or .bdrv_aio_*.

The current code pauses the guest while I/O is in progress!  Try running
disk I/O benchmarks inside the guest and you'll see that performance and
interactivity are poor.

QEMU has two ways of implementing efficient block drivers:

1. Coroutines - .bdrv_co_*

Good for image formats or block drivers that have internal logic.

Each request runs inside a coroutine - see include/block/coroutine.h.
In order to wait for I/O, submit an asynchronous request or worker
thread and yield the coroutine.  When the request completes, re-enter
the coroutine and continue.

Examples: block/sheepdog.c or block/qcow2.c.

2. Asynchronous I/O - .bdrv_aio_*

Good for low-level block drivers that have little logic.

The request processing code is split into callbacks.  I/O requests are
submitted and then the code returns back to QEMU's main loop.  When the
I/O request completes, a callback is invoked.

Examples: block/rbd.c or block/qed.c.

> +static int hlbs_snapshot_delete(BlockDriverState *bs, const char *snapshot)
> +{
> +    /* FIXME: Delete specified snapshot id.  */

Outdated comment?

> +    BDRVHLBSState *s = bs->opaque;
> +    int ret = 0;
> +    ret = hlfs_rm_snapshot(s->hctrl->storage->uri, snapshot);
> +    return ret;
> +}
> +
> +static int hlbs_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
> +{
> +    BDRVHLBSState *s = bs->opaque;
> +    /*Fixed: num_entries must be inited 0*/

Comment can be dropped?

> +    int num_entries = 0;
> +    struct snapshot *snapshots = hlfs_get_all_snapshots(s->hctrl->storage->uri,

I suggest using "hlfs_" as a prefix for libhlfs structs.  Otherwise you
risk name collisions.

> +        &num_entries);
> +    dprintf("snapshot count:%d\n", num_entries);
> +    /*Fixed: snapshots is NULL condition*/

Comment can be dropped?

> +    if (NULL == snapshots) {
> +        dprintf("snapshots is NULL, may be no snapshots, check it please!\n");
> +        return num_entries;
> +    }
> +    QEMUSnapshotInfo *sn_tab = NULL;
> +    struct snapshot *snapshot = snapshots;
> +    sn_tab = g_malloc0(num_entries * sizeof(*sn_tab));
> +    int i;
> +    for (i = 0; i < num_entries; i++) {
> +        printf("---snapshot:%s----", snapshot->sname);

Debugging code?

> +        sn_tab[i].date_sec = snapshot->timestamp * 1000;
> +        sn_tab[i].date_nsec = 0;
> +        sn_tab[i].vm_state_size = 0;
> +        sn_tab[i].vm_clock_nsec = 0;
> +        snprintf(sn_tab[i].id_str, sizeof(sn_tab[i].id_str), "%u", 0);
> +        strncpy(sn_tab[i].name, snapshot->sname, MIN(sizeof(sn_tab[i].name),
> +                    strlen(snapshot->sname)));

Please use snprintf() instead.  strncpy() does not NUL-terminate when
strlen(input) >= bufsize.

> +        snapshot++;
> +    }
> +    *psn_tab = sn_tab;
> +    if (snapshots != NULL) {
> +        g_free(snapshots);
> +    }

g_free(NULL) is a nop, the if is not needed.

> +    return num_entries;
> +}
> +
> +static QEMUOptionParameter hlbs_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    }, {
> +        .name = BLOCK_OPT_BACKING_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a base image"
> +    }, {
> +        .name = BLOCK_OPT_PREALLOC,
> +        .type = OPT_STRING,
> +        .help = "Preallocation mode (allowed values: off, full)"
> +    }, { NULL }
> +};
> +
> +static BlockDriver bdrv_hlbs = {
> +    .format_name    = "hlfs",
> +    .protocol_name  = "hlfs",
> +    .instance_size  = sizeof(BDRVHLBSState),
> +    .bdrv_file_open = hlbs_open,
> +    .bdrv_close     = hlbs_close,
> +    .bdrv_create    = hlbs_create,
> +    .bdrv_getlength = hlbs_getlength,
> +    .bdrv_read  = hlbs_read,
> +    .bdrv_write = hlbs_write,

.bdrv_flush is missing.

> +    .bdrv_get_allocated_file_size  = hlbs_get_allocated_file_size,
> +    .bdrv_snapshot_create   = hlbs_snapshot_create,
> +    .bdrv_snapshot_goto     = hlbs_snapshot_goto,
> +    .bdrv_snapshot_delete   = hlbs_snapshot_delete,
> +    .bdrv_snapshot_list     = hlbs_snapshot_list,
> +    .create_options = hlbs_create_options,
> +};
> +
> +static void bdrv_hlbs_init(void)
> +{
> +    if (log4c_init()) {
> +        g_message("log4c_init error!");
> +    }

This seems to be for libhlfs since the QEMU code doesn't use log4c.  If
it's safe to call this function more than once, please put it into
libhlfs instead of requiring the program that links to your library to
call it for you.

> @@ -2813,6 +2820,45 @@ if compile_prog "" "" ; then
>  fi
>  
>  ##########################################
> +# hlfs probe
> +if test "$hlfs" != "no" ; then
> +    cat > $TMPC <<EOF
> +#include <stdio.h>
> +#include <api/hlfs.h>
> +    int main(void) {
> +            return 0;
> +    }
> +EOF
> +    INC_GLIB_DIR=`pkg-config --cflags glib-2.0`

QEMU uses gthread-2.0 instead of the single-threaded glib-2.0.

You can reuse glib_cflags instead of probing again here.

> +    HLFS_DIR=`echo $HLFS_INSTALL_PATH`
> +    JVM_DIR=`echo $JAVA_HOME`
> +
> +    if [ `getconf LONG_BIT` -eq "64" ];then
> +    CLIBS="-L$JVM_DIR/jre/lib/amd64/server/ $CLIBS"
> +    fi
> +    if [ `getconf LONG_BIT` -eq "32" ];then
> +    CLIBS="-L$JVM_DIR/jre/lib/i386/server/ $CLIBS"
> +    fi

Hopefully there is a cleaner and portable way to do this (e.g.
pkg-config).

> +    CLIBS="-L$HLFS_DIR/lib/ $CLIBS"
> +    CFLAGS="-I$HLFS_DIR/include $CFLAGS"
> +    CFLAGS="$INC_GLIB_DIR $CFLAGS"
> +
> +    hlfs_libs="$CLIBS -lhlfs -llog4c -lglib-2.0 -lgthread-2.0 -lrt -lhdfs -ljvm"

libhlfs should support pkg-config so that QEMU does not need to know
about -llog4c -lhdfs -ljvm.

> +    echo "999 CLIBS is $CLIBS\n"
> +    echo "999 CFLAGS is $CFLAGS\n"

Debugging?

  reply	other threads:[~2013-03-18 11:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-18  6:47 [Qemu-devel] [PATCH] HLFS driver for QEMU harryxiyou
2013-03-18 11:10 ` Stefan Hajnoczi [this message]
2013-03-18 13:36   ` harryxiyou
2013-03-18 15:16     ` Stefan Hajnoczi
2013-03-20  8:43       ` harryxiyou
2013-03-20 10:04         ` Stefan Hajnoczi
2013-03-20 11:17           ` harryxiyou
2013-03-28  7:36   ` harryxiyou
2013-03-28 13:18     ` Stefan Hajnoczi
  -- strict thread matches above, loose matches on Subject: below --
2013-03-18  6:39 harryxiyou
2013-03-16 14:04 harryxiyou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130318111028.GC11058@stefanha-thinkpad.redhat.com \
    --to=stefanha@gmail.com \
    --cc=cloudxy@googlegroups.com \
    --cc=harryxiyou@gmail.com \
    --cc=kanghua151@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).