From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59000) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UHXxc-0001wz-8F for qemu-devel@nongnu.org; Mon, 18 Mar 2013 07:10:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UHXxY-0002dp-QQ for qemu-devel@nongnu.org; Mon, 18 Mar 2013 07:10:36 -0400 Received: from mail-wg0-f47.google.com ([74.125.82.47]:48705) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UHXxY-0002df-HB for qemu-devel@nongnu.org; Mon, 18 Mar 2013 07:10:32 -0400 Received: by mail-wg0-f47.google.com with SMTP id dr13so4725289wgb.2 for ; Mon, 18 Mar 2013 04:10:31 -0700 (PDT) Date: Mon, 18 Mar 2013 12:10:28 +0100 From: Stefan Hajnoczi Message-ID: <20130318111028.GC11058@stefanha-thinkpad.redhat.com> References: <1363589264-31606-1-git-send-email-harryxiyou@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363589264-31606-1-git-send-email-harryxiyou@gmail.com> Subject: Re: [Qemu-devel] [PATCH] HLFS driver for QEMU List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: harryxiyou@gmail.com Cc: kwolf@redhat.com, cloudxy@googlegroups.com, kanghua151@gmail.com, qemu-devel@nongnu.org, stefanha@redhat.com On Mon, Mar 18, 2013 at 02:47:44PM +0800, harryxiyou@gmail.com wrote: > From: Harry Wei > > HLFS is HDFS-based(Hadoop Distributed File System) Log-Structured File > System. Actually, HLFS, currently, is not a FS but a block-storage system, > which we simplify LFS to fit block-level storage. So you could also call HLFS > as HLBS (HDFS-based Log-Structured Block-storage System). HLFS has > two mode, which are local mode and HDFS mode. HDFS is once write and > many read so HLFS realize LBS(Log-Structured Block-storage System) to > achieve reading and writing randomly. LBS is based on LFS's basic theories > but is different from LFS, which LBS fits block storage better. See > http://code.google.com/p/cloudxy/wiki/WHAT_IS_CLOUDXY > about HLFS in details. > > Currently, HLFS support following features: > > 1, Portions of POSIX --- Just realize some interfaces VM image need. > 2, Randomly Read/Write. > 3, Large file storage (TB). > 4, Support snapshots(Linear snapshots and tree snapshots), Clone, > Block compression, Cache, etc. > 5, A copy of the data more. > 6, Cluster system can dynamic expand. > ... > > Signed-off-by: Harry Wei Is HLFS making releases that distros can package? I don't see packages in Debian or Fedora. Block drivers in qemu.git should have active and sustainable communities behind them. For example, if this is a research project where the team will move on within a year, then it may not be appropriate to merge the code into QEMU. Can you share some background on the HLFS community and where the project is heading? > --- > block/Makefile.objs | 2 +- > block/hlfs.c | 515 +++++++++++++++++++++++++++++++++++++++++++++++++++ > configure | 51 +++++ > 3 files changed, 567 insertions(+), 1 deletion(-) > create mode 100644 block/hlfs.c > > diff --git a/block/Makefile.objs b/block/Makefile.objs > index c067f38..723c7a5 100644 > --- a/block/Makefile.objs > +++ b/block/Makefile.objs > @@ -8,7 +8,7 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o > block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o > > ifeq ($(CONFIG_POSIX),y) > -block-obj-y += nbd.o sheepdog.o > +block-obj-y += nbd.o sheepdog.o hlfs.o Missing CONFIG_HLFS to compile out hlfs.o. > +static int hlbs_write(BlockDriverState *bs, int64_t sector_num, const uint8_t > + *buf, int nb_sectors) > +{ > + int ret; > + BDRVHLBSState *s = bs->opaque; > + uint32_t write_size = nb_sectors * 512; > + uint64_t offset = sector_num * 512; > + ret = hlfs_write(s->hctrl, (char *)buf, write_size, offset); > + if (ret != write_size) { > + return -EIO; > + } > + return 0; > +} > + > +static int hlbs_read(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, > + int nb_sectors) > +{ > + int ret; > + BDRVHLBSState *s = bs->opaque; > + uint32_t read_size = nb_sectors * 512; > + uint64_t offset = sector_num * 512; > + ret = hlfs_read(s->hctrl, (char *)buf, read_size, offset); > + if (ret != read_size) { > + return -EIO; > + } > + return 0; > +} > + > +static int hlbs_flush(BlockDriverState *bs) > +{ > + int ret; > + BDRVHLBSState *s = bs->opaque; > + ret = hlfs_flush(s->hctrl); > + return ret; > +} read/write/flush should be either .bdrv_co_* or .bdrv_aio_*. The current code pauses the guest while I/O is in progress! Try running disk I/O benchmarks inside the guest and you'll see that performance and interactivity are poor. QEMU has two ways of implementing efficient block drivers: 1. Coroutines - .bdrv_co_* Good for image formats or block drivers that have internal logic. Each request runs inside a coroutine - see include/block/coroutine.h. In order to wait for I/O, submit an asynchronous request or worker thread and yield the coroutine. When the request completes, re-enter the coroutine and continue. Examples: block/sheepdog.c or block/qcow2.c. 2. Asynchronous I/O - .bdrv_aio_* Good for low-level block drivers that have little logic. The request processing code is split into callbacks. I/O requests are submitted and then the code returns back to QEMU's main loop. When the I/O request completes, a callback is invoked. Examples: block/rbd.c or block/qed.c. > +static int hlbs_snapshot_delete(BlockDriverState *bs, const char *snapshot) > +{ > + /* FIXME: Delete specified snapshot id. */ Outdated comment? > + BDRVHLBSState *s = bs->opaque; > + int ret = 0; > + ret = hlfs_rm_snapshot(s->hctrl->storage->uri, snapshot); > + return ret; > +} > + > +static int hlbs_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab) > +{ > + BDRVHLBSState *s = bs->opaque; > + /*Fixed: num_entries must be inited 0*/ Comment can be dropped? > + int num_entries = 0; > + struct snapshot *snapshots = hlfs_get_all_snapshots(s->hctrl->storage->uri, I suggest using "hlfs_" as a prefix for libhlfs structs. Otherwise you risk name collisions. > + &num_entries); > + dprintf("snapshot count:%d\n", num_entries); > + /*Fixed: snapshots is NULL condition*/ Comment can be dropped? > + if (NULL == snapshots) { > + dprintf("snapshots is NULL, may be no snapshots, check it please!\n"); > + return num_entries; > + } > + QEMUSnapshotInfo *sn_tab = NULL; > + struct snapshot *snapshot = snapshots; > + sn_tab = g_malloc0(num_entries * sizeof(*sn_tab)); > + int i; > + for (i = 0; i < num_entries; i++) { > + printf("---snapshot:%s----", snapshot->sname); Debugging code? > + sn_tab[i].date_sec = snapshot->timestamp * 1000; > + sn_tab[i].date_nsec = 0; > + sn_tab[i].vm_state_size = 0; > + sn_tab[i].vm_clock_nsec = 0; > + snprintf(sn_tab[i].id_str, sizeof(sn_tab[i].id_str), "%u", 0); > + strncpy(sn_tab[i].name, snapshot->sname, MIN(sizeof(sn_tab[i].name), > + strlen(snapshot->sname))); Please use snprintf() instead. strncpy() does not NUL-terminate when strlen(input) >= bufsize. > + snapshot++; > + } > + *psn_tab = sn_tab; > + if (snapshots != NULL) { > + g_free(snapshots); > + } g_free(NULL) is a nop, the if is not needed. > + return num_entries; > +} > + > +static QEMUOptionParameter hlbs_create_options[] = { > + { > + .name = BLOCK_OPT_SIZE, > + .type = OPT_SIZE, > + .help = "Virtual disk size" > + }, { > + .name = BLOCK_OPT_BACKING_FILE, > + .type = OPT_STRING, > + .help = "File name of a base image" > + }, { > + .name = BLOCK_OPT_PREALLOC, > + .type = OPT_STRING, > + .help = "Preallocation mode (allowed values: off, full)" > + }, { NULL } > +}; > + > +static BlockDriver bdrv_hlbs = { > + .format_name = "hlfs", > + .protocol_name = "hlfs", > + .instance_size = sizeof(BDRVHLBSState), > + .bdrv_file_open = hlbs_open, > + .bdrv_close = hlbs_close, > + .bdrv_create = hlbs_create, > + .bdrv_getlength = hlbs_getlength, > + .bdrv_read = hlbs_read, > + .bdrv_write = hlbs_write, .bdrv_flush is missing. > + .bdrv_get_allocated_file_size = hlbs_get_allocated_file_size, > + .bdrv_snapshot_create = hlbs_snapshot_create, > + .bdrv_snapshot_goto = hlbs_snapshot_goto, > + .bdrv_snapshot_delete = hlbs_snapshot_delete, > + .bdrv_snapshot_list = hlbs_snapshot_list, > + .create_options = hlbs_create_options, > +}; > + > +static void bdrv_hlbs_init(void) > +{ > + if (log4c_init()) { > + g_message("log4c_init error!"); > + } This seems to be for libhlfs since the QEMU code doesn't use log4c. If it's safe to call this function more than once, please put it into libhlfs instead of requiring the program that links to your library to call it for you. > @@ -2813,6 +2820,45 @@ if compile_prog "" "" ; then > fi > > ########################################## > +# hlfs probe > +if test "$hlfs" != "no" ; then > + cat > $TMPC < +#include > +#include > + int main(void) { > + return 0; > + } > +EOF > + INC_GLIB_DIR=`pkg-config --cflags glib-2.0` QEMU uses gthread-2.0 instead of the single-threaded glib-2.0. You can reuse glib_cflags instead of probing again here. > + HLFS_DIR=`echo $HLFS_INSTALL_PATH` > + JVM_DIR=`echo $JAVA_HOME` > + > + if [ `getconf LONG_BIT` -eq "64" ];then > + CLIBS="-L$JVM_DIR/jre/lib/amd64/server/ $CLIBS" > + fi > + if [ `getconf LONG_BIT` -eq "32" ];then > + CLIBS="-L$JVM_DIR/jre/lib/i386/server/ $CLIBS" > + fi Hopefully there is a cleaner and portable way to do this (e.g. pkg-config). > + CLIBS="-L$HLFS_DIR/lib/ $CLIBS" > + CFLAGS="-I$HLFS_DIR/include $CFLAGS" > + CFLAGS="$INC_GLIB_DIR $CFLAGS" > + > + hlfs_libs="$CLIBS -lhlfs -llog4c -lglib-2.0 -lgthread-2.0 -lrt -lhdfs -ljvm" libhlfs should support pkg-config so that QEMU does not need to know about -llog4c -lhdfs -ljvm. > + echo "999 CLIBS is $CLIBS\n" > + echo "999 CFLAGS is $CFLAGS\n" Debugging?