From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56044) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bzVpM-0005pF-7L for qemu-devel@nongnu.org; Wed, 26 Oct 2016 17:33:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bzVpI-0005E1-RL for qemu-devel@nongnu.org; Wed, 26 Oct 2016 17:33:40 -0400 Received: from mail1.bemta12.messagelabs.com ([216.82.251.8]:49365) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bzVpI-0005D3-Gs for qemu-devel@nongnu.org; Wed, 26 Oct 2016 17:33:36 -0400 From: Buddhi Madhav Date: Wed, 26 Oct 2016 21:33:30 +0000 Message-ID: References: <1477432927-5451-1-git-send-email-ashish.mittal@veritas.com> <20161026044128.GA2677@localhost.localdomain> In-Reply-To: <20161026044128.GA2677@localhost.localdomain> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-ID: <6DAC40738E296D4A9A868AD3493A9513@veritas.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH v1] block/vxhs: Add Veritas HyperScale VxHS block device support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jeff Cody , Ashish Mittal Cc: "qemu-devel@nongnu.org" , "pbonzini@redhat.com" , "kwolf@redhat.com" , "armbru@redhat.com" , "berrange@redhat.com" , "famz@redhat.com" , Ashish Mittal , "stefanha@gmail.com" , Rakesh Ranjan , Ketan Nilangekar , Abhijit Dey On 10/25/16, 9:41 PM, "Jeff Cody" wrote: >On Tue, Oct 25, 2016 at 03:02:07PM -0700, Ashish Mittal wrote: >> This patch adds support for a new block device type called "vxhs". >> Source code for the library that this code loads can be downloaded from= : >> https://github.com/MittalAshish/libqnio.git >> > >I grabbed the latest of libqnio, compiled it (had to disable -Werror), an= d >tried it out. I was able to do a qemu-img info on a raw file, but it >would >just hang when trying a format such as qcow2. I am assuming >this is a limitation of test_server, and not libqnio. On my build I did not get any build errors. The qcow2 issue is to do with the limitation in the test server, which we will fix in a seperate patch. > >This will make qemu-iotests more difficult however. > >I haven't looked at the latest qnio code yet (other than compiling the >test-server to test), so the rest of this review is on the qemu driver. > >> Sample command line using JSON syntax: >> ./qemu-system-x86_64 -name=20instance-00000008 -S -vnc 0.0.0.0:0 -k en-= us >>-vga cirrus -device virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D= 0x5 >>-msg timestamp=3Don >>'json:{"driver":"vxhs","vdisk_id":"{c3e9095a-a5ee-4dce-afeb-2a59fb387410= } >>","server":[{"host":"172.172.17.4","port":"9999"}]}' >>=20 >> Sample command line using URI syntax: >> qemu-img convert -f raw -O raw -n >>/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad >>vxhs://192.168.0.1:9999/%7Bc6718f6b-0401-441d-a8c3-1f0064d75ee0%7D >>=20 >> Signed-off-by: Ashish Mittal >> --- >> block/Makefile.objs | 2 + >> block/trace-events | 22 ++ >> block/vxhs.c | 736 >>++++++++++++++++++++++++++++++++++++++++++++++++++++ >> configure | 41 +++jj >> 4 files changed, 801 insertions(+) >> create mode 100644 block/vxhs.c > >I think this version still does not address Daniel's concerns regarding a= >QAPI schema for vxhs. We are working on QAPI schema changes and will submit them in separate patch. > >We are also still needing qemu-iotests, and a test-server suitable to run= >the tests. > >>=20 >> diff --git a/block/Makefile.objs b/block/Makefile.objs >> index 67a036a..58313a2 100644 >> --- a/block/Makefile.objs >> +++ b/block/Makefile.objs >> @@ -18,6 +18,7 @@ block-obj-$(CONFIG_LIBNFS) +=3D nfs.o >> block-obj-$(CONFIG_CURL) +=3D curl.o >> block-obj-$(CONFIG_RBD) +=3D rbd.o >> block-obj-$(CONFIG_GLUSTERFS) +=3D gluster.o >> +block-obj-$(CONFIG_VXHS) +=3D vxhs.o >> block-obj-$(CONFIG_ARCHIPELAGO) +=3D archipelago.o >> block-obj-$(CONFIG_LIBSSH2) +=3D ssh.o >> block-obj-y +=3D accounting.o dirty-bitmap.o >> @@ -38,6 +39,7 @@ rbd.o-cflags :=3D $(RBD_CFLAGS) >> rbd.o-libs :=3D $(RBD_LIBS) >> gluster.o-cflags :=3D $(GLUSTERFS_CFLAGS) >> gluster.o-libs :=3D $(GLUSTERFS_LIBS) >> +vxhs.o-libs :=3D $(VXHS_LIBS) >> ssh.o-cflags :=3D $(LIBSSH2_CFLAGS) >> ssh.o-libs :=3D $(LIBSSH2_LIBS) >> archipelago.o-libs :=3D $(ARCHIPELAGO_LIBS) >> diff --git a/block/trace-events b/block/trace-events >> index 05fa13c..aea97cb=20100644 >> --- a/block/trace-events >> +++ b/block/trace-events >> @@ -114,3 +114,25 @@ qed_aio_write_data(void *s, void *acb, int ret, >>uint64_t offset, size_t len) "s >> qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, >>uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64 >> qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len,= >>uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64 >> qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, >>size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu" >> + >> +# block/vxhs.c >> +vxhs_iio_callback(int error, int reason) "ctx is NULL: error %d, >>reason %d" >> +vxhs_setup_qnio(void *s) "Context to HyperScale IO manager =3D %p" >> +vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no= >>i/o %d, %d" >> +vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, >>errno %d" >> +vxhs_open_fail(int ret) "Could not open the device. Error =3D %d" >> +vxhs_open_epipe(int ret) "Could not create a pipe for device. Bailing >>out. Error=3D%d" >> +vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d" >> +vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, >>void *acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write =3D %= d >>size =3D %lu offset =3D %lu ACB =3D %p. Error =3D %d, errno =3D %d" >> +vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat= >>ioctl failed, ret =3D %d, errno =3D %d" >> +vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s >>stat ioctl returned size %lu" >> +vxhs_qnio_iio_open(const char *ip) "Failed to connect to storage agent= >>on host-ip %s" >> +vxhs_qnio_iio_devopen(const char *fname) "Failed to open vdisk device:= >>%s" >> +vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %ld"= >> +vxhs_parse_uri_filename(const char *filename) "URI passed via >>bdrv_parse_filename %s" >> +vxhs_qemu_init_vdisk(const char *vdisk_id) "vdisk_id from json %s" >> +vxhs_qemu_init_numservers(int num_servers) "Number of servers passed =3D= >>%d" >> +vxhs_parse_uri_hostinfo(int num, char *host, int port) "Host %d: IP >>%s, Port %d" >> +vxhs_qemu_init(char *of_vsa_addr, int port) "Adding host %s:%d to >>BDRVVXHSState" >> +vxhs_qemu_init_filename(const char *filename) "Filename passed as %s" >> +vxhs_close(char *vdisk_guid) "Closing vdisk %s" >> diff --git a/block/vxhs.c b/block/vxhs.c >> new file mode 100644 >> index 0000000..97fb804 >> --- /dev/null >> +++ b/block/vxhs.c >> @@ -0,0 +1,736 @@ >> +/* >> + * QEMU Block driver for Veritas HyperScale (VxHS) >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or >>later. >> + * See the COPYING file in the top-level directory. >> + * >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "block/block_int.h" >> +#include >> +#include "qapi/qmp/qerror.h" >> +#include "qapi/qmp/qdict.h" >> +#include "qapi/qmp/qstring.h" >> +#include "trace.h" >> +#include "qemu/uri.h" >> +#include "qapi/error.h" >> +#include "qemu/error-report.h" >>=20+ >> +#define VDISK_FD_READ 0 >> +#define VDISK_FD_WRITE 1 >> + >> +#define VXHS_OPT_FILENAME "filename" >> +#define VXHS_OPT_VDISK_ID "vdisk_id" >> +#define VXHS_OPT_SERVER "server." >> +#define VXHS_OPT_HOST "host" >> +#define VXHS_OPT_PORT "port" >> + >> +/* qnio client ioapi_ctx */ >> +static void *global_qnio_ctx; >> + > > >Why create a qnio ctx as a static global, rather than confining it to >BDRVVXHSState, so that it is unique to each disk instance? > >Is the following usage supported (i.e. two different vxhs remote servers)= : > >qemu-system-x86_64 -drive file=3Dvxhs://server1:9999/test1 \ > -drive file=3Dvxhs://server2:9999/test2 =B3global_qnio_ctx=B2 is a global context, not per-disk context. Removed t= he qnio_ctx field >>From =B3BDRVVXHSState=B2 structure. The above command should work. > > >Why store the qnio ctx into this global briefly, and just to move it to >where it belongs in the BDRVVXHSState struct?=20 It makes more sense to b= e >in >the state struct the whole time, unless I am missing something fundamenta= l >about the protocol. > > >> +/* vdisk prefix to pass to qnio */ >> +static const char vdisk_prefix[] =3D "/dev/of/vdisk"; >> + >> +typedef enum { >> + VXHS_IO_INPROGRESS, >> + VXHS_IO_COMPLETED, >> + VXHS_IO_ERROR >> +} VXHSIOState; >> + >> +typedef enum { >> + VDISK_AIO_READ, >> + VDISK_AIO_WRITE, >> + VDISK_STAT >> +} VDISKAIOCmd; >> + >> +/* >> + * HyperScale AIO callbacks structure >> + */ >> +typedef struct VXHSAIOCB { >> + BlockAIOCB common; >> + int err; >> + int state; >> + int direction; /* IO direction (r/w) */ >> + size_t io_offset; >> + size_t size; >> + QEMUBH *bh; > >*bh is unused. Removed the unused variable. > >> + QEMUIOVector *qiov; >> + QSIMPLEQ_ENTRY(VXHSAIOCB) retry_entry; > >retry_entry is unused. Removed the unused variable. > >> +} VXHSAIOCB; >> + >> +typedef struct VXHSvDiskHostsInfo { >> + int qnio_cfd; /* Channel FD */ >> + int vdisk_rfd; /* vDisk remote FD */ >> + char *hostip; /* Host's IP addresses */ >> + int port; /* Host's port number */ >> +} VXHSvDiskHostsInfo; >> + >> +/* >> + * Structure per vDisk maintained for state >> + */ >> +typedef struct BDRVVXHSState { >> + int fds[2]; >> + int64_t vdisk_size; >> + int64_t vdisk_blocks; > >vdisk_blocks is unused. Removed the unused variable. > >> + int event_reader_pos; >> + VXHSAIOCB *qnio_event_acb; >> + void *qnio_ctx; >> + VXHSvDiskHostsInfo vdisk_hostinfo; /* Per host info */ >> + char *vdisk_guid; >> +} BDRVVXHSState; >> + >> +static void vxhs_qnio_iio_close(BDRVVXHSState *s) >> +{ >> + /* >> + * Close vDisk device >> + */ >> + if (s->vdisk_hostinfo.vdisk_rfd >=3D 0) { >> + iio_devclose(s->qnio_ctx, 0, s->vdisk_hostinfo.vdisk_rfd); >> + s->vdisk_hostinfo.vdisk_rfd =3D -1; >> + } >> + >> + /* >> + * Close QNIO channel against cached channel-fd >> + */ >> + if (s->vdisk_hostinfo.qnio_cfd >=3D 0) { >> + =20 iio_close(s->qnio_ctx, s->vdisk_hostinfo.qnio_cfd); >> + s->vdisk_hostinfo.qnio_cfd =3D -1; >> + } >> +} >> + >> +static int vxhs_qnio_iio_open(int *cfd, const char *of_vsa_addr, >> + int *rfd, const char *file_name) >> +{ >> + /* >> + * Open qnio channel to storage agent if not opened before. >> + */ >> + if (*cfd < 0) { >> + *cfd =3D iio_open(global_qnio_ctx, of_vsa_addr, 0); >> + if (*cfd < 0) { >> + trace_vxhs_qnio_iio_open(of_vsa_addr); >> + return -ENODEV; >> + } >> + } >> + >> + /* >> + * Open vdisk device >> + */ >> + *rfd =3D iio_devopen(global_qnio_ctx, *cfd, file_name, 0); >> + if (*rfd < 0) { >> + if (*cfd >=3D 0) { >> + iio_close(global_qnio_ctx, *cfd); >> + *cfd =3D -1; >> + *rfd =3D -1; >> + } >> + >> + trace_vxhs_qnio_iio_devopen(file_name); >> + return -ENODEV; >> + } >> + >> + return 0; >> +} >> + >> +static void vxhs_iio_callback(int32_t rfd, uint32_t reason, void *ctx,= >> + uint32_t error, uint32_t opcode) >> +{ >> + VXHSAIOCB *acb =3D NULL; >> + BDRVVXHSState *s =3D NULL; >> + ssize_t ret; >> + >> + switch (opcode) { >> + case IRP_READ_REQUEST: >> + case IRP_WRITE_REQUEST: >> + >> + /* >> + * ctx is VXHSAIOCB* >> + * ctx is NULL if error is QNIOERROR_CHANNEL_HUP or >> + * reason is IIO_REASON_HUP >> + */ >> + if (ctx) { >> + acb =3D ctx; >> + s =3D acb->common.bs->opaque; >> + } else { >> + trace_vxhs_iio_callback(error, reason); >> + goto out; >> + } >> + >> + if (error) { >> + if (!acb->err) { >> + acb->err =3D error; >> + } >> + trace_vxhs_iio_callback(error, reason); >> + } >> + >> + ret =3D qemu_write_full(s->fds[VDISK_FD_WRITE], &acb, >>sizeof(acb)); >> + g_assert(ret =3D=3D sizeof(acb)); >> + break; >> + >> + default: >> + if (error =3D=3D QNIOERROR_CHANNEL_HUP) { >> + /* >> + * Channel failed, spontaneous notification, >> + * not in response to I/O >> + */ >> + trace_vxhs_iio_callback_chnfail(error, errno); >> + } else { >> + trace_vxhs_iio_callback_unknwn(opcode, error); >> + } >> + break; >> + } >> +out: >> + return; >> +} >> + >> +static void vxhs_complete_aio(VXHSAIOCB *acb, BDRVVXHSState *s) >> +{ >> + BlockCompletionFunc *cb =3D acb->common.cb; >> + void *opaque =3D acb->common.opaque; >> + int ret =3D 0; >> + >> + if (acb->err !=3D 0) { >> + trace_vxhs_complete_aio(acb, acb->err); >> + /* >> + * We mask all the IO errors generically as EIO for upper >>layers >> + * Right now our IO Manager uses non standard error codes. >>Instead >> + * of confusing upper layers with incorrect interpretation we >>are >> + * doing this workaround. >> + */ >> + ret =3D (-EIO); >> + } >> + >> + acb->state =3D VXHS_IO_COMPLETED; > >What is this state used for? I see it also set to VXHS_IO_INPROGRESS at >the >beginning of the request. This was useful when failover code was implemented. It is no longer used. So removed the =B3state=B2 filed. > >> + qemu_aio_unref(acb); >> + cb(opaque, ret); >> +} >> + >> +/* >> + * This is the HyperScale event handler registered to QEMU. >> + * It is invoked when any IO gets completed and written on pipe >> + * by callback called from QNIO thread context. Then it marks >> + * the AIO as completed, and releases HyperScale AIO callbacks. >> + */ >> +static void vxhs_aio_event_reader(void *opaque) >> +{ >> + BDRVVXHSState *s =3D opaque; >> + char *p; >> + ssize_t ret; >> + >> + do { >> + p =3D (char *)&s->qnio_event_acb; >> + ret =3D read(s->fds[VDISK_FD_READ], p + s->event_reader_pos, >> + sizeof(s->qnio_event_acb) - s->event_reader_pos); >> + if (ret > 0) { >> + =20 s->event_reader_pos +=3D ret; >> + if (s->event_reader_pos =3D=3D sizeof(s->qnio_event_acb)) = { >> + s->event_reader_pos =3D 0; >> + vxhs_complete_aio(s->qnio_event_acb, s); >> + } >> + } >> + } while (ret < 0 && errno =3D=3D EINTR); > >I'm bit confused on this event reader loop. If ret is > 0, but >s->event_reader_pos !=3D sizeof(s->qnio_event_acb), we'll exit the event >reader leaving event_read_pos non-zero. And since we are reading in a >pointer for the acb, what does that mean for the next time we come we hav= e >an aio reader event? On the next reader event, it will read the rest of =B3acb=B2 structure(fro= m s->event_reader_pos), and calls vxhs_complete_aio(). > >> +} >> + >> +/* >> + * Call QNIO operation to create channels to do IO on vDisk. >> + */ >> + >> +static void *vxhs_setup_qnio(void) >> +{ >> + void *qnio_ctx =3D NULL; >> + >> + qnio_ctx =3D iio_init(vxhs_iio_callback); >> + if (qnio_ctx !=3D NULL) { >> + trace_vxhs_setup_qnio(qnio_ctx); >> + } >> + return qnio_ctx; >> +} >> + >> +static QemuOptsList runtime_opts =3D { >> + .name =3D "vxhs", >> + .head =3D QTAILQ_HEAD_INITIALIZER(runtime_opts.head), >> + .desc =3D { >> + { >> + .name =3D VXHS_OPT_FILENAME, >> + .type =3D QEMU_OPT_STRING, >> + .help =3D "URI to the Veritas HyperScale image", >> + }, >> + { >> + .name =3D VXHS_OPT_VDISK_ID, >> + .type =3D QEMU_OPT_STRING, >> + .help =3D "UUID of the VxHS vdisk", >> + }, >> + { /* end of list */ } >> + }, >> +}; >> + >> +static QemuOptsList runtime_tcp_opts =3D { >> + .name =3D "vxhs_tcp", >> + .head =3D QTAILQ_HEAD_INITIALIZER(runtime_tcp_opts.head), >> + .desc =3D { >> + { >> + .name =3D VXHS_OPT_HOST, >> + .type =3D QEMU_OPT_STRING, >> + .help =3D "host address (ipv4 addresses)", >> + }, >> + { >> + .name =3D VXHS_OPT_PORT, >> + .type =3D QEMU_OPT_NUMBER, >> + .help =3D "port number on which VxHSD is listening (defaul= t >>9999)", >> + .def_value_str =3D "9999" >> + }, >> + { /* end of list */ } >> + }, >> +}; >> + >> +/* >> + * Parse the incoming URI and populate *options with the host >>information. >> + * URI syntax has the limitation of supporting only one host info. >> + * To pass multiple host information, use the JSON syntax. >> + */ >> +static int vxhs_parse_uri(const char *filename, QDict *options) >> +{ >> + URI *uri =3D NULL; >> + char *hoststr, *portstr; >> + char *port; >> + int ret =3D 0; >> + >> + trace_vxhs_parse_uri_filename(filename); >> + uri =3D uri_parse(filename); >> + if (!uri || !uri->server || !uri->path) { >> + uri_free(uri); >> + return -EINVAL; >> + } >> + >> + hoststr =3D g_strdup(VXHS_OPT_SERVER"0.host"); >> + qdict_put(options, hoststr, qstring_from_str(uri->server)); >> + g_free(hoststr); >> + >> + portstr =3D g_strdup(VXHS_OPT_SERVER"0.port"); >> + if (uri->port) { >> + =20 port =3D g_strdup_printf("%d", uri->port); >> + qdict_put(options, portstr, qstring_from_str(port)); >> + g_free(port); >> + } >> + g_free(portstr); >> + >> + if (strstr(uri->path, "vxhs") =3D=3D NULL) { >> + qdict_put(options, "vdisk_id", qstring_from_str(uri->path)); >> + } >> + >> + trace_vxhs_parse_uri_hostinfo(1, uri->server, uri->port); >> + uri_free(uri); >> + >> + return ret; >> +} >> + >> +static void vxhs_parse_filename(const char *filename, QDict *options, >> + Error **errp) >> +{ >> + if (qdict_haskey(options, "vdisk_id") || qdict_haskey(options, >>"server")) { >> + error_setg(errp, "vdisk_id/server and a file name may not be >>specified " >> + "at the same time"); >> + return; >> + } >> + >> + if (strstr(filename, "://")) { >> + int ret =3D vxhs_parse_uri(filename, options); >> + if (ret < 0) { >> + error_setg(errp, "Invalid URI. URI should be of the form=20= " >> + " vxhs://:/{}"); >> + } >> + } >> +} >> + >> +static int vxhs_qemu_init(QDict *options, BDRVVXHSState *s, >> + int *cfd, int *rfd, Error **errp) >> +{ >> + QDict *backing_options =3D NULL; >> + QemuOpts *opts, *tcp_opts; >> + const char *vxhs_filename; >> + char *of_vsa_addr =3D NULL; >> + Error *local_err =3D NULL; >> + const char *vdisk_id_opt; >> + char *file_name =3D NULL; >> + size_t num_servers =3D 0; >> + char *str =3D NULL; >> + int ret =3D 0; >> + >> + opts =3D qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); >> + qemu_opts_absorb_qdict(opts, options, &local_err); >> + if (local_err) { >> + error_propagate(errp, local_err); > >You are already propagating this error in 'out'. Removed =B3error_propagate()=B2 call. > >> + ret =3D -EINVAL; >> + goto out; >> + } >> + >> + vxhs_filename =3D qemu_opt_get(opts, VXHS_OPT_FILENAME); >> + if (vxhs_filename) { >> + =20 trace_vxhs_qemu_init_filename(vxhs_filename); >> + } >> + >> + vdisk_id_opt =3D qemu_opt_get(opts, VXHS_OPT_VDISK_ID); >> + if (!vdisk_id_opt) { >> + error_setg(&local_err, QERR_MISSING_PARAMETER, >>VXHS_OPT_VDISK_ID); >> + ret =3D -EINVAL; >> + goto out; >> + } >> + s->vdisk_guid =3D g_strdup(vdisk_id_opt); >> + trace_vxhs_qemu_init_vdisk(vdisk_id_opt); >> + >> + num_servers =3D qdict_array_entries(options, VXHS_OPT_SERVER); >> + if (num_servers < 1) { >> + error_setg(&local_err, QERR_MISSING_PARAMETER, "server"); >> + ret =3D -EINVAL; >> + goto out; >> + } else if (num_servers > 1) { >> + error_setg(&local_err, QERR_INVALID_PARAMETER, "server"); >> + error_append_hint(errp, "Only one server allowed.\n"); > >s/errp/&local_err Replaced as suggested. > >> + ret =3D -EINVAL; >> + goto out; >> + } >> + trace_vxhs_qemu_init_numservers(num_servers); >> + >> + str =3D g_strdup_printf(VXHS_OPT_SERVER"0."); >> + qdict_extract_subqdict(options, &backing_options, str); >> + >> + /* Create opts info from runtime_tcp_opts list */ >> + tcp_opts =3D qemu_opts_create(&runtime_tcp_opts, NULL, 0, >>&error_abort); >> + qemu_opts_absorb_qdict(tcp_opts, backing_options, &local_err); >> + if (local_err) { >> + qdict_del(backing_options, str); >> + qemu_opts_del(tcp_opts); >> + g_free(str); > >Rather than free str here, you can just do it in the 'out'. (g_free is >NULL-safe) Moved g_free() call to =B3out=B2 label. > >> + ret =3D -EINVAL; >> + goto out; >> + } >> + >> + s->vdisk_hostinfo.hostip =3D g_strdup(qemu_opt_get(tcp_opts, >> + VXHS_OPT_HOST)); >> + s->vdisk_hostinfo.port =3D g_ascii_strtoll(qemu_opt_get(tcp_opts, >> + =20 >>VXHS_OPT_PORT), >> + NULL, 0); >> + >> + s->vdisk_hostinfo.qnio_cfd =3D -1; >> + s->vdisk_hostinfo.vdisk_rfd =3D -1; >> + trace_vxhs_qemu_init(s->vdisk_hostinfo.hostip, >> + s->vdisk_hostinfo.port); >> + >> + qdict_del(backing_options, str); >> + qemu_opts_del(tcp_opts); >> + g_free(str); > >Same here. Moved g_free() call to =B3out=B2 label. > >> + >> + file_name =3D g_strdup_printf("%s%s", vdisk_prefix, s->vdisk_guid)= ; >> + of_vsa_addr =3D g_strdup_printf("of://%s:%d", >> + s->vdisk_hostinfo.hostip, >> + s->vdisk_hostinfo.port); >> + >> + /* >> + * .bdrv_open() and .bdrv_create() run under the QEMU global mutex= . >> + */ >> + if (global_qnio_ctx =3D=3D NULL) { >> + global_qnio_ctx =3D vxhs_setup_qnio(); > >I assume this would prevent the scenario I described in the beginning >(two different >drives using two different vxhs servers) from working properly? Is that >by >design? > >If so, it needs to fail gracefully; I don't think the current >implementation >will. > > >But it seems like you should just be able to do >s->qnio_ctx =3D vxhs_setup_qnio() here, right? As mentioned before =B3qnio_ctx=B2 is global context, and gets initialized= on first disk open.=20 > > >> + if (global_qnio_ctx =3D=3D NULL) { >> + error_setg(&local_err, "Failed vxhs_setup_qnio"); >> + ret =3D -EINVAL; >> + goto out; >> + } >> + } >> + >> + ret =3D vxhs_qnio_iio_open(cfd, of_vsa_addr, rfd, file_name); > >This overwrites the host / port, and I presume other info, of other drive= s >in the case of another vxhs drive being opened. This seems wrong. > >> + if (ret) { >> + error_setg(&local_err, "Failed qnio_iio_open"); >> + ret =3D -EIO; >> + } >> + >> +out: >> + g_free(file_name); >> + g_free(of_vsa_addr); >> + qemu_opts_del(opts); >> + >> + if (ret < 0) { >> + g_free(s->vdisk_hostinfo.hostip); >> + g_free(s->vdisk_guid); >> + s->vdisk_guid =3D NULL; >> + errno =3D -ret; >> + } >> + error_propagate(errp, local_err); > >This can be moved up into the 'if' block above it. Moved the code as suggested. > >> + >> + return ret; >> +} >> + >> +static int vxhs_open(BlockDriverState *bs, QDict *options, >> + int bdrv_flags, Error **errp) >> +{ >> + BDRVVXHSState *s =3D bs->opaque; >> + AioContext *aio_context; >> + int qemu_qnio_cfd =3D -1; >> + int device_opened =3D 0; > >A bool makes more sense here. Modified the type to =B3bool=B2 as suggested. > >> + int qemu_rfd =3D -1; >> + int ret =3D 0; >> + >> + ret =3D vxhs_qemu_init(options, s, &qemu_qnio_cfd, &qemu_rfd, errp= ); >> + if (ret < 0) { >> + trace_vxhs_open_fail(ret); >> + return ret; >> + } >> + >> + device_opened =3D 1; >> + s->qnio_ctx =3D global_qnio_ctx; >> + s->vdisk_hostinfo.qnio_cfd =3D qemu_qnio_cfd; >> + s->vdisk_hostinfo.vdisk_rfd =3D qemu_rfd; >> + s->vdisk_size =3D 0; >> + >> + /* >> + * Create a pipe for communicating between two threads in differen= t >> + * context. Set handler for read event, which gets triggered when >> + * IO completion is done by non-QEMU context. >> + */ >> + ret =3D qemu_pipe(s->fds); >> + if (ret < 0) { >> + trace_vxhs_open_epipe(ret); >> + ret =3D -errno; >> + goto errout; >> + } >> + fcntl(s->fds[VDISK_FD_READ], F_SETFL, O_NONBLOCK); >> + >> + aio_context =3D bdrv_get_aio_context(bs); >> + aio_set_fd_handler(aio_context, s->fds[VDISK_FD_READ], >> + false, vxhs_aio_event_reader, NULL, s); >> + return 0; >> + >> +errout: >> + /* >> + * Close remote vDisk device if it was opened earlier >> + */ >> + if (device_opened) { >> + vxhs_qnio_iio_close(s); >> + } >> + trace_vxhs_open_fail(ret); >> + return ret; >> +} >> + >> +static const AIOCBInfo vxhs_aiocb_info =3D { >> + .aiocb_size =3D sizeof(VXHSAIOCB) >> +}; >> + >> +/* >> + * This allocates QEMU-VXHS callback for each IO >> + * and is passed to QNIO. When QNIO completes the work, >> + * it will be passed back through the callback. >> + */ >> +static BlockAIOCB *vxhs_aio_rw(BlockDriverState *bs, int64_t >>sector_num, >> + QEMUIOVector *qiov, int nb_sectors, >> + BlockCompletionFunc *cb, void *opaque, >>int iodir) >> +{ >> + VXHSAIOCB *acb =3D NULL; >> + BDRVVXHSState *s =3D bs->opaque; >> + size_t size; >> + uint64_t offset; >> + int iio_flags =3D 0; >> + int ret =3D 0; >> + void *qnio_ctx =3D s->qnio_ctx; >> + uint32_t rfd =3D s->vdisk_hostinfo.vdisk_rfd; >> + >> + offset =3D sector_num * BDRV_SECTOR_SIZE; >> + size =3D nb_sectors * BDRV_SECTOR_SIZE; >> + acb =3D qemu_aio_get(&vxhs_aiocb_info, bs, cb, opaque); >> + /* >> + * Setup or initialize VXHSAIOCB. >> + * Every single field should be initialized since >> + * acb will be picked up from the slab without >> + * initializing with zero. >> + */ >> + acb->io_offset =3D offset; >> + acb->size =3D size; >> + acb->err =3D 0; >> + acb->state =3D VXHS_IO_INPROGRESS; >> + acb->qiov =3D qiov; >> + acb->direction =3D iodir; >> + >> + iio_flags =3D (IIO_FLAG_DONE | IIO_FLAG_ASYNC); >> + >> + switch (iodir) { >> + case VDISK_AIO_WRITE: >> + ret =3D iio_writev(qnio_ctx, rfd, qiov->iov, qiov->niov, >> + offset, (uint64_t)size, (void *)acb, >>iio_flags); >> + break; >> + case VDISK_AIO_READ: >> + ret =3D iio_readv(qnio_ctx, rfd, qiov->iov, qiov->niov, >> + offset, (uint64_t)size, (void *)acb, >>iio_flags); >> + break; >> + default: >> + trace_vxhs_aio_rw_invalid(iodir); >> + goto errout; >> + } >> + >> + if (ret !=3D 0) { >> + trace_vxhs_aio_rw_ioerr(s->vdisk_guid, iodir, size, offset, >> + acb, ret, errno); >> + goto errout; >> + } >> + return &acb->common; >> + >> +errout: >> + qemu_aio_unref(acb); >> + return NULL; >> +} >> + >> +static BlockAIOCB *vxhs_aio_readv(BlockDriverState *bs, >> + int64_t sector_num, QEMUIOVector >>*qiov, >> + int nb_sectors, >> + BlockCompletionFunc *cb, void >>*opaque) >> +{ >> + return vxhs_aio_rw(bs, sector_num, qiov, nb_sectors, cb, >> + opaque, VDISK_AIO_READ); >> +} >> + >> +static BlockAIOCB *vxhs_aio_writev(BlockDriverState *bs, >> + int64_t sector_num, QEMUIOVector >>*qiov, >> + int nb_sectors, >> + BlockCompletionFunc *cb, void >>*opaque) >> +{ >> + return vxhs_aio_rw(bs, sector_num, qiov, nb_sectors, >> + cb, opaque, VDISK_AIO_WRITE); >> +} >> + >> +static void vxhs_close(BlockDriverState *bs) >> +{ >> + BDRVVXHSState *s =3D bs->opaque; >> + >> + trace_vxhs_close(s->vdisk_guid); >> + close(s->fds[VDISK_FD_READ]); >> + close(s->fds[VDISK_FD_WRITE]); >> + >> + /* >> + * Clearing all the event handlers for oflame registered to QEMU >> + */ >> + aio_set_fd_handler(bdrv_get_aio_context(bs), s->fds[VDISK_FD_READ]= , >> + false, NULL, NULL, NULL); >> + g_free(s->vdisk_guid); >> + s->vdisk_guid =3D NULL; >> + vxhs_qnio_iio_close(s); >> + >> + /* >> + * Free the dynamically allocated hostip string >> + */ >> + g_free(s->vdisk_hostinfo.hostip); >> + s->vdisk_hostinfo.hostip =3D NULL; >> + s->vdisk_hostinfo.port =3D 0; >> +} >> + >> +static unsigned long vxhs_get_vdisk_stat(BDRVVXHSState *s) > >The functions that call this use its value as an int64_t, although this i= s >returning an unsigned long. It should return int64_t. Modified the return value as suggested. > >> +{ >> + int64_t vdisk_size =3D 0; >> + int ret =3D 0; >> + uint32_t rfd =3D s->vdisk_hostinfo.vdisk_rfd; >> + >> + ret =3D iio_ioctl(s->qnio_ctx, rfd, IOR_VDISK_STAT, &vdisk_size, >>NULL, 0); >> + if (ret < 0) { >> + trace_vxhs_get_vdisk_stat_err(s->vdisk_guid, ret, errno); >> + return 0; > >Repeat comment from RFC v7: > >Why is is this returning 0 here, rather then 'ret'? Does IOR_VDIST_STAT >return sane error codes? If not, instead of returning zero, map it to >-EIO >here. Changed the return value to =B3-EIO=B2 incase of error. > >> + } >> + >> + trace_vxhs_get_vdisk_stat(s->vdisk_guid, vdisk_size); >> + return vdisk_size; >> +} >> + >> +/* >> + * Returns the size of vDisk in bytes. This is required >> + * by QEMU block upper block layer so that it is visible >> + * to guest. >> + */ >> +static int64_t vxhs_getlength(BlockDriverState *bs) >> +{ >> + BDRVVXHSState *s =3D bs->opaque; >> + int64_t vdisk_size =3D 0; >> + >> + if (s->vdisk_size > 0) { >> + vdisk_size =3D s->vdisk_size; >> + } else { >> + /* >> + * Fetch the vDisk size using stat ioctl >> + */ >> + vdisk_size =3D vxhs_get_vdisk_stat(s); >> + if (vdisk_size > 0) { >> + s->vdisk_size =3D vdisk_size; >> + } >> + } >> + >> + if (vdisk_size > 0) { >> + return vdisk_size; /* return size in bytes */ >> + =20 } >> + > >Repeat comment from RFC v7: > >If vxhs_get_vdisk_stat() returned error rather than 0, then this check >is unnecessary. You can just return vdisk_size. > >(Also: I don't think you need to return -EIO for a zero-sized image, do >you?) Modified the code as suggested. Zero size disk will return zero. > >> + return -EIO; >> +} >> + >> +/* >> + * Returns actual blocks allocated for the vDisk. > >s/blocks/bytes > >> + * This is required by qemu-img utility. >> + */ >> +static int64_t vxhs_get_allocated_blocks(BlockDriverState *bs) > >s/blocks/bytes > >> +{ >> + BDRVVXHSState *s =3D bs->opaque; >> + int64_t vdisk_size =3D 0; >> + >> + if (s->vdisk_size > 0) { >> + vdisk_size =3D s->vdisk_size; >> + } else { >> + /* >> + * TODO: >> + * Once HyperScale storage-virtualizer provides >> + * actual physical allocation of blocks then >> + * fetch that information and return back to the >> + * caller but for now just get the full size. >> + =20 */ >> + vdisk_size =3D vxhs_get_vdisk_stat(s); >> + if (vdisk_size > 0) { >> + s->vdisk_size =3D vdisk_size; >> + } >> + } >> + >> + if (vdisk_size > 0) { >> + return vdisk_size; /* return size in bytes */ >> + } >> + >> + return -EIO; > >Repeat comment from RFC v7: > >Since this is just identical to vxhs_getlength(), why not just call it >here, >and reduce code duplication? > >Or even better: do not implement this function. It isn't required. If i= t >is not implemented, the block layer above this will just return -ENOTSUP,= >which in qapi will be interpreted as "unavailable". This function should= >be >dropped, until it is something supported by VXHS. Removed vxhs_get_allocated_blocks() function. > >> +} >> + >> +static void vxhs_detach_aio_context(BlockDriverState *bs) >> +{ >> + BDRVVXHSState *s =3D bs->opaque; >> + >> + aio_set_fd_handler(bdrv_get_aio_context(bs), s->fds[VDISK_FD_READ]= , >> + false, NULL, NULL, NULL); >> + >> +} >> + >> +static void vxhs_attach_aio_context(BlockDriverState *bs, >> + AioContext *new_context) >> +{ >> + BDRVVXHSState *s =3D bs->opaque; >> + >> + aio_set_fd_handler(new_context, s->fds[VDISK_FD_READ], >> + false, vxhs_aio_event_reader, NULL, s); >> +} >> + >> +static BlockDriver bdrv_vxhs =3D { >> + .format_name =3D "vxhs", >> + .protocol_name =3D "vxhs", >> + .instance_size =3D sizeof(BDRVVXHSState), >> + .bdrv_file_open =3D vxhs_open, >> + .bdrv_parse_filename =3D vxhs_parse_filename, >> + .bdrv_close =3D vxhs_close, >> + .bdrv_getlength =3D vxhs_getlength, >> + .bdrv_get_allocated_file_size =3D vxhs_get_allocated_blocks, >> + .bdrv_aio_readv =3D vxhs_aio_readv, >> + .bdrv_aio_writev =3D vxhs_aio_writev, >> + .bdrv_detach_aio_context =3D vxhs_detach_aio_context, >> + .bdrv_attach_aio_context =3D vxhs_attach_aio_context, >> +}; >> + >> +static void bdrv_vxhs_init(void) >> +{ >> + bdrv_register(&bdrv_vxhs); >> +} >> + >> +block_init(bdrv_vxhs_init); >> diff --git a/configure b/configure >> index d3dafcb..4413e88 100755 >> --- a/configure >> +++ b/configure >> @@ -321,6 +321,7 @@ numa=3D"" >> tcmalloc=3D"no" >> jemalloc=3D"no" >> replication=3D"yes" >> +vxhs=3D"" >> =20 >> # parse CC options first >> for opt do >> @@ -1162,6 +1163,11 @@ for opt do >> ;; >> --enable-replication) replication=3D"yes" >> ;; >> + --disable-vxhs) vxhs=3D"no" >> + ;; >> + --enable-vxhs) vxhs=3D"yes" >> + ;; >> + >> *) >> echo "ERROR: unknown option $opt" >> echo "Try '$0 --help' for more information" >> @@ -1391,6 +1397,7 @@ disabled with --disable-FEATURE, default is=20 >>enabled if available: >> tcmalloc tcmalloc support >> jemalloc jemalloc support >> replication replication support >> + vxhs Veritas HyperScale vDisk backend support >> =20 >> NOTE: The object files are=20built at the place where configure is=20 >>launched >> EOF >> @@ -4625,6 +4632,33 @@ if do_cc -nostdlib -Wl,-r -Wl,--no-relax -o=20 >>$TMPMO $TMPO; then >> fi >> =20 >> ########################################## >> +# Veritas HyperScale block driver VxHS >> +# Check if libqnio is installed >> + >> +if test "$vxhs" !=3D "no" ; then >> + cat > $TMPC <> +#include >> +#include >> + >> +void *vxhs_callback; >> + >> +int main(void) { >> + iio_init(vxhs_callback); >> + return 0; >> +} >> +EOF >> + vxhs_libs=3D"-lqnio" >> + if compile_prog "" "$vxhs_libs" ; then >> + vxhs=3Dyes >> + else >> + if test "$vxhs" =3D "yes" ; then >> + feature_not_found "vxhs block device" "Install libqnio. See=20 >>github" >> + fi >> + vxhs=3Dno >> + fi >> +fi > >How is libqnio going to be distributed normally? Should we be checking=20= >for >a minimum version here to compile against? Yes, version check is required. We will add the changes in separate patch.= > >> + >> +########################################## >> # End of CC checks >> # After here, no more $cc or $ld runs >> =20 >> @@ -4990,6 +5024,7 @@ echo "tcmalloc support $tcmalloc" >> echo "jemalloc support $jemalloc" >> echo "avx2 optimization $avx2_opt" >> echo "replication support $replication" >> +echo "VxHS block device $vxhs" >> =20 >> if test "$sdl_too_old" =3D "yes"; then >> echo "-> Your SDL version is too old - please upgrade to have SDL=20 >>support" >> @@ -5590,6 +5625,12 @@ if test "$pthread_setname_np" =3D "yes" ; then >> echo "CONFIG_PTHREAD_SETNAME_NP=3Dy" >> $config_host_mak >> fi >> =20 >> +if test "$vxhs" =3D "yes" ; then >> + echo "CONFIG_VXHS=3Dy" >> $config_host_mak >> + echo "VXHS_CFLAGS=3D$vxhs_cflags" >> $config_host_mak >> + echo "VXHS_LIBS=3D$vxhs_libs" >> $config_host_mak >> +fi >> + >> if test "$tcg_interpreter" =3D "yes"; then >> QEMU_INCLUDES=3D"-I\$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" >> elif test "$ARCH" =3D "sparc64" ; then >> --=20 >> 2.5.5 >>=20 >