From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=55248 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PfXk9-00056G-I8 for qemu-devel@nongnu.org; Wed, 19 Jan 2011 08:06:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PfXiK-0008PS-N3 for qemu-devel@nongnu.org; Wed, 19 Jan 2011 08:04:42 -0500 Received: from mail-ww0-f41.google.com ([74.125.82.41]:52066) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PfXiK-0008PI-9k for qemu-devel@nongnu.org; Wed, 19 Jan 2011 08:04:40 -0500 Received: by wwi18 with SMTP id 18so616771wwi.4 for ; Wed, 19 Jan 2011 05:04:39 -0800 (PST) MIME-Version: 1.0 Sender: tamura.yoshiaki@gmail.com In-Reply-To: <4D36B130.4010608@redhat.com> References: <1295415904-11918-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1295415904-11918-10-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <4D36B130.4010608@redhat.com> Date: Wed, 19 Jan 2011 22:04:39 +0900 Message-ID: Subject: Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap. From: Yoshiaki Tamura Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, mst@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, blauwirbel@gmail.com, ohmura.kei@lab.ntt.co.jp, avi@redhat.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com 2011/1/19 Kevin Wolf : > Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: >> event-tap controls when to start FT transaction, and provides proxy >> functions to called from net/block devices. =A0While FT transaction, it >> queues up net/block requests, and flush them when the transaction gets >> completed. >> >> Signed-off-by: Yoshiaki Tamura >> Signed-off-by: OHMURA Kei > > One general comment: On the first glance this seems to mix block and net > (and some other things) arbitrarily instead of having a section for > handling all block stuff, then network, etc. > > Is there a specific reason for the order in which you put the functions? > If not, maybe reordering them might improve readability. Thanks. I'll rework on that. > >> --- >> =A0Makefile.target | =A0 =A01 + >> =A0event-tap.c =A0 =A0 | =A0847 ++++++++++++++++++++++++++++++++++++++++= +++++++++++++++ >> =A0event-tap.h =A0 =A0 | =A0 42 +++ >> =A0qemu-tool.c =A0 =A0 | =A0 24 ++ >> =A0trace-events =A0 =A0| =A0 =A09 + >> =A05 files changed, 923 insertions(+), 0 deletions(-) >> =A0create mode 100644 event-tap.c >> =A0create mode 100644 event-tap.h >> >> diff --git a/Makefile.target b/Makefile.target >> index e15b1c4..f36cd75 100644 >> --- a/Makefile.target >> +++ b/Makefile.target >> @@ -199,6 +199,7 @@ obj-y +=3D rwhandler.o >> =A0obj-$(CONFIG_KVM) +=3D kvm.o kvm-all.o >> =A0obj-$(CONFIG_NO_KVM) +=3D kvm-stub.o >> =A0LIBS+=3D-lz >> +obj-y +=3D event-tap.o >> >> =A0QEMU_CFLAGS +=3D $(VNC_TLS_CFLAGS) >> =A0QEMU_CFLAGS +=3D $(VNC_SASL_CFLAGS) >> diff --git a/event-tap.c b/event-tap.c >> new file mode 100644 >> index 0000000..f492708 >> --- /dev/null >> +++ b/event-tap.c > >> @@ -0,0 +1,847 @@ >> +/* >> + * Event Tap functions for QEMU >> + * >> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2. =A0= See >> + * the COPYING file in the top-level directory. >> + */ >> + >> +#include "qemu-common.h" >> +#include "qemu-error.h" >> +#include "block.h" >> +#include "block_int.h" >> +#include "ioport.h" >> +#include "osdep.h" >> +#include "sysemu.h" >> +#include "hw/hw.h" >> +#include "net.h" >> +#include "event-tap.h" >> +#include "trace.h" >> + >> +enum EVENT_TAP_STATE { >> + =A0 =A0EVENT_TAP_OFF, >> + =A0 =A0EVENT_TAP_ON, >> + =A0 =A0EVENT_TAP_FLUSH, >> + =A0 =A0EVENT_TAP_LOAD, >> + =A0 =A0EVENT_TAP_REPLAY, >> +}; >> + >> +static enum EVENT_TAP_STATE event_tap_state =3D EVENT_TAP_OFF; >> +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */ > > Indeed, bdrv_aio_cancel will segfault this way. > > If you use dummies instead of real ACBs the only way to correctly > implement bdrv_aio_cancel is waiting for all in-flight AIOs > (qemu_aio_flush). So I need to insert a new event_tap function to bdrv_aio_cancel to do that. > >> +typedef struct EventTapIOport { >> + =A0 =A0uint32_t address; >> + =A0 =A0uint32_t data; >> + =A0 =A0int =A0 =A0 =A0index; >> +} EventTapIOport; >> + >> +#define MMIO_BUF_SIZE 8 >> + >> +typedef struct EventTapMMIO { >> + =A0 =A0uint64_t address; >> + =A0 =A0uint8_t =A0buf[MMIO_BUF_SIZE]; >> + =A0 =A0int =A0 =A0 =A0len; >> +} EventTapMMIO; >> + >> +typedef struct EventTapNetReq { >> + =A0 =A0char *device_name; >> + =A0 =A0int iovcnt; >> + =A0 =A0struct iovec *iov; >> + =A0 =A0int vlan_id; >> + =A0 =A0bool vlan_needed; >> + =A0 =A0bool async; >> + =A0 =A0NetPacketSent *sent_cb; >> +} EventTapNetReq; >> + >> +#define MAX_BLOCK_REQUEST 32 >> + >> +typedef struct EventTapBlkReq { >> + =A0 =A0char *device_name; >> + =A0 =A0int num_reqs; >> + =A0 =A0int num_cbs; >> + =A0 =A0bool is_flush; >> + =A0 =A0BlockRequest reqs[MAX_BLOCK_REQUEST]; >> + =A0 =A0BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST]; >> + =A0 =A0void *opaque[MAX_BLOCK_REQUEST]; >> +} EventTapBlkReq; >> + >> +#define EVENT_TAP_IOPORT (1 << 0) >> +#define EVENT_TAP_MMIO =A0 (1 << 1) >> +#define EVENT_TAP_NET =A0 =A0(1 << 2) >> +#define EVENT_TAP_BLK =A0 =A0(1 << 3) >> + >> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1) >> + >> +typedef struct EventTapLog { >> + =A0 =A0int mode; >> + =A0 =A0union { >> + =A0 =A0 =A0 =A0EventTapIOport ioport; >> + =A0 =A0 =A0 =A0EventTapMMIO mmio; >> + =A0 =A0}; >> + =A0 =A0union { >> + =A0 =A0 =A0 =A0EventTapNetReq net_req; >> + =A0 =A0 =A0 =A0EventTapBlkReq blk_req; >> + =A0 =A0}; >> + =A0 =A0QTAILQ_ENTRY(EventTapLog) node; >> +} EventTapLog; >> + >> +static EventTapLog *last_event_tap; >> + >> +static QTAILQ_HEAD(, EventTapLog) event_list; >> +static QTAILQ_HEAD(, EventTapLog) event_pool; >> + >> +static int (*event_tap_cb)(void); >> +static QEMUBH *event_tap_bh; >> +static VMChangeStateEntry *vmstate; >> + >> +static void event_tap_bh_cb(void *p) >> +{ >> + =A0 =A0if (event_tap_cb) { >> + =A0 =A0 =A0 =A0event_tap_cb(); >> + =A0 =A0} >> + >> + =A0 =A0qemu_bh_delete(event_tap_bh); >> + =A0 =A0event_tap_bh =3D NULL; >> +} >> + >> +static void event_tap_schedule_bh(void) >> +{ >> + =A0 =A0trace_event_tap_ignore_bh(!!event_tap_bh); >> + >> + =A0 =A0/* if bh is already set, we ignore it for now */ >> + =A0 =A0if (event_tap_bh) { >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0event_tap_bh =3D qemu_bh_new(event_tap_bh_cb, NULL); >> + =A0 =A0qemu_bh_schedule(event_tap_bh); >> + >> + =A0 =A0return ; >> +} >> + >> +static void event_tap_alloc_net_req(EventTapNetReq *net_req, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 VL= ANClientState *vc, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 co= nst struct iovec *iov, int iovcnt, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Ne= tPacketSent *sent_cb, bool async) >> +{ >> + =A0 =A0int i; >> + >> + =A0 =A0net_req->iovcnt =3D iovcnt; >> + =A0 =A0net_req->async =3D async; >> + =A0 =A0net_req->device_name =3D qemu_strdup(vc->name); >> + =A0 =A0net_req->sent_cb =3D sent_cb; >> + >> + =A0 =A0if (vc->vlan) { >> + =A0 =A0 =A0 =A0net_req->vlan_needed =3D 1; >> + =A0 =A0 =A0 =A0net_req->vlan_id =3D vc->vlan->id; >> + =A0 =A0} else { >> + =A0 =A0 =A0 =A0net_req->vlan_needed =3D 0; >> + =A0 =A0} >> + >> + =A0 =A0if (async) { >> + =A0 =A0 =A0 =A0net_req->iov =3D (struct iovec *)iov; >> + =A0 =A0} else { >> + =A0 =A0 =A0 =A0net_req->iov =3D qemu_malloc(sizeof(struct iovec) * iov= cnt); >> + =A0 =A0 =A0 =A0for (i =3D 0; i < iovcnt; i++) { >> + =A0 =A0 =A0 =A0 =A0 =A0net_req->iov[i].iov_base =3D qemu_malloc(iov[i]= .iov_len); >> + =A0 =A0 =A0 =A0 =A0 =A0memcpy(net_req->iov[i].iov_base, iov[i].iov_bas= e, iov[i].iov_len); >> + =A0 =A0 =A0 =A0 =A0 =A0net_req->iov[i].iov_len =3D iov[i].iov_len; >> + =A0 =A0 =A0 =A0} >> + =A0 =A0} >> +} >> + >> +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0BlockDriverState *bs, BlockRequest *reqs, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0int num_reqs, BlockDriverCompletionFunc *cb, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0void *opaque, bool is_flush) >> +{ >> + =A0 =A0int i; >> + >> + =A0 =A0blk_req->num_reqs =3D num_reqs; >> + =A0 =A0blk_req->num_cbs =3D num_reqs; >> + =A0 =A0blk_req->device_name =3D qemu_strdup(bs->device_name); >> + =A0 =A0blk_req->is_flush =3D is_flush; >> + >> + =A0 =A0for (i =3D 0; i < num_reqs; i++) { >> + =A0 =A0 =A0 =A0blk_req->reqs[i].sector =3D reqs[i].sector; >> + =A0 =A0 =A0 =A0blk_req->reqs[i].nb_sectors =3D reqs[i].nb_sectors; >> + =A0 =A0 =A0 =A0blk_req->reqs[i].qiov =3D reqs[i].qiov; >> + =A0 =A0 =A0 =A0blk_req->reqs[i].cb =3D cb; >> + =A0 =A0 =A0 =A0blk_req->reqs[i].opaque =3D opaque; >> + =A0 =A0 =A0 =A0blk_req->cb[i] =3D reqs[i].cb; >> + =A0 =A0 =A0 =A0blk_req->opaque[i] =3D reqs[i].opaque; >> + =A0 =A0} >> +} >> + >> +static void *event_tap_alloc_log(void) >> +{ >> + =A0 =A0EventTapLog *log; >> + >> + =A0 =A0if (QTAILQ_EMPTY(&event_pool)) { >> + =A0 =A0 =A0 =A0log =3D qemu_mallocz(sizeof(EventTapLog)); >> + =A0 =A0} else { >> + =A0 =A0 =A0 =A0log =3D QTAILQ_FIRST(&event_pool); >> + =A0 =A0 =A0 =A0QTAILQ_REMOVE(&event_pool, log, node); >> + =A0 =A0} >> + >> + =A0 =A0return log; >> +} >> + >> +static void event_tap_free_log(EventTapLog *log) >> +{ >> + =A0 =A0int i, mode =3D log->mode & ~EVENT_TAP_TYPE_MASK; >> + >> + =A0 =A0if (mode =3D=3D EVENT_TAP_NET) { >> + =A0 =A0 =A0 =A0EventTapNetReq *net_req =3D &log->net_req; >> + >> + =A0 =A0 =A0 =A0if (!net_req->async) { >> + =A0 =A0 =A0 =A0 =A0 =A0for (i =3D 0; i < net_req->iovcnt; i++) { >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qemu_free(net_req->iov[i].iov_base); >> + =A0 =A0 =A0 =A0 =A0 =A0} >> + =A0 =A0 =A0 =A0 =A0 =A0qemu_free(net_req->iov); >> + =A0 =A0 =A0 =A0} else if (event_tap_state >=3D EVENT_TAP_LOAD) { >> + =A0 =A0 =A0 =A0 =A0 =A0qemu_free(net_req->iov); >> + =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0 =A0qemu_free(net_req->device_name); >> + =A0 =A0} else if (mode =3D=3D EVENT_TAP_BLK) { >> + =A0 =A0 =A0 =A0EventTapBlkReq *blk_req =3D &log->blk_req; >> + >> + =A0 =A0 =A0 =A0if (event_tap_state >=3D EVENT_TAP_LOAD && !blk_req->is= _flush) { >> + =A0 =A0 =A0 =A0 =A0 =A0for (i =3D 0; i < blk_req->num_reqs; i++) { >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qemu_iovec_destroy(blk_req->reqs[i].qio= v); >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qemu_free(blk_req->reqs[i].qiov); >> + =A0 =A0 =A0 =A0 =A0 =A0} >> + =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0 =A0qemu_free(blk_req->device_name); >> + =A0 =A0} >> + >> + =A0 =A0log->mode =3D 0; >> + >> + =A0 =A0/* return the log to event_pool */ >> + =A0 =A0QTAILQ_INSERT_HEAD(&event_pool, log, node); >> +} >> + >> +static void event_tap_free_pool(void) >> +{ >> + =A0 =A0EventTapLog *log, *next; >> + >> + =A0 =A0QTAILQ_FOREACH_SAFE(log, &event_pool, node, next) { >> + =A0 =A0 =A0 =A0QTAILQ_REMOVE(&event_pool, log, node); >> + =A0 =A0 =A0 =A0qemu_free(log); >> + =A0 =A0} >> +} >> + >> +static void event_tap_blk_cb(void *opaque, int ret) >> +{ >> + =A0 =A0EventTapLog *log =3D container_of(opaque, EventTapLog, blk_req)= ; >> + =A0 =A0EventTapBlkReq *blk_req =3D opaque; >> + =A0 =A0int i; >> + >> + =A0 =A0blk_req->num_cbs--; >> + >> + =A0 =A0/* all outstanding requests are flushed */ >> + =A0 =A0if (blk_req->num_cbs =3D=3D 0) { >> + =A0 =A0 =A0 =A0for (i =3D 0; i < blk_req->num_reqs; i++) { >> + =A0 =A0 =A0 =A0 =A0 =A0blk_req->cb[i](blk_req->opaque[i], ret); >> + =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0 =A0event_tap_free_log(log); >> + =A0 =A0} >> +} >> + >> +static void event_tap_packet(VLANClientState *vc, const struct iovec *i= ov, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int iovcnt, Net= PacketSent *sent_cb, bool async) >> +{ >> + =A0 =A0int empty; >> + =A0 =A0EventTapLog *log =3D last_event_tap; >> + >> + =A0 =A0if (!log) { >> + =A0 =A0 =A0 =A0trace_event_tap_no_event(); >> + =A0 =A0 =A0 =A0log =3D event_tap_alloc_log(); >> + =A0 =A0} >> + >> + =A0 =A0if (log->mode & ~EVENT_TAP_TYPE_MASK) { >> + =A0 =A0 =A0 =A0trace_event_tap_already_used(log->mode & ~EVENT_TAP_TYP= E_MASK); >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0log->mode |=3D EVENT_TAP_NET; >> + =A0 =A0event_tap_alloc_net_req(&log->net_req, vc, iov, iovcnt, sent_cb= , async); >> + >> + =A0 =A0empty =3D QTAILQ_EMPTY(&event_list); >> + =A0 =A0QTAILQ_INSERT_TAIL(&event_list, log, node); >> + =A0 =A0last_event_tap =3D NULL; >> + >> + =A0 =A0if (empty) { >> + =A0 =A0 =A0 =A0event_tap_schedule_bh(); >> + =A0 =A0} >> +} >> + >> +static void event_tap_bdrv(BlockDriverState *bs, BlockRequest *reqs, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int num_reqs, bool= is_flush) >> +{ >> + =A0 =A0EventTapLog *log =3D last_event_tap; >> + =A0 =A0int empty; >> + >> + =A0 =A0if (!log) { >> + =A0 =A0 =A0 =A0trace_event_tap_no_event(); >> + =A0 =A0 =A0 =A0log =3D event_tap_alloc_log(); >> + =A0 =A0} >> + >> + =A0 =A0if (log->mode & ~EVENT_TAP_TYPE_MASK) { >> + =A0 =A0 =A0 =A0trace_event_tap_already_used(log->mode & ~EVENT_TAP_TYP= E_MASK); >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0log->mode |=3D EVENT_TAP_BLK; >> + =A0 =A0event_tap_alloc_blk_req(&log->blk_req, bs, reqs, num_reqs, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0event_tap_blk_c= b, &log->blk_req, is_flush); >> + >> + =A0 =A0empty =3D QTAILQ_EMPTY(&event_list); >> + =A0 =A0QTAILQ_INSERT_TAIL(&event_list, log, node); >> + =A0 =A0last_event_tap =3D NULL; >> + >> + =A0 =A0if (empty) { >> + =A0 =A0 =A0 =A0event_tap_schedule_bh(); >> + =A0 =A0} >> +} >> + >> +BlockDriverAIOCB *event_tap_bdrv_aio_writev(BlockDriverState *bs, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0int64_t sector_num, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0QEMUIOVector *iov, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0int nb_sectors, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0BlockDriverCompletionFunc *cb, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0void *opaque) >> +{ >> + =A0 =A0BlockRequest req; >> + >> + =A0 =A0assert(event_tap_state =3D=3D EVENT_TAP_ON); >> + >> + =A0 =A0req.sector =3D sector_num; >> + =A0 =A0req.nb_sectors =3D nb_sectors; >> + =A0 =A0req.qiov =3D iov; >> + =A0 =A0req.cb =3D cb; >> + =A0 =A0req.opaque =3D opaque; >> + =A0 =A0event_tap_bdrv(bs, &req, 1, 0); >> + >> + =A0 =A0/* return a dummy_acb pointer to prevent from failing */ >> + =A0 =A0return &dummy_acb; >> +} >> + >> +BlockDriverAIOCB *event_tap_bdrv_aio_flush(BlockDriverState *bs, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 BlockDriverCompletionFunc *cb, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 void *opaque) >> +{ >> + =A0 =A0BlockRequest req; >> + >> + =A0 =A0assert(event_tap_state =3D=3D EVENT_TAP_ON); >> + >> + =A0 =A0memset(&req, 0, sizeof(req)); >> + =A0 =A0req.cb =3D cb; >> + =A0 =A0req.opaque =3D opaque; >> + =A0 =A0event_tap_bdrv(bs, &req, 1, 1); >> + >> + =A0 =A0return &dummy_acb; >> +} >> + >> +void event_tap_send_packet(VLANClientState *vc, const uint8_t *buf, int= size) >> +{ >> + =A0 =A0struct iovec iov; >> + >> + =A0 =A0assert(event_tap_state =3D=3D EVENT_TAP_ON); >> + >> + =A0 =A0iov.iov_base =3D (uint8_t *)buf; >> + =A0 =A0iov.iov_len =3D size; >> + =A0 =A0event_tap_packet(vc, &iov, 1, NULL, 0); >> + >> + =A0 =A0return; >> +} >> +ssize_t event_tap_sendv_packet_async(VLANClientState *vc, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 const struct iovec *iov, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 int iovcnt, NetPacketSent *sent_cb) >> +{ >> + =A0 =A0assert(event_tap_state =3D=3D EVENT_TAP_ON); >> + =A0 =A0event_tap_packet(vc, iov, iovcnt, sent_cb, 1); >> + =A0 =A0return 0; >> +} >> + >> +int event_tap_register(int (*cb)(void)) >> +{ >> + =A0 =A0if (event_tap_state !=3D EVENT_TAP_OFF) { >> + =A0 =A0 =A0 =A0error_report("event-tap is already on"); >> + =A0 =A0 =A0 =A0return -EINVAL; >> + =A0 =A0} >> + >> + =A0 =A0if (!cb || event_tap_cb) { >> + =A0 =A0 =A0 =A0error_report("can't set event_tap_cb"); >> + =A0 =A0 =A0 =A0return -EINVAL; >> + =A0 =A0} >> + >> + =A0 =A0event_tap_cb =3D cb; >> + =A0 =A0event_tap_state =3D EVENT_TAP_ON; >> + >> + =A0 =A0return 0; >> +} >> + >> +void event_tap_unregister(void) >> +{ >> + =A0 =A0if (event_tap_state =3D=3D EVENT_TAP_OFF) { >> + =A0 =A0 =A0 =A0error_report("event-tap is already off"); >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0event_tap_state =3D EVENT_TAP_OFF; >> + =A0 =A0event_tap_cb =3D NULL; >> + >> + =A0 =A0event_tap_flush(); >> + =A0 =A0event_tap_free_pool(); >> +} >> + >> +int event_tap_is_on(void) >> +{ >> + =A0 =A0return (event_tap_state =3D=3D EVENT_TAP_ON); >> +} >> + >> +void event_tap_ioport(int index, uint32_t address, uint32_t data) >> +{ >> + =A0 =A0if (event_tap_state !=3D EVENT_TAP_ON) { >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0if (!last_event_tap) { >> + =A0 =A0 =A0 =A0last_event_tap =3D event_tap_alloc_log(); >> + =A0 =A0} >> + >> + =A0 =A0last_event_tap->mode =3D EVENT_TAP_IOPORT; >> + =A0 =A0last_event_tap->ioport.index =3D index; >> + =A0 =A0last_event_tap->ioport.address =3D address; >> + =A0 =A0last_event_tap->ioport.data =3D data; >> +} >> + >> +void event_tap_mmio(uint64_t address, uint8_t *buf, int len) >> +{ >> + =A0 =A0if (event_tap_state !=3D EVENT_TAP_ON || len > MMIO_BUF_SIZE) { >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0if (!last_event_tap) { >> + =A0 =A0 =A0 =A0last_event_tap =3D event_tap_alloc_log(); >> + =A0 =A0} >> + >> + =A0 =A0last_event_tap->mode =3D EVENT_TAP_MMIO; >> + =A0 =A0last_event_tap->mmio.address =3D address; >> + =A0 =A0last_event_tap->mmio.len =3D len; >> + =A0 =A0memcpy(last_event_tap->mmio.buf, buf, len); >> +} >> + >> +static void event_tap_net_flush(EventTapNetReq *net_req) >> +{ >> + =A0 =A0VLANClientState *vc; >> + =A0 =A0ssize_t len; >> + >> + =A0 =A0if (net_req->vlan_needed) { >> + =A0 =A0 =A0 =A0vc =3D qemu_find_vlan_client_by_name(NULL, net_req->vla= n_id, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 net_req->device_name); >> + =A0 =A0} else { >> + =A0 =A0 =A0 =A0vc =3D qemu_find_netdev(net_req->device_name); >> + =A0 =A0} >> + >> + =A0 =A0if (net_req->async) { >> + =A0 =A0 =A0 =A0len =3D qemu_sendv_packet_async(vc, net_req->iov, net_r= eq->iovcnt, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0net_req->sent_cb); >> + =A0 =A0 =A0 =A0if (len) { >> + =A0 =A0 =A0 =A0 =A0 =A0net_req->sent_cb(vc, len); >> + =A0 =A0 =A0 =A0} else { >> + =A0 =A0 =A0 =A0 =A0 =A0/* packets are queued in the net layer */ >> + =A0 =A0 =A0 =A0 =A0 =A0trace_event_tap_append_packet(); >> + =A0 =A0 =A0 =A0} >> + =A0 =A0} else { >> + =A0 =A0 =A0 =A0qemu_send_packet(vc, net_req->iov[0].iov_base, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 net_req->iov[0].iov_le= n); >> + =A0 =A0} >> +} >> + >> +static void event_tap_blk_flush(EventTapBlkReq *blk_req) >> +{ >> + =A0 =A0BlockDriverState *bs; >> + >> + =A0 =A0bs =3D bdrv_find(blk_req->device_name); > > Please store the BlockDriverState in blk_req. This code loops over all > block devices and does a string comparison - and that for each request. > You can also save the qemu_strdup() when creating the request. > > In the few places where you really need the device name (might be the > case for load/save, I'm not sure), you can still get it from the > BlockDriverState. I would do so for the primary side. Although we haven't implemented yet, we want to replay block requests from block layer on the secondary side, and need device name to restore BlockDriverState. > >> + >> + =A0 =A0if (blk_req->is_flush) { >> + =A0 =A0 =A0 =A0bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0= ].opaque); > > You need to handle errors. If bdrv_aio_flush returns NULL, call the > callback with -EIO. I'll do so. > >> + =A0 =A0 =A0 =A0return; >> + =A0 =A0} >> + >> + =A0 =A0bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].q= iov, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blk_req->reqs[0].nb_sectors, bl= k_req->reqs[0].cb, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blk_req->reqs[0].opaque); > > Same here. > >> + =A0 =A0bdrv_flush(bs); > > This looks really strange. What is this supposed to do? > > One point is that you write it immediately after bdrv_aio_write, so you > get an fsync for which you don't know if it includes the current write > request or if it doesn't. Which data do you want to get flushed to the di= sk? I was expecting to flush the aio request that was just initiated. Am I misunderstanding the function? > The other thing is that you introduce a bdrv_flush for each request, > basically forcing everyone to something very similar to writethrough > mode. I'm sure this will have a big impact on performance. The reason is to avoid inversion of queued requests. Although processing one-by-one is heavy, wouldn't having requests flushed to disk out of order break the disk image? > Additionally, error handling is missing. I looked at the codes using bdrv_flush and realized some of them doesn't handle errors, but scsi-disk.c does. Should everyone handle errors or depends on the usage? > > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at =A0http://vger.kernel.org/majordomo-info.html >