public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
To: ya su <suya94335@gmail.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, avi@redhat.com,
	anthony@codemonkey.ws, aliguori@us.ibm.com, mtosatti@redhat.com,
	dlaor@redhat.com, mst@redhat.com, kwolf@redhat.com,
	pbonzini@redhat.com, quintela@redhat.com, ananth@in.ibm.com,
	psuriset@linux.vnet.ibm.com, vatsa@linux.vnet.ibm.com,
	stefanha@linux.vnet.ibm.com, blauwirbel@gmail.com,
	ohmura.kei@lab.ntt.co.jp
Subject: Re: [PATCH 09/18] Introduce event-tap.
Date: Wed, 09 Mar 2011 17:51:04 +0900	[thread overview]
Message-ID: <4D773F78.5010407@lab.ntt.co.jp> (raw)
In-Reply-To: <AANLkTikxHw+PbQGcqiHJKsZW7+rirkhver-pUQK8wfjp@mail.gmail.com>

ya su wrote:
> Yoshi:
>
>      I meet one problem if I killed a ft source VM, the dest ft VM will
> return errors as the following:
>
> qemu-system-x86_64: fill buffer failed, Resource temporarily unavailable
> qemu-system-x86_64: recv header failed
>
>      the problem is that the dest VM can not continue to run, as it is
> interrupted in the middle of a transaction, some of rams have been
> updated, but the others not, do you have any plan for rolling back to
> cancel the interrupted transaction? thanks.

No it's not a problem.  This is one of FAQs I get, but just press cont or c in 
the secondary qemu, it should run.

Thanks,

Yoshi

>
>
> Green.
>
>
>
> 2011/3/9 Yoshiaki Tamura<tamura.yoshiaki@lab.ntt.co.jp>:
>> ya su wrote:
>>>
>>> Yoshi:
>>>
>>>      I think event-tap is a great idea, it remove the reading from disk
>>> which will increase ft effiency much better as your plan in later
>>> series.
>>>
>>>      one question: IO read/write may dirty rams, but it is difficute to
>>> differ them from other dirty pages like caused by  running of
>>> softwares,  whether that means you need change all the emulated device
>>> realization?  actually I think it will not send too much rams caused
>>> by IO Read/Write in ram_save_live, but if It can event-tap IO
>>> read/write and replay on the other side, Does that means we don't need
>>> call qemu_savevm_state_full in ft transactoins?
>>
>> I'm not expecting to remove qemu_savevm_state_full in the transaction.  Just
>> reduce the number of pages to be transfered as a result.
>>
>> Thanks,
>>
>> Yoshi
>>
>>>
>>> Green.
>>>
>>>
>>> 2011/3/9 Yoshiaki Tamura<tamura.yoshiaki@lab.ntt.co.jp>:
>>>>
>>>> ya su wrote:
>>>>>
>>>>> 2011/3/8 Yoshiaki Tamura<tamura.yoshiaki@lab.ntt.co.jp>:
>>>>>>
>>>>>> ya su wrote:
>>>>>>>
>>>>>>> Yokshiaki:
>>>>>>>
>>>>>>>      event-tap record block and io wirte events, and replay these on
>>>>>>> the other side, so block_save_live is useless during the latter ft
>>>>>>> phase, right? if so, I think it need to process the following code in
>>>>>>> block_save_live function:
>>>>>>
>>>>>> Actually no.  It just replays the last events only.  We do have patches
>>>>>> that
>>>>>> enable block replication without using block live migration, like the
>>>>>> way
>>>>>> you described above.  In that case, we disable block live migration
>>>>>> when
>>>>>>   we
>>>>>> go into ft mode.  We're thinking to propose it after this series get
>>>>>> settled.
>>>>>
>>>>> so event-tap's objective is to initial a ft transaction, to start the
>>>>> sync. of ram/block/device states? if so, it need not change
>>>>> bdrv_aio_writev/bdrv_aio_flush normal process, on the other side it
>>>>> need not invokde bdrv_aio_writev either, right?
>>>>
>>>> Mostly yes, but because event-tap is queuing requests from block/net, it
>>>> needs to flush queued requests after the transaction on the primary side.
>>>>   On the secondary, it currently doesn't have to invoke bdrv_aio_writev as
>>>> you mentioned.  But will change soon to enable block replication with
>>>> event-tap.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>      if (stage == 1) {
>>>>>>>          init_blk_migration(mon, f);
>>>>>>>
>>>>>>>          /* start track dirty blocks */
>>>>>>>          set_dirty_tracking(1);
>>>>>>>      }
>>>>>>> --------------------------------------
>>>>>>> the following code will send block to the other side, as this will
>>>>>>> also be done by event-tap replay. I think it should placed in stage 3,
>>>>>>> before the assert line. (this may affect some stage 2 rate-limit
>>>>>>> then, so this can be placed in stage 2, though it looks ugly), another
>>>>>>> choice is to avoid the invocation of block_save_live, right?
>>>>>>> ---------------------------------------
>>>>>>>      flush_blks(f);
>>>>>>>
>>>>>>>      if (qemu_file_has_error(f)) {
>>>>>>>          blk_mig_cleanup(mon);
>>>>>>>          return 0;
>>>>>>>      }
>>>>>>>
>>>>>>>      blk_mig_reset_dirty_cursor();
>>>>>>> ----------------------------------------
>>>>>>>      if (stage == 2) {
>>>>>>>
>>>>>>>
>>>>>>>      another question is: since you event-tap io write(I think IO READ
>>>>>>> should also be event-tapped, as read may cause io chip state to
>>>>>>> change),  you then need not invoke qemu_savevm_state_full in
>>>>>>> qemu_savevm_trans_complete, right? thanks.
>>>>>>
>>>>>> It's not necessary to tap IO READ, but you can if you like.  We also
>>>>>> have
>>>>>> experimental patches for this to reduce rams to be transfered.  But I
>>>>>> don't
>>>>>> understand why we don't have to invoke qemu_savevm_state_full although
>>>>>> I
>>>>>> think we may reduce number of rams by replaying IO READ on the
>>>>>> secondary.
>>>>>>
>>>>>
>>>>> I first think the objective of io-Write event-tap is to reproduce the
>>>>> same device state on the other side, though I doubt this,  so I think
>>>>> IO-Read also should be recorded and replayed. since event-tap is only
>>>>> to initial a ft transaction, the sync. of states still depend on
>>>>> qemu_save_vm_live/full,  I understand the design now, thanks.
>>>>>
>>>>> but I don't understand why io-write event-tap can reduce transfered
>>>>> rams as you mentioned, the amount of rams only depend on dirty pages,
>>>>> IO write don't change the normal process unlike block write, right?
>>>>
>>>> The point is, if we can assure that IO read retrieves the same data on
>>>> both
>>>> sides, instead of dirtying the ram by read, meaning we have to transfer
>>>> in
>>>> the transaction, just replay the operation and get the same data on the
>>>> otherside. Anyway, that's just a plan :)
>>>>
>>>> Thanks,
>>>>
>>>> Yoshi
>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yoshi
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Green.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2011/2/24 Yoshiaki Tamura<tamura.yoshiaki@lab.ntt.co.jp>:
>>>>>>>>
>>>>>>>> event-tap controls when to start FT transaction, and provides proxy
>>>>>>>> functions to called from net/block devices.  While FT transaction, it
>>>>>>>> queues up net/block requests, and flush them when the transaction
>>>>>>>> gets
>>>>>>>> completed.
>>>>>>>>
>>>>>>>> Signed-off-by: Yoshiaki Tamura<tamura.yoshiaki@lab.ntt.co.jp>
>>>>>>>> Signed-off-by: OHMURA Kei<ohmura.kei@lab.ntt.co.jp>
>>>>>>>> ---
>>>>>>>>   Makefile.target |    1 +
>>>>>>>>   event-tap.c     |  940
>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>   event-tap.h     |   44 +++
>>>>>>>>   qemu-tool.c     |   28 ++
>>>>>>>>   trace-events    |   10 +
>>>>>>>>   5 files changed, 1023 insertions(+), 0 deletions(-)
>>>>>>>>   create mode 100644 event-tap.c
>>>>>>>>   create mode 100644 event-tap.h
>>>>>>>>
>>>>>>>> diff --git a/Makefile.target b/Makefile.target
>>>>>>>> index 220589e..da57efe 100644
>>>>>>>> --- a/Makefile.target
>>>>>>>> +++ b/Makefile.target
>>>>>>>> @@ -199,6 +199,7 @@ obj-y += rwhandler.o
>>>>>>>>   obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>>>>>>>>   obj-$(CONFIG_NO_KVM) += kvm-stub.o
>>>>>>>>   LIBS+=-lz
>>>>>>>> +obj-y += event-tap.o
>>>>>>>>
>>>>>>>>   QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>>>>>>>>   QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
>>>>>>>> diff --git a/event-tap.c b/event-tap.c
>>>>>>>> new file mode 100644
>>>>>>>> index 0000000..95c147a
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/event-tap.c
>>>>>>>> @@ -0,0 +1,940 @@
>>>>>>>> +/*
>>>>>>>> + * Event Tap functions for QEMU
>>>>>>>> + *
>>>>>>>> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
>>>>>>>> + *
>>>>>>>> + * This work is licensed under the terms of the GNU GPL, version 2.
>>>>>>>>   See
>>>>>>>> + * the COPYING file in the top-level directory.
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +#include "qemu-common.h"
>>>>>>>> +#include "qemu-error.h"
>>>>>>>> +#include "block.h"
>>>>>>>> +#include "block_int.h"
>>>>>>>> +#include "ioport.h"
>>>>>>>> +#include "osdep.h"
>>>>>>>> +#include "sysemu.h"
>>>>>>>> +#include "hw/hw.h"
>>>>>>>> +#include "net.h"
>>>>>>>> +#include "event-tap.h"
>>>>>>>> +#include "trace.h"
>>>>>>>> +
>>>>>>>> +enum EVENT_TAP_STATE {
>>>>>>>> +    EVENT_TAP_OFF,
>>>>>>>> +    EVENT_TAP_ON,
>>>>>>>> +    EVENT_TAP_SUSPEND,
>>>>>>>> +    EVENT_TAP_FLUSH,
>>>>>>>> +    EVENT_TAP_LOAD,
>>>>>>>> +    EVENT_TAP_REPLAY,
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
>>>>>>>> +
>>>>>>>> +typedef struct EventTapIOport {
>>>>>>>> +    uint32_t address;
>>>>>>>> +    uint32_t data;
>>>>>>>> +    int      index;
>>>>>>>> +} EventTapIOport;
>>>>>>>> +
>>>>>>>> +#define MMIO_BUF_SIZE 8
>>>>>>>> +
>>>>>>>> +typedef struct EventTapMMIO {
>>>>>>>> +    uint64_t address;
>>>>>>>> +    uint8_t  buf[MMIO_BUF_SIZE];
>>>>>>>> +    int      len;
>>>>>>>> +} EventTapMMIO;
>>>>>>>> +
>>>>>>>> +typedef struct EventTapNetReq {
>>>>>>>> +    char *device_name;
>>>>>>>> +    int iovcnt;
>>>>>>>> +    int vlan_id;
>>>>>>>> +    bool vlan_needed;
>>>>>>>> +    bool async;
>>>>>>>> +    struct iovec *iov;
>>>>>>>> +    NetPacketSent *sent_cb;
>>>>>>>> +} EventTapNetReq;
>>>>>>>> +
>>>>>>>> +#define MAX_BLOCK_REQUEST 32
>>>>>>>> +
>>>>>>>> +typedef struct EventTapAIOCB EventTapAIOCB;
>>>>>>>> +
>>>>>>>> +typedef struct EventTapBlkReq {
>>>>>>>> +    char *device_name;
>>>>>>>> +    int num_reqs;
>>>>>>>> +    int num_cbs;
>>>>>>>> +    bool is_flush;
>>>>>>>> +    BlockRequest reqs[MAX_BLOCK_REQUEST];
>>>>>>>> +    EventTapAIOCB *acb[MAX_BLOCK_REQUEST];
>>>>>>>> +} EventTapBlkReq;
>>>>>>>> +
>>>>>>>> +#define EVENT_TAP_IOPORT (1<<        0)
>>>>>>>> +#define EVENT_TAP_MMIO   (1<<        1)
>>>>>>>> +#define EVENT_TAP_NET    (1<<        2)
>>>>>>>> +#define EVENT_TAP_BLK    (1<<        3)
>>>>>>>> +
>>>>>>>> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
>>>>>>>> +
>>>>>>>> +typedef struct EventTapLog {
>>>>>>>> +    int mode;
>>>>>>>> +    union {
>>>>>>>> +        EventTapIOport ioport;
>>>>>>>> +        EventTapMMIO mmio;
>>>>>>>> +    };
>>>>>>>> +    union {
>>>>>>>> +        EventTapNetReq net_req;
>>>>>>>> +        EventTapBlkReq blk_req;
>>>>>>>> +    };
>>>>>>>> +    QTAILQ_ENTRY(EventTapLog) node;
>>>>>>>> +} EventTapLog;
>>>>>>>> +
>>>>>>>> +struct EventTapAIOCB {
>>>>>>>> +    BlockDriverAIOCB common;
>>>>>>>> +    BlockDriverAIOCB *acb;
>>>>>>>> +    bool is_canceled;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static EventTapLog *last_event_tap;
>>>>>>>> +
>>>>>>>> +static QTAILQ_HEAD(, EventTapLog) event_list;
>>>>>>>> +static QTAILQ_HEAD(, EventTapLog) event_pool;
>>>>>>>> +
>>>>>>>> +static int (*event_tap_cb)(void);
>>>>>>>> +static QEMUBH *event_tap_bh;
>>>>>>>> +static VMChangeStateEntry *vmstate;
>>>>>>>> +
>>>>>>>> +static void event_tap_bh_cb(void *p)
>>>>>>>> +{
>>>>>>>> +    if (event_tap_cb) {
>>>>>>>> +        event_tap_cb();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    qemu_bh_delete(event_tap_bh);
>>>>>>>> +    event_tap_bh = NULL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_schedule_bh(void)
>>>>>>>> +{
>>>>>>>> +    trace_event_tap_ignore_bh(!!event_tap_bh);
>>>>>>>> +
>>>>>>>> +    /* if bh is already set, we ignore it for now */
>>>>>>>> +    if (event_tap_bh) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
>>>>>>>> +    qemu_bh_schedule(event_tap_bh);
>>>>>>>> +
>>>>>>>> +    return;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void *event_tap_alloc_log(void)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log;
>>>>>>>> +
>>>>>>>> +    if (QTAILQ_EMPTY(&event_pool)) {
>>>>>>>> +        log = qemu_mallocz(sizeof(EventTapLog));
>>>>>>>> +    } else {
>>>>>>>> +        log = QTAILQ_FIRST(&event_pool);
>>>>>>>> +        QTAILQ_REMOVE(&event_pool, log, node);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return log;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_free_net_req(EventTapNetReq *net_req);
>>>>>>>> +static void event_tap_free_blk_req(EventTapBlkReq *blk_req);
>>>>>>>> +
>>>>>>>> +static void event_tap_free_log(EventTapLog *log)
>>>>>>>> +{
>>>>>>>> +    int mode = log->mode&        ~EVENT_TAP_TYPE_MASK;
>>>>>>>> +
>>>>>>>> +    if (mode == EVENT_TAP_NET) {
>>>>>>>> +        event_tap_free_net_req(&log->net_req);
>>>>>>>> +    } else if (mode == EVENT_TAP_BLK) {
>>>>>>>> +        event_tap_free_blk_req(&log->blk_req);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    log->mode = 0;
>>>>>>>> +
>>>>>>>> +    /* return the log to event_pool */
>>>>>>>> +    QTAILQ_INSERT_HEAD(&event_pool, log, node);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_free_pool(void)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log, *next;
>>>>>>>> +
>>>>>>>> +    QTAILQ_FOREACH_SAFE(log,&event_pool, node, next) {
>>>>>>>> +        QTAILQ_REMOVE(&event_pool, log, node);
>>>>>>>> +        qemu_free(log);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_free_net_req(EventTapNetReq *net_req)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    if (!net_req->async) {
>>>>>>>> +        for (i = 0; i<        net_req->iovcnt; i++) {
>>>>>>>> +            qemu_free(net_req->iov[i].iov_base);
>>>>>>>> +        }
>>>>>>>> +        qemu_free(net_req->iov);
>>>>>>>> +    } else if (event_tap_state>= EVENT_TAP_LOAD) {
>>>>>>>> +        qemu_free(net_req->iov);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    qemu_free(net_req->device_name);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_alloc_net_req(EventTapNetReq *net_req,
>>>>>>>> +                                   VLANClientState *vc,
>>>>>>>> +                                   const struct iovec *iov, int
>>>>>>>> iovcnt,
>>>>>>>> +                                   NetPacketSent *sent_cb, bool
>>>>>>>> async)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    net_req->iovcnt = iovcnt;
>>>>>>>> +    net_req->async = async;
>>>>>>>> +    net_req->device_name = qemu_strdup(vc->name);
>>>>>>>> +    net_req->sent_cb = sent_cb;
>>>>>>>> +
>>>>>>>> +    if (vc->vlan) {
>>>>>>>> +        net_req->vlan_needed = 1;
>>>>>>>> +        net_req->vlan_id = vc->vlan->id;
>>>>>>>> +    } else {
>>>>>>>> +        net_req->vlan_needed = 0;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (async) {
>>>>>>>> +        net_req->iov = (struct iovec *)iov;
>>>>>>>> +    } else {
>>>>>>>> +        net_req->iov = qemu_malloc(sizeof(struct iovec) * iovcnt);
>>>>>>>> +        for (i = 0; i<        iovcnt; i++) {
>>>>>>>> +            net_req->iov[i].iov_base = qemu_malloc(iov[i].iov_len);
>>>>>>>> +            memcpy(net_req->iov[i].iov_base, iov[i].iov_base,
>>>>>>>> iov[i].iov_len);
>>>>>>>> +            net_req->iov[i].iov_len = iov[i].iov_len;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_packet(VLANClientState *vc, const struct iovec
>>>>>>>> *iov,
>>>>>>>> +                            int iovcnt, NetPacketSent *sent_cb, bool
>>>>>>>> async)
>>>>>>>> +{
>>>>>>>> +    int empty;
>>>>>>>> +    EventTapLog *log = last_event_tap;
>>>>>>>> +
>>>>>>>> +    if (!log) {
>>>>>>>> +        trace_event_tap_no_event();
>>>>>>>> +        log = event_tap_alloc_log();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (log->mode&        ~EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        trace_event_tap_already_used(log->mode&
>>>>>>>>   ~EVENT_TAP_TYPE_MASK);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    log->mode |= EVENT_TAP_NET;
>>>>>>>> +    event_tap_alloc_net_req(&log->net_req, vc, iov, iovcnt, sent_cb,
>>>>>>>> async);
>>>>>>>> +
>>>>>>>> +    empty = QTAILQ_EMPTY(&event_list);
>>>>>>>> +    QTAILQ_INSERT_TAIL(&event_list, log, node);
>>>>>>>> +    last_event_tap = NULL;
>>>>>>>> +
>>>>>>>> +    if (empty) {
>>>>>>>> +        event_tap_schedule_bh();
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_send_packet(VLANClientState *vc, const uint8_t *buf,
>>>>>>>> int
>>>>>>>> size)
>>>>>>>> +{
>>>>>>>> +    struct iovec iov;
>>>>>>>> +
>>>>>>>> +    assert(event_tap_state == EVENT_TAP_ON);
>>>>>>>> +
>>>>>>>> +    iov.iov_base = (uint8_t *)buf;
>>>>>>>> +    iov.iov_len = size;
>>>>>>>> +    event_tap_packet(vc,&iov, 1, NULL, 0);
>>>>>>>> +
>>>>>>>> +    return;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +ssize_t event_tap_sendv_packet_async(VLANClientState *vc,
>>>>>>>> +                                     const struct iovec *iov,
>>>>>>>> +                                     int iovcnt, NetPacketSent
>>>>>>>> *sent_cb)
>>>>>>>> +{
>>>>>>>> +    assert(event_tap_state == EVENT_TAP_ON);
>>>>>>>> +    event_tap_packet(vc, iov, iovcnt, sent_cb, 1);
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_net_flush(EventTapNetReq *net_req)
>>>>>>>> +{
>>>>>>>> +    VLANClientState *vc;
>>>>>>>> +    ssize_t len;
>>>>>>>> +
>>>>>>>> +    if (net_req->vlan_needed) {
>>>>>>>> +        vc = qemu_find_vlan_client_by_name(NULL, net_req->vlan_id,
>>>>>>>> +                                           net_req->device_name);
>>>>>>>> +    } else {
>>>>>>>> +        vc = qemu_find_netdev(net_req->device_name);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (net_req->async) {
>>>>>>>> +        len = qemu_sendv_packet_async(vc, net_req->iov,
>>>>>>>> net_req->iovcnt,
>>>>>>>> +                                      net_req->sent_cb);
>>>>>>>> +        if (len) {
>>>>>>>> +            net_req->sent_cb(vc, len);
>>>>>>>> +        } else {
>>>>>>>> +            /* packets are queued in the net layer */
>>>>>>>> +            trace_event_tap_append_packet();
>>>>>>>> +        }
>>>>>>>> +    } else {
>>>>>>>> +        qemu_send_packet(vc, net_req->iov[0].iov_base,
>>>>>>>> +                         net_req->iov[0].iov_len);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    /* force flush to avoid request inversion */
>>>>>>>> +    qemu_aio_flush();
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_net_save(QEMUFile *f, EventTapNetReq *net_req)
>>>>>>>> +{
>>>>>>>> +    ram_addr_t page_addr;
>>>>>>>> +    int i, len;
>>>>>>>> +
>>>>>>>> +    len = strlen(net_req->device_name);
>>>>>>>> +    qemu_put_byte(f, len);
>>>>>>>> +    qemu_put_buffer(f, (uint8_t *)net_req->device_name, len);
>>>>>>>> +    qemu_put_byte(f, net_req->vlan_id);
>>>>>>>> +    qemu_put_byte(f, net_req->vlan_needed);
>>>>>>>> +    qemu_put_byte(f, net_req->async);
>>>>>>>> +    qemu_put_be32(f, net_req->iovcnt);
>>>>>>>> +
>>>>>>>> +    for (i = 0; i<        net_req->iovcnt; i++) {
>>>>>>>> +        qemu_put_be64(f, net_req->iov[i].iov_len);
>>>>>>>> +        if (net_req->async) {
>>>>>>>> +            page_addr =
>>>>>>>> +
>>>>>>>>   qemu_ram_addr_from_host_nofail(net_req->iov[i].iov_base);
>>>>>>>> +            qemu_put_be64(f, page_addr);
>>>>>>>> +        } else {
>>>>>>>> +            qemu_put_buffer(f, (uint8_t *)net_req->iov[i].iov_base,
>>>>>>>> +                            net_req->iov[i].iov_len);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_net_load(QEMUFile *f, EventTapNetReq *net_req)
>>>>>>>> +{
>>>>>>>> +    ram_addr_t page_addr;
>>>>>>>> +    int i, len;
>>>>>>>> +
>>>>>>>> +    len = qemu_get_byte(f);
>>>>>>>> +    net_req->device_name = qemu_malloc(len + 1);
>>>>>>>> +    qemu_get_buffer(f, (uint8_t *)net_req->device_name, len);
>>>>>>>> +    net_req->device_name[len] = '\0';
>>>>>>>> +    net_req->vlan_id = qemu_get_byte(f);
>>>>>>>> +    net_req->vlan_needed = qemu_get_byte(f);
>>>>>>>> +    net_req->async = qemu_get_byte(f);
>>>>>>>> +    net_req->iovcnt = qemu_get_be32(f);
>>>>>>>> +    net_req->iov = qemu_malloc(sizeof(struct iovec) *
>>>>>>>> net_req->iovcnt);
>>>>>>>> +
>>>>>>>> +    for (i = 0; i<        net_req->iovcnt; i++) {
>>>>>>>> +        net_req->iov[i].iov_len = qemu_get_be64(f);
>>>>>>>> +        if (net_req->async) {
>>>>>>>> +            page_addr = qemu_get_be64(f);
>>>>>>>> +            net_req->iov[i].iov_base = qemu_get_ram_ptr(page_addr);
>>>>>>>> +        } else {
>>>>>>>> +            net_req->iov[i].iov_base =
>>>>>>>> qemu_malloc(net_req->iov[i].iov_len);
>>>>>>>> +            qemu_get_buffer(f, (uint8_t *)net_req->iov[i].iov_base,
>>>>>>>> +                            net_req->iov[i].iov_len);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_free_blk_req(EventTapBlkReq *blk_req)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    if (event_tap_state>= EVENT_TAP_LOAD&&        !blk_req->is_flush)
>>>>>>>> {
>>>>>>>> +        for (i = 0; i<        blk_req->num_reqs; i++) {
>>>>>>>> +            qemu_iovec_destroy(blk_req->reqs[i].qiov);
>>>>>>>> +            qemu_free(blk_req->reqs[i].qiov);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    qemu_free(blk_req->device_name);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_blk_cb(void *opaque, int ret)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log = container_of(opaque, EventTapLog, blk_req);
>>>>>>>> +    EventTapBlkReq *blk_req = opaque;
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    blk_req->num_cbs--;
>>>>>>>> +
>>>>>>>> +    /* all outstanding requests are flushed */
>>>>>>>> +    if (blk_req->num_cbs == 0) {
>>>>>>>> +        for (i = 0; i<        blk_req->num_reqs; i++) {
>>>>>>>> +            EventTapAIOCB *eacb = blk_req->acb[i];
>>>>>>>> +            eacb->common.cb(eacb->common.opaque, ret);
>>>>>>>> +            qemu_aio_release(eacb);
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        event_tap_free_log(log);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_bdrv_aio_cancel(BlockDriverAIOCB *acb)
>>>>>>>> +{
>>>>>>>> +    EventTapAIOCB *eacb = container_of(acb, EventTapAIOCB, common);
>>>>>>>> +
>>>>>>>> +    /* check if already passed to block layer */
>>>>>>>> +    if (eacb->acb) {
>>>>>>>> +        bdrv_aio_cancel(eacb->acb);
>>>>>>>> +    } else {
>>>>>>>> +        eacb->is_canceled = 1;
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static AIOPool event_tap_aio_pool = {
>>>>>>>> +    .aiocb_size = sizeof(EventTapAIOCB),
>>>>>>>> +    .cancel     = event_tap_bdrv_aio_cancel,
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
>>>>>>>> +                                    BlockDriverState *bs,
>>>>>>>> BlockRequest
>>>>>>>> *reqs,
>>>>>>>> +                                    int num_reqs, void *opaque, bool
>>>>>>>> is_flush)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    blk_req->num_reqs = num_reqs;
>>>>>>>> +    blk_req->num_cbs = num_reqs;
>>>>>>>> +    blk_req->device_name = qemu_strdup(bs->device_name);
>>>>>>>> +    blk_req->is_flush = is_flush;
>>>>>>>> +
>>>>>>>> +    for (i = 0; i<        num_reqs; i++) {
>>>>>>>> +        blk_req->reqs[i].sector = reqs[i].sector;
>>>>>>>> +        blk_req->reqs[i].nb_sectors = reqs[i].nb_sectors;
>>>>>>>> +        blk_req->reqs[i].qiov = reqs[i].qiov;
>>>>>>>> +        blk_req->reqs[i].cb = event_tap_blk_cb;
>>>>>>>> +        blk_req->reqs[i].opaque = opaque;
>>>>>>>> +
>>>>>>>> +        blk_req->acb[i] = qemu_aio_get(&event_tap_aio_pool, bs,
>>>>>>>> +                                       reqs[i].cb, reqs[i].opaque);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static EventTapBlkReq *event_tap_bdrv(BlockDriverState *bs,
>>>>>>>> BlockRequest
>>>>>>>> *reqs,
>>>>>>>> +                                      int num_reqs, bool is_flush)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log = last_event_tap;
>>>>>>>> +    int empty;
>>>>>>>> +
>>>>>>>> +    if (!log) {
>>>>>>>> +        trace_event_tap_no_event();
>>>>>>>> +        log = event_tap_alloc_log();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (log->mode&        ~EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        trace_event_tap_already_used(log->mode&
>>>>>>>>   ~EVENT_TAP_TYPE_MASK);
>>>>>>>> +        return NULL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    log->mode |= EVENT_TAP_BLK;
>>>>>>>> +    event_tap_alloc_blk_req(&log->blk_req, bs, reqs,
>>>>>>>> +                            num_reqs,&log->blk_req, is_flush);
>>>>>>>> +
>>>>>>>> +    empty = QTAILQ_EMPTY(&event_list);
>>>>>>>> +    QTAILQ_INSERT_TAIL(&event_list, log, node);
>>>>>>>> +    last_event_tap = NULL;
>>>>>>>> +
>>>>>>>> +    if (empty) {
>>>>>>>> +        event_tap_schedule_bh();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return&log->blk_req;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +BlockDriverAIOCB *event_tap_bdrv_aio_writev(BlockDriverState *bs,
>>>>>>>> +                                            int64_t sector_num,
>>>>>>>> +                                            QEMUIOVector *iov,
>>>>>>>> +                                            int nb_sectors,
>>>>>>>> +
>>>>>>>>   BlockDriverCompletionFunc
>>>>>>>> *cb,
>>>>>>>> +                                            void *opaque)
>>>>>>>> +{
>>>>>>>> +    BlockRequest req;
>>>>>>>> +    EventTapBlkReq *ereq;
>>>>>>>> +
>>>>>>>> +    assert(event_tap_state == EVENT_TAP_ON);
>>>>>>>> +
>>>>>>>> +    req.sector = sector_num;
>>>>>>>> +    req.nb_sectors = nb_sectors;
>>>>>>>> +    req.qiov = iov;
>>>>>>>> +    req.cb = cb;
>>>>>>>> +    req.opaque = opaque;
>>>>>>>> +    ereq = event_tap_bdrv(bs,&req, 1, 0);
>>>>>>>> +
>>>>>>>> +    return&ereq->acb[0]->common;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +BlockDriverAIOCB *event_tap_bdrv_aio_flush(BlockDriverState *bs,
>>>>>>>> +                                           BlockDriverCompletionFunc
>>>>>>>> *cb,
>>>>>>>> +                                           void *opaque)
>>>>>>>> +{
>>>>>>>> +    BlockRequest req;
>>>>>>>> +    EventTapBlkReq *ereq;
>>>>>>>> +
>>>>>>>> +    assert(event_tap_state == EVENT_TAP_ON);
>>>>>>>> +
>>>>>>>> +    memset(&req, 0, sizeof(req));
>>>>>>>> +    req.cb = cb;
>>>>>>>> +    req.opaque = opaque;
>>>>>>>> +    ereq = event_tap_bdrv(bs,&req, 1, 1);
>>>>>>>> +
>>>>>>>> +    return&ereq->acb[0]->common;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_bdrv_flush(void)
>>>>>>>> +{
>>>>>>>> +    qemu_bh_cancel(event_tap_bh);
>>>>>>>> +
>>>>>>>> +    while (!QTAILQ_EMPTY(&event_list)) {
>>>>>>>> +        event_tap_cb();
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_blk_flush(EventTapBlkReq *blk_req)
>>>>>>>> +{
>>>>>>>> +    int i, ret;
>>>>>>>> +
>>>>>>>> +    for (i = 0; i<        blk_req->num_reqs; i++) {
>>>>>>>> +        BlockRequest *req =&blk_req->reqs[i];
>>>>>>>> +        EventTapAIOCB *eacb = blk_req->acb[i];
>>>>>>>> +        BlockDriverAIOCB *acb =&eacb->common;
>>>>>>>> +
>>>>>>>> +        /* don't flush if canceled */
>>>>>>>> +        if (eacb->is_canceled) {
>>>>>>>> +            continue;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        /* receiver needs to restore bs from device name */
>>>>>>>> +        if (!acb->bs) {
>>>>>>>> +            acb->bs = bdrv_find(blk_req->device_name);
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        if (blk_req->is_flush) {
>>>>>>>> +            eacb->acb = bdrv_aio_flush(acb->bs, req->cb,
>>>>>>>> req->opaque);
>>>>>>>> +            if (!eacb->acb) {
>>>>>>>> +                req->cb(req->opaque, -EIO);
>>>>>>>> +            }
>>>>>>>> +            return;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        eacb->acb = bdrv_aio_writev(acb->bs, req->sector, req->qiov,
>>>>>>>> +                                    req->nb_sectors, req->cb,
>>>>>>>> req->opaque);
>>>>>>>> +        if (!eacb->acb) {
>>>>>>>> +            req->cb(req->opaque, -EIO);
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        /* force flush to avoid request inversion */
>>>>>>>> +        qemu_aio_flush();
>>>>>>>> +        ret = bdrv_flush(acb->bs);
>>>>>>>> +        if (ret<        0) {
>>>>>>>> +            error_report("flushing blk_req to %s failed",
>>>>>>>> blk_req->device_name);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_blk_save(QEMUFile *f, EventTapBlkReq *blk_req)
>>>>>>>> +{
>>>>>>>> +    ram_addr_t page_addr;
>>>>>>>> +    int i, j, len;
>>>>>>>> +
>>>>>>>> +    len = strlen(blk_req->device_name);
>>>>>>>> +    qemu_put_byte(f, len);
>>>>>>>> +    qemu_put_buffer(f, (uint8_t *)blk_req->device_name, len);
>>>>>>>> +    qemu_put_byte(f, blk_req->num_reqs);
>>>>>>>> +    qemu_put_byte(f, blk_req->is_flush);
>>>>>>>> +
>>>>>>>> +    if (blk_req->is_flush) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    for (i = 0; i<        blk_req->num_reqs; i++) {
>>>>>>>> +        BlockRequest *req =&blk_req->reqs[i];
>>>>>>>> +        EventTapAIOCB *eacb = blk_req->acb[i];
>>>>>>>> +        /* don't save canceled requests */
>>>>>>>> +        if (eacb->is_canceled) {
>>>>>>>> +            continue;
>>>>>>>> +        }
>>>>>>>> +        qemu_put_be64(f, req->sector);
>>>>>>>> +        qemu_put_be32(f, req->nb_sectors);
>>>>>>>> +        qemu_put_be32(f, req->qiov->niov);
>>>>>>>> +
>>>>>>>> +        for (j = 0; j<        req->qiov->niov; j++) {
>>>>>>>> +            page_addr =
>>>>>>>> +
>>>>>>>>   qemu_ram_addr_from_host_nofail(req->qiov->iov[j].iov_base);
>>>>>>>> +            qemu_put_be64(f, page_addr);
>>>>>>>> +            qemu_put_be64(f, req->qiov->iov[j].iov_len);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_blk_load(QEMUFile *f, EventTapBlkReq *blk_req)
>>>>>>>> +{
>>>>>>>> +    BlockRequest *req;
>>>>>>>> +    ram_addr_t page_addr;
>>>>>>>> +    int i, j, len, niov;
>>>>>>>> +
>>>>>>>> +    len = qemu_get_byte(f);
>>>>>>>> +    blk_req->device_name = qemu_malloc(len + 1);
>>>>>>>> +    qemu_get_buffer(f, (uint8_t *)blk_req->device_name, len);
>>>>>>>> +    blk_req->device_name[len] = '\0';
>>>>>>>> +    blk_req->num_reqs = qemu_get_byte(f);
>>>>>>>> +    blk_req->is_flush = qemu_get_byte(f);
>>>>>>>> +
>>>>>>>> +    if (blk_req->is_flush) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    for (i = 0; i<        blk_req->num_reqs; i++) {
>>>>>>>> +        req =&blk_req->reqs[i];
>>>>>>>> +        req->sector = qemu_get_be64(f);
>>>>>>>> +        req->nb_sectors = qemu_get_be32(f);
>>>>>>>> +        req->qiov = qemu_mallocz(sizeof(QEMUIOVector));
>>>>>>>> +        niov = qemu_get_be32(f);
>>>>>>>> +        qemu_iovec_init(req->qiov, niov);
>>>>>>>> +
>>>>>>>> +        for (j = 0; j<        niov; j++) {
>>>>>>>> +            void *iov_base;
>>>>>>>> +            size_t iov_len;
>>>>>>>> +            page_addr = qemu_get_be64(f);
>>>>>>>> +            iov_base = qemu_get_ram_ptr(page_addr);
>>>>>>>> +            iov_len = qemu_get_be64(f);
>>>>>>>> +            qemu_iovec_add(req->qiov, iov_base, iov_len);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_ioport(int index, uint32_t address, uint32_t data)
>>>>>>>> +{
>>>>>>>> +    if (event_tap_state != EVENT_TAP_ON) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (!last_event_tap) {
>>>>>>>> +        last_event_tap = event_tap_alloc_log();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    last_event_tap->mode = EVENT_TAP_IOPORT;
>>>>>>>> +    last_event_tap->ioport.index = index;
>>>>>>>> +    last_event_tap->ioport.address = address;
>>>>>>>> +    last_event_tap->ioport.data = data;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static inline void event_tap_ioport_save(QEMUFile *f, EventTapIOport
>>>>>>>> *ioport)
>>>>>>>> +{
>>>>>>>> +    qemu_put_be32(f, ioport->index);
>>>>>>>> +    qemu_put_be32(f, ioport->address);
>>>>>>>> +    qemu_put_byte(f, ioport->data);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static inline void event_tap_ioport_load(QEMUFile *f,
>>>>>>>> +                                         EventTapIOport *ioport)
>>>>>>>> +{
>>>>>>>> +    ioport->index = qemu_get_be32(f);
>>>>>>>> +    ioport->address = qemu_get_be32(f);
>>>>>>>> +    ioport->data = qemu_get_byte(f);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_mmio(uint64_t address, uint8_t *buf, int len)
>>>>>>>> +{
>>>>>>>> +    if (event_tap_state != EVENT_TAP_ON || len>        MMIO_BUF_SIZE)
>>>>>>>> {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (!last_event_tap) {
>>>>>>>> +        last_event_tap = event_tap_alloc_log();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    last_event_tap->mode = EVENT_TAP_MMIO;
>>>>>>>> +    last_event_tap->mmio.address = address;
>>>>>>>> +    last_event_tap->mmio.len = len;
>>>>>>>> +    memcpy(last_event_tap->mmio.buf, buf, len);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static inline void event_tap_mmio_save(QEMUFile *f, EventTapMMIO
>>>>>>>> *mmio)
>>>>>>>> +{
>>>>>>>> +    qemu_put_be64(f, mmio->address);
>>>>>>>> +    qemu_put_byte(f, mmio->len);
>>>>>>>> +    qemu_put_buffer(f, mmio->buf, mmio->len);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static inline void event_tap_mmio_load(QEMUFile *f, EventTapMMIO
>>>>>>>> *mmio)
>>>>>>>> +{
>>>>>>>> +    mmio->address = qemu_get_be64(f);
>>>>>>>> +    mmio->len = qemu_get_byte(f);
>>>>>>>> +    qemu_get_buffer(f, mmio->buf, mmio->len);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int event_tap_register(int (*cb)(void))
>>>>>>>> +{
>>>>>>>> +    if (event_tap_state != EVENT_TAP_OFF) {
>>>>>>>> +        error_report("event-tap is already on");
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (!cb || event_tap_cb) {
>>>>>>>> +        error_report("can't set event_tap_cb");
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    event_tap_cb = cb;
>>>>>>>> +    event_tap_state = EVENT_TAP_ON;
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_unregister(void)
>>>>>>>> +{
>>>>>>>> +    if (event_tap_state == EVENT_TAP_OFF) {
>>>>>>>> +        error_report("event-tap is already off");
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    qemu_del_vm_change_state_handler(vmstate);
>>>>>>>> +
>>>>>>>> +    event_tap_flush();
>>>>>>>> +    event_tap_free_pool();
>>>>>>>> +
>>>>>>>> +    event_tap_state = EVENT_TAP_OFF;
>>>>>>>> +    event_tap_cb = NULL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int event_tap_is_on(void)
>>>>>>>> +{
>>>>>>>> +    return (event_tap_state == EVENT_TAP_ON);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_suspend(void *opaque, int running, int reason)
>>>>>>>> +{
>>>>>>>> +    event_tap_state = running ? EVENT_TAP_ON : EVENT_TAP_SUSPEND;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* returns 1 if the queue gets emtpy */
>>>>>>>> +int event_tap_flush_one(void)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log;
>>>>>>>> +    int ret;
>>>>>>>> +
>>>>>>>> +    if (QTAILQ_EMPTY(&event_list)) {
>>>>>>>> +        return 1;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    event_tap_state = EVENT_TAP_FLUSH;
>>>>>>>> +
>>>>>>>> +    log = QTAILQ_FIRST(&event_list);
>>>>>>>> +    QTAILQ_REMOVE(&event_list, log, node);
>>>>>>>> +    switch (log->mode&        ~EVENT_TAP_TYPE_MASK) {
>>>>>>>> +    case EVENT_TAP_NET:
>>>>>>>> +        event_tap_net_flush(&log->net_req);
>>>>>>>> +        event_tap_free_log(log);
>>>>>>>> +        break;
>>>>>>>> +    case EVENT_TAP_BLK:
>>>>>>>> +        event_tap_blk_flush(&log->blk_req);
>>>>>>>> +        break;
>>>>>>>> +    default:
>>>>>>>> +        error_report("Unknown state %d", log->mode);
>>>>>>>> +        event_tap_free_log(log);
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    ret = QTAILQ_EMPTY(&event_list);
>>>>>>>> +    event_tap_state = ret ? EVENT_TAP_ON : EVENT_TAP_FLUSH;
>>>>>>>> +
>>>>>>>> +    return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_flush(void)
>>>>>>>> +{
>>>>>>>> +    int ret;
>>>>>>>> +
>>>>>>>> +    do {
>>>>>>>> +        ret = event_tap_flush_one();
>>>>>>>> +    } while (ret == 0);
>>>>>>>> +
>>>>>>>> +    if (ret<        0) {
>>>>>>>> +        error_report("error flushing event-tap requests");
>>>>>>>> +        abort();
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_replay(void *opaque, int running, int reason)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log, *next;
>>>>>>>> +
>>>>>>>> +    if (!running) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    assert(event_tap_state == EVENT_TAP_LOAD);
>>>>>>>> +
>>>>>>>> +    event_tap_state = EVENT_TAP_REPLAY;
>>>>>>>> +
>>>>>>>> +    QTAILQ_FOREACH(log,&event_list, node) {
>>>>>>>> +        if ((log->mode&        ~EVENT_TAP_TYPE_MASK) == EVENT_TAP_NET)
>>>>>>>> {
>>>>>>>> +            EventTapNetReq *net_req =&log->net_req;
>>>>>>>> +            if (!net_req->async) {
>>>>>>>> +                event_tap_net_flush(net_req);
>>>>>>>> +                continue;
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        switch (log->mode&        EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        case EVENT_TAP_IOPORT:
>>>>>>>> +            switch (log->ioport.index) {
>>>>>>>> +            case 0:
>>>>>>>> +                cpu_outb(log->ioport.address, log->ioport.data);
>>>>>>>> +                break;
>>>>>>>> +            case 1:
>>>>>>>> +                cpu_outw(log->ioport.address, log->ioport.data);
>>>>>>>> +                break;
>>>>>>>> +            case 2:
>>>>>>>> +                cpu_outl(log->ioport.address, log->ioport.data);
>>>>>>>> +                break;
>>>>>>>> +            }
>>>>>>>> +            break;
>>>>>>>> +        case EVENT_TAP_MMIO:
>>>>>>>> +            cpu_physical_memory_rw(log->mmio.address,
>>>>>>>> +                                   log->mmio.buf,
>>>>>>>> +                                   log->mmio.len, 1);
>>>>>>>> +            break;
>>>>>>>> +        case 0:
>>>>>>>> +            trace_event_tap_replay_no_event();
>>>>>>>> +            break;
>>>>>>>> +        default:
>>>>>>>> +            error_report("Unknown state %d", log->mode);
>>>>>>>> +            QTAILQ_REMOVE(&event_list, log, node);
>>>>>>>> +            event_tap_free_log(log);
>>>>>>>> +            return;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    /* remove event logs from queue */
>>>>>>>> +    QTAILQ_FOREACH_SAFE(log,&event_list, node, next) {
>>>>>>>> +        QTAILQ_REMOVE(&event_list, log, node);
>>>>>>>> +        event_tap_free_log(log);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    event_tap_state = EVENT_TAP_OFF;
>>>>>>>> +    qemu_del_vm_change_state_handler(vmstate);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void event_tap_save(QEMUFile *f, void *opaque)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log;
>>>>>>>> +
>>>>>>>> +    QTAILQ_FOREACH(log,&event_list, node) {
>>>>>>>> +        qemu_put_byte(f, log->mode);
>>>>>>>> +
>>>>>>>> +        switch (log->mode&        EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        case EVENT_TAP_IOPORT:
>>>>>>>> +            event_tap_ioport_save(f,&log->ioport);
>>>>>>>> +            break;
>>>>>>>> +        case EVENT_TAP_MMIO:
>>>>>>>> +            event_tap_mmio_save(f,&log->mmio);
>>>>>>>> +            break;
>>>>>>>> +        case 0:
>>>>>>>> +            trace_event_tap_save_no_event();
>>>>>>>> +            break;
>>>>>>>> +        default:
>>>>>>>> +            error_report("Unknown state %d", log->mode);
>>>>>>>> +            return;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        switch (log->mode&        ~EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        case EVENT_TAP_NET:
>>>>>>>> +            event_tap_net_save(f,&log->net_req);
>>>>>>>> +            break;
>>>>>>>> +        case EVENT_TAP_BLK:
>>>>>>>> +            event_tap_blk_save(f,&log->blk_req);
>>>>>>>> +            break;
>>>>>>>> +        default:
>>>>>>>> +            error_report("Unknown state %d", log->mode);
>>>>>>>> +            return;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    qemu_put_byte(f, 0); /* EOF */
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int event_tap_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>>> +{
>>>>>>>> +    EventTapLog *log, *next;
>>>>>>>> +    int mode;
>>>>>>>> +
>>>>>>>> +    event_tap_state = EVENT_TAP_LOAD;
>>>>>>>> +
>>>>>>>> +    QTAILQ_FOREACH_SAFE(log,&event_list, node, next) {
>>>>>>>> +        QTAILQ_REMOVE(&event_list, log, node);
>>>>>>>> +        event_tap_free_log(log);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    /* loop until EOF */
>>>>>>>> +    while ((mode = qemu_get_byte(f)) != 0) {
>>>>>>>> +        EventTapLog *log = event_tap_alloc_log();
>>>>>>>> +
>>>>>>>> +        log->mode = mode;
>>>>>>>> +        switch (log->mode&        EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        case EVENT_TAP_IOPORT:
>>>>>>>> +            event_tap_ioport_load(f,&log->ioport);
>>>>>>>> +            break;
>>>>>>>> +        case EVENT_TAP_MMIO:
>>>>>>>> +            event_tap_mmio_load(f,&log->mmio);
>>>>>>>> +            break;
>>>>>>>> +        case 0:
>>>>>>>> +            trace_event_tap_load_no_event();
>>>>>>>> +            break;
>>>>>>>> +        default:
>>>>>>>> +            error_report("Unknown state %d", log->mode);
>>>>>>>> +            event_tap_free_log(log);
>>>>>>>> +            return -EINVAL;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        switch (log->mode&        ~EVENT_TAP_TYPE_MASK) {
>>>>>>>> +        case EVENT_TAP_NET:
>>>>>>>> +            event_tap_net_load(f,&log->net_req);
>>>>>>>> +            break;
>>>>>>>> +        case EVENT_TAP_BLK:
>>>>>>>> +            event_tap_blk_load(f,&log->blk_req);
>>>>>>>> +            break;
>>>>>>>> +        default:
>>>>>>>> +            error_report("Unknown state %d", log->mode);
>>>>>>>> +            event_tap_free_log(log);
>>>>>>>> +            return -EINVAL;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        QTAILQ_INSERT_TAIL(&event_list, log, node);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_schedule_replay(void)
>>>>>>>> +{
>>>>>>>> +    vmstate = qemu_add_vm_change_state_handler(event_tap_replay,
>>>>>>>> NULL);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_schedule_suspend(void)
>>>>>>>> +{
>>>>>>>> +    vmstate = qemu_add_vm_change_state_handler(event_tap_suspend,
>>>>>>>> NULL);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_init(void)
>>>>>>>> +{
>>>>>>>> +    QTAILQ_INIT(&event_list);
>>>>>>>> +    QTAILQ_INIT(&event_pool);
>>>>>>>> +    register_savevm(NULL, "event-tap", 0, 1,
>>>>>>>> +                    event_tap_save, event_tap_load,&last_event_tap);
>>>>>>>> +}
>>>>>>>> diff --git a/event-tap.h b/event-tap.h
>>>>>>>> new file mode 100644
>>>>>>>> index 0000000..ab677f8
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/event-tap.h
>>>>>>>> @@ -0,0 +1,44 @@
>>>>>>>> +/*
>>>>>>>> + * Event Tap functions for QEMU
>>>>>>>> + *
>>>>>>>> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
>>>>>>>> + *
>>>>>>>> + * This work is licensed under the terms of the GNU GPL, version 2.
>>>>>>>>   See
>>>>>>>> + * the COPYING file in the top-level directory.
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +#ifndef EVENT_TAP_H
>>>>>>>> +#define EVENT_TAP_H
>>>>>>>> +
>>>>>>>> +#include "qemu-common.h"
>>>>>>>> +#include "net.h"
>>>>>>>> +#include "block.h"
>>>>>>>> +
>>>>>>>> +int event_tap_register(int (*cb)(void));
>>>>>>>> +void event_tap_unregister(void);
>>>>>>>> +int event_tap_is_on(void);
>>>>>>>> +void event_tap_schedule_suspend(void);
>>>>>>>> +void event_tap_ioport(int index, uint32_t address, uint32_t data);
>>>>>>>> +void event_tap_mmio(uint64_t address, uint8_t *buf, int len);
>>>>>>>> +void event_tap_init(void);
>>>>>>>> +void event_tap_flush(void);
>>>>>>>> +int event_tap_flush_one(void);
>>>>>>>> +void event_tap_schedule_replay(void);
>>>>>>>> +
>>>>>>>> +void event_tap_send_packet(VLANClientState *vc, const uint8_t *buf,
>>>>>>>> int
>>>>>>>> size);
>>>>>>>> +ssize_t event_tap_sendv_packet_async(VLANClientState *vc,
>>>>>>>> +                                     const struct iovec *iov,
>>>>>>>> +                                     int iovcnt, NetPacketSent
>>>>>>>> *sent_cb);
>>>>>>>> +
>>>>>>>> +BlockDriverAIOCB *event_tap_bdrv_aio_writev(BlockDriverState *bs,
>>>>>>>> +                                            int64_t sector_num,
>>>>>>>> +                                            QEMUIOVector *iov,
>>>>>>>> +                                            int nb_sectors,
>>>>>>>> +
>>>>>>>>   BlockDriverCompletionFunc
>>>>>>>> *cb,
>>>>>>>> +                                            void *opaque);
>>>>>>>> +BlockDriverAIOCB *event_tap_bdrv_aio_flush(BlockDriverState *bs,
>>>>>>>> +                                           BlockDriverCompletionFunc
>>>>>>>> *cb,
>>>>>>>> +                                           void *opaque);
>>>>>>>> +void event_tap_bdrv_flush(void);
>>>>>>>> +
>>>>>>>> +#endif
>>>>>>>> diff --git a/qemu-tool.c b/qemu-tool.c
>>>>>>>> index 392e1c9..3f71215 100644
>>>>>>>> --- a/qemu-tool.c
>>>>>>>> +++ b/qemu-tool.c
>>>>>>>> @@ -16,6 +16,7 @@
>>>>>>>>   #include "qemu-timer.h"
>>>>>>>>   #include "qemu-log.h"
>>>>>>>>   #include "sysemu.h"
>>>>>>>> +#include "event-tap.h"
>>>>>>>>
>>>>>>>>   #include<sys/time.h>
>>>>>>>>
>>>>>>>> @@ -111,3 +112,30 @@ int qemu_set_fd_handler2(int fd,
>>>>>>>>   {
>>>>>>>>      return 0;
>>>>>>>>   }
>>>>>>>> +
>>>>>>>> +BlockDriverAIOCB *event_tap_bdrv_aio_writev(BlockDriverState *bs,
>>>>>>>> +                                            int64_t sector_num,
>>>>>>>> +                                            QEMUIOVector *iov,
>>>>>>>> +                                            int nb_sectors,
>>>>>>>> +
>>>>>>>>   BlockDriverCompletionFunc
>>>>>>>> *cb,
>>>>>>>> +                                            void *opaque)
>>>>>>>> +{
>>>>>>>> +    return NULL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +BlockDriverAIOCB *event_tap_bdrv_aio_flush(BlockDriverState *bs,
>>>>>>>> +                                           BlockDriverCompletionFunc
>>>>>>>> *cb,
>>>>>>>> +                                           void *opaque)
>>>>>>>> +{
>>>>>>>> +    return NULL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void event_tap_bdrv_flush(void)
>>>>>>>> +{
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int event_tap_is_on(void)
>>>>>>>> +{
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> diff --git a/trace-events b/trace-events
>>>>>>>> index 50ac840..1af3895 100644
>>>>>>>> --- a/trace-events
>>>>>>>> +++ b/trace-events
>>>>>>>> @@ -269,3 +269,13 @@ disable ft_trans_freeze_input(void) "backend not
>>>>>>>> ready, freezing input"
>>>>>>>>   disable ft_trans_put_ready(void) "file is ready to put"
>>>>>>>>   disable ft_trans_get_ready(void) "file is ready to get"
>>>>>>>>   disable ft_trans_cb(void *cb) "callback %p"
>>>>>>>> +
>>>>>>>> +# event-tap.c
>>>>>>>> +disable event_tap_ignore_bh(int bh) "event_tap_bh is already
>>>>>>>> scheduled
>>>>>>>> %d"
>>>>>>>> +disable event_tap_net_cb(char *s, ssize_t len) "%s: %zd bytes packet
>>>>>>>> was
>>>>>>>> sended"
>>>>>>>> +disable event_tap_no_event(void) "no last_event_tap"
>>>>>>>> +disable event_tap_already_used(int mode) "last_event_tap already
>>>>>>>> used
>>>>>>>> %d"
>>>>>>>> +disable event_tap_append_packet(void) "This packet is appended"
>>>>>>>> +disable event_tap_replay_no_event(void) "No event to replay"
>>>>>>>> +disable event_tap_save_no_event(void) "No event to save"
>>>>>>>> +disable event_tap_load_no_event(void) "No event to load"
>>>>>>>> --
>>>>>>>> 1.7.1.2
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>


  reply	other threads:[~2011-03-09  8:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-24  7:28 [PATCH 00/18] Kemari for KVM v0.2.12 Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 01/18] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 02/18] Introduce read() to FdMigrationState Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 03/18] Introduce qemu_loadvm_state_no_header() and make qemu_loadvm_state() a wrapper Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 04/18] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 05/18] vl.c: add deleted flag for deleting the handler Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 06/18] virtio: decrement last_avail_idx with inuse before saving Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 08/18] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 09/18] Introduce event-tap Yoshiaki Tamura
2011-03-04  3:31   ` ya su
2011-03-08  8:22     ` Yoshiaki Tamura
2011-03-09  2:56       ` ya su
     [not found]         ` <4D76FAC2.3000502@lab.ntt.co.jp>
2011-03-09  4:58           ` ya su
2011-03-09  6:26             ` Yoshiaki Tamura
2011-03-09  8:36               ` ya su
2011-03-09  8:51                 ` Yoshiaki Tamura [this message]
2011-02-24  7:28 ` [PATCH 10/18] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 11/18] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 13/18] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async() Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 14/18] block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and bdrv_flush() Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 15/18] savevm: introduce qemu_savevm_trans_{begin,commit} Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 16/18] migration: introduce migrate_ft_trans_{put,get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 17/18] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2011-02-24  7:28 ` [PATCH 18/18] Introduce "kemari:" to enable FT migration mode (Kemari) Yoshiaki Tamura
  -- strict thread matches above, loose matches on Subject: below --
2011-04-25 11:00 [PATCH 00/18] Kemari for KVM v0.2.14 OHMURA Kei
2011-04-25 11:00 ` [PATCH 09/18] Introduce event-tap OHMURA Kei
2011-03-23  4:10 [PATCH 00/18] [PATCH 00/18] Kemari for KVM v0.2.13 Yoshiaki Tamura
2011-03-23  4:10 ` [PATCH 09/18] Introduce event-tap Yoshiaki Tamura
2011-02-23 13:48 [PATCH 00/18] Kemari for KVM v0.2.11 Yoshiaki Tamura
2011-02-23 13:48 ` [PATCH 09/18] Introduce event-tap Yoshiaki Tamura
2011-02-10  9:30 [PATCH 00/18] Kemari for KVM v0.2.10 Yoshiaki Tamura
2011-02-10  9:30 ` [PATCH 09/18] Introduce event-tap Yoshiaki Tamura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D773F78.5010407@lab.ntt.co.jp \
    --to=tamura.yoshiaki@lab.ntt.co.jp \
    --cc=aliguori@us.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=ohmura.kei@lab.ntt.co.jp \
    --cc=pbonzini@redhat.com \
    --cc=psuriset@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@linux.vnet.ibm.com \
    --cc=suya94335@gmail.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox