From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com,
kvm@vger.kernel.org, Stefan Hajnoczi <stefanha@gmail.com>,
mtosatti@redhat.com, ohmura.kei@lab.ntt.co.jp,
qemu-devel@nongnu.org, avi@redhat.com, vatsa@linux.vnet.ibm.com,
psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
Date: Thu, 6 Jan 2011 11:36:12 +0200 [thread overview]
Message-ID: <20110106093612.GB12142@redhat.com> (raw)
In-Reply-To: <AANLkTimpbPx1vD8dLBz2GCDCEqR94VEGHC9Fnf3n8UeL@mail.gmail.com>
On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote:
> 2011/1/4 Michael S. Tsirkin <mst@redhat.com>:
> > On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
> >> 2011/1/4 Michael S. Tsirkin <mst@redhat.com>:
> >> > On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
> >> >> 2011/1/4 Michael S. Tsirkin <mst@redhat.com>:
> >> >> > On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
> >> >> >> 2010/11/29 Stefan Hajnoczi <stefanha@gmail.com>:
> >> >> >> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
> >> >> >> > <tamura.yoshiaki@lab.ntt.co.jp> wrote:
> >> >> >> >> event-tap controls when to start FT transaction, and provides proxy
> >> >> >> >> functions to called from net/block devices. While FT transaction, it
> >> >> >> >> queues up net/block requests, and flush them when the transaction gets
> >> >> >> >> completed.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
> >> >> >> >> Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
> >> >> >> >> ---
> >> >> >> >> Makefile.target | 1 +
> >> >> >> >> block.h | 9 +
> >> >> >> >> event-tap.c | 794 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >> >> event-tap.h | 34 +++
> >> >> >> >> net.h | 4 +
> >> >> >> >> net/queue.c | 1 +
> >> >> >> >> 6 files changed, 843 insertions(+), 0 deletions(-)
> >> >> >> >> create mode 100644 event-tap.c
> >> >> >> >> create mode 100644 event-tap.h
> >> >> >> >
> >> >> >> > event_tap_state is checked at the beginning of several functions. If
> >> >> >> > there is an unexpected state the function silently returns. Should
> >> >> >> > these checks really be assert() so there is an abort and backtrace if
> >> >> >> > the program ever reaches this state?
> >> >> >> >
> >> >> >> >> +typedef struct EventTapBlkReq {
> >> >> >> >> + char *device_name;
> >> >> >> >> + int num_reqs;
> >> >> >> >> + int num_cbs;
> >> >> >> >> + bool is_multiwrite;
> >> >> >> >
> >> >> >> > Is multiwrite logging necessary? If event tap is called from within
> >> >> >> > the block layer then multiwrite is turned into one or more
> >> >> >> > bdrv_aio_writev() calls.
> >> >> >> >
> >> >> >> >> +static void event_tap_replay(void *opaque, int running, int reason)
> >> >> >> >> +{
> >> >> >> >> + EventTapLog *log, *next;
> >> >> >> >> +
> >> >> >> >> + if (!running) {
> >> >> >> >> + return;
> >> >> >> >> + }
> >> >> >> >> +
> >> >> >> >> + if (event_tap_state != EVENT_TAP_LOAD) {
> >> >> >> >> + return;
> >> >> >> >> + }
> >> >> >> >> +
> >> >> >> >> + event_tap_state = EVENT_TAP_REPLAY;
> >> >> >> >> +
> >> >> >> >> + QTAILQ_FOREACH(log, &event_list, node) {
> >> >> >> >> + EventTapBlkReq *blk_req;
> >> >> >> >> +
> >> >> >> >> + /* event resume */
> >> >> >> >> + switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
> >> >> >> >> + case EVENT_TAP_NET:
> >> >> >> >> + event_tap_net_flush(&log->net_req);
> >> >> >> >> + break;
> >> >> >> >> + case EVENT_TAP_BLK:
> >> >> >> >> + blk_req = &log->blk_req;
> >> >> >> >> + if ((log->mode & EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
> >> >> >> >> + switch (log->ioport.index) {
> >> >> >> >> + case 0:
> >> >> >> >> + cpu_outb(log->ioport.address, log->ioport.data);
> >> >> >> >> + break;
> >> >> >> >> + case 1:
> >> >> >> >> + cpu_outw(log->ioport.address, log->ioport.data);
> >> >> >> >> + break;
> >> >> >> >> + case 2:
> >> >> >> >> + cpu_outl(log->ioport.address, log->ioport.data);
> >> >> >> >> + break;
> >> >> >> >> + }
> >> >> >> >> + } else {
> >> >> >> >> + /* EVENT_TAP_MMIO */
> >> >> >> >> + cpu_physical_memory_rw(log->mmio.address,
> >> >> >> >> + log->mmio.buf,
> >> >> >> >> + log->mmio.len, 1);
> >> >> >> >> + }
> >> >> >> >> + break;
> >> >> >> >
> >> >> >> > Why are net tx packets replayed at the net level but blk requests are
> >> >> >> > replayed at the pio/mmio level?
> >> >> >> >
> >> >> >> > I expected everything to replay either as pio/mmio or as net/block.
> >> >> >>
> >> >> >> Stefan,
> >> >> >>
> >> >> >> After doing some heavy load tests, I realized that we have to
> >> >> >> take a hybrid approach to replay for now. This is because when a
> >> >> >> device moves to the next state (e.g. virtio decreases inuse) is
> >> >> >> different between net and block. For example, virtio-net
> >> >> >> decreases inuse upon returning from the net layer,
> >> >> >> but virtio-blk
> >> >> >> does that inside of the callback.
> >> >> >
> >> >> > For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
> >> >> > For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
> >> >> > Both are invoked from a callback.
> >> >> >
> >> >> >> If we only use pio/mmio
> >> >> >> replay, even though event-tap tries to replay net requests, some
> >> >> >> get lost because the state has proceeded already.
> >> >> >
> >> >> > It seems that all you need to do to avoid this is to
> >> >> > delay the callback?
> >> >>
> >> >> Yeah, if it's possible. But if you take a look at virtio-net,
> >> >> you'll see that virtio_push is called immediately after calling
> >> >> qemu_sendv_packet
> >> >> while virtio-blk does that in the callback.
> >> >
> >> > This is only if the packet was sent immediately.
> >> > I was referring to the case where the packet is queued.
> >>
> >> I see. I usually don't see packets get queued in the net layer.
> >> What would be the effect to devices? Restraint sending packets?
> >
> > Yes.
> >
> >> >
> >> >> >
> >> >> >> This doesn't
> >> >> >> happen with block, because the state is still old enough to
> >> >> >> replay. Note that using hybrid approach won't cause duplicated
> >> >> >> requests on the secondary.
> >> >> >
> >> >> > An assumption devices make is that a buffer is unused once
> >> >> > completion callback was invoked. Does this violate that assumption?
> >> >>
> >> >> No, it shouldn't. In case of net with net layer replay, we copy
> >> >> the content of the requests, and in case of block, because we
> >> >> haven't called the callback yet, the requests remains fresh.
> >> >>
> >> >> Yoshi
> >> >>
> >> >
> >> > Yes, as long as you copy it should be fine. Maybe it's a good idea for
> >> > event-tap to queue all packets to avoid the copy and avoid the need to
> >> > replay at the net level.
> >>
> >> If queuing works fine for the devices, it seems to be a good
> >> idea. I think the ordering issue doesn't happen still.
> >>
> >> Yoshi
> >
> > If you replay and both net and pio level, it becomes complex.
> > Maybe it's ok, but certainly harder to reason about.
>
> Michael,
>
> It seems queuing at event-tap like in net layer works for devices
> that use qemu_send_packet_async as you suggested. But for those
> that use qemu_send_packet, we still need to copy the contents
> just like net layer queuing does, and net level replay should be
> kept to handle it.
> Thanks,
>
> Yoshi
Right. And I think it's fine. What I found confusing was
where both virtio (because avail idx is moved back) and
the net layer replay the packet.
> >
> >> >
> >> >> >
> >> >> > --
> >> >> > MST
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
next prev parent reply other threads:[~2011-01-06 9:36 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-25 6:06 [PATCH 00/21] Kemari for KVM 0.2 Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 01/21] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 02/21] Introduce read() to FdMigrationState Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 03/21] Introduce skip_header parameter to qemu_loadvm_state() Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 04/21] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble Yoshiaki Tamura
2010-11-28 9:28 ` Michael S. Tsirkin
2010-11-28 11:27 ` Yoshiaki Tamura
2010-11-28 11:46 ` Michael S. Tsirkin
2010-12-01 8:03 ` Yoshiaki Tamura
2010-12-02 12:02 ` Michael S. Tsirkin
2010-12-03 6:28 ` Yoshiaki Tamura
2010-12-16 7:36 ` Yoshiaki Tamura
2010-12-16 9:51 ` Michael S. Tsirkin
2010-12-16 14:28 ` Yoshiaki Tamura
2010-12-16 14:40 ` Michael S. Tsirkin
2010-12-16 15:59 ` Yoshiaki Tamura
2010-12-17 16:22 ` Yoshiaki Tamura
2010-12-24 9:27 ` Michael S. Tsirkin
2010-12-24 11:42 ` Yoshiaki Tamura
2010-12-24 13:21 ` Michael S. Tsirkin
2010-12-26 9:05 ` Michael S. Tsirkin
2010-12-26 10:14 ` [Qemu-devel] " Yoshiaki Tamura
2010-12-26 10:46 ` Michael S. Tsirkin
2010-12-26 10:50 ` Yoshiaki Tamura
2010-12-26 10:49 ` Michael S. Tsirkin
2010-12-26 10:57 ` Yoshiaki Tamura
2010-12-26 12:01 ` Michael S. Tsirkin
2010-12-26 12:16 ` [Qemu-devel] " Yoshiaki Tamura
2010-12-26 12:17 ` Michael S. Tsirkin
2010-11-25 6:06 ` [PATCH 06/21] vl: add a tmp pointer so that a handler can delete the entry to which it belongs Yoshiaki Tamura
2010-12-08 7:03 ` [Qemu-devel] " Isaku Yamahata
2010-12-08 8:11 ` Yoshiaki Tamura
2010-12-08 14:22 ` Anthony Liguori
2010-11-25 6:06 ` [PATCH 07/21] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 08/21] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 09/21] Introduce event-tap Yoshiaki Tamura
2010-11-29 11:00 ` Stefan Hajnoczi
2010-11-30 9:50 ` Yoshiaki Tamura
2010-11-30 10:04 ` Stefan Hajnoczi
2010-11-30 10:20 ` Yoshiaki Tamura
2011-01-04 11:02 ` Yoshiaki Tamura
2011-01-04 11:14 ` Stefan Hajnoczi
2011-01-04 11:19 ` Michael S. Tsirkin
2011-01-04 12:20 ` [Qemu-devel] " Yoshiaki Tamura
2011-01-04 13:10 ` Michael S. Tsirkin
2011-01-04 13:45 ` Yoshiaki Tamura
2011-01-04 14:42 ` Michael S. Tsirkin
2011-01-06 8:47 ` Yoshiaki Tamura
2011-01-06 9:36 ` Michael S. Tsirkin [this message]
2011-01-06 9:41 ` Yoshiaki Tamura
2010-11-30 1:19 ` Marcelo Tosatti
2010-11-30 9:28 ` [Qemu-devel] " Yoshiaki Tamura
2010-11-30 10:25 ` Marcelo Tosatti
2010-11-30 10:35 ` Yoshiaki Tamura
2010-11-30 13:11 ` Marcelo Tosatti
2010-11-25 6:06 ` [PATCH 10/21] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2010-11-28 9:40 ` Michael S. Tsirkin
2010-11-28 12:00 ` Yoshiaki Tamura
2010-12-16 7:37 ` Yoshiaki Tamura
2010-12-16 9:22 ` Michael S. Tsirkin
2010-12-16 9:50 ` Yoshiaki Tamura
2010-12-16 9:54 ` Michael S. Tsirkin
2010-12-16 16:27 ` Stefan Hajnoczi
2010-12-17 16:19 ` Yoshiaki Tamura
2010-12-18 8:36 ` Stefan Hajnoczi
2010-11-25 6:06 ` [PATCH 12/21] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 13/21] dma-helpers: replace bdrv_aio_writev() with bdrv_aio_writev_proxy() Yoshiaki Tamura
2010-11-28 9:33 ` Michael S. Tsirkin
2010-11-28 11:55 ` Yoshiaki Tamura
2010-11-28 12:28 ` Michael S. Tsirkin
2010-11-29 9:52 ` [Qemu-devel] " Kevin Wolf
2010-11-29 12:56 ` Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 14/21] virtio-blk: replace bdrv_aio_multiwrite() with bdrv_aio_multiwrite_proxy() Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 15/21] virtio-net: replace qemu_sendv_packet_async() with qemu_sendv_packet_async_proxy() Yoshiaki Tamura
2010-11-28 9:31 ` Michael S. Tsirkin
2010-11-28 11:43 ` Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 16/21] e1000: replace qemu_send_packet() with qemu_send_packet_proxy() Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 17/21] savevm: introduce qemu_savevm_trans_{begin,commit} Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 18/21] migration: introduce migrate_ft_trans_{put,get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 19/21] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2010-11-25 6:06 ` [PATCH 20/21] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2010-11-25 6:07 ` [PATCH 21/21] migration: add a parser to accept FT migration incoming mode Yoshiaki Tamura
2010-11-26 18:39 ` [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2 Blue Swirl
2010-11-27 4:29 ` Yoshiaki Tamura
2010-11-27 7:23 ` Stefan Hajnoczi
2010-11-27 8:53 ` Yoshiaki Tamura
2010-11-27 11:03 ` Blue Swirl
2010-11-27 12:21 ` Yoshiaki Tamura
2010-11-27 11:54 ` Stefan Hajnoczi
2010-11-27 13:11 ` Yoshiaki Tamura
2010-11-29 10:17 ` Stefan Hajnoczi
2010-11-29 13:00 ` Paul Brook
2010-11-29 13:13 ` Yoshiaki Tamura
2010-11-29 13:19 ` Paul Brook
2010-11-29 13:41 ` Yoshiaki Tamura
2010-11-29 14:12 ` Paul Brook
2010-11-29 14:37 ` Yoshiaki Tamura
2010-11-29 14:56 ` Paul Brook
2010-11-29 15:00 ` Yoshiaki Tamura
2010-11-29 15:56 ` Paul Brook
2010-11-29 16:23 ` Stefan Hajnoczi
2010-11-29 16:41 ` Dor Laor
2010-11-29 16:53 ` Paul Brook
2010-11-29 17:05 ` Anthony Liguori
2010-11-29 17:18 ` Paul Brook
2010-11-29 17:33 ` Anthony Liguori
2010-11-30 7:13 ` Yoshiaki Tamura
2010-11-30 6:43 ` Yoshiaki Tamura
2010-11-30 9:13 ` Takuya Yoshikawa
2010-11-27 11:20 ` Paul Brook
2010-11-27 12:35 ` Yoshiaki Tamura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110106093612.GB12142@redhat.com \
--to=mst@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=ananth@in.ibm.com \
--cc=avi@redhat.com \
--cc=dlaor@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=ohmura.kei@lab.ntt.co.jp \
--cc=psuriset@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@linux.vnet.ibm.com \
--cc=tamura.yoshiaki@lab.ntt.co.jp \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.