qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@us.ibm.com, mtosatti@redhat.com, ananth@in.ibm.com,
	kvm@vger.kernel.org, Stefan Hajnoczi <stefanha@gmail.com>,
	dlaor@redhat.com, ohmura.kei@lab.ntt.co.jp,
	qemu-devel@nongnu.org, avi@redhat.com, vatsa@linux.vnet.ibm.com,
	psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
Date: Thu, 6 Jan 2011 18:41:05 +0900	[thread overview]
Message-ID: <AANLkTikVaAxTwUgrUkyDAP3BwHLncKbaxS4cVXFo2Qv0@mail.gmail.com> (raw)
In-Reply-To: <20110106093612.GB12142@redhat.com>

2011/1/6 Michael S. Tsirkin <mst@redhat.com>:
> On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote:
>> 2011/1/4 Michael S. Tsirkin <mst@redhat.com>:
>> > On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
>> >> 2011/1/4 Michael S. Tsirkin <mst@redhat.com>:
>> >> > On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
>> >> >> 2011/1/4 Michael S. Tsirkin <mst@redhat.com>:
>> >> >> > On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
>> >> >> >> 2010/11/29 Stefan Hajnoczi <stefanha@gmail.com>:
>> >> >> >> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
>> >> >> >> > <tamura.yoshiaki@lab.ntt.co.jp> wrote:
>> >> >> >> >> event-tap controls when to start FT transaction, and provides proxy
>> >> >> >> >> functions to called from net/block devices.  While FT transaction, it
>> >> >> >> >> queues up net/block requests, and flush them when the transaction gets
>> >> >> >> >> completed.
>> >> >> >> >>
>> >> >> >> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
>> >> >> >> >> Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
>> >> >> >> >> ---
>> >> >> >> >>  Makefile.target |    1 +
>> >> >> >> >>  block.h         |    9 +
>> >> >> >> >>  event-tap.c     |  794 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >> >>  event-tap.h     |   34 +++
>> >> >> >> >>  net.h           |    4 +
>> >> >> >> >>  net/queue.c     |    1 +
>> >> >> >> >>  6 files changed, 843 insertions(+), 0 deletions(-)
>> >> >> >> >>  create mode 100644 event-tap.c
>> >> >> >> >>  create mode 100644 event-tap.h
>> >> >> >> >
>> >> >> >> > event_tap_state is checked at the beginning of several functions.  If
>> >> >> >> > there is an unexpected state the function silently returns.  Should
>> >> >> >> > these checks really be assert() so there is an abort and backtrace if
>> >> >> >> > the program ever reaches this state?
>> >> >> >> >
>> >> >> >> >> +typedef struct EventTapBlkReq {
>> >> >> >> >> +    char *device_name;
>> >> >> >> >> +    int num_reqs;
>> >> >> >> >> +    int num_cbs;
>> >> >> >> >> +    bool is_multiwrite;
>> >> >> >> >
>> >> >> >> > Is multiwrite logging necessary?  If event tap is called from within
>> >> >> >> > the block layer then multiwrite is turned into one or more
>> >> >> >> > bdrv_aio_writev() calls.
>> >> >> >> >
>> >> >> >> >> +static void event_tap_replay(void *opaque, int running, int reason)
>> >> >> >> >> +{
>> >> >> >> >> +    EventTapLog *log, *next;
>> >> >> >> >> +
>> >> >> >> >> +    if (!running) {
>> >> >> >> >> +        return;
>> >> >> >> >> +    }
>> >> >> >> >> +
>> >> >> >> >> +    if (event_tap_state != EVENT_TAP_LOAD) {
>> >> >> >> >> +        return;
>> >> >> >> >> +    }
>> >> >> >> >> +
>> >> >> >> >> +    event_tap_state = EVENT_TAP_REPLAY;
>> >> >> >> >> +
>> >> >> >> >> +    QTAILQ_FOREACH(log, &event_list, node) {
>> >> >> >> >> +        EventTapBlkReq *blk_req;
>> >> >> >> >> +
>> >> >> >> >> +        /* event resume */
>> >> >> >> >> +        switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
>> >> >> >> >> +        case EVENT_TAP_NET:
>> >> >> >> >> +            event_tap_net_flush(&log->net_req);
>> >> >> >> >> +            break;
>> >> >> >> >> +        case EVENT_TAP_BLK:
>> >> >> >> >> +            blk_req = &log->blk_req;
>> >> >> >> >> +            if ((log->mode & EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
>> >> >> >> >> +                switch (log->ioport.index) {
>> >> >> >> >> +                case 0:
>> >> >> >> >> +                    cpu_outb(log->ioport.address, log->ioport.data);
>> >> >> >> >> +                    break;
>> >> >> >> >> +                case 1:
>> >> >> >> >> +                    cpu_outw(log->ioport.address, log->ioport.data);
>> >> >> >> >> +                    break;
>> >> >> >> >> +                case 2:
>> >> >> >> >> +                    cpu_outl(log->ioport.address, log->ioport.data);
>> >> >> >> >> +                    break;
>> >> >> >> >> +                }
>> >> >> >> >> +            } else {
>> >> >> >> >> +                /* EVENT_TAP_MMIO */
>> >> >> >> >> +                cpu_physical_memory_rw(log->mmio.address,
>> >> >> >> >> +                                       log->mmio.buf,
>> >> >> >> >> +                                       log->mmio.len, 1);
>> >> >> >> >> +            }
>> >> >> >> >> +            break;
>> >> >> >> >
>> >> >> >> > Why are net tx packets replayed at the net level but blk requests are
>> >> >> >> > replayed at the pio/mmio level?
>> >> >> >> >
>> >> >> >> > I expected everything to replay either as pio/mmio or as net/block.
>> >> >> >>
>> >> >> >> Stefan,
>> >> >> >>
>> >> >> >> After doing some heavy load tests, I realized that we have to
>> >> >> >> take a hybrid approach to replay for now.  This is because when a
>> >> >> >> device moves to the next state (e.g. virtio decreases inuse) is
>> >> >> >> different between net and block.  For example, virtio-net
>> >> >> >> decreases inuse upon returning from the net layer,
>> >> >> >> but virtio-blk
>> >> >> >> does that inside of the callback.
>> >> >> >
>> >> >> > For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
>> >> >> > For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
>> >> >> > Both are invoked from a callback.
>> >> >> >
>> >> >> >> If we only use pio/mmio
>> >> >> >> replay, even though event-tap tries to replay net requests, some
>> >> >> >> get lost because the state has proceeded already.
>> >> >> >
>> >> >> > It seems that all you need to do to avoid this is to
>> >> >> > delay the callback?
>> >> >>
>> >> >> Yeah, if it's possible.  But if you take a look at virtio-net,
>> >> >> you'll see that virtio_push is called immediately after calling
>> >> >> qemu_sendv_packet
>> >> >> while virtio-blk does that in the callback.
>> >> >
>> >> > This is only if the packet was sent immediately.
>> >> > I was referring to the case where the packet is queued.
>> >>
>> >> I see.  I usually don't see packets get queued in the net layer.
>> >> What would be the effect to devices?  Restraint sending packets?
>> >
>> > Yes.
>> >
>> >> >
>> >> >> >
>> >> >> >> This doesn't
>> >> >> >> happen with block, because the state is still old enough to
>> >> >> >> replay.  Note that using hybrid approach won't cause duplicated
>> >> >> >> requests on the secondary.
>> >> >> >
>> >> >> > An assumption devices make is that a buffer is unused once
>> >> >> > completion callback was invoked. Does this violate that assumption?
>> >> >>
>> >> >> No, it shouldn't.  In case of net with net layer replay, we copy
>> >> >> the content of the requests, and in case of block, because we
>> >> >> haven't called the callback yet, the requests remains fresh.
>> >> >>
>> >> >> Yoshi
>> >> >>
>> >> >
>> >> > Yes, as long as you copy it should be fine.  Maybe it's a good idea for
>> >> > event-tap to queue all packets to avoid the copy and avoid the need to
>> >> > replay at the net level.
>> >>
>> >> If queuing works fine for the devices, it seems to be a good
>> >> idea.  I think the ordering issue doesn't happen still.
>> >>
>> >> Yoshi
>> >
>> > If you replay and both net and pio level, it becomes complex.
>> > Maybe it's ok, but certainly harder to reason about.
>>
>> Michael,
>>
>> It seems queuing at event-tap like in net layer works for devices
>> that use qemu_send_packet_async as you suggested.  But for those
>> that use qemu_send_packet, we still need to copy the contents
>> just like net layer queuing does, and net level replay should be
>> kept to handle it.
>> Thanks,
>>
>> Yoshi
>
> Right. And I think it's fine. What I found confusing was
> where both virtio (because avail idx is moved back) and
> the net layer replay the packet.

I agree, and that part is fixed.  There won't be double layer
replay for the same device.

Yoshi

>
>
>> >
>> >> >
>> >> >> >
>> >> >> > --
>> >> >> > MST
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2011-01-06  9:41 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-25  6:06 [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2 Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 01/21] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 02/21] Introduce read() to FdMigrationState Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 03/21] Introduce skip_header parameter to qemu_loadvm_state() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 04/21] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble Yoshiaki Tamura
2010-11-28  9:28   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 11:27     ` Yoshiaki Tamura
2010-11-28 11:46       ` Michael S. Tsirkin
2010-12-01  8:03         ` Yoshiaki Tamura
2010-12-02 12:02           ` Michael S. Tsirkin
2010-12-03  6:28             ` Yoshiaki Tamura
2010-12-16  7:36               ` Yoshiaki Tamura
2010-12-16  9:51                 ` Michael S. Tsirkin
2010-12-16 14:28                   ` Yoshiaki Tamura
2010-12-16 14:40                     ` Michael S. Tsirkin
2010-12-16 15:59                       ` Yoshiaki Tamura
2010-12-17 16:22                         ` Yoshiaki Tamura
2010-12-24  9:27                         ` Michael S. Tsirkin
2010-12-24 11:42                           ` Yoshiaki Tamura
2010-12-24 13:21                             ` Michael S. Tsirkin
2010-12-26  9:05                             ` Michael S. Tsirkin
2010-12-26 10:14                               ` Yoshiaki Tamura
2010-12-26 10:46                                 ` Michael S. Tsirkin
2010-12-26 10:50                                   ` Yoshiaki Tamura
2010-12-26 10:49                             ` Michael S. Tsirkin
2010-12-26 10:57                               ` Yoshiaki Tamura
2010-12-26 12:01                                 ` Michael S. Tsirkin
2010-12-26 12:16                                   ` Yoshiaki Tamura
2010-12-26 12:17                                     ` Michael S. Tsirkin
2010-11-25  6:06 ` [Qemu-devel] [PATCH 06/21] vl: add a tmp pointer so that a handler can delete the entry to which it belongs Yoshiaki Tamura
2010-12-08  7:03   ` Isaku Yamahata
2010-12-08  8:11     ` Yoshiaki Tamura
2010-12-08 14:22       ` Anthony Liguori
2010-11-25  6:06 ` [Qemu-devel] [PATCH 07/21] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 08/21] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 09/21] Introduce event-tap Yoshiaki Tamura
2010-11-29 11:00   ` [Qemu-devel] " Stefan Hajnoczi
2010-11-30  9:50     ` Yoshiaki Tamura
2010-11-30 10:04       ` Stefan Hajnoczi
2010-11-30 10:20         ` Yoshiaki Tamura
2011-01-04 11:02     ` Yoshiaki Tamura
2011-01-04 11:14       ` Stefan Hajnoczi
2011-01-04 11:19       ` Michael S. Tsirkin
2011-01-04 12:20         ` Yoshiaki Tamura
2011-01-04 13:10           ` Michael S. Tsirkin
2011-01-04 13:45             ` Yoshiaki Tamura
2011-01-04 14:42               ` Michael S. Tsirkin
2011-01-06  8:47                 ` Yoshiaki Tamura
2011-01-06  9:36                   ` Michael S. Tsirkin
2011-01-06  9:41                     ` Yoshiaki Tamura [this message]
     [not found]   ` <20101130011914.GA9015@amt.cnet>
2010-11-30  9:28     ` Yoshiaki Tamura
2010-11-30 10:25       ` Marcelo Tosatti
2010-11-30 10:35         ` Yoshiaki Tamura
2010-11-30 13:11           ` Marcelo Tosatti
2010-11-25  6:06 ` [Qemu-devel] [PATCH 10/21] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2010-11-28  9:40   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 12:00     ` Yoshiaki Tamura
2010-12-16  7:37       ` Yoshiaki Tamura
2010-12-16  9:22         ` Michael S. Tsirkin
2010-12-16  9:50           ` Yoshiaki Tamura
2010-12-16  9:54             ` Michael S. Tsirkin
2010-12-16 16:27             ` Stefan Hajnoczi
2010-12-17 16:19               ` Yoshiaki Tamura
2010-12-18  8:36                 ` Stefan Hajnoczi
2010-11-25  6:06 ` [Qemu-devel] [PATCH 12/21] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 13/21] dma-helpers: replace bdrv_aio_writev() with bdrv_aio_writev_proxy() Yoshiaki Tamura
2010-11-28  9:33   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 11:55     ` Yoshiaki Tamura
2010-11-28 12:28       ` Michael S. Tsirkin
2010-11-29  9:52       ` Kevin Wolf
2010-11-29 12:56         ` Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 14/21] virtio-blk: replace bdrv_aio_multiwrite() with bdrv_aio_multiwrite_proxy() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 15/21] virtio-net: replace qemu_sendv_packet_async() with qemu_sendv_packet_async_proxy() Yoshiaki Tamura
2010-11-28  9:31   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 11:43     ` Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 16/21] e1000: replace qemu_send_packet() with qemu_send_packet_proxy() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 17/21] savevm: introduce qemu_savevm_trans_{begin, commit} Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 18/21] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 19/21] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 20/21] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2010-11-25  6:07 ` [Qemu-devel] [PATCH 21/21] migration: add a parser to accept FT migration incoming mode Yoshiaki Tamura
2010-11-26 18:39 ` [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2 Blue Swirl
2010-11-27  4:29   ` Yoshiaki Tamura
2010-11-27  7:23     ` Stefan Hajnoczi
2010-11-27  8:53       ` Yoshiaki Tamura
2010-11-27 11:03         ` Blue Swirl
2010-11-27 12:21           ` Yoshiaki Tamura
2010-11-27 11:54         ` Stefan Hajnoczi
2010-11-27 13:11           ` Yoshiaki Tamura
2010-11-29 10:17             ` Stefan Hajnoczi
2010-11-29 13:00               ` Paul Brook
2010-11-29 13:13                 ` Yoshiaki Tamura
2010-11-29 13:19                   ` Paul Brook
2010-11-29 13:41                     ` Yoshiaki Tamura
2010-11-29 14:12                       ` Paul Brook
2010-11-29 14:37                         ` Yoshiaki Tamura
2010-11-29 14:56                           ` Paul Brook
2010-11-29 15:00                             ` Yoshiaki Tamura
2010-11-29 15:56                               ` Paul Brook
2010-11-29 16:23                               ` Stefan Hajnoczi
2010-11-29 16:41                                 ` Dor Laor
2010-11-29 16:53                                   ` Paul Brook
2010-11-29 17:05                                     ` Anthony Liguori
2010-11-29 17:18                                       ` Paul Brook
2010-11-29 17:33                                         ` Anthony Liguori
2010-11-30  7:13                                       ` Yoshiaki Tamura
2010-11-30  6:43                                   ` Yoshiaki Tamura
2010-11-30  9:13                                   ` Takuya Yoshikawa
2010-11-27 11:20       ` Paul Brook
2010-11-27 12:35         ` Yoshiaki Tamura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikVaAxTwUgrUkyDAP3BwHLncKbaxS4cVXFo2Qv0@mail.gmail.com \
    --to=tamura.yoshiaki@lab.ntt.co.jp \
    --cc=aliguori@us.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=ohmura.kei@lab.ntt.co.jp \
    --cc=psuriset@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).