qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com,
	kvm@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>,
	ohmura.kei@lab.ntt.co.jp, qemu-devel@nongnu.org, avi@redhat.com,
	vatsa@linux.vnet.ibm.com, psuriset@linux.vnet.ibm.com,
	stefanha@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
Date: Sun, 26 Dec 2010 12:46:07 +0200	[thread overview]
Message-ID: <20101226104607.GA32000@redhat.com> (raw)
In-Reply-To: <AANLkTimrkwruPQkSNLbgT-yGfo9sHL7RAci4maB3n+un@mail.gmail.com>

On Sun, Dec 26, 2010 at 07:14:44PM +0900, Yoshiaki Tamura wrote:
> 2010/12/26 Michael S. Tsirkin <mst@redhat.com>:
> > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote:
> >> >> If qemu_aio_flush() is responsible for flushing the outstanding
> >> >> virtio-net requests, I'm wondering why it's a problem for Kemari.
> >> >> As I described in the previous message, Kemari queues the
> >> >> requests first.  So in you example above, it should start with
> >> >>
> >> >> virtio-net: last_avai_idx 0 inuse 2
> >> >> event-tap: {A,B}
> >> >>
> >> >> As you know, the requests are still in order still because net
> >> >> layer initiates in order.  Not about completing.
> >> >>
> >> >> In the first synchronization, the status above is transferred.  In
> >> >> the next synchronization, the status will be as following.
> >> >>
> >> >> virtio-net: last_avai_idx 1 inuse 1
> >> >> event-tap: {B}
> >> >
> >> > OK, this answers the ordering question.
> >>
> >> Glad to hear that!
> >>
> >> > Another question: at this point we transfer this status: both
> >> > event-tap and virtio ring have the command B,
> >> > so the remote will have:
> >> >
> >> > virtio-net: inuse 0
> >> > event-tap: {B}
> >> >
> >> > Is this right? This already seems to be a problem as when B completes
> >> > inuse will go negative?
> >>
> >> I think state above is wrong.  inuse 0 means there shouldn't be
> >> any requests in event-tap.  Note that the callback is called only
> >> when event-tap flushes the requests.
> >>
> >> > Next it seems that the remote virtio will resubmit B to event-tap. The
> >> > remote will then have:
> >> >
> >> > virtio-net: inuse 1
> >> > event-tap: {B, B}
> >> >
> >> > This looks kind of wrong ... will two packets go out?
> >>
> >> No.  Currently, we're just replaying the requests with pio/mmio.
> >> In the situation above, it should be,
> >>
> >> virtio-net: inuse 1
> >> event-tap: {B}
> >> >> Why? Because Kemari flushes the first virtio-net request using
> >> >> qemu_aio_flush() before each synchronization.  If
> >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed
> >> >> should be problematic.  So in the final synchronization, the
> >> >> state should be,
> >> >>
> >> >> virtio-net: last_avai_idx 2 inuse 0
> >> >> event-tap: {}
> >> >>
> >> >> where A,B were completed in order.
> >> >>
> >> >> Yoshi
> >> >
> >> >
> >> > It might be better to discuss block because that's where
> >> > requests can complete out of order.
> >>
> >> It's same as net.  We queue requests and call bdrv_flush per
> >> sending requests to the block.  So there shouldn't be any
> >> inversion.
> >>
> >> > So let me see if I understand:
> >> > - each command passed to event tap is queued by it,
> >> >  it is not passed directly to the backend
> >> > - later requests are passed to the backend,
> >> >  always in the same order that they were submitted
> >> > - each synchronization point flushes all requests
> >> >  passed to the backend so far
> >> > - each synchronization transfers all requests not passed to the backend,
> >> >  to the remote, and they are replayed there
> >>
> >> Correct.
> >>
> >> > Now to analyse this for correctness I am looking at the original patch
> >> > because it is smaller so easier to analyse and I think it is
> >> > functionally equivalent, correct me if I am wrong in this.
> >>
> >> So you think decreasing last_avail_idx upon save is better than
> >> updating it in the callback?
> >>
> >> > So the reason there's no out of order issue is this
> >> > (and might be a good thing to put in commit log
> >> > or a comment somewhere):
> >>
> >> I've done some in the latest patch.  Please point it out if it
> >> wasn't enough.
> >>
> >> > At point of save callback event tap has flushed commands
> >> > passed to the backend already. Thus at the point of
> >> > the save callback if a command has completed
> >> > all previous commands have been flushed and completed.
> >> >
> >> >
> >> > Therefore inuse is
> >> > in fact the # of requests passed to event tap but not yet
> >> > passed to the backend (for non-event tap case all commands are
> >> > passed to the backend immediately and because of this
> >> > inuse is 0) and these are the last inuse commands submitted.
> >> >
> >> >
> >> > Right?
> >>
> >> Yep.
> >>
> >> > Now a question:
> >> >
> >> > When we pass last_used_index - inuse to the remote,
> >> > the remote virtio will resubmit the request.
> >> > Since request is also passed by event tap, we get
> >> > the request twice, why is this not a problem?
> >>
> >> It's not a problem because event-tap currently replays with
> >> pio/mmio only, as I mentioned above.  Although event-tap receives
> >> information about the queued requests, it won't pass it to the
> >> backend.  The reason is the problem in setting the callbacks
> >> which are specific to devices on the secondary.  These are
> >> pointers, and even worse, are usually static functions, which
> >> event-tap has no way to restore it upon failover.  I do want to
> >> change event-tap replay to be this way in the future, pio/mmio
> >> replay is implemented for now.
> >>
> >> Thanks,
> >>
> >> Yoshi
> >>
> >
> > Then I am still confused, sorry.  inuse != 0 means that some requests
> > were passed to the backend but did not complete.  I think that if you do
> > a flush, this waits until all requests passed to the backend will
> > complete.  Why does not this guarantee inuse = 0 on the origin at the
> > synchronization point?
> 
> The synchronization is done before event-tap releases requests to
> the backend, so there are two types of flush: event-tap and
> backend block/net.  I assume you're confused with the fact that
> flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary
> decrease inuse if event-tap has queued requests because there are
> no requests passed to the backend.  Let me do a case study again.
> 
> virtio: inuse 4
> event-tap: {A,B,C}
> backend: {D}
> 


There are two event-tap devices, right?
PIO one is above virtio, AIO one is between virtio and backend
(e.g. bdrv)? Which one is meant here?


> synchronization starts.  backend gets flushed.
> 
> virtio: inuse 3
> event-tap: {A,B,C}
> backend: {}
> synchronization gets done.
> # secondary is virtio: inuse 3
> 
> event-tap flushes one request.
> 
> virtio: inuse 2
> event-tap: {B,C}
> backend: {}
> repeats above and finally it should be,
> 
> virtio: inuse 0
> event-tap: {}
> 
> Hope this helps.
> 
> Yoshi
> 
> >
> > --
> > MST
> >
> >

  reply	other threads:[~2010-12-26 10:46 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-25  6:06 [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2 Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 01/21] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 02/21] Introduce read() to FdMigrationState Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 03/21] Introduce skip_header parameter to qemu_loadvm_state() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 04/21] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble Yoshiaki Tamura
2010-11-28  9:28   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 11:27     ` Yoshiaki Tamura
2010-11-28 11:46       ` Michael S. Tsirkin
2010-12-01  8:03         ` Yoshiaki Tamura
2010-12-02 12:02           ` Michael S. Tsirkin
2010-12-03  6:28             ` Yoshiaki Tamura
2010-12-16  7:36               ` Yoshiaki Tamura
2010-12-16  9:51                 ` Michael S. Tsirkin
2010-12-16 14:28                   ` Yoshiaki Tamura
2010-12-16 14:40                     ` Michael S. Tsirkin
2010-12-16 15:59                       ` Yoshiaki Tamura
2010-12-17 16:22                         ` Yoshiaki Tamura
2010-12-24  9:27                         ` Michael S. Tsirkin
2010-12-24 11:42                           ` Yoshiaki Tamura
2010-12-24 13:21                             ` Michael S. Tsirkin
2010-12-26  9:05                             ` Michael S. Tsirkin
2010-12-26 10:14                               ` Yoshiaki Tamura
2010-12-26 10:46                                 ` Michael S. Tsirkin [this message]
2010-12-26 10:50                                   ` Yoshiaki Tamura
2010-12-26 10:49                             ` Michael S. Tsirkin
2010-12-26 10:57                               ` Yoshiaki Tamura
2010-12-26 12:01                                 ` Michael S. Tsirkin
2010-12-26 12:16                                   ` Yoshiaki Tamura
2010-12-26 12:17                                     ` Michael S. Tsirkin
2010-11-25  6:06 ` [Qemu-devel] [PATCH 06/21] vl: add a tmp pointer so that a handler can delete the entry to which it belongs Yoshiaki Tamura
2010-12-08  7:03   ` Isaku Yamahata
2010-12-08  8:11     ` Yoshiaki Tamura
2010-12-08 14:22       ` Anthony Liguori
2010-11-25  6:06 ` [Qemu-devel] [PATCH 07/21] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 08/21] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 09/21] Introduce event-tap Yoshiaki Tamura
2010-11-29 11:00   ` [Qemu-devel] " Stefan Hajnoczi
2010-11-30  9:50     ` Yoshiaki Tamura
2010-11-30 10:04       ` Stefan Hajnoczi
2010-11-30 10:20         ` Yoshiaki Tamura
2011-01-04 11:02     ` Yoshiaki Tamura
2011-01-04 11:14       ` Stefan Hajnoczi
2011-01-04 11:19       ` Michael S. Tsirkin
2011-01-04 12:20         ` Yoshiaki Tamura
2011-01-04 13:10           ` Michael S. Tsirkin
2011-01-04 13:45             ` Yoshiaki Tamura
2011-01-04 14:42               ` Michael S. Tsirkin
2011-01-06  8:47                 ` Yoshiaki Tamura
2011-01-06  9:36                   ` Michael S. Tsirkin
2011-01-06  9:41                     ` Yoshiaki Tamura
     [not found]   ` <20101130011914.GA9015@amt.cnet>
2010-11-30  9:28     ` Yoshiaki Tamura
2010-11-30 10:25       ` Marcelo Tosatti
2010-11-30 10:35         ` Yoshiaki Tamura
2010-11-30 13:11           ` Marcelo Tosatti
2010-11-25  6:06 ` [Qemu-devel] [PATCH 10/21] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2010-11-28  9:40   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 12:00     ` Yoshiaki Tamura
2010-12-16  7:37       ` Yoshiaki Tamura
2010-12-16  9:22         ` Michael S. Tsirkin
2010-12-16  9:50           ` Yoshiaki Tamura
2010-12-16  9:54             ` Michael S. Tsirkin
2010-12-16 16:27             ` Stefan Hajnoczi
2010-12-17 16:19               ` Yoshiaki Tamura
2010-12-18  8:36                 ` Stefan Hajnoczi
2010-11-25  6:06 ` [Qemu-devel] [PATCH 12/21] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 13/21] dma-helpers: replace bdrv_aio_writev() with bdrv_aio_writev_proxy() Yoshiaki Tamura
2010-11-28  9:33   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 11:55     ` Yoshiaki Tamura
2010-11-28 12:28       ` Michael S. Tsirkin
2010-11-29  9:52       ` Kevin Wolf
2010-11-29 12:56         ` Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 14/21] virtio-blk: replace bdrv_aio_multiwrite() with bdrv_aio_multiwrite_proxy() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 15/21] virtio-net: replace qemu_sendv_packet_async() with qemu_sendv_packet_async_proxy() Yoshiaki Tamura
2010-11-28  9:31   ` [Qemu-devel] " Michael S. Tsirkin
2010-11-28 11:43     ` Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 16/21] e1000: replace qemu_send_packet() with qemu_send_packet_proxy() Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 17/21] savevm: introduce qemu_savevm_trans_{begin, commit} Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 18/21] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 19/21] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2010-11-25  6:06 ` [Qemu-devel] [PATCH 20/21] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2010-11-25  6:07 ` [Qemu-devel] [PATCH 21/21] migration: add a parser to accept FT migration incoming mode Yoshiaki Tamura
2010-11-26 18:39 ` [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2 Blue Swirl
2010-11-27  4:29   ` Yoshiaki Tamura
2010-11-27  7:23     ` Stefan Hajnoczi
2010-11-27  8:53       ` Yoshiaki Tamura
2010-11-27 11:03         ` Blue Swirl
2010-11-27 12:21           ` Yoshiaki Tamura
2010-11-27 11:54         ` Stefan Hajnoczi
2010-11-27 13:11           ` Yoshiaki Tamura
2010-11-29 10:17             ` Stefan Hajnoczi
2010-11-29 13:00               ` Paul Brook
2010-11-29 13:13                 ` Yoshiaki Tamura
2010-11-29 13:19                   ` Paul Brook
2010-11-29 13:41                     ` Yoshiaki Tamura
2010-11-29 14:12                       ` Paul Brook
2010-11-29 14:37                         ` Yoshiaki Tamura
2010-11-29 14:56                           ` Paul Brook
2010-11-29 15:00                             ` Yoshiaki Tamura
2010-11-29 15:56                               ` Paul Brook
2010-11-29 16:23                               ` Stefan Hajnoczi
2010-11-29 16:41                                 ` Dor Laor
2010-11-29 16:53                                   ` Paul Brook
2010-11-29 17:05                                     ` Anthony Liguori
2010-11-29 17:18                                       ` Paul Brook
2010-11-29 17:33                                         ` Anthony Liguori
2010-11-30  7:13                                       ` Yoshiaki Tamura
2010-11-30  6:43                                   ` Yoshiaki Tamura
2010-11-30  9:13                                   ` Takuya Yoshikawa
2010-11-27 11:20       ` Paul Brook
2010-11-27 12:35         ` Yoshiaki Tamura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101226104607.GA32000@redhat.com \
    --to=mst@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=ohmura.kei@lab.ntt.co.jp \
    --cc=psuriset@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    --cc=tamura.yoshiaki@lab.ntt.co.jp \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).