All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com,
	kvm@vger.kernel.org, ohmura.kei@lab.ntt.co.jp,
	Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, avi@redhat.com,
	psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
Date: Sun, 26 Dec 2010 12:46:07 +0200	[thread overview]
Message-ID: <20101226104607.GA32000@redhat.com> (raw)
In-Reply-To: <AANLkTimrkwruPQkSNLbgT-yGfo9sHL7RAci4maB3n+un@mail.gmail.com>

On Sun, Dec 26, 2010 at 07:14:44PM +0900, Yoshiaki Tamura wrote:
> 2010/12/26 Michael S. Tsirkin <mst@redhat.com>:
> > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote:
> >> >> If qemu_aio_flush() is responsible for flushing the outstanding
> >> >> virtio-net requests, I'm wondering why it's a problem for Kemari.
> >> >> As I described in the previous message, Kemari queues the
> >> >> requests first.  So in you example above, it should start with
> >> >>
> >> >> virtio-net: last_avai_idx 0 inuse 2
> >> >> event-tap: {A,B}
> >> >>
> >> >> As you know, the requests are still in order still because net
> >> >> layer initiates in order.  Not about completing.
> >> >>
> >> >> In the first synchronization, the status above is transferred.  In
> >> >> the next synchronization, the status will be as following.
> >> >>
> >> >> virtio-net: last_avai_idx 1 inuse 1
> >> >> event-tap: {B}
> >> >
> >> > OK, this answers the ordering question.
> >>
> >> Glad to hear that!
> >>
> >> > Another question: at this point we transfer this status: both
> >> > event-tap and virtio ring have the command B,
> >> > so the remote will have:
> >> >
> >> > virtio-net: inuse 0
> >> > event-tap: {B}
> >> >
> >> > Is this right? This already seems to be a problem as when B completes
> >> > inuse will go negative?
> >>
> >> I think state above is wrong.  inuse 0 means there shouldn't be
> >> any requests in event-tap.  Note that the callback is called only
> >> when event-tap flushes the requests.
> >>
> >> > Next it seems that the remote virtio will resubmit B to event-tap. The
> >> > remote will then have:
> >> >
> >> > virtio-net: inuse 1
> >> > event-tap: {B, B}
> >> >
> >> > This looks kind of wrong ... will two packets go out?
> >>
> >> No.  Currently, we're just replaying the requests with pio/mmio.
> >> In the situation above, it should be,
> >>
> >> virtio-net: inuse 1
> >> event-tap: {B}
> >> >> Why? Because Kemari flushes the first virtio-net request using
> >> >> qemu_aio_flush() before each synchronization.  If
> >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed
> >> >> should be problematic.  So in the final synchronization, the
> >> >> state should be,
> >> >>
> >> >> virtio-net: last_avai_idx 2 inuse 0
> >> >> event-tap: {}
> >> >>
> >> >> where A,B were completed in order.
> >> >>
> >> >> Yoshi
> >> >
> >> >
> >> > It might be better to discuss block because that's where
> >> > requests can complete out of order.
> >>
> >> It's same as net.  We queue requests and call bdrv_flush per
> >> sending requests to the block.  So there shouldn't be any
> >> inversion.
> >>
> >> > So let me see if I understand:
> >> > - each command passed to event tap is queued by it,
> >> >  it is not passed directly to the backend
> >> > - later requests are passed to the backend,
> >> >  always in the same order that they were submitted
> >> > - each synchronization point flushes all requests
> >> >  passed to the backend so far
> >> > - each synchronization transfers all requests not passed to the backend,
> >> >  to the remote, and they are replayed there
> >>
> >> Correct.
> >>
> >> > Now to analyse this for correctness I am looking at the original patch
> >> > because it is smaller so easier to analyse and I think it is
> >> > functionally equivalent, correct me if I am wrong in this.
> >>
> >> So you think decreasing last_avail_idx upon save is better than
> >> updating it in the callback?
> >>
> >> > So the reason there's no out of order issue is this
> >> > (and might be a good thing to put in commit log
> >> > or a comment somewhere):
> >>
> >> I've done some in the latest patch.  Please point it out if it
> >> wasn't enough.
> >>
> >> > At point of save callback event tap has flushed commands
> >> > passed to the backend already. Thus at the point of
> >> > the save callback if a command has completed
> >> > all previous commands have been flushed and completed.
> >> >
> >> >
> >> > Therefore inuse is
> >> > in fact the # of requests passed to event tap but not yet
> >> > passed to the backend (for non-event tap case all commands are
> >> > passed to the backend immediately and because of this
> >> > inuse is 0) and these are the last inuse commands submitted.
> >> >
> >> >
> >> > Right?
> >>
> >> Yep.
> >>
> >> > Now a question:
> >> >
> >> > When we pass last_used_index - inuse to the remote,
> >> > the remote virtio will resubmit the request.
> >> > Since request is also passed by event tap, we get
> >> > the request twice, why is this not a problem?
> >>
> >> It's not a problem because event-tap currently replays with
> >> pio/mmio only, as I mentioned above.  Although event-tap receives
> >> information about the queued requests, it won't pass it to the
> >> backend.  The reason is the problem in setting the callbacks
> >> which are specific to devices on the secondary.  These are
> >> pointers, and even worse, are usually static functions, which
> >> event-tap has no way to restore it upon failover.  I do want to
> >> change event-tap replay to be this way in the future, pio/mmio
> >> replay is implemented for now.
> >>
> >> Thanks,
> >>
> >> Yoshi
> >>
> >
> > Then I am still confused, sorry.  inuse != 0 means that some requests
> > were passed to the backend but did not complete.  I think that if you do
> > a flush, this waits until all requests passed to the backend will
> > complete.  Why does not this guarantee inuse = 0 on the origin at the
> > synchronization point?
> 
> The synchronization is done before event-tap releases requests to
> the backend, so there are two types of flush: event-tap and
> backend block/net.  I assume you're confused with the fact that
> flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary
> decrease inuse if event-tap has queued requests because there are
> no requests passed to the backend.  Let me do a case study again.
> 
> virtio: inuse 4
> event-tap: {A,B,C}
> backend: {D}
> 


There are two event-tap devices, right?
PIO one is above virtio, AIO one is between virtio and backend
(e.g. bdrv)? Which one is meant here?


> synchronization starts.  backend gets flushed.
> 
> virtio: inuse 3
> event-tap: {A,B,C}
> backend: {}
> synchronization gets done.
> # secondary is virtio: inuse 3
> 
> event-tap flushes one request.
> 
> virtio: inuse 2
> event-tap: {B,C}
> backend: {}
> repeats above and finally it should be,
> 
> virtio: inuse 0
> event-tap: {}
> 
> Hope this helps.
> 
> Yoshi
> 
> >
> > --
> > MST
> >
> >

  reply	other threads:[~2010-12-26 10:46 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-25  6:06 [PATCH 00/21] Kemari for KVM 0.2 Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 01/21] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 02/21] Introduce read() to FdMigrationState Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 03/21] Introduce skip_header parameter to qemu_loadvm_state() Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 04/21] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble Yoshiaki Tamura
2010-11-28  9:28   ` Michael S. Tsirkin
2010-11-28 11:27     ` Yoshiaki Tamura
2010-11-28 11:46       ` Michael S. Tsirkin
2010-12-01  8:03         ` Yoshiaki Tamura
2010-12-02 12:02           ` Michael S. Tsirkin
2010-12-03  6:28             ` Yoshiaki Tamura
2010-12-16  7:36               ` Yoshiaki Tamura
2010-12-16  9:51                 ` Michael S. Tsirkin
2010-12-16 14:28                   ` Yoshiaki Tamura
2010-12-16 14:40                     ` Michael S. Tsirkin
2010-12-16 15:59                       ` Yoshiaki Tamura
2010-12-17 16:22                         ` Yoshiaki Tamura
2010-12-24  9:27                         ` Michael S. Tsirkin
2010-12-24 11:42                           ` Yoshiaki Tamura
2010-12-24 13:21                             ` Michael S. Tsirkin
2010-12-26  9:05                             ` Michael S. Tsirkin
2010-12-26 10:14                               ` [Qemu-devel] " Yoshiaki Tamura
2010-12-26 10:46                                 ` Michael S. Tsirkin [this message]
2010-12-26 10:50                                   ` Yoshiaki Tamura
2010-12-26 10:49                             ` Michael S. Tsirkin
2010-12-26 10:57                               ` Yoshiaki Tamura
2010-12-26 12:01                                 ` Michael S. Tsirkin
2010-12-26 12:16                                   ` [Qemu-devel] " Yoshiaki Tamura
2010-12-26 12:17                                     ` Michael S. Tsirkin
2010-11-25  6:06 ` [PATCH 06/21] vl: add a tmp pointer so that a handler can delete the entry to which it belongs Yoshiaki Tamura
2010-12-08  7:03   ` [Qemu-devel] " Isaku Yamahata
2010-12-08  8:11     ` Yoshiaki Tamura
2010-12-08 14:22       ` Anthony Liguori
2010-11-25  6:06 ` [PATCH 07/21] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 08/21] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 09/21] Introduce event-tap Yoshiaki Tamura
2010-11-29 11:00   ` Stefan Hajnoczi
2010-11-30  9:50     ` Yoshiaki Tamura
2010-11-30 10:04       ` Stefan Hajnoczi
2010-11-30 10:20         ` Yoshiaki Tamura
2011-01-04 11:02     ` Yoshiaki Tamura
2011-01-04 11:14       ` Stefan Hajnoczi
2011-01-04 11:19       ` Michael S. Tsirkin
2011-01-04 12:20         ` [Qemu-devel] " Yoshiaki Tamura
2011-01-04 13:10           ` Michael S. Tsirkin
2011-01-04 13:45             ` Yoshiaki Tamura
2011-01-04 14:42               ` Michael S. Tsirkin
2011-01-06  8:47                 ` Yoshiaki Tamura
2011-01-06  9:36                   ` Michael S. Tsirkin
2011-01-06  9:41                     ` Yoshiaki Tamura
2010-11-30  1:19   ` Marcelo Tosatti
2010-11-30  9:28     ` [Qemu-devel] " Yoshiaki Tamura
2010-11-30 10:25       ` Marcelo Tosatti
2010-11-30 10:35         ` Yoshiaki Tamura
2010-11-30 13:11           ` Marcelo Tosatti
2010-11-25  6:06 ` [PATCH 10/21] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2010-11-28  9:40   ` Michael S. Tsirkin
2010-11-28 12:00     ` Yoshiaki Tamura
2010-12-16  7:37       ` Yoshiaki Tamura
2010-12-16  9:22         ` Michael S. Tsirkin
2010-12-16  9:50           ` Yoshiaki Tamura
2010-12-16  9:54             ` Michael S. Tsirkin
2010-12-16 16:27             ` Stefan Hajnoczi
2010-12-17 16:19               ` Yoshiaki Tamura
2010-12-18  8:36                 ` Stefan Hajnoczi
2010-11-25  6:06 ` [PATCH 12/21] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 13/21] dma-helpers: replace bdrv_aio_writev() with bdrv_aio_writev_proxy() Yoshiaki Tamura
2010-11-28  9:33   ` Michael S. Tsirkin
2010-11-28 11:55     ` Yoshiaki Tamura
2010-11-28 12:28       ` Michael S. Tsirkin
2010-11-29  9:52       ` [Qemu-devel] " Kevin Wolf
2010-11-29 12:56         ` Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 14/21] virtio-blk: replace bdrv_aio_multiwrite() with bdrv_aio_multiwrite_proxy() Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 15/21] virtio-net: replace qemu_sendv_packet_async() with qemu_sendv_packet_async_proxy() Yoshiaki Tamura
2010-11-28  9:31   ` Michael S. Tsirkin
2010-11-28 11:43     ` Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 16/21] e1000: replace qemu_send_packet() with qemu_send_packet_proxy() Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 17/21] savevm: introduce qemu_savevm_trans_{begin,commit} Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 18/21] migration: introduce migrate_ft_trans_{put,get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 19/21] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2010-11-25  6:06 ` [PATCH 20/21] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2010-11-25  6:07 ` [PATCH 21/21] migration: add a parser to accept FT migration incoming mode Yoshiaki Tamura
2010-11-26 18:39 ` [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2 Blue Swirl
2010-11-27  4:29   ` Yoshiaki Tamura
2010-11-27  7:23     ` Stefan Hajnoczi
2010-11-27  8:53       ` Yoshiaki Tamura
2010-11-27 11:03         ` Blue Swirl
2010-11-27 12:21           ` Yoshiaki Tamura
2010-11-27 11:54         ` Stefan Hajnoczi
2010-11-27 13:11           ` Yoshiaki Tamura
2010-11-29 10:17             ` Stefan Hajnoczi
2010-11-29 13:00               ` Paul Brook
2010-11-29 13:13                 ` Yoshiaki Tamura
2010-11-29 13:19                   ` Paul Brook
2010-11-29 13:41                     ` Yoshiaki Tamura
2010-11-29 14:12                       ` Paul Brook
2010-11-29 14:37                         ` Yoshiaki Tamura
2010-11-29 14:56                           ` Paul Brook
2010-11-29 15:00                             ` Yoshiaki Tamura
2010-11-29 15:56                               ` Paul Brook
2010-11-29 16:23                               ` Stefan Hajnoczi
2010-11-29 16:41                                 ` Dor Laor
2010-11-29 16:53                                   ` Paul Brook
2010-11-29 17:05                                     ` Anthony Liguori
2010-11-29 17:18                                       ` Paul Brook
2010-11-29 17:33                                         ` Anthony Liguori
2010-11-30  7:13                                       ` Yoshiaki Tamura
2010-11-30  6:43                                   ` Yoshiaki Tamura
2010-11-30  9:13                                   ` Takuya Yoshikawa
2010-11-27 11:20       ` Paul Brook
2010-11-27 12:35         ` Yoshiaki Tamura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101226104607.GA32000@redhat.com \
    --to=mst@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=ohmura.kei@lab.ntt.co.jp \
    --cc=psuriset@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    --cc=tamura.yoshiaki@lab.ntt.co.jp \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.