From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=46394 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PWmYC-00052c-T7
	for qemu-devel@nongnu.org; Sun, 26 Dec 2010 04:06:02 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1PWmYB-0001Gj-6G
	for qemu-devel@nongnu.org; Sun, 26 Dec 2010 04:06:00 -0500
Received: from mx1.redhat.com ([209.132.183.28]:62493)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1PWmYA-0001Gc-Ve
	for qemu-devel@nongnu.org; Sun, 26 Dec 2010 04:05:59 -0500
Date: Sun, 26 Dec 2010 11:05:17 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20101226090517.GA3738@redhat.com>
References: <AANLkTim2CojXtyGVDNngzaYMCXfan3kkS=thDE9i-yT=@mail.gmail.com>
	<20101202120213.GA2454@redhat.com>
	<AANLkTimAkmqtAP4e_rvjc_NAsr7D86L3pD64HtqXa7DD@mail.gmail.com>
	<AANLkTimvsozOXJpwyUYAqWBKLFsY==x8AzCkJ4CapgTg@mail.gmail.com>
	<20101216095140.GB19495@redhat.com>
	<AANLkTikW+9CnDhqqMgtbdavScZ-Kg2UA-9xhGw988Qp9@mail.gmail.com>
	<20101216144010.GA25333@redhat.com>
	<AANLkTikspsLyUeTHQ3b0nyo=bXkFqhxYn_LBqwzNLNC7@mail.gmail.com>
	<20101224092710.GA23271@redhat.com>
	<AANLkTikWj5Fv484ijqsa74yy-DLL627eiJDhykn+v1dy@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <AANLkTikWj5Fv484ijqsa74yy-DLL627eiJDhykn+v1dy@mail.gmail.com>
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to
 handle inuse varialble.
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, ohmura.kei@lab.ntt.co.jp, Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, avi@redhat.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com

On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote:
> >> If qemu_aio_flush() is responsible for flushing the outstanding
> >> virtio-net requests, I'm wondering why it's a problem for Kemari.
> >> As I described in the previous message, Kemari queues the
> >> requests first. =A0So in you example above, it should start with
> >>
> >> virtio-net: last_avai_idx 0 inuse 2
> >> event-tap: {A,B}
> >>
> >> As you know, the requests are still in order still because net
> >> layer initiates in order. =A0Not about completing.
> >>
> >> In the first synchronization, the status above is transferred. =A0In
> >> the next synchronization, the status will be as following.
> >>
> >> virtio-net: last_avai_idx 1 inuse 1
> >> event-tap: {B}
> >
> > OK, this answers the ordering question.
>=20
> Glad to hear that!
>=20
> > Another question: at this point we transfer this status: both
> > event-tap and virtio ring have the command B,
> > so the remote will have:
> >
> > virtio-net: inuse 0
> > event-tap: {B}
> >
> > Is this right? This already seems to be a problem as when B completes
> > inuse will go negative?
>=20
> I think state above is wrong.  inuse 0 means there shouldn't be
> any requests in event-tap.  Note that the callback is called only
> when event-tap flushes the requests.
>=20
> > Next it seems that the remote virtio will resubmit B to event-tap. Th=
e
> > remote will then have:
> >
> > virtio-net: inuse 1
> > event-tap: {B, B}
> >
> > This looks kind of wrong ... will two packets go out?
>=20
> No.  Currently, we're just replaying the requests with pio/mmio.
> In the situation above, it should be,
>=20
> virtio-net: inuse 1
> event-tap: {B}
> >> Why? Because Kemari flushes the first virtio-net request using
> >> qemu_aio_flush() before each synchronization. =A0If
> >> qemu_aio_flush() doesn't guarantee the order, what you pointed
> >> should be problematic. =A0So in the final synchronization, the
> >> state should be,
> >>
> >> virtio-net: last_avai_idx 2 inuse 0
> >> event-tap: {}
> >>
> >> where A,B were completed in order.
> >>
> >> Yoshi
> >
> >
> > It might be better to discuss block because that's where
> > requests can complete out of order.
>=20
> It's same as net.  We queue requests and call bdrv_flush per
> sending requests to the block.  So there shouldn't be any
> inversion.
>=20
> > So let me see if I understand:
> > - each command passed to event tap is queued by it,
> > =A0it is not passed directly to the backend
> > - later requests are passed to the backend,
> > =A0always in the same order that they were submitted
> > - each synchronization point flushes all requests
> > =A0passed to the backend so far
> > - each synchronization transfers all requests not passed to the backe=
nd,
> > =A0to the remote, and they are replayed there
>=20
> Correct.
>=20
> > Now to analyse this for correctness I am looking at the original patc=
h
> > because it is smaller so easier to analyse and I think it is
> > functionally equivalent, correct me if I am wrong in this.
>=20
> So you think decreasing last_avail_idx upon save is better than
> updating it in the callback?
>=20
> > So the reason there's no out of order issue is this
> > (and might be a good thing to put in commit log
> > or a comment somewhere):
>=20
> I've done some in the latest patch.  Please point it out if it
> wasn't enough.
>=20
> > At point of save callback event tap has flushed commands
> > passed to the backend already. Thus at the point of
> > the save callback if a command has completed
> > all previous commands have been flushed and completed.
> >
> >
> > Therefore inuse is
> > in fact the # of requests passed to event tap but not yet
> > passed to the backend (for non-event tap case all commands are
> > passed to the backend immediately and because of this
> > inuse is 0) and these are the last inuse commands submitted.
> >
> >
> > Right?
>=20
> Yep.
>=20
> > Now a question:
> >
> > When we pass last_used_index - inuse to the remote,
> > the remote virtio will resubmit the request.
> > Since request is also passed by event tap, we get
> > the request twice, why is this not a problem?
>=20
> It's not a problem because event-tap currently replays with
> pio/mmio only, as I mentioned above.  Although event-tap receives
> information about the queued requests, it won't pass it to the
> backend.  The reason is the problem in setting the callbacks
> which are specific to devices on the secondary.  These are
> pointers, and even worse, are usually static functions, which
> event-tap has no way to restore it upon failover.  I do want to
> change event-tap replay to be this way in the future, pio/mmio
> replay is implemented for now.
>=20
> Thanks,
>=20
> Yoshi
>=20

Then I am still confused, sorry.  inuse !=3D 0 means that some requests
were passed to the backend but did not complete.  I think that if you do
a flush, this waits until all requests passed to the backend will
complete.  Why does not this guarantee inuse =3D 0 on the origin at the
synchronization point?

--=20
MST