From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=46394 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PWmYC-00052c-T7 for qemu-devel@nongnu.org; Sun, 26 Dec 2010 04:06:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PWmYB-0001Gj-6G for qemu-devel@nongnu.org; Sun, 26 Dec 2010 04:06:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:62493) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PWmYA-0001Gc-Ve for qemu-devel@nongnu.org; Sun, 26 Dec 2010 04:05:59 -0500 Date: Sun, 26 Dec 2010 11:05:17 +0200 From: "Michael S. Tsirkin" Message-ID: <20101226090517.GA3738@redhat.com> References: <20101202120213.GA2454@redhat.com> <20101216095140.GB19495@redhat.com> <20101216144010.GA25333@redhat.com> <20101224092710.GA23271@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble. List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, ohmura.kei@lab.ntt.co.jp, Marcelo Tosatti , qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, avi@redhat.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: > >> If qemu_aio_flush() is responsible for flushing the outstanding > >> virtio-net requests, I'm wondering why it's a problem for Kemari. > >> As I described in the previous message, Kemari queues the > >> requests first. =A0So in you example above, it should start with > >> > >> virtio-net: last_avai_idx 0 inuse 2 > >> event-tap: {A,B} > >> > >> As you know, the requests are still in order still because net > >> layer initiates in order. =A0Not about completing. > >> > >> In the first synchronization, the status above is transferred. =A0In > >> the next synchronization, the status will be as following. > >> > >> virtio-net: last_avai_idx 1 inuse 1 > >> event-tap: {B} > > > > OK, this answers the ordering question. >=20 > Glad to hear that! >=20 > > Another question: at this point we transfer this status: both > > event-tap and virtio ring have the command B, > > so the remote will have: > > > > virtio-net: inuse 0 > > event-tap: {B} > > > > Is this right? This already seems to be a problem as when B completes > > inuse will go negative? >=20 > I think state above is wrong. inuse 0 means there shouldn't be > any requests in event-tap. Note that the callback is called only > when event-tap flushes the requests. >=20 > > Next it seems that the remote virtio will resubmit B to event-tap. Th= e > > remote will then have: > > > > virtio-net: inuse 1 > > event-tap: {B, B} > > > > This looks kind of wrong ... will two packets go out? >=20 > No. Currently, we're just replaying the requests with pio/mmio. > In the situation above, it should be, >=20 > virtio-net: inuse 1 > event-tap: {B} > >> Why? Because Kemari flushes the first virtio-net request using > >> qemu_aio_flush() before each synchronization. =A0If > >> qemu_aio_flush() doesn't guarantee the order, what you pointed > >> should be problematic. =A0So in the final synchronization, the > >> state should be, > >> > >> virtio-net: last_avai_idx 2 inuse 0 > >> event-tap: {} > >> > >> where A,B were completed in order. > >> > >> Yoshi > > > > > > It might be better to discuss block because that's where > > requests can complete out of order. >=20 > It's same as net. We queue requests and call bdrv_flush per > sending requests to the block. So there shouldn't be any > inversion. >=20 > > So let me see if I understand: > > - each command passed to event tap is queued by it, > > =A0it is not passed directly to the backend > > - later requests are passed to the backend, > > =A0always in the same order that they were submitted > > - each synchronization point flushes all requests > > =A0passed to the backend so far > > - each synchronization transfers all requests not passed to the backe= nd, > > =A0to the remote, and they are replayed there >=20 > Correct. >=20 > > Now to analyse this for correctness I am looking at the original patc= h > > because it is smaller so easier to analyse and I think it is > > functionally equivalent, correct me if I am wrong in this. >=20 > So you think decreasing last_avail_idx upon save is better than > updating it in the callback? >=20 > > So the reason there's no out of order issue is this > > (and might be a good thing to put in commit log > > or a comment somewhere): >=20 > I've done some in the latest patch. Please point it out if it > wasn't enough. >=20 > > At point of save callback event tap has flushed commands > > passed to the backend already. Thus at the point of > > the save callback if a command has completed > > all previous commands have been flushed and completed. > > > > > > Therefore inuse is > > in fact the # of requests passed to event tap but not yet > > passed to the backend (for non-event tap case all commands are > > passed to the backend immediately and because of this > > inuse is 0) and these are the last inuse commands submitted. > > > > > > Right? >=20 > Yep. >=20 > > Now a question: > > > > When we pass last_used_index - inuse to the remote, > > the remote virtio will resubmit the request. > > Since request is also passed by event tap, we get > > the request twice, why is this not a problem? >=20 > It's not a problem because event-tap currently replays with > pio/mmio only, as I mentioned above. Although event-tap receives > information about the queued requests, it won't pass it to the > backend. The reason is the problem in setting the callbacks > which are specific to devices on the secondary. These are > pointers, and even worse, are usually static functions, which > event-tap has no way to restore it upon failover. I do want to > change event-tap replay to be this way in the future, pio/mmio > replay is implemented for now. >=20 > Thanks, >=20 > Yoshi >=20 Then I am still confused, sorry. inuse !=3D 0 means that some requests were passed to the backend but did not complete. I think that if you do a flush, this waits until all requests passed to the backend will complete. Why does not this guarantee inuse =3D 0 on the origin at the synchronization point? --=20 MST