From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=56621 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PWo7a-0005Zs-Th for qemu-devel@nongnu.org; Sun, 26 Dec 2010 05:46:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PWo7Z-0003gQ-Gy for qemu-devel@nongnu.org; Sun, 26 Dec 2010 05:46:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45468) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PWo7Z-0003gJ-3d for qemu-devel@nongnu.org; Sun, 26 Dec 2010 05:46:37 -0500 Date: Sun, 26 Dec 2010 12:46:07 +0200 From: "Michael S. Tsirkin" Subject: Re: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble. Message-ID: <20101226104607.GA32000@redhat.com> References: <20101216095140.GB19495@redhat.com> <20101216144010.GA25333@redhat.com> <20101224092710.GA23271@redhat.com> <20101226090517.GA3738@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, Marcelo Tosatti , ohmura.kei@lab.ntt.co.jp, qemu-devel@nongnu.org, avi@redhat.com, vatsa@linux.vnet.ibm.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com On Sun, Dec 26, 2010 at 07:14:44PM +0900, Yoshiaki Tamura wrote: > 2010/12/26 Michael S. Tsirkin : > > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: > >> >> If qemu_aio_flush() is responsible for flushing the outstanding > >> >> virtio-net requests, I'm wondering why it's a problem for Kemari. > >> >> As I described in the previous message, Kemari queues the > >> >> requests first. =A0So in you example above, it should start with > >> >> > >> >> virtio-net: last_avai_idx 0 inuse 2 > >> >> event-tap: {A,B} > >> >> > >> >> As you know, the requests are still in order still because net > >> >> layer initiates in order. =A0Not about completing. > >> >> > >> >> In the first synchronization, the status above is transferred. =A0= In > >> >> the next synchronization, the status will be as following. > >> >> > >> >> virtio-net: last_avai_idx 1 inuse 1 > >> >> event-tap: {B} > >> > > >> > OK, this answers the ordering question. > >> > >> Glad to hear that! > >> > >> > Another question: at this point we transfer this status: both > >> > event-tap and virtio ring have the command B, > >> > so the remote will have: > >> > > >> > virtio-net: inuse 0 > >> > event-tap: {B} > >> > > >> > Is this right? This already seems to be a problem as when B comple= tes > >> > inuse will go negative? > >> > >> I think state above is wrong. =A0inuse 0 means there shouldn't be > >> any requests in event-tap. =A0Note that the callback is called only > >> when event-tap flushes the requests. > >> > >> > Next it seems that the remote virtio will resubmit B to event-tap.= The > >> > remote will then have: > >> > > >> > virtio-net: inuse 1 > >> > event-tap: {B, B} > >> > > >> > This looks kind of wrong ... will two packets go out? > >> > >> No. =A0Currently, we're just replaying the requests with pio/mmio. > >> In the situation above, it should be, > >> > >> virtio-net: inuse 1 > >> event-tap: {B} > >> >> Why? Because Kemari flushes the first virtio-net request using > >> >> qemu_aio_flush() before each synchronization. =A0If > >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed > >> >> should be problematic. =A0So in the final synchronization, the > >> >> state should be, > >> >> > >> >> virtio-net: last_avai_idx 2 inuse 0 > >> >> event-tap: {} > >> >> > >> >> where A,B were completed in order. > >> >> > >> >> Yoshi > >> > > >> > > >> > It might be better to discuss block because that's where > >> > requests can complete out of order. > >> > >> It's same as net. =A0We queue requests and call bdrv_flush per > >> sending requests to the block. =A0So there shouldn't be any > >> inversion. > >> > >> > So let me see if I understand: > >> > - each command passed to event tap is queued by it, > >> > =A0it is not passed directly to the backend > >> > - later requests are passed to the backend, > >> > =A0always in the same order that they were submitted > >> > - each synchronization point flushes all requests > >> > =A0passed to the backend so far > >> > - each synchronization transfers all requests not passed to the ba= ckend, > >> > =A0to the remote, and they are replayed there > >> > >> Correct. > >> > >> > Now to analyse this for correctness I am looking at the original p= atch > >> > because it is smaller so easier to analyse and I think it is > >> > functionally equivalent, correct me if I am wrong in this. > >> > >> So you think decreasing last_avail_idx upon save is better than > >> updating it in the callback? > >> > >> > So the reason there's no out of order issue is this > >> > (and might be a good thing to put in commit log > >> > or a comment somewhere): > >> > >> I've done some in the latest patch. =A0Please point it out if it > >> wasn't enough. > >> > >> > At point of save callback event tap has flushed commands > >> > passed to the backend already. Thus at the point of > >> > the save callback if a command has completed > >> > all previous commands have been flushed and completed. > >> > > >> > > >> > Therefore inuse is > >> > in fact the # of requests passed to event tap but not yet > >> > passed to the backend (for non-event tap case all commands are > >> > passed to the backend immediately and because of this > >> > inuse is 0) and these are the last inuse commands submitted. > >> > > >> > > >> > Right? > >> > >> Yep. > >> > >> > Now a question: > >> > > >> > When we pass last_used_index - inuse to the remote, > >> > the remote virtio will resubmit the request. > >> > Since request is also passed by event tap, we get > >> > the request twice, why is this not a problem? > >> > >> It's not a problem because event-tap currently replays with > >> pio/mmio only, as I mentioned above. =A0Although event-tap receives > >> information about the queued requests, it won't pass it to the > >> backend. =A0The reason is the problem in setting the callbacks > >> which are specific to devices on the secondary. =A0These are > >> pointers, and even worse, are usually static functions, which > >> event-tap has no way to restore it upon failover. =A0I do want to > >> change event-tap replay to be this way in the future, pio/mmio > >> replay is implemented for now. > >> > >> Thanks, > >> > >> Yoshi > >> > > > > Then I am still confused, sorry. =A0inuse !=3D 0 means that some requ= ests > > were passed to the backend but did not complete. =A0I think that if y= ou do > > a flush, this waits until all requests passed to the backend will > > complete. =A0Why does not this guarantee inuse =3D 0 on the origin at= the > > synchronization point? >=20 > The synchronization is done before event-tap releases requests to > the backend, so there are two types of flush: event-tap and > backend block/net. I assume you're confused with the fact that > flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary > decrease inuse if event-tap has queued requests because there are > no requests passed to the backend. Let me do a case study again. >=20 > virtio: inuse 4 > event-tap: {A,B,C} > backend: {D} >=20 There are two event-tap devices, right? PIO one is above virtio, AIO one is between virtio and backend (e.g. bdrv)? Which one is meant here? > synchronization starts. backend gets flushed. >=20 > virtio: inuse 3 > event-tap: {A,B,C} > backend: {} > synchronization gets done. > # secondary is virtio: inuse 3 >=20 > event-tap flushes one request. >=20 > virtio: inuse 2 > event-tap: {B,C} > backend: {} > repeats above and finally it should be, >=20 > virtio: inuse 0 > event-tap: {} >=20 > Hope this helps. >=20 > Yoshi >=20 > > > > -- > > MST > > > >