From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=55767 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PWncm-0006R8-MT for qemu-devel@nongnu.org; Sun, 26 Dec 2010 05:14:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PWncl-000729-6d for qemu-devel@nongnu.org; Sun, 26 Dec 2010 05:14:48 -0500 Received: from mail-wy0-f173.google.com ([74.125.82.173]:47994) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PWnck-00071w-Qm for qemu-devel@nongnu.org; Sun, 26 Dec 2010 05:14:47 -0500 Received: by wyg36 with SMTP id 36so8599951wyg.4 for ; Sun, 26 Dec 2010 02:14:45 -0800 (PST) MIME-Version: 1.0 Sender: tamura.yoshiaki@gmail.com In-Reply-To: <20101226090517.GA3738@redhat.com> References: <20101202120213.GA2454@redhat.com> <20101216095140.GB19495@redhat.com> <20101216144010.GA25333@redhat.com> <20101224092710.GA23271@redhat.com> <20101226090517.GA3738@redhat.com> Date: Sun, 26 Dec 2010 19:14:44 +0900 Message-ID: Subject: Re: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble. From: Yoshiaki Tamura Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, Marcelo Tosatti , ohmura.kei@lab.ntt.co.jp, qemu-devel@nongnu.org, avi@redhat.com, vatsa@linux.vnet.ibm.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com 2010/12/26 Michael S. Tsirkin : > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote: >> >> If qemu_aio_flush() is responsible for flushing the outstanding >> >> virtio-net requests, I'm wondering why it's a problem for Kemari. >> >> As I described in the previous message, Kemari queues the >> >> requests first. =A0So in you example above, it should start with >> >> >> >> virtio-net: last_avai_idx 0 inuse 2 >> >> event-tap: {A,B} >> >> >> >> As you know, the requests are still in order still because net >> >> layer initiates in order. =A0Not about completing. >> >> >> >> In the first synchronization, the status above is transferred. =A0In >> >> the next synchronization, the status will be as following. >> >> >> >> virtio-net: last_avai_idx 1 inuse 1 >> >> event-tap: {B} >> > >> > OK, this answers the ordering question. >> >> Glad to hear that! >> >> > Another question: at this point we transfer this status: both >> > event-tap and virtio ring have the command B, >> > so the remote will have: >> > >> > virtio-net: inuse 0 >> > event-tap: {B} >> > >> > Is this right? This already seems to be a problem as when B completes >> > inuse will go negative? >> >> I think state above is wrong. =A0inuse 0 means there shouldn't be >> any requests in event-tap. =A0Note that the callback is called only >> when event-tap flushes the requests. >> >> > Next it seems that the remote virtio will resubmit B to event-tap. The >> > remote will then have: >> > >> > virtio-net: inuse 1 >> > event-tap: {B, B} >> > >> > This looks kind of wrong ... will two packets go out? >> >> No. =A0Currently, we're just replaying the requests with pio/mmio. >> In the situation above, it should be, >> >> virtio-net: inuse 1 >> event-tap: {B} >> >> Why? Because Kemari flushes the first virtio-net request using >> >> qemu_aio_flush() before each synchronization. =A0If >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed >> >> should be problematic. =A0So in the final synchronization, the >> >> state should be, >> >> >> >> virtio-net: last_avai_idx 2 inuse 0 >> >> event-tap: {} >> >> >> >> where A,B were completed in order. >> >> >> >> Yoshi >> > >> > >> > It might be better to discuss block because that's where >> > requests can complete out of order. >> >> It's same as net. =A0We queue requests and call bdrv_flush per >> sending requests to the block. =A0So there shouldn't be any >> inversion. >> >> > So let me see if I understand: >> > - each command passed to event tap is queued by it, >> > =A0it is not passed directly to the backend >> > - later requests are passed to the backend, >> > =A0always in the same order that they were submitted >> > - each synchronization point flushes all requests >> > =A0passed to the backend so far >> > - each synchronization transfers all requests not passed to the backen= d, >> > =A0to the remote, and they are replayed there >> >> Correct. >> >> > Now to analyse this for correctness I am looking at the original patch >> > because it is smaller so easier to analyse and I think it is >> > functionally equivalent, correct me if I am wrong in this. >> >> So you think decreasing last_avail_idx upon save is better than >> updating it in the callback? >> >> > So the reason there's no out of order issue is this >> > (and might be a good thing to put in commit log >> > or a comment somewhere): >> >> I've done some in the latest patch. =A0Please point it out if it >> wasn't enough. >> >> > At point of save callback event tap has flushed commands >> > passed to the backend already. Thus at the point of >> > the save callback if a command has completed >> > all previous commands have been flushed and completed. >> > >> > >> > Therefore inuse is >> > in fact the # of requests passed to event tap but not yet >> > passed to the backend (for non-event tap case all commands are >> > passed to the backend immediately and because of this >> > inuse is 0) and these are the last inuse commands submitted. >> > >> > >> > Right? >> >> Yep. >> >> > Now a question: >> > >> > When we pass last_used_index - inuse to the remote, >> > the remote virtio will resubmit the request. >> > Since request is also passed by event tap, we get >> > the request twice, why is this not a problem? >> >> It's not a problem because event-tap currently replays with >> pio/mmio only, as I mentioned above. =A0Although event-tap receives >> information about the queued requests, it won't pass it to the >> backend. =A0The reason is the problem in setting the callbacks >> which are specific to devices on the secondary. =A0These are >> pointers, and even worse, are usually static functions, which >> event-tap has no way to restore it upon failover. =A0I do want to >> change event-tap replay to be this way in the future, pio/mmio >> replay is implemented for now. >> >> Thanks, >> >> Yoshi >> > > Then I am still confused, sorry. =A0inuse !=3D 0 means that some requests > were passed to the backend but did not complete. =A0I think that if you d= o > a flush, this waits until all requests passed to the backend will > complete. =A0Why does not this guarantee inuse =3D 0 on the origin at the > synchronization point? The synchronization is done before event-tap releases requests to the backend, so there are two types of flush: event-tap and backend block/net. I assume you're confused with the fact that flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary decrease inuse if event-tap has queued requests because there are no requests passed to the backend. Let me do a case study again. virtio: inuse 4 event-tap: {A,B,C} backend: {D} synchronization starts. backend gets flushed. virtio: inuse 3 event-tap: {A,B,C} backend: {} synchronization gets done. # secondary is virtio: inuse 3 event-tap flushes one request. virtio: inuse 2 event-tap: {B,C} backend: {} repeats above and finally it should be, virtio: inuse 0 event-tap: {} Hope this helps. Yoshi > > -- > MST > >