From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=38045 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PalVK-000194-Tq for qemu-devel@nongnu.org; Thu, 06 Jan 2011 03:47:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PalVJ-0001B1-2r for qemu-devel@nongnu.org; Thu, 06 Jan 2011 03:47:30 -0500 Received: from mail-ww0-f53.google.com ([74.125.82.53]:51408) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PalVI-0001AH-QR for qemu-devel@nongnu.org; Thu, 06 Jan 2011 03:47:29 -0500 Received: by wwi18 with SMTP id 18so16296933wwi.10 for ; Thu, 06 Jan 2011 00:47:27 -0800 (PST) MIME-Version: 1.0 Sender: tamura.yoshiaki@gmail.com In-Reply-To: <20110104144214.GC8734@redhat.com> References: <1290665220-26478-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1290665220-26478-10-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <20110104111908.GA5694@redhat.com> <20110104131052.GA8734@redhat.com> <20110104144214.GC8734@redhat.com> Date: Thu, 6 Jan 2011 17:47:27 +0900 Message-ID: Subject: Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap. From: Yoshiaki Tamura Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aliguori@us.ibm.com, mtosatti@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, Stefan Hajnoczi , dlaor@redhat.com, ohmura.kei@lab.ntt.co.jp, qemu-devel@nongnu.org, avi@redhat.com, vatsa@linux.vnet.ibm.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com 2011/1/4 Michael S. Tsirkin : > On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote: >> 2011/1/4 Michael S. Tsirkin : >> > On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: >> >> 2011/1/4 Michael S. Tsirkin : >> >> > On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: >> >> >> 2010/11/29 Stefan Hajnoczi : >> >> >> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura >> >> >> > wrote: >> >> >> >> event-tap controls when to start FT transaction, and provides p= roxy >> >> >> >> functions to called from net/block devices. =A0While FT transac= tion, it >> >> >> >> queues up net/block requests, and flush them when the transacti= on gets >> >> >> >> completed. >> >> >> >> >> >> >> >> Signed-off-by: Yoshiaki Tamura >> >> >> >> Signed-off-by: OHMURA Kei >> >> >> >> --- >> >> >> >> =A0Makefile.target | =A0 =A01 + >> >> >> >> =A0block.h =A0 =A0 =A0 =A0 | =A0 =A09 + >> >> >> >> =A0event-tap.c =A0 =A0 | =A0794 +++++++++++++++++++++++++++++++= ++++++++++++++++++++++++ >> >> >> >> =A0event-tap.h =A0 =A0 | =A0 34 +++ >> >> >> >> =A0net.h =A0 =A0 =A0 =A0 =A0 | =A0 =A04 + >> >> >> >> =A0net/queue.c =A0 =A0 | =A0 =A01 + >> >> >> >> =A06 files changed, 843 insertions(+), 0 deletions(-) >> >> >> >> =A0create mode 100644 event-tap.c >> >> >> >> =A0create mode 100644 event-tap.h >> >> >> > >> >> >> > event_tap_state is checked at the beginning of several functions= . =A0If >> >> >> > there is an unexpected state the function silently returns. =A0S= hould >> >> >> > these checks really be assert() so there is an abort and backtra= ce if >> >> >> > the program ever reaches this state? >> >> >> > >> >> >> >> +typedef struct EventTapBlkReq { >> >> >> >> + =A0 =A0char *device_name; >> >> >> >> + =A0 =A0int num_reqs; >> >> >> >> + =A0 =A0int num_cbs; >> >> >> >> + =A0 =A0bool is_multiwrite; >> >> >> > >> >> >> > Is multiwrite logging necessary? =A0If event tap is called from = within >> >> >> > the block layer then multiwrite is turned into one or more >> >> >> > bdrv_aio_writev() calls. >> >> >> > >> >> >> >> +static void event_tap_replay(void *opaque, int running, int re= ason) >> >> >> >> +{ >> >> >> >> + =A0 =A0EventTapLog *log, *next; >> >> >> >> + >> >> >> >> + =A0 =A0if (!running) { >> >> >> >> + =A0 =A0 =A0 =A0return; >> >> >> >> + =A0 =A0} >> >> >> >> + >> >> >> >> + =A0 =A0if (event_tap_state !=3D EVENT_TAP_LOAD) { >> >> >> >> + =A0 =A0 =A0 =A0return; >> >> >> >> + =A0 =A0} >> >> >> >> + >> >> >> >> + =A0 =A0event_tap_state =3D EVENT_TAP_REPLAY; >> >> >> >> + >> >> >> >> + =A0 =A0QTAILQ_FOREACH(log, &event_list, node) { >> >> >> >> + =A0 =A0 =A0 =A0EventTapBlkReq *blk_req; >> >> >> >> + >> >> >> >> + =A0 =A0 =A0 =A0/* event resume */ >> >> >> >> + =A0 =A0 =A0 =A0switch (log->mode & ~EVENT_TAP_TYPE_MASK) { >> >> >> >> + =A0 =A0 =A0 =A0case EVENT_TAP_NET: >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0event_tap_net_flush(&log->net_req); >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0break; >> >> >> >> + =A0 =A0 =A0 =A0case EVENT_TAP_BLK: >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0blk_req =3D &log->blk_req; >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0if ((log->mode & EVENT_TAP_TYPE_MASK) = =3D=3D EVENT_TAP_IOPORT) { >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0switch (log->ioport.index) { >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 0: >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outb(log->ioport.a= ddress, log->ioport.data); >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 1: >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outw(log->ioport.a= ddress, log->ioport.data); >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 2: >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outl(log->ioport.a= ddress, log->ioport.data); >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0} else { >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* EVENT_TAP_MMIO */ >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_physical_memory_rw(log->mm= io.address, >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 log->mmio.buf, >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 log->mmio.len, 1); >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0} >> >> >> >> + =A0 =A0 =A0 =A0 =A0 =A0break; >> >> >> > >> >> >> > Why are net tx packets replayed at the net level but blk request= s are >> >> >> > replayed at the pio/mmio level? >> >> >> > >> >> >> > I expected everything to replay either as pio/mmio or as net/blo= ck. >> >> >> >> >> >> Stefan, >> >> >> >> >> >> After doing some heavy load tests, I realized that we have to >> >> >> take a hybrid approach to replay for now. =A0This is because when = a >> >> >> device moves to the next state (e.g. virtio decreases inuse) is >> >> >> different between net and block. =A0For example, virtio-net >> >> >> decreases inuse upon returning from the net layer, >> >> >> but virtio-blk >> >> >> does that inside of the callback. >> >> > >> >> > For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete= . >> >> > For RX, virtio-net calls virtqueue_flush from virtio_net_receive. >> >> > Both are invoked from a callback. >> >> > >> >> >> If we only use pio/mmio >> >> >> replay, even though event-tap tries to replay net requests, some >> >> >> get lost because the state has proceeded already. >> >> > >> >> > It seems that all you need to do to avoid this is to >> >> > delay the callback? >> >> >> >> Yeah, if it's possible. =A0But if you take a look at virtio-net, >> >> you'll see that virtio_push is called immediately after calling >> >> qemu_sendv_packet >> >> while virtio-blk does that in the callback. >> > >> > This is only if the packet was sent immediately. >> > I was referring to the case where the packet is queued. >> >> I see. =A0I usually don't see packets get queued in the net layer. >> What would be the effect to devices? =A0Restraint sending packets? > > Yes. > >> > >> >> > >> >> >> This doesn't >> >> >> happen with block, because the state is still old enough to >> >> >> replay. =A0Note that using hybrid approach won't cause duplicated >> >> >> requests on the secondary. >> >> > >> >> > An assumption devices make is that a buffer is unused once >> >> > completion callback was invoked. Does this violate that assumption? >> >> >> >> No, it shouldn't. =A0In case of net with net layer replay, we copy >> >> the content of the requests, and in case of block, because we >> >> haven't called the callback yet, the requests remains fresh. >> >> >> >> Yoshi >> >> >> > >> > Yes, as long as you copy it should be fine. =A0Maybe it's a good idea = for >> > event-tap to queue all packets to avoid the copy and avoid the need to >> > replay at the net level. >> >> If queuing works fine for the devices, it seems to be a good >> idea. =A0I think the ordering issue doesn't happen still. >> >> Yoshi > > If you replay and both net and pio level, it becomes complex. > Maybe it's ok, but certainly harder to reason about. Michael, It seems queuing at event-tap like in net layer works for devices that use qemu_send_packet_async as you suggested. But for those that use qemu_send_packet, we still need to copy the contents just like net layer queuing does, and net level replay should be kept to handle it. Thanks, Yoshi > >> > >> >> > >> >> > -- >> >> > MST >> >> > >> >> > >> > >> > > >