From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=34694 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa4vT-0002lA-Aw for qemu-devel@nongnu.org; Tue, 04 Jan 2011 06:19:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pa4vS-0002y2-0X for qemu-devel@nongnu.org; Tue, 04 Jan 2011 06:19:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33067) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pa4vR-0002xr-LD for qemu-devel@nongnu.org; Tue, 04 Jan 2011 06:19:37 -0500 Date: Tue, 4 Jan 2011 13:19:08 +0200 From: "Michael S. Tsirkin" Message-ID: <20110104111908.GA5694@redhat.com> References: <1290665220-26478-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1290665220-26478-10-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap. List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, ohmura.kei@lab.ntt.co.jp, Stefan Hajnoczi , mtosatti@redhat.com, qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, avi@redhat.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: > 2010/11/29 Stefan Hajnoczi : > > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura > > wrote: > >> event-tap controls when to start FT transaction, and provides proxy > >> functions to called from net/block devices. =A0While FT transaction,= it > >> queues up net/block requests, and flush them when the transaction ge= ts > >> completed. > >> > >> Signed-off-by: Yoshiaki Tamura > >> Signed-off-by: OHMURA Kei > >> --- > >> =A0Makefile.target | =A0 =A01 + > >> =A0block.h =A0 =A0 =A0 =A0 | =A0 =A09 + > >> =A0event-tap.c =A0 =A0 | =A0794 ++++++++++++++++++++++++++++++++++++= +++++++++++++++++++ > >> =A0event-tap.h =A0 =A0 | =A0 34 +++ > >> =A0net.h =A0 =A0 =A0 =A0 =A0 | =A0 =A04 + > >> =A0net/queue.c =A0 =A0 | =A0 =A01 + > >> =A06 files changed, 843 insertions(+), 0 deletions(-) > >> =A0create mode 100644 event-tap.c > >> =A0create mode 100644 event-tap.h > > > > event_tap_state is checked at the beginning of several functions. =A0= If > > there is an unexpected state the function silently returns. =A0Should > > these checks really be assert() so there is an abort and backtrace if > > the program ever reaches this state? > > > >> +typedef struct EventTapBlkReq { > >> + =A0 =A0char *device_name; > >> + =A0 =A0int num_reqs; > >> + =A0 =A0int num_cbs; > >> + =A0 =A0bool is_multiwrite; > > > > Is multiwrite logging necessary? =A0If event tap is called from withi= n > > the block layer then multiwrite is turned into one or more > > bdrv_aio_writev() calls. > > > >> +static void event_tap_replay(void *opaque, int running, int reason) > >> +{ > >> + =A0 =A0EventTapLog *log, *next; > >> + > >> + =A0 =A0if (!running) { > >> + =A0 =A0 =A0 =A0return; > >> + =A0 =A0} > >> + > >> + =A0 =A0if (event_tap_state !=3D EVENT_TAP_LOAD) { > >> + =A0 =A0 =A0 =A0return; > >> + =A0 =A0} > >> + > >> + =A0 =A0event_tap_state =3D EVENT_TAP_REPLAY; > >> + > >> + =A0 =A0QTAILQ_FOREACH(log, &event_list, node) { > >> + =A0 =A0 =A0 =A0EventTapBlkReq *blk_req; > >> + > >> + =A0 =A0 =A0 =A0/* event resume */ > >> + =A0 =A0 =A0 =A0switch (log->mode & ~EVENT_TAP_TYPE_MASK) { > >> + =A0 =A0 =A0 =A0case EVENT_TAP_NET: > >> + =A0 =A0 =A0 =A0 =A0 =A0event_tap_net_flush(&log->net_req); > >> + =A0 =A0 =A0 =A0 =A0 =A0break; > >> + =A0 =A0 =A0 =A0case EVENT_TAP_BLK: > >> + =A0 =A0 =A0 =A0 =A0 =A0blk_req =3D &log->blk_req; > >> + =A0 =A0 =A0 =A0 =A0 =A0if ((log->mode & EVENT_TAP_TYPE_MASK) =3D=3D= EVENT_TAP_IOPORT) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0switch (log->ioport.index) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 0: > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outb(log->ioport.addres= s, log->ioport.data); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 1: > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outw(log->ioport.addres= s, log->ioport.data); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 2: > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outl(log->ioport.addres= s, log->ioport.data); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > >> + =A0 =A0 =A0 =A0 =A0 =A0} else { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* EVENT_TAP_MMIO */ > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_physical_memory_rw(log->mmio.ad= dress, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 log->mmio.buf, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 log->mmio.len, 1); > >> + =A0 =A0 =A0 =A0 =A0 =A0} > >> + =A0 =A0 =A0 =A0 =A0 =A0break; > > > > Why are net tx packets replayed at the net level but blk requests are > > replayed at the pio/mmio level? > > > > I expected everything to replay either as pio/mmio or as net/block. >=20 > Stefan, >=20 > After doing some heavy load tests, I realized that we have to > take a hybrid approach to replay for now. This is because when a > device moves to the next state (e.g. virtio decreases inuse) is > different between net and block. For example, virtio-net > decreases inuse upon returning from the net layer, > but virtio-blk > does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. > If we only use pio/mmio > replay, even though event-tap tries to replay net requests, some > get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? > This doesn't > happen with block, because the state is still old enough to > replay. Note that using hybrid approach won't cause duplicated > requests on the secondary. An assumption devices make is that a buffer is unused once completion callback was invoked. Does this violate that assumption? --=20 MST