From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=34694 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Pa4vT-0002lA-Aw
	for qemu-devel@nongnu.org; Tue, 04 Jan 2011 06:19:40 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Pa4vS-0002y2-0X
	for qemu-devel@nongnu.org; Tue, 04 Jan 2011 06:19:39 -0500
Received: from mx1.redhat.com ([209.132.183.28]:33067)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Pa4vR-0002xr-LD
	for qemu-devel@nongnu.org; Tue, 04 Jan 2011 06:19:37 -0500
Date: Tue, 4 Jan 2011 13:19:08 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20110104111908.GA5694@redhat.com>
References: <1290665220-26478-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp>
	<1290665220-26478-10-git-send-email-tamura.yoshiaki@lab.ntt.co.jp>
	<AANLkTimGqh7v5gHrax_Yjt8vJwJPda6rrFhs+rY4UDik@mail.gmail.com>
	<AANLkTin=0msj=vAyWskTJBTWOJC9xYH6fOzVT3m=WUJH@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <AANLkTin=0msj=vAyWskTJBTWOJC9xYH6fOzVT3m=WUJH@mail.gmail.com>
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com, kvm@vger.kernel.org, ohmura.kei@lab.ntt.co.jp, Stefan Hajnoczi <stefanha@gmail.com>, mtosatti@redhat.com, qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, avi@redhat.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com

On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
> 2010/11/29 Stefan Hajnoczi <stefanha@gmail.com>:
> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
> > <tamura.yoshiaki@lab.ntt.co.jp> wrote:
> >> event-tap controls when to start FT transaction, and provides proxy
> >> functions to called from net/block devices. =A0While FT transaction,=
 it
> >> queues up net/block requests, and flush them when the transaction ge=
ts
> >> completed.
> >>
> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
> >> Signed-off-by: OHMURA Kei <ohmura.kei@lab.ntt.co.jp>
> >> ---
> >> =A0Makefile.target | =A0 =A01 +
> >> =A0block.h =A0 =A0 =A0 =A0 | =A0 =A09 +
> >> =A0event-tap.c =A0 =A0 | =A0794 ++++++++++++++++++++++++++++++++++++=
+++++++++++++++++++
> >> =A0event-tap.h =A0 =A0 | =A0 34 +++
> >> =A0net.h =A0 =A0 =A0 =A0 =A0 | =A0 =A04 +
> >> =A0net/queue.c =A0 =A0 | =A0 =A01 +
> >> =A06 files changed, 843 insertions(+), 0 deletions(-)
> >> =A0create mode 100644 event-tap.c
> >> =A0create mode 100644 event-tap.h
> >
> > event_tap_state is checked at the beginning of several functions. =A0=
If
> > there is an unexpected state the function silently returns. =A0Should
> > these checks really be assert() so there is an abort and backtrace if
> > the program ever reaches this state?
> >
> >> +typedef struct EventTapBlkReq {
> >> + =A0 =A0char *device_name;
> >> + =A0 =A0int num_reqs;
> >> + =A0 =A0int num_cbs;
> >> + =A0 =A0bool is_multiwrite;
> >
> > Is multiwrite logging necessary? =A0If event tap is called from withi=
n
> > the block layer then multiwrite is turned into one or more
> > bdrv_aio_writev() calls.
> >
> >> +static void event_tap_replay(void *opaque, int running, int reason)
> >> +{
> >> + =A0 =A0EventTapLog *log, *next;
> >> +
> >> + =A0 =A0if (!running) {
> >> + =A0 =A0 =A0 =A0return;
> >> + =A0 =A0}
> >> +
> >> + =A0 =A0if (event_tap_state !=3D EVENT_TAP_LOAD) {
> >> + =A0 =A0 =A0 =A0return;
> >> + =A0 =A0}
> >> +
> >> + =A0 =A0event_tap_state =3D EVENT_TAP_REPLAY;
> >> +
> >> + =A0 =A0QTAILQ_FOREACH(log, &event_list, node) {
> >> + =A0 =A0 =A0 =A0EventTapBlkReq *blk_req;
> >> +
> >> + =A0 =A0 =A0 =A0/* event resume */
> >> + =A0 =A0 =A0 =A0switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
> >> + =A0 =A0 =A0 =A0case EVENT_TAP_NET:
> >> + =A0 =A0 =A0 =A0 =A0 =A0event_tap_net_flush(&log->net_req);
> >> + =A0 =A0 =A0 =A0 =A0 =A0break;
> >> + =A0 =A0 =A0 =A0case EVENT_TAP_BLK:
> >> + =A0 =A0 =A0 =A0 =A0 =A0blk_req =3D &log->blk_req;
> >> + =A0 =A0 =A0 =A0 =A0 =A0if ((log->mode & EVENT_TAP_TYPE_MASK) =3D=3D=
 EVENT_TAP_IOPORT) {
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0switch (log->ioport.index) {
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 0:
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outb(log->ioport.addres=
s, log->ioport.data);
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 1:
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outw(log->ioport.addres=
s, log->ioport.data);
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0case 2:
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_outl(log->ioport.addres=
s, log->ioport.data);
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> >> + =A0 =A0 =A0 =A0 =A0 =A0} else {
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* EVENT_TAP_MMIO */
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpu_physical_memory_rw(log->mmio.ad=
dress,
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 log->mmio.buf,
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 log->mmio.len, 1);
> >> + =A0 =A0 =A0 =A0 =A0 =A0}
> >> + =A0 =A0 =A0 =A0 =A0 =A0break;
> >
> > Why are net tx packets replayed at the net level but blk requests are
> > replayed at the pio/mmio level?
> >
> > I expected everything to replay either as pio/mmio or as net/block.
>=20
> Stefan,
>=20
> After doing some heavy load tests, I realized that we have to
> take a hybrid approach to replay for now.  This is because when a
> device moves to the next state (e.g. virtio decreases inuse) is
> different between net and block.  For example, virtio-net
> decreases inuse upon returning from the net layer,
> but virtio-blk
> does that inside of the callback.

For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
Both are invoked from a callback.

> If we only use pio/mmio
> replay, even though event-tap tries to replay net requests, some
> get lost because the state has proceeded already.

It seems that all you need to do to avoid this is to
delay the callback?

> This doesn't
> happen with block, because the state is still old enough to
> replay.  Note that using hybrid approach won't cause duplicated
> requests on the secondary.

An assumption devices make is that a buffer is unused once
completion callback was invoked. Does this violate that assumption?

--=20
MST