From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH] kvm: First step to push iothread lock out
 of inner run loop
Date: Wed, 27 Jun 2012 08:09:31 -0300
Message-ID: <20120627110930.GA3590@amt.cnet>
References: <4FE4F56D.1020201@web.de> <4FE4F7F5.7030400@web.de>
	<20120626193420.GA19852@amt.cnet>
	<CAJSP0QXOoH-drT2ntiB0A2iuTBstkY_h5yN6mcsC7Dpc+w4qqg@mail.gmail.com>
	<CAJSP0QW=ZzJzjKes+e+9qSSEWWXAsDH+w00DgNY18yoQDPDHMg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Liu Ping Fan <pingfank@linux.vnet.ibm.com>, kvm <kvm@vger.kernel.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	Alexander Graf <agraf@suse.de>, Jan Kiszka <jan.kiszka@web.de>,
	Avi Kivity <avi@redhat.com>, Anthony Liguori <anthony@codemonkey.ws>
To: Stefan Hajnoczi <stefanha@gmail.com>
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
Content-Disposition: inline
In-Reply-To: <CAJSP0QW=ZzJzjKes+e+9qSSEWWXAsDH+w00DgNY18yoQDPDHMg@mail.gmail.com>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
List-Id: kvm.vger.kernel.org

On Wed, Jun 27, 2012 at 08:41:49AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 27, 2012 at 8:39 AM, Stefan Hajnoczi <stefanha@gmail.com> w=
rote:
> > On Tue, Jun 26, 2012 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com=
> wrote:
> >> On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote:
> >> net.txt
> >> --------
> >>
> >>
> >> iothread flow
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>
> >> 1) Skip-work-if-device-locked
> >>
> >> select(tap fd ready)
> >> tap_send
> >> =A0 =A0if (trylock(TAPState->NetClientState->dev))
> >> =A0 =A0 =A0 =A0proceed;
> >> =A0 =A0else
> >> =A0 =A0 =A0 =A0continue; (data remains in queue)
> >> =A0 =A0tap_read_packet
> >> =A0 =A0qemu_send_packet_async
> >>
> >> DRAWBACK: lock intensive.
> >> DRAWBACK: VCPU threads can starve IOTHREAD (can be fixed with
> >> a scheme such as trylock() marks a flag saying "iothread wants lock".
> >
> > One alternative is to say the lock cannot be held across system calls
> > or other long-running operations (similar to interrupt handler
> > spinlocks in the kernel). =A0But QEMU threads are still at the mercy =
of
> > the scheduler or page faults, so perhaps this is not a good idea
> > because waiting on the lock could tie up the iothread.
> >
> >> 2) Queue-work-to-vcpu-context
> >>
> >> select(tap fd ready)
> >> tap_send
> >> =A0 =A0if (trylock(TAPState->NetClientState->dev)) {
> >> =A0 =A0 =A0 =A0proceed;
> >> =A0 =A0} else {
> >> =A0 =A0 =A0 =A0AddToDeviceWorkQueue(work);
> >> =A0 =A0}
> >> =A0 =A0tap_read_packet
> >> =A0 =A0qemu_send_packet_async
> >>
> >> DRAWBACK: lock intensive
> >> DRAWBACK: cache effects
> >
> > This almost suggests we should invert packet reception for NICs. =A0t=
ap
> > fd ready should add a work item and the guest device calls into
> > net/tap.c to pull out any packets from the work function:
> >
> > tap_send
> > =A0 =A0dev_add_work(work);
> >
> > virtio_net_rx_work_fn
> > =A0 =A0while ((req =3D virtqueue_pop(rx_queue))) {
> > =A0 =A0 =A0 =A0if (!net_receive_packet(netclient, req->buf)) {
> > =A0 =A0 =A0 =A0 =A0 =A0break; /* no more packets available */
> > =A0 =A0 =A0 =A0}
> > =A0 =A0 =A0 =A0/* ...mark req complete and push onto virtqueue... */
> > =A0 =A0}
> > =A0 =A0virtqueue_notify(rx_queue);
> >
> > The idea is to avoid copies on the rx path and make the locking simpl=
e
> > by always deferring to a work function (which can be invoked
> > immediately if the device isn't locked).
>
> I just realized this approach bypasses net/queue.c.  I think it works
> really well for the peer model (no "VLANs").  Basically flow control
> is dictated by the peer (receiver) and we don't need to use NetQueue
> anymore.  Whether this works elegantly for slirp and other backends,
> I'm not sure yet.

Right. The advantage is only backends where performance matters=20
need to be converted (and only net hw drivers that matter performance
wise need to be converted).