From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [Qemu-devel] [PATCH] kvm: First step to push iothread lock out of inner run loop Date: Wed, 27 Jun 2012 08:19:03 -0300 Message-ID: <20120627111903.GA3994@amt.cnet> References: <4FE4F56D.1020201@web.de> <4FE4F7F5.7030400@web.de> <20120626193420.GA19852@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jan Kiszka , Liu Ping Fan , kvm , qemu-devel , Alexander Graf , Avi Kivity , Anthony Liguori To: Stefan Hajnoczi Return-path: Received: from mx1.redhat.com ([209.132.183.28]:25076 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753025Ab2F0LWn (ORCPT ); Wed, 27 Jun 2012 07:22:43 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Jun 27, 2012 at 08:41:49AM +0100, Stefan Hajnoczi wrote: > On Wed, Jun 27, 2012 at 8:39 AM, Stefan Hajnoczi = wrote: > > On Tue, Jun 26, 2012 at 8:34 PM, Marcelo Tosatti wrote: > >> On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote: > >> net.txt > >> -------- > >> > >> > >> iothread flow > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > >> 1) Skip-work-if-device-locked > >> > >> select(tap fd ready) > >> tap_send > >> =A0 =A0if (trylock(TAPState->NetClientState->dev)) > >> =A0 =A0 =A0 =A0proceed; > >> =A0 =A0else > >> =A0 =A0 =A0 =A0continue; (data remains in queue) > >> =A0 =A0tap_read_packet > >> =A0 =A0qemu_send_packet_async > >> > >> DRAWBACK: lock intensive. > >> DRAWBACK: VCPU threads can starve IOTHREAD (can be fixed with > >> a scheme such as trylock() marks a flag saying "iothread wants loc= k". > > > > One alternative is to say the lock cannot be held across system cal= ls > > or other long-running operations (similar to interrupt handler > > spinlocks in the kernel). =A0But QEMU threads are still at the merc= y of > > the scheduler or page faults, so perhaps this is not a good idea > > because waiting on the lock could tie up the iothread. > > > >> 2) Queue-work-to-vcpu-context > >> > >> select(tap fd ready) > >> tap_send > >> =A0 =A0if (trylock(TAPState->NetClientState->dev)) { > >> =A0 =A0 =A0 =A0proceed; > >> =A0 =A0} else { > >> =A0 =A0 =A0 =A0AddToDeviceWorkQueue(work); > >> =A0 =A0} > >> =A0 =A0tap_read_packet > >> =A0 =A0qemu_send_packet_async > >> > >> DRAWBACK: lock intensive > >> DRAWBACK: cache effects > > > > This almost suggests we should invert packet reception for NICs. =A0= tap > > fd ready should add a work item and the guest device calls into > > net/tap.c to pull out any packets from the work function: > > > > tap_send > > =A0 =A0dev_add_work(work); > > > > virtio_net_rx_work_fn > > =A0 =A0while ((req =3D virtqueue_pop(rx_queue))) { > > =A0 =A0 =A0 =A0if (!net_receive_packet(netclient, req->buf)) { > > =A0 =A0 =A0 =A0 =A0 =A0break; /* no more packets available */ > > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0/* ...mark req complete and push onto virtqueue... *= / > > =A0 =A0} > > =A0 =A0virtqueue_notify(rx_queue); > > > > The idea is to avoid copies on the rx path and make the locking sim= ple > > by always deferring to a work function (which can be invoked > > immediately if the device isn't locked). >=20 > I just realized this approach bypasses net/queue.c. I think it works > really well for the peer model (no "VLANs"). Basically flow control > is dictated by the peer (receiver) and we don't need to use NetQueue > anymore. Whether this works elegantly for slirp and other backends, > I'm not sure yet. >=20 > Stefan Right. The advantage is only backends where performance matters need to be converted (and only net hw drivers that matter performance wise need to be converted). Apparently you guys have very different ideas on the higher level model here. It would be good to agree on one structure (including modifications on the one above) to follow and share the work on that direction. Having competing implementations in this case is a waste of time.