From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH] kvm: First step to push iothread lock out of inner run loop Date: Wed, 27 Jun 2012 08:09:31 -0300 Message-ID: <20120627110930.GA3590@amt.cnet> References: <4FE4F56D.1020201@web.de> <4FE4F7F5.7030400@web.de> <20120626193420.GA19852@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Liu Ping Fan , kvm , qemu-devel , Alexander Graf , Jan Kiszka , Avi Kivity , Anthony Liguori To: Stefan Hajnoczi Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On Wed, Jun 27, 2012 at 08:41:49AM +0100, Stefan Hajnoczi wrote: > On Wed, Jun 27, 2012 at 8:39 AM, Stefan Hajnoczi w= rote: > > On Tue, Jun 26, 2012 at 8:34 PM, Marcelo Tosatti wrote: > >> On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote: > >> net.txt > >> -------- > >> > >> > >> iothread flow > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > >> 1) Skip-work-if-device-locked > >> > >> select(tap fd ready) > >> tap_send > >> =A0 =A0if (trylock(TAPState->NetClientState->dev)) > >> =A0 =A0 =A0 =A0proceed; > >> =A0 =A0else > >> =A0 =A0 =A0 =A0continue; (data remains in queue) > >> =A0 =A0tap_read_packet > >> =A0 =A0qemu_send_packet_async > >> > >> DRAWBACK: lock intensive. > >> DRAWBACK: VCPU threads can starve IOTHREAD (can be fixed with > >> a scheme such as trylock() marks a flag saying "iothread wants lock". > > > > One alternative is to say the lock cannot be held across system calls > > or other long-running operations (similar to interrupt handler > > spinlocks in the kernel). =A0But QEMU threads are still at the mercy = of > > the scheduler or page faults, so perhaps this is not a good idea > > because waiting on the lock could tie up the iothread. > > > >> 2) Queue-work-to-vcpu-context > >> > >> select(tap fd ready) > >> tap_send > >> =A0 =A0if (trylock(TAPState->NetClientState->dev)) { > >> =A0 =A0 =A0 =A0proceed; > >> =A0 =A0} else { > >> =A0 =A0 =A0 =A0AddToDeviceWorkQueue(work); > >> =A0 =A0} > >> =A0 =A0tap_read_packet > >> =A0 =A0qemu_send_packet_async > >> > >> DRAWBACK: lock intensive > >> DRAWBACK: cache effects > > > > This almost suggests we should invert packet reception for NICs. =A0t= ap > > fd ready should add a work item and the guest device calls into > > net/tap.c to pull out any packets from the work function: > > > > tap_send > > =A0 =A0dev_add_work(work); > > > > virtio_net_rx_work_fn > > =A0 =A0while ((req =3D virtqueue_pop(rx_queue))) { > > =A0 =A0 =A0 =A0if (!net_receive_packet(netclient, req->buf)) { > > =A0 =A0 =A0 =A0 =A0 =A0break; /* no more packets available */ > > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0/* ...mark req complete and push onto virtqueue... */ > > =A0 =A0} > > =A0 =A0virtqueue_notify(rx_queue); > > > > The idea is to avoid copies on the rx path and make the locking simpl= e > > by always deferring to a work function (which can be invoked > > immediately if the device isn't locked). > > I just realized this approach bypasses net/queue.c. I think it works > really well for the peer model (no "VLANs"). Basically flow control > is dictated by the peer (receiver) and we don't need to use NetQueue > anymore. Whether this works elegantly for slirp and other backends, > I'm not sure yet. Right. The advantage is only backends where performance matters=20 need to be converted (and only net hw drivers that matter performance wise need to be converted).