From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:57161) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLDiT-0007nU-VM for qemu-devel@nongnu.org; Mon, 08 Oct 2012 09:50:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLDiN-0002mq-UH for qemu-devel@nongnu.org; Mon, 08 Oct 2012 09:49:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53977) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLDiN-0002mm-LH for qemu-devel@nongnu.org; Mon, 08 Oct 2012 09:49:47 -0400 Message-ID: <5072DA4C.8050708@redhat.com> Date: Mon, 08 Oct 2012 15:51:08 +0200 From: Hans de Goede MIME-Version: 1.0 References: <3321480.8UDes0xfFC@segfault.sh0n.net> <50606FDF.3070408@redhat.com> <10559125.MRDnL6POYS@segfault.sh0n.net> <2581372.ig9fx04ALR@segfault.sh0n.net> <5072B8A0.9060700@redhat.com> <20121008130125.GA3622@sig21.net> In-Reply-To: <20121008130125.GA3622@sig21.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] EHCI USB regression in 1.2.0 - ehci_state_fetchqtd() asserting List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Johannes Stezenbach Cc: Shawn Starr , qemu-devel@nongnu.org, gerd@kraxel.org Hi, On 10/08/2012 03:01 PM, Johannes Stezenbach wrote: > Hi Hans, > > On Mon, Oct 08, 2012 at 01:27:28PM +0200, Hans de Goede wrote: >> On 10/02/2012 05:26 PM, Shawn Starr wrote: >>> >>> Reopening this issue with usb-host stalling now >>> >>> ehci warning: guest updated active QH >>> USBDEVFS_DISCARDURB: Invalid argument >>> USBDEVFS_DISCARDURB: Invalid argument >>> husb: leaking iso urbs because of discard failure >>> >>> >>> Now with qemu-XXX-1.2.0-12.fc18.x86_64 >>> >>> if I have webcam open, it will stall and not resume. This is with usb-host >>> directly. >>> >>> Shall I enable debugging again? >> >> Hmm, this likely is caused by too high latencies in your system, >> which are caused in turn I believe by you running an F-18 kernel which >> has various debugging options enabled inside the kernel which can >> cause significant latencies. I've spend 1.5 days tracing this very >> same issue down in the past. So please first of all make sure that you're >> running a kernel without debugging options enabled, either the latest >> F-18 build from koji: >> http://koji.fedoraproject.org/koji/buildinfo?buildID=358570 >> >> or an F-17 kernel, almost all the F-18 "rc" kernels have debugging enabled >> and thus cause significant latency issues. >> >> If you can reproduce this with a kernel without the debugging options, >> then we can investigate this further. > > By changing the kernel, don't you just make the issue harder to reproduce? > I mean Linux isn't real-time so any kernel can show latency spikes > and it's a show-stopper if iso transfers stall instead of just > dropping some packets. > > There will always be a race between the call to USBDEVFS_DISCARDURB > and the URB completing. IMHO the handling in usb_host_stop_n_free_iso() > is buggy. How about dropping the "killed" and "free" variables and > calling async_complete() and g_free() unconditionally? This race is well known already handled correctly, the real problem is the "ehci warning: guest updated active QH" message, which most likely indicates that the guest has hit the doorbell (IAAD) in the EHCI controller, and then has not gotten an IAA interrupt within a certain amount of time triggering its IAAD watchdog (some real EHCI hardware is broken wrt delivering IAA interrupt) causing us to not see an unlinked qh as unlinked, and then later on triggering the "warning: guest updated active QH" message. This is unavoidable when we get too large latencies, the ehci hardware simple was not designed to be virtualized, anything but actually. Regards, Hans