From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44087) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TZNUk-0004t9-2z for qemu-devel@nongnu.org; Fri, 16 Nov 2012 10:06:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TZNUh-00017e-0R for qemu-devel@nongnu.org; Fri, 16 Nov 2012 10:06:14 -0500 Received: from thoth.sbs.de ([192.35.17.2]:16900) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TZNUg-000178-Mz for qemu-devel@nongnu.org; Fri, 16 Nov 2012 10:06:10 -0500 Message-ID: <50A6565F.1070708@siemens.com> Date: Fri, 16 Nov 2012 16:06:07 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <50A50806.5020309@redhat.com> <50A50CD7.4070104@siemens.com> <50A63F9F.7000309@redhat.com> In-Reply-To: <50A63F9F.7000309@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Hans de Goede Cc: Gerd Hoffmann , "qemu-devel@nongnu.org" On 2012-11-16 14:29, Hans de Goede wrote: > Hi, > > On 11/15/2012 04:40 PM, Jan Kiszka wrote: >> Hi Hans, >> >> On 2012-11-15 16:19, Hans de Goede wrote: >>> Hi Jan, >>> >>> I just saw your $subject patch in Gerd's usb-next tree, and I've a question >>> about it. The token should be enough to uniquely identify a device + ep, >>> and unless a guest uses multiple qhs for a singe ep, that _should_ be enough. >> >> But what disallows that the guest issues multiple requests (QH + series >> of TDs) for a single endpoint? I'm not finding any trace in the spec >> that disallows this. > > It may not be explicitly forbidden in the spec, but the whole text of the > spec implies a 1 on 1 relationship between qhs and eps, and this is how > all major guests do it. Was afraid of this... > >> And my special guest is stumbling over that limitation in QEMU. > > I guess you're not at liberty to disclose what guest this is ? > Hairy legacy code with additions I'm not yet able to share. > > >>> The reason why I want to know, is that identifying the queue by qh has a >>> disadvantage, to be precise I believe the following can then happen: >>> >>> 1) The guest queues up multiple requests for a queue >>> 2) We execute one, go async >>> 3) The guest changes it mind an unlinks the qh >>> 4) The guest will think the queue is cancelled after frindex has >>> changed by 1, but we keep it around for 64 frames (which sucks, >>> and I want to improve on this, but we need to for various reasons) >>> 5) The guests submits some new requests to the queue, using a >>> new qh (which possibly has a different address). >>> 6) We see the new qh, and execute a request from there >>> 7) The 1st request on the old qh completes, and we execute the next >>> 8) Now things are a mess as we're executing requests from the old >>> (cancelled from the guest pov) and new queue intermixed... >>> >>> Using the token to identify the queue fixes this, cause we will >>> find the same queue for the old and new qh, and uhci_queue_verify() >>> will then fail because of the qh addr change, and then we cancel >>> the old queue. IOW then we do the right thing. >> >> I was afraid of triggering some conflict but, as pointed out above, the >> current code also does something wrong. It associates two different >> active QH with the same queue and then triggers a failure for the first >> QH as it wrongly thinks the guest has unlinked it. >> >>> >>> So I'm wondering if there is another, potentially better fix for >>> what you are seeing? >> >> /me too. I'm not fully understanding your scenario above yet, >> specifically when and how the guest can correctly remove an active QH, > > A guest can unlink an active qh when it wants to cancel the requests > in there (ie a driver level timeout kicks in). This is a quite normal > thing to do (unlike using multiple qhs for a single ep). > > For ehci we can reliable detect this happening (except for interrupt > endpoints, sigh) thanks to the "doorbell" mechanism. But for uhci it > is harder, and things are further complicated by the fact that > Linux' full speed recovery code for bulk endpoints temporarily unlinks > and then relinks a pending requests to make some changes to the qh > while requests are in flight ... (bad Linux, bad!). > > So we rely on a combination of measures to detect cancels in the > qemu uhci code, of which the qh validation is one, and this gets > broken by your patch. How should unlink be defined here? A qh is no longer present in any chain of any frame? For how long? > >> so I cannot provide good suggestions at this point. > > I see 2 solutions at this point: > 1) You modify your special guest if you can > 2) We add a property to the uhci emulation to search for queues > by address rather then by token, which would of course default > to false. Well, this particular thing worked on real hw, so I thought it would be good to adjust QEMU. But as I'm struggling with my guest anyway, it may turn out that we will no longer need this in the end - at least in this case. Still, I'm with a bad feeling regarding the current heuristics for identifying invalidated queues. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux