From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:44087)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1TZNUk-0004t9-2z
	for qemu-devel@nongnu.org; Fri, 16 Nov 2012 10:06:17 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1TZNUh-00017e-0R
	for qemu-devel@nongnu.org; Fri, 16 Nov 2012 10:06:14 -0500
Received: from thoth.sbs.de ([192.35.17.2]:16900)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1TZNUg-000178-Mz
	for qemu-devel@nongnu.org; Fri, 16 Nov 2012 10:06:10 -0500
Message-ID: <50A6565F.1070708@siemens.com>
Date: Fri, 16 Nov 2012 16:06:07 +0100
From: Jan Kiszka <jan.kiszka@siemens.com>
MIME-Version: 1.0
References: <50A50806.5020309@redhat.com> <50A50CD7.4070104@siemens.com>
	<50A63F9F.7000309@redhat.com>
In-Reply-To: <50A63F9F.7000309@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] "usb: uhci: Look up queue by address, not token"
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On 2012-11-16 14:29, Hans de Goede wrote:
> Hi,
> 
> On 11/15/2012 04:40 PM, Jan Kiszka wrote:
>> Hi Hans,
>>
>> On 2012-11-15 16:19, Hans de Goede wrote:
>>> Hi Jan,
>>>
>>> I just saw your $subject patch in Gerd's usb-next tree, and I've a question
>>> about it. The token should be enough to uniquely identify a device + ep,
>>> and unless a guest uses multiple qhs for a singe ep, that _should_ be enough.
>>
>> But what disallows that the guest issues multiple requests (QH + series
>> of TDs) for a single endpoint? I'm not finding any trace in the spec
>> that disallows this.
> 
> It may not be explicitly forbidden in the spec, but the whole text of the
> spec implies a 1 on 1 relationship between qhs and eps, and this is how
> all major guests do it.

Was afraid of this...

> 
>> And my special guest is stumbling over that limitation in QEMU.
> 
> I guess you're not at liberty to disclose what guest this is ?
> 

Hairy legacy code with additions I'm not yet able to share.

> <snip>
> 
>>> The reason why I want to know, is that identifying the queue by qh has a
>>> disadvantage, to be precise I believe the following can then happen:
>>>
>>> 1) The guest queues up multiple requests for a queue
>>> 2) We execute one, go async
>>> 3) The guest changes it mind an unlinks the qh
>>> 4) The guest will think the queue is cancelled after frindex has
>>> changed by 1, but we keep it around for 64 frames (which sucks,
>>> and I want to improve on this, but we need to for various reasons)
>>> 5) The guests submits some new requests to the queue, using a
>>> new qh (which possibly has a different address).
>>> 6) We see the new qh, and execute a request from there
>>> 7) The 1st request on the old qh completes, and we execute the next
>>> 8) Now things are a mess as we're executing requests from the old
>>> (cancelled from the guest pov) and new queue intermixed...
>>>
>>> Using the token to identify the queue fixes this, cause we will
>>> find the same queue for the old and new qh, and uhci_queue_verify()
>>> will then fail because of the qh addr change, and then we cancel
>>> the old queue. IOW then we do the right thing.
>>
>> I was afraid of triggering some conflict but, as pointed out above, the
>> current code also does something wrong. It associates two different
>> active QH with the same queue and then triggers a failure for the first
>> QH as it wrongly thinks the guest has unlinked it.
>>
>>>
>>> So I'm wondering if there is another, potentially better fix for
>>> what you are seeing?
>>
>> /me too. I'm not fully understanding your scenario above yet,
>> specifically when and how the guest can correctly remove an active QH,
> 
> A guest can unlink an active qh when it wants to cancel the requests
> in there (ie a driver level timeout kicks in). This is a quite normal
> thing to do (unlike using multiple qhs for a single ep).
> 
> For ehci we can reliable detect this happening (except for interrupt
> endpoints, sigh) thanks to the "doorbell" mechanism. But for uhci it
> is harder, and things are further complicated by the fact that
> Linux' full speed recovery code for bulk endpoints temporarily unlinks
> and then relinks a pending requests to make some changes to the qh
> while requests are in flight ... (bad Linux, bad!).
> 
> So we rely on a combination of measures to detect cancels in the
> qemu uhci code, of which the qh validation is one, and this gets
> broken by your patch.

How should unlink be defined here? A qh is no longer present in any
chain of any frame? For how long?

> 
>> so I cannot provide good suggestions at this point.
> 
> I see 2 solutions at this point:
> 1) You modify your special guest if you can
> 2) We add a property to the uhci emulation to search for queues
> by address rather then by token, which would of course default
> to false.

Well, this particular thing worked on real hw, so I thought it would be
good to adjust QEMU. But as I'm struggling with my guest anyway, it may
turn out that we will no longer need this in the end - at least in this
case. Still, I'm with a bad feeling regarding the current heuristics for
identifying invalidated queues.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux