From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73E7D15687D for ; Sat, 18 Apr 2026 09:21:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776504115; cv=none; b=LXGl9/pPWnYy3PvbcBGM3KAYI15P8hG06DYSW22IWiYeeTO09WNt++p5qVDFA1BDdT+krhyflAL77gOu/96uj2Z1HhqS48+QEpz/7ldX7w26Y5RypheL3/RAzPfMz/eyx+pOyO0j/MeenyNqfxKdUAD9FbpRPA2AUIkN5A+JOV4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776504115; c=relaxed/simple; bh=pYakLwD3myC3DThRrfjzfIYZFjg7fj7FEp4hjTaV6Vs=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jEXNQD0Niu/enP5DDReQpMJ/1aeZH3CroTElr0gysuVCAeiK1lh0VVv80TcrZ/K1rl5DiE8QJ6qaDuXLnyNJzFEpWTZoxJAYvntvYXRVRttS3pRY5QzKM1XDzoT/MFm15mguLrjSCbxHGqktWTBuTQN/cBJg0yG71XH3AMSJ0lY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=s6OWFDKW; arc=none smtp.client-ip=209.85.167.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="s6OWFDKW" Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-5a2c7427ad9so1498014e87.1 for ; Sat, 18 Apr 2026 02:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776504112; x=1777108912; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=+r2puGmgiHoTOu/R0hhpZ2zLxigJWBKLgmduat9CRS4=; b=s6OWFDKWIZHIm0d5KJlSYi60k0kCKsiLt2m16fDgNqmjbj4TsQ9kY0kEmf3Uk1avS0 PHnNbZPP1pfAJrKlBTpDbQ92K1e5ScUy8GK7ZWShsTQt8JCxwzvdKQnSxWXhazJn2K7V +iIiK18u0BW9zJm407fVI0N7a8xnCzf/HzM5Rricz/F8yzYfshw6TS/aOqWkuT0m5CvX orrZdzQa7oexQurSjVNGKFQLRFRFVrXiibos+suuqCzAzrA1Ngqxe/ero41i+wZtMuPJ P8mN3UWJ/F/ni+WjljMP8O2pm5uWFXaI8SoYDqmgwSBall+VxIOM3Xi5D8tC6yrhvdBq 5nYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776504112; x=1777108912; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=+r2puGmgiHoTOu/R0hhpZ2zLxigJWBKLgmduat9CRS4=; b=KDuX5gQzpkT02uxKRzr6cPsQVkZGVxHs+mcM5L2vGHk0PaCeGN11B90wEo4GWxjw8y KPwXMlDv+AhjtohF6n3UnahMCDwqueWqlrzKBkS8Ks8y1W4xVKoYEf0qYcur7je+JcyL tvFIqmWSLGgcbfBmnKUJZMijtFIIFSBCaEnblPLKIkxpSclODpKs7Zuq22sIpP1u5o0c zfj/OJsYO7AzudsyMhqrJT1L1Rf7qeFuER6qyXVvr4nR6rmeKm3nvZDlLVzkPEWl+EQZ Z1DRv13R+e0EzG5hvdYO7IdwaSkm5zmoEMVn26DZzhtjo2t25SwYQaocEJhiH0pVH9ab gRDA== X-Forwarded-Encrypted: i=1; AFNElJ+qVQ9WZG1eLt3qvcXopLhQ/U8nmjYwwmpHTO26AnQ2560BTDE2Qbq//dna9jx0vIzmIsdDrvflIJU=@vger.kernel.org X-Gm-Message-State: AOJu0Yxopmz4EcvM/xKz7eTIZS9ZdMZqkXk6NI3PVEikWJlnpGBwcGZd 5gWKuwsTazjiafFtdJEIVkv0hdPykAKU4UQjxMMihGkR7cPJOk8wV9DjRuaLhw== X-Gm-Gg: AeBDieucKyaudV/myQ2+OwQFiN7BLh93nJp266FuDozL3IV99zTm3WpQUwHbfrg/JR2 SMIj/LwPNbs7OaBRdVxVpXaQ1WHqtD6BYVw+ja5pjDy1VODV8DNSi8I395oq9p/s/Ua53wBbjv/ uLLVJF/pImnm9P5KBrtDeuYDn+UWHc3vg3uVgErUMkFgdA/nsd6sIAoUHDvPVzSeqVFVfq8FwmB oRZoBop5MEuHRqfTdg54NflhOZOTa5gjIfuJN5s1kRZ9Nlio/Y6TWa9hHpj32pUbLF8TSPLbn4L ck+2z2MlpHn1FIBSOpzKoRuc8pcxiJbNX2AAU5uDxq8FOON414uaW0V8NADjEClrkeS71lalcul ww4m2We+zF6JHNIfOGc5B8SXl1aq4h0qKrzh4+bwdBfZQPhvIt7YRY7gc7pyeUwLQhuqvwqwfN3 ZMTreGM5J+aZq7ubRvUYdenESA10tMAnhgIV5t7f/Ks0tAUA== X-Received: by 2002:a05:6512:31d6:b0:5a3:cd67:9e25 with SMTP id 2adb3069b0e04-5a4172e31a2mr2038889e87.45.1776504111235; Sat, 18 Apr 2026 02:21:51 -0700 (PDT) Received: from foxbook (bfi125.neoplus.adsl.tpnet.pl. [83.28.46.125]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5a4185ad12asm1198689e87.1.2026.04.18.02.21.50 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sat, 18 Apr 2026 02:21:50 -0700 (PDT) Date: Sat, 18 Apr 2026 11:21:46 +0200 From: Michal Pecio To: Alan Stern Cc: Mathias Nyman , Thinh Nguyen , "linux-usb@vger.kernel.org" , "oneukum@suse.com" , "niklas.neronin@linux.intel.com" Subject: Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Message-ID: <20260418112146.3ae60b58.michal.pecio@gmail.com> In-Reply-To: References: <20260404204133.3mcizeeokw3ln5r4@synopsys.com> <243af5f2-3925-4960-be7b-8d0c273ae629@rowland.harvard.edu> <20260404221533.woepax7jxwefy3fq@synopsys.com> <20260404222818.t5y52gnd2gvalvp5@synopsys.com> <20260405030954.32jbg3fphi5xdla3@synopsys.com> <74ac9ea2-34d1-4999-9048-c03a0f978b5d@rowland.harvard.edu> <65682e07-e18c-4674-bfa7-2cc27abb5ede@linux.intel.com> <4a484a89-f52a-48c2-af43-c0029878ddaf@rowland.harvard.edu> <20260417234846.41a24089.michal.pecio@gmail.com> Precedence: bulk X-Mailing-List: linux-usb@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 17 Apr 2026 22:34:58 -0400, Alan Stern wrote: > Okay, good, we'll require all HCDs to reset control endpoints > automatically after every error and stall. Are they not doing it? Say that something like lsusb encounters protocol stall while another URB from the class driver is pending, will the other URB time out just because host endpoint halted on an earlier one? > > Currently, by the time the URB is given back, its endpoint is > > already in a "stopped but runnable" state and its sequence state is > > zeroed. And it may have already been restarted if there are more > > pending URBs. > > Ah, I was going to ask about that. This will be different from the > way bulk and interrupt endpoints will behave, but I think it is > acceptable. Control endpoints aren't used for anything that requires > high throughput; if a driver wants an error to prevent later > transfers from starting right away then it can simply avoid > submitting those later transfers until the earlier ones have > completed. Or it could unlink if the async giveback race is fixed with a new callback separate from endpoint_reset(), but IDK if any demand exists. Same thing with "chain unlinking" - unlink one URB and expect others not to execute so that unlink completion can unlink them later. Looks odd, but it's guaranteed by kerneldocs. And currently broken. > > > Recovery from a transaction error on a bulk or interrupt endpoint > > > involves sending a Clear-Halt request to the device. But if the > > > error was caused by some sort of transient interference that > > > hasn't ended yet, the Clear-Halt might itself fail with the same > > > error. To handle this, we should retry the Clear-Halt at > > > increasing time intervals. At what point should the core give up? > > > > Good question, I don't know. One thing I noticed is that Windows > > does tend to lose patience with completely unresponsive devices and > > kicks them out, but I don't know the exact criteria. > > Two reasonable possibilities are 250 ms (because that's about how > long an intermediate hub might take to notify the core about a > disconnect) or 5 seconds (the normal timeout for control transfers). > Of course, 5 seconds is an awfully long time to wait for a mouse or > keyboard to recover, so maybe something in between would be best. What happens after giving up? If control requests don't work, most likely nothing works anyway. Reset may work, or not if it's bad cable. Retrying too long may cause class drivers to time out on pending URBs, not sure if it matters. Drivers may have no way to distinguish this from any other timeout, not sure if this matters either. > I will set things up so that an extraneous clear-halt (such as one > submitted by the driver) will prevent the core from doing its own. > This leaves the possibility of the core clearing the halt and > restarting the endpoint and then the driver doing it again, while the > endpoint is running and the queue is nonempty. Hopefully drivers > avoid doing this. Yes, that's just dodgy, what would such driver even expect to happen? An URB may be in progress and then what? On xHCI we would need to throw out this URB, so it simply isn't supported. > But if it helps, I could print a warning if usb_clear_halt() is > called for an endpoint that isn't stopped and has a nonempty queue. Not sure what core considers a "stopped" endpoint. FYI, xhci-hcd logs dev_err() when reset is attempted while URBs are running. > > A related issue is clearing TT buffers. AFAIK this has no retries, > > it fails silently and leaves the endpoint potentially broken, and > > it is waited for to complete in case of usb_set_interface(). > > Is there anything we can do besides calling usb_clear_halt() and > usb_reset_endpoint()? If not, and data loss is unavoidable, then so > be it. If this "clear-halt by usbcore" materializes and survives confrotation with the real world, it could make sense to look into combining TT clearing with it. It's a similar thing, but tracked separately now. One thing that could reduce data loss is never giving up on those control requests, or resetting/disconnecting the device if giving up. It's a general problem that control requests can fail and nobody has much idea what to do then. Some drivers ignore errors. If the device returns to operation, it may end up running in an unknown state. This is apparently rare enough that nobody complains, though on low- and full-speed it's relatively easy to produce artificially with a particularly defective cable. Regards, Michal