From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752379AbcBJRI1 (ORCPT ); Wed, 10 Feb 2016 12:08:27 -0500 Received: from mga03.intel.com ([134.134.136.65]:33089 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751536AbcBJRIZ (ORCPT ); Wed, 10 Feb 2016 12:08:25 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,426,1449561600"; d="scan'208";a="881367098" Message-ID: <56BB6FC2.8090804@linux.intel.com> Date: Wed, 10 Feb 2016 19:13:38 +0200 From: Mathias Nyman User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: Chris Bainbridge , gregkh@linuxfoundation.org CC: johan@kernel.org, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, stern@rowland.harvard.edu Subject: Re: [PATCH] usb: core: hub: hub_port_init lock controller instead of bus References: <1454939360-7947-1-git-send-email-chris.bainbridge@gmail.com> In-Reply-To: <1454939360-7947-1-git-send-email-chris.bainbridge@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08.02.2016 15:49, Chris Bainbridge wrote: > The XHCI controller presents two USB buses to the system - one for USB 2 > and one for USB 3. When only one bus is locked there is a race condition > with two threads in hub_port_init: > > [ 8.984500] Call Trace: > [ 8.985698] [] schedule+0x37/0x90 > [ 8.986918] [] usb_kill_urb+0x8d/0xd0 > [ 8.988130] [] ? wake_up_atomic_t+0x30/0x30 > [ 8.989343] [] usb_start_wait_urb+0xbe/0x150 > [ 8.990561] [] usb_control_msg+0xbc/0xf0 > [ 8.991766] [] hub_port_init+0x51e/0xb70 > [ 8.992964] [] hub_event+0x817/0x1570 > [ 8.994156] [] process_one_work+0x1ff/0x620 > [ 8.995342] [] ? process_one_work+0x15f/0x620 > [ 8.996528] [] worker_thread+0x64/0x4b0 > [ 8.997707] [] ? rescuer_thread+0x390/0x390 > [ 8.998883] [] kthread+0x105/0x120 > [ 9.000056] [] ? kthread_create_on_node+0x200/0x200 > [ 9.001241] [] ret_from_fork+0x3f/0x70 > [ 9.002420] [] ? kthread_create_on_node+0x200/0x200 > > [ 9.870837] Call Trace: > [ 9.875664] [] xhci_setup_device+0x53d/0xa40 > [ 9.876871] [] xhci_address_device+0xe/0x10 > [ 9.878068] [] hub_port_init+0x1bf/0xb70 > [ 9.879262] [] ? trace_hardirqs_on+0xd/0x10 > [ 9.880465] [] hub_event+0x817/0x1570 > [ 9.881668] [] process_one_work+0x1ff/0x620 > [ 9.882869] [] ? process_one_work+0x15f/0x620 > [ 9.884074] [] worker_thread+0x64/0x4b0 > [ 9.885268] [] ? rescuer_thread+0x390/0x390 > [ 9.886457] [] kthread+0x105/0x120 > [ 9.887634] [] ? kthread_create_on_node+0x200/0x200 > [ 9.888817] [] ret_from_fork+0x3f/0x70 > [ 9.889995] [] ? kthread_create_on_node+0x200/0x200 > > Which results from the two call chains: > > hub_port_init > usb_get_device_descriptor > usb_get_descriptor > usb_control_msg > usb_internal_control_msg > usb_start_wait_urb > usb_submit_urb / wait_for_completion_timeout / usb_kill_urb > > hub_port_init > hub_set_address > xhci_address_device > xhci_setup_device > > The logged kernel errors are: > > [ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command > [ 13.183701] usb 3-3: device descriptor read/all, error -110 > > On a test system this failure occurred on 6% of all boots. > > Hypothetically, it should perhaps be possible to set the address of the > hub on one bus while the hub on the other bus already has an outstanding > descriptor read request. But as this is not working reliably, fix it by > locking the controller in hub_port_init to prevent parallel > initialisation of both buses. > Most likely xhci is messed up after two device slots are in default state at the same time. This happens when both threads are in hub_port_init() have called hub_port_reset() The issue becomes visible when the the descriptor read and set address both fail after the port resets. xhci specs 4.5.3 has one tiny note about this: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So to me, and from xhci pov this patch looks like the correct solution, but I might be missing some usb core side details. -Mathias