From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751496AbdAPLOu (ORCPT ); Mon, 16 Jan 2017 06:14:50 -0500 Received: from mga04.intel.com ([192.55.52.120]:54729 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751074AbdAPLOr (ORCPT ); Mon, 16 Jan 2017 06:14:47 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,238,1477983600"; d="asc'?scan'208";a="31107237" From: Felipe Balbi To: Alan Stern Cc: Peter Zijlstra , "Paul E. McKenney" , Ingo Molnar , USB list , Kernel development list , Will Deacon Subject: Re: Memory barrier needed with wake_up_process()? In-Reply-To: References: Date: Mon, 16 Jan 2017 13:12:46 +0200 Message-ID: <87eg03mf69.fsf@linux.intel.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi, Alan Stern writes: > On Tue, 20 Sep 2016, Felipe Balbi wrote: > >> And here's trace output (complete, scroll to bottom). It seems to me >> like the thread didn't wake up at all. It didn't even try to >> execute. I'll add some more traces and try to get better information >> about what's going on. >>=20 >>=20 >>=20 >> # tracer: nop >> # >> # entries-in-buffer/entries-written: 2865/2865 #P:4 >> # >> # _-----=3D> irqs-off >> # / _----=3D> need-resched >> # | / _---=3D> hardirq/softirq >> # || / _--=3D> preempt-depth >> # ||| / delay >> # TASK-PID CPU# |||| TIMESTAMP FUNCTION >> # | | | |||| | | > > Skipping to the end... > >> irq/17-dwc3-2522 [002] d... 43.504199: bulk_out_complete: 0, 31= /31 >> file-storage-2521 [001] .... 43.504202: fsg_main_thread: get_nex= t_command -> 0 >> file-storage-2521 [001] .... 43.504288: fsg_main_thread: do_scsi= _command -> 0 >> file-storage-2521 [001] .... 43.504295: fsg_main_thread: finish_= reply -> 0 >> file-storage-2521 [001] .... 43.504298: fsg_main_thread: send_st= atus -> 0 >> irq/17-dwc3-2522 [002] d... 43.504347: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504351: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504434: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504438: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504535: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504539: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504618: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504703: bulk_in_complete: 0, 163= 84/16384 >> irq/17-dwc3-2522 [002] d... 43.504794: bulk_in_complete: 0, 13/= 13 >> irq/17-dwc3-2522 [002] d... 43.504797: bulk_out_complete: 0, 31= /31 > > Like you say, it appears that the thread didn't get woken up at all. > But this is inconsistent with your earlier results. On Sep. 9, you=20 > posted a message that ended with these lines: > >> irq/17-dwc3-3579 [003] d..1 21167.729666: bulk_out_complete: compl= : bh ffff880111e6aac0 state 1 >> file-storage-3578 [002] .... 21167.729670: fsg_main_thread: next: b= h ffff880111e6aac0 state 1 > > This indicates that in the earlier test, the thread did start running=20 > and get_next_command should have returned. > > The trace you posted after this one doesn't seem to show anything new,=20 > as far as I can tell. > > So I still can't tell what's happening. Maybe the patch below will=20 > help. It concentrates on the critical area. Sorry for the long delay, I finally have more information on this. All this time I was doing something that I never considered to matter: I've been running host and peripheral on the same machine. Now that I have tracepoints on xHCI as well, I could see that these 30 seconds of "nothing" is actuall full of xHCI activity and I can see that for the duration of these 30 seconds preempt depth on the CPU that (eventually) queues a request on dwc3, is always > 1 (sometimes 2, most of the time 1). My conclusion from that is that xHCI (or usbcore ?!?) locks the CPU and g_mass_storage is spinning for over 30 seconds at which point storage.ko (host side class driver) dequeues the request. I'll see if I can capture a fresh trace with both xHCI and dwc3 with this happening, but probably not today (testing stuff for -rc). =2D-=20 balbi --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEElLzh7wn96CXwjh2IzL64meEamQYFAlh8qq4ACgkQzL64meEa mQbcjg//diWcSuGKUjkt8pb6luxUylwNjhbIJB/3PktNqAFuyR7nM7yWNOp0I64r oucO6gRVPs9oOQYLkEDuCCrDPiS/IJ1S73rdnSmDjXNK0ghz9/E36mEfkdz7O4Ux z221lLk0bDYZtV/DA2IO7jrD5n4Fe0qbpmRQBepX9lhVhn/Oox91PB91o3XD/wTx 6xlgXFPy9egRVLvrxB7IWn9LQidP/qEsUIIoDo2D/7E1rlJT3V84Bh2OJBZtJqIO 4roFzO4Tmztt/7qUtTn+S51mHWpobrSSkfUaQbsUst8ZRhkSkYmKbzfXTm6WrZQj zoYPGVCj47DgaJ9eph2wPUOAQdRVSw2E9aHI1L5d/Kxcg45csfM1RikuvarDa0ai 3kjaLke+maEeDhtqUHaRCoW1sX14S6PkceIJYGjOsTQ81beS9clgBeklGisRNE1I +dFRiOKIiEGz5rkL/4q7Q1jp3Z10E480yLMplkso9FlhojkDEHb9v7U1pcaNlUUA p++DqUQG4kxwIKCtMh0LbnMshDHTeogeilig6XUqKJaYaA20v+nLTYPpxcEuehqj KwtIfI+urQD3UW3PY9viZAc27+FBSTGFBKTm85ZYiiyJ74ReRan/kdql584pkZV+ iyhd/bIzHDHHpeAhV7+GxdjIya87FB67ZTdeBuJX3Kg3POzZRIs= =YZeB -----END PGP SIGNATURE----- --=-=-=--