From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Date: Tue, 16 Jan 2018 10:09:19 +0000 From: Lorenzo Pieralisi To: Marek Behun Cc: Bjorn Helgaas , linux-pci@vger.kernel.org, Thomas Petazzoni , marc.zyngier@arm.com Subject: Re: pci-aardvark ath9k arm64 issues Message-ID: <20180116100919.GA12032@red-moon> References: <20180113002234.0ae646d6@nic.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180113002234.0ae646d6@nic.cz> List-ID: [+cc Marc, for his information] On Sat, Jan 13, 2018 at 12:22:34AM +0100, Marek Behun wrote: > Hello, > > we are having a CPU stall issue with ath9k driver on pci-aardvark > (Marvell Armada 3720 (arm64)). > > The ath9k driver loads correctly and the interface connects and for > some time it works correctly, but then CPU stalls and kernel dumps > self-detected stall on CPU. > > I don't know if this is issue with aardvark or whole arm64 (someone had > a similar problem, see > https://www.spinics.net/lists/linux-wireless/msg157038.html ), but > ath10k doesn't have this problem. > > I am attaching the rcu_sched stall detection backtrace (although I > don't know if contains needed information). > > Can you please point me how to debug/solve this issue? I do not think this is arm64 related - IMO it is host bridge related and I do not have HW to test this (and Marvell Armada 3720 datasheets). I think pci-aardvark is another host bridge that needs IRQ handling updates according to: https://marc.info/?l=linux-pci&m=151517416712010&w=2 The sooner we convert the host bridges to use the right API the better, at least we will be able to fix these bugs more quickly. I hope Thomas can help you have a look into this. Thanks, Lorenzo > Thank you. > > Marek Behun > [ 265.283187] INFO: rcu_sched self-detected stall on CPU > [ 265.288346] 0-...: (2100 ticks this GP) idle=c1e/2/0 softirq=3408/3408 fqs=0 > [ 265.295896] (t=2100 jiffies g=846 c=845 q=8) > [ 265.300492] rcu_sched kthread starved for 2100 jiffies! g846 c845 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > [ 265.310932] rcu_sched I 0 8 2 0x00000000 > [ 265.316333] Call trace: > [ 265.319134] [] __switch_to+0x84/0x98 > [ 265.324168] [] __schedule+0x1f0/0x4c0 > [ 265.329565] [] schedule+0x2c/0x88 > [ 265.334694] [] schedule_timeout+0x134/0x288 > [ 265.340454] [] rcu_gp_kthread+0x434/0x728 > [ 265.346303] [] kthread+0xfc/0x128 > [ 265.351434] [] ret_from_fork+0x10/0x18 > [ 265.356927] Task dump for CPU 0: > [ 265.360340] swapper/0 R running task 0 0 0 0x00000002 > [ 265.367450] Call trace: > [ 265.369885] [] dump_backtrace+0x0/0x380 > [ 265.375820] [] show_stack+0x14/0x20 > [ 265.380954] [] sched_show_task+0x13c/0x160 > [ 265.386892] [] dump_cpu_task+0x40/0x50 > [ 265.392024] [] rcu_dump_cpu_stacks+0x98/0xd8 > [ 265.398500] [] rcu_check_callbacks+0x61c/0x7e0 > [ 265.404628] [] update_process_times+0x2c/0x58 > [ 265.410481] [] tick_sched_handle.isra.5+0x34/0x50 > [ 265.417316] [] tick_sched_timer+0x40/0x90 > [ 265.422723] [] __hrtimer_run_queues+0xe8/0x160 > [ 265.428934] [] hrtimer_interrupt+0xa0/0x220 > [ 265.435142] [] arch_timer_handler_phys+0x30/0x40 > [ 265.441173] [] handle_percpu_devid_irq+0x78/0x128 > [ 265.447922] [] generic_handle_irq+0x24/0x38 > [ 265.453863] [] __handle_domain_irq+0x5c/0xb8 > [ 265.460074] [] gic_handle_irq+0xfc/0x1c4 > [ 265.465660] Exception stack(0xffffff8008003d80 to 0xffffff8008003ec0) > [ 265.472324] 3d80: 0000000000000000 ffffff80089b7c00 00000000ffffea38 000000401776b000 > [ 265.480066] 3da0: 000000000000001f ffffff8008a10000 000000002974debf 0000000000000000 > [ 265.488342] 3dc0: 0000000000000040 ffffff8008943e80 0000000000000880 0000000000000000 > [ 265.496529] 3de0: 0000000000000001 0000000000000000 0000000000000000 0000000000000010 > [ 265.504718] 3e00: ffffff80081a5708 0000007f9431e328 000000000000001c ffffff8008945000 > [ 265.512728] 3e20: fffffffffffffff8 ffffff80088584b0 0000000000000001 ffffff80089b7c00 > [ 265.520649] 3e40: 000000000000003d ffffff8008857000 ffffff8008857000 0000000000000040 > [ 265.529015] 3e60: ffffff8008950800 ffffff8008003ec0 ffffff80080a73f8 ffffff8008003ec0 > [ 265.537111] 3e80: ffffff8008080f74 0000000040000145 0000000000000001 ffffffc01e9e8600 > [ 265.545120] 3ea0: 0000008000000000 0000000000000001 ffffff8008003ec0 ffffff8008080f74 > [ 265.553310] [] el1_irq+0xb0/0x140 > [ 265.558264] [] __do_softirq+0xac/0x208 > [ 265.563665] [] irq_exit+0xc8/0x100 > [ 265.568704] [] __handle_domain_irq+0x60/0xb8 > [ 265.574911] [] gic_handle_irq+0xfc/0x1c4 > [ 265.580759] Exception stack(0xffffff8008943dd0 to 0xffffff8008943f10) > [ 265.587153] 3dc0: 0000000000000000 0000000000000000 > [ 265.595431] 3de0: 0000000000000001 0000000000000000 ffffff8008943f10 000000401776b000 > [ 265.603083] 3e00: 0000000000000001 ffffff80089484e0 0000000000000000 ffffff8008943e80 > [ 265.611270] 3e20: 0000000000000880 0000000000000000 0000000000000001 0000000000000000 > [ 265.619815] 3e40: 0000000000000000 0000000000000010 ffffff80081a5708 0000007f9431e328 > [ 265.627651] 3e60: 000000000000001c ffffff8008857000 ffffff8008949930 ffffff8008949000 > [ 265.636017] 3e80: ffffff800885ea88 0000000000000000 0000000000000000 ffffff8008950800 > [ 265.644030] 3ea0: 0000000000000000 000000001ff26364 0000000000820018 ffffff8008943f10 > [ 265.651953] 3ec0: ffffff8008084a64 ffffff8008943f10 ffffff8008084a68 0000000060000145 > [ 265.660054] 3ee0: ffffffc01ffffb00 ffffff800884f028 ffffffffffffffff 0000000000000000 > [ 265.668420] 3f00: ffffff8008943f10 ffffff8008084a68 > [ 265.673020] [] el1_irq+0xb0/0x140 > [ 265.678238] [] arch_cpu_idle+0x10/0x18 > [ 265.683640] [] do_idle+0x10c/0x1a0 > [ 265.688678] [] cpu_startup_entry+0x24/0x28 > [ 265.694887] [] rest_init+0xac/0xb8 > [ 265.700107] [] start_kernel+0x390/0x3a4