From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail2.candelatech.com ([208.74.158.173]:44366 "EHLO mail2.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752680AbcLBB4G (ORCPT ); Thu, 1 Dec 2016 20:56:06 -0500 To: "linux-wireless@vger.kernel.org" , ath10k From: Ben Greear Subject: ath10k crash due to funky-ness in mac80211 restart logic. Message-ID: <39f7970a-de68-7ffe-e6a0-53f044ad4488@candelatech.com> (sfid-20161202_025609_367828_0B4CF0A1) Date: Thu, 1 Dec 2016 17:56:04 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: I spent the day looking at strange issues related to ath10k firmware crashing in my hacked 3.7 kernel. Here is one of the things I noticed. When firmware crashes, soon after mac80211 tries to restart things, and after a bit of work, it starts adding interfaces again. In the ath10k_add_interface method, we first memzero the arvif, and then a bit later assign arvif->ar Between that memzero and assigning arvif->ar, I was seeing ath10k_wmi_tx_beacon_nowait called, with the half-built arvif. This promptly crashed since it needs arvif->ar to be properly assigned. Maybe we need to remove the SDATA_IN_DRIVER flag very early in the mac80211 code that deals with the driver restarting it's firmware so that the active iterator does not use those devices? I added some debugging code: ath10k_pci 0000:05:00.0: Initializing arvif: ffff8801ce97e320 on vif: ffff8801ce97e1d8 [the print that happens after arvif->ar is assigned is not shown, so code did not make it that far before the tx-beacon-nowait method was called] tx-beacon-nowait: arvif: ffff8801ce97e320 ar: (null) arvif->magic: 0x87560001 ------------[ cut here ]------------ kernel BUG at /home/greearb/git/linux-4.7.dev.y/drivers/net/wireless/ath/ath10k/wmi.c:1781! invalid opcode: 0000 [#1] PREEMPT SMP KASAN Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 bridge carl9170 mac80211_hwsim ath10k_pci ath10k_core ath5k ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 8021q garp mrp stp llc bnep bluetooth fuse macvlan pktgen rpcsec_gss_krb5 nfsv4 nfs fscache snd_hda_codec_hdmi coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic kvm iTCO_wdt irqbypass iTCO_vendor_support joydev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device pcspkr snd_pcm snd_timer shpchp snd i2c_i801 lpc_ich soundcore tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 serio_raw i2c_algo_bit drm_kms_helper ata_generic e1000e pata_acpi drm ptp pps_core i2c_core fjes video ipv6 [last unloaded: nf_conntrack] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.10+ #15 Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013 task: ffff8801d4f20000 ti: ffff8801d4f28000 task.ti: ffff8801d4f28000 RIP: 0010:[] [] ath10k_wmi_tx_beacons_iter+0x28b/0x290 [ath10k_core] RSP: 0018:ffff8801d6447a98 EFLAGS: 00010293 RAX: 0000000000000018 RBX: ffff8801ce97e1d8 RCX: 0000000000000000 RDX: 0000000000000018 RSI: 0000000000000003 RDI: ffffed003ac88f49 RBP: ffff8801d6447af0 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff8801ce97e320 R14: ffff8801ce97e378 R15: ffff8801ce97ca40 FS: 0000000000000000(0000) GS:ffff8801d6440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007eff191ef1ab CR3: 000000000260a000 CR4: 00000000001406e0 Stack: 1ffff1003ac88f59 0000000041b58ab3 ffffffffa0f4d52a ffff8801d4f20000 0000000000000246 0000000000000002 ffff8801ce97e1d8 ffff8801bd5d39b8 0000000000000002 0000000000000001 ffff8801ce97ca40 ffff8801d6447b48 Call Trace: [] __iterate_interfaces+0xfc/0x1d0 [mac80211] [] ? ath10k_wmi_cmd_send_nowait+0x260/0x260 [ath10k_core] [] ? ath10k_wmi_cmd_send_nowait+0x260/0x260 [ath10k_core] [] ieee80211_iterate_active_interfaces_atomic+0x67/0x100 [mac80211] [] ? ieee80211_handle_reconfig_failure+0x140/0x140 [mac80211] [] ? ath10k_tpc_config_disp_tables+0x620/0x620 [ath10k_core] [] ath10k_wmi_op_ep_tx_credits+0x2b/0x50 [ath10k_core] [] ath10k_htc_rx_completion_handler+0x422/0x5c0 [ath10k_core] [] ath10k_pci_process_rx_cb+0x37e/0x430 [ath10k_pci] [] ? ath10k_htc_build_tx_ctrl_skb+0xc0/0xc0 [ath10k_core] [] ? ath10k_pci_rx_post_pipe+0x550/0x550 [ath10k_pci] [] ? debug_lockdep_rcu_enabled+0x35/0x40 [] ? mark_held_locks+0x23/0xc0 [] ? __local_bh_enable_ip+0x6a/0xd0 [] ? trace_hardirqs_on_caller+0x18b/0x290 [] ? trace_hardirqs_on+0xd/0x10 [] ? __local_bh_enable_ip+0x6a/0xd0 [] ? _raw_spin_unlock_bh+0x30/0x40 [] ? ath10k_ce_per_engine_service+0xee/0x100 [ath10k_pci] [] ath10k_pci_htt_htc_rx_cb+0x29/0x30 [ath10k_pci] [] ath10k_ce_per_engine_service+0xa6/0x100 [ath10k_pci] [] ath10k_ce_per_engine_service_any+0xd6/0xf0 [ath10k_pci] [] ? ath10k_pci_enable_legacy_irq+0xe0/0xe0 [ath10k_pci] [] ath10k_pci_tasklet+0x5f/0xb0 [ath10k_pci] [] tasklet_action+0x245/0x2b0 [] __do_softirq+0x181/0x595 [] irq_exit+0xbc/0xc0 [] do_IRQ+0x7c/0x150 [] common_interrupt+0x8c/0x8c [] ? trace_hardirqs_on_caller+0x18b/0x290 [] ? cpuidle_enter_state+0x1ae/0x4b0 [] ? cpuidle_enter_state+0x1a7/0x4b0 [] cpuidle_enter+0x12/0x20 [] call_cpuidle+0x4e/0x90 [] cpu_startup_entry+0x3f7/0x540 [] ? default_idle_call+0x50/0x50 [] ? clockevents_config_and_register+0x5f/0x70 [] ? setup_APIC_timer+0xfa/0x110 [] start_secondary+0x253/0x2b0 [] ? set_cpu_sibling_map+0x920/0x920 Code: 4d 49 e0 8b b3 48 01 00 00 48 c7 c7 a0 ee f3 a0 e8 d9 c2 3f e0 49 81 fd 3f 1f 00 00 76 0f 49 81 fc 3f 1f 00 00 0f 87 c0 fd ff ff <0f> 0b 0f 0b 90 55 48 89 e5 41 57 41 56 48 8d 85 58 ff ff ff 41 RIP [] ath10k_wmi_tx_beacons_iter+0x28b/0x290 [ath10k_core] RSP ---[ end trace 6588464714e5163a ]--- Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com