From: Felix Fietkau <nbd@openwrt.org>
To: Ben Greear <greearb@candelatech.com>
Cc: "Luis R. Rodriguez" <lrodriguez@Atheros.com>,
"ath9k-devel@lists.ath9k.org" <ath9k-devel@venema.h4ckr.net>,
"linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>
Subject: Re: [ath9k-devel] Script to crash ath9k with DMA errors.
Date: Sun, 05 Dec 2010 03:41:37 +0100 [thread overview]
Message-ID: <4CFAFBE1.3080505@openwrt.org> (raw)
In-Reply-To: <4CF8A6DE.4020804@candelatech.com>
On 2010-12-03 9:14 AM, Ben Greear wrote:
> On 12/01/2010 03:22 PM, Ben Greear wrote:
>> On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
>>> On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:
>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 00000040
>>>> IP: [<f933470a>] ath_tx_start+0x461/0x5ef [ath9k]
>>>> *pde = 00000000
>>>> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>>> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:08:01.0/irq
>>>> Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common ath9k_hw mi]
>>>>
>>>> Pid: 38, comm: kworker/u:1 Tainted: G W 2.6.37-rc3-wl+ #53 PDSBM/PDSBM
>>>> EIP: 0060:[<f933470a>] EFLAGS: 00010246 CPU: 1
>>>> EIP is at ath_tx_start+0x461/0x5ef [ath9k]
>>>
>>> Please use
>>>
>>> gdb drivers/net/wireless/ath/ath9k/
>>> l *(ath_tx_start+0x461)
>>>
>>> Luis
>>
>> I managed to hit that ath_tx_start crash again, and this time there were no obvious
>> DMA or irq errors immediately preceding it. So, it might be a real bug
>> after all. I'll add some extra checks to see if tid->ac is NULL.
>
> I've made some small progress on this general issue.
>
> First, I added all sorts of debugging to try to figure out ath_tx_start crash.
> As best as I can tell, 'tid' is not NULL, but also is not a valid pointer,
> and probably something close to 0x0. I've added yet more debugging, but haven't
> hit the problem again.
>
> I also tried stopping DMA in a loop up to 5 times if it failed to stop
> previously in the loop. This did not appear to help at all.
>
> I also managed to make both the ath_tx_start crash and the DMA errors very hard to reproduce
> (I dare not say fixed, yet).
>
> It appears that this small patch (and possibly, the fact that I set debugging to 0x600
> instead of 0x400) makes the problems go away. This makes me wonder if a root cause is
> something to do with repeatedly resetting the hardware too fast, as setting channels rapidly
> would tend to do that, and channels are set on association by supplicant, it appears.
Please try this patch while leaving the unnecessary resets in place.
I found that when ath_drain_all_txq finds tx dma not stopped, it will
issue a reset at a point in time where it is both useless (since it's
right before a reset anyway) and dangerous (since the rx dma engine
isn't even disabled yet), so IMHO the right thing to do is to drop
this extra reset.
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -1194,18 +1194,8 @@ void ath_drain_all_txq(struct ath_softc
}
}
- if (npend) {
- int r;
-
- ath_print(common, ATH_DBG_FATAL,
- "Failed to stop TX DMA. Resetting hardware!\n");
-
- r = ath9k_hw_reset(ah, sc->sc_ah->curchan, ah->caldata, false);
- if (r)
- ath_print(common, ATH_DBG_FATAL,
- "Unable to reset hardware; reset status %d\n",
- r);
- }
+ if (npend)
+ ath_print(common, ATH_DBG_FATAL, "Failed to stop TX DMA!\n");
for (i = 0; i < ATH9K_NUM_TX_QUEUES; i++) {
if (ATH_TXQ_SETUP(sc, i))
next prev parent reply other threads:[~2010-12-05 2:41 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-30 0:28 Script to crash ath9k with DMA errors Ben Greear
2010-11-30 0:44 ` [ath9k-devel] " Luis R. Rodriguez
2010-11-30 0:52 ` Ben Greear
2010-12-01 23:22 ` Ben Greear
2010-12-03 8:14 ` Ben Greear
2010-12-05 2:41 ` Felix Fietkau [this message]
2010-12-05 3:30 ` Ben Greear
2010-12-05 5:18 ` Ben Greear
2010-12-06 19:36 ` Luis R. Rodriguez
2010-12-06 19:47 ` Ben Greear
2010-12-06 19:53 ` Luis R. Rodriguez
2010-12-06 19:53 ` Luis R. Rodriguez
2010-12-06 20:28 ` Ben Greear
2010-12-06 20:38 ` Felix Fietkau
2010-12-06 20:11 ` Björn Smedman
2010-12-06 20:22 ` Ben Greear
2010-12-06 20:42 ` Luis R. Rodriguez
2010-12-06 21:00 ` Ben Greear
2010-12-06 21:16 ` Luis R. Rodriguez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CFAFBE1.3080505@openwrt.org \
--to=nbd@openwrt.org \
--cc=ath9k-devel@venema.h4ckr.net \
--cc=greearb@candelatech.com \
--cc=linux-wireless@vger.kernel.org \
--cc=lrodriguez@Atheros.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).