From mboxrd@z Thu Jan 1 00:00:00 1970 From: Felix Fietkau Date: Thu, 01 Mar 2012 20:18:57 +0100 Subject: [ath9k-devel] "Failed to stop Tx DMA" and "Could not stop RX" with AR9485 In-Reply-To: <20120301184243.32168.qmail@stuge.se> References: <4F4E9A36.2080809@lacto.se> <20120301144642.13047.qmail@stuge.se> <4F4FB06F.6030201@openwrt.org> <20120301175339.28512.qmail@stuge.se> <4F4FBC4E.6050109@openwrt.org> <20120301184243.32168.qmail@stuge.se> Message-ID: <4F4FCBA1.4080601@openwrt.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org On 2012-03-01 7:42 PM, Peter Stuge wrote: > Felix Fietkau wrote: >> Saying it's a single class of issues does not help in any way, because >> with the issues I've fixed so far, the cause has been wildly different. >> Sometimes software race conditions, sometimes wrong register settings, >> or sometimes actual issues in setting up DMA descriptors. > > All affecting DMA. You're missing the point *again*. Many of the things I found were unrelated to DMA, but messed up the MAC state enough to trigger this warning. Of course you could say that since the MAC also does DMA, the problem must thereby somehow be DMA related, but using that to justify your whining would be kind of stupid. Let's just drop the "it says DMA, so it must be DMA related" nonsense. >> Instrumenting at a lower level does not necessarily help. Even if you >> manage to capture all PCI bus traffic, there's still a good chance that >> this won't help, because the problem could very well be in a completely >> different layer. > > Yes, it depends on what the error is of course. But making > assumptions is not the way to do debugging. Right, that's why I'm trying to bust your assumption that starting with the lowest layer is a good idea. >> > Still, DMA is not exotic, and here are DMA problems again. >> >> That last sentence makes no sense at all. > > My point is that DMA by peripheral devices and the drivers to manage > it are established technologies in computer busses across the world, > so it keeps surprising me that drivers in 2012 get it wrong. Which proves to me yet again that you're completely missing the point of what I'm saying about the likelihood of this being *NOT* DMA related. >> > I think it's clear that the community will never have the opportunity >> > to work closely with Atheros on the lowest levels, >> >> How do you define 'lowest level'? > > An example would be to monitor state machines inside the device using > side channel debugging, while in parallell monitoring state machines > inside the driver. Then comparing them after the fact and seeing > where one goes wrong. Find out why, and audit the complete driver for > similar types of errors. > > Of course the same issue could (theoretically) also be found purely > by static analysis, but that's not very efficient. In my experience, this generates *way* too much data to be of any use to narrow down the source of the problem. >> If you think that just because the message says something about >> DMA, it must be an issue in DMA transfers, you're completely >> missing the point. > > Not saying must be the cause, but it must be eliminated before > walking up the layers.