From mboxrd@z Thu Jan 1 00:00:00 1970 From: Felix Fietkau Date: Thu, 01 Mar 2012 19:13:34 +0100 Subject: [ath9k-devel] "Failed to stop Tx DMA" and "Could not stop RX" with AR9485 In-Reply-To: <20120301175339.28512.qmail@stuge.se> References: <4F4E9A36.2080809@lacto.se> <20120301144642.13047.qmail@stuge.se> <4F4FB06F.6030201@openwrt.org> <20120301175339.28512.qmail@stuge.se> Message-ID: <4F4FBC4E.6050109@openwrt.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org On 2012-03-01 6:53 PM, Peter Stuge wrote: > Felix Fietkau wrote: >> just because the error messages are similar does not mean that it's >> just one bug that has been lurking in the driver for years. > > I didn't say that it's a single issue, but it's clearly a single > class of issues, and I'm surprised that there is noone who can take a > hard look at DMA (with all required information at hand) and make it > reliable, even after so long time. Saying it's a single class of issues does not help in any way, because with the issues I've fixed so far, the cause has been wildly different. Sometimes software race conditions, sometimes wrong register settings, or sometimes actual issues in setting up DMA descriptors. >> Just because one ore more symptoms are the same does not mean that the >> issue has just one cause, or always leads to the same connectivity >> issues. > > Absolutely right. And from the error messages alone "Failed to stop > DMA" it's clear that connectivity issues are really not so relevant > for debugging the issue, they are merely the symptom and thus not > really useful. The problem must be instrumented at a much lower > level. Instrumenting at a lower level does not necessarily help. Even if you manage to capture all PCI bus traffic, there's still a good chance that this won't help, because the problem could very well be in a completely different layer. My hunch on this particular issue is that it's AR9485 specific, and there's simply a chipset difference compared to AR93xx that may not have been accounted for in the driver yet. >> I remember at least 5 different issues that I fixed myself, which had >> all led to these exact symptoms, but were otherwise unrelated to each >> other and had different side effects. > > You and Adrian have done and continue to do amazing work. But I don't > think that it's supposed to be you who solve these issues. > > >> On most chips with recent kernels, these issues tend to not be fatal >> anymore, and at least on embedded hardware it's getting much harder >> to find people that can still produce connection stability issues. > > Yes, IIRC the last (known?) memory corruption was resolved fall 2011. > > Still, DMA is not exotic, and here are DMA problems again. That last sentence makes no sense at all. >> Sometimes I wonder why you keep wasting your energy on these rants, >> which do *nothing* to fix the underlying issues. > > I think it's clear that the community will never have the opportunity > to work closely with Atheros on the lowest levels, so the only thing > left for anyone in the community to do is to complain, so that > perhaps some day Atheros will fix the issue(s). > > Ideally of course the complaints become too tiresome, and suddenly > everyone can work together instead. But I know how foreign this is > under many circumstances, so I don't expect it to happen anytime > soon. How do you define 'lowest level'? If you think that just because the message says something about DMA, it must be an issue in DMA transfers, you're completely missing the point. >> In fact, while I'm writing this I wonder why I bother to respond, >> but I guess my answer is simply to provide some perspective for >> people that might stumble upon one of your mindless rants for the >> first time. > > Mindless as they may seem, I complain because I notice something that > could be better than it is. I appreciate that Atheros cares about > Linux (although by now so do all their competitors) and the Atheros > hardware keeps looking good as always, but hardware development > information in the community is not really available. I found that people that have shown enough interest in improving ath9k and have proven to be experienced in working on such drivers tend to get access to documentation. >> If your intention really is to improve the situation, then please >> use your time to do something useful instead of writing such garbage. > > I tried several times, and was repeatedly shot down or simply ignored > because I didn't have the latest hardware, or because I was asking > too low-level questions that were not permitted to be discussed, or > for some other reason. I can only guess. So I will not waste more of > my time until I believe that I will get hard support in order to work > efficiently. Being told to try to reverse engineer the hardware using > the driver source code or any other source code does not qualify. Your definition of 'reverse engineering' is quite funny. >> From what I can see, the recommendation to try disabling powersave >> is actually useful for narrowing down the source of this issue... > > No. Disabling powersave is a workaround, but it doesn't help find the > problem. Debugging the issue requires looking inside the hardware, > and by now it's obvious at least to me that if that ever happens then > it will be in Atheros' lab and nowhere else. To me it looks somewhat ridiculous that you claim to know better what's required to debug this issue than people working with the hardware on a daily basis (Adrian, Mohammed, me). - Felix