From mboxrd@z Thu Jan 1 00:00:00 1970 From: Felix Fietkau Date: Wed, 12 Oct 2011 20:36:16 +0200 Subject: [ath9k-devel] need help debugging/pinning down obscure freezes/hangs using AR9287 In-Reply-To: <20111012172309.19779.qmail@stuge.se> References: <8E4915FC-90CA-46C6-B15D-BFFC3037BA22@gmail.com> <20111012135355.22032.qmail@stuge.se> <20111012145612.31165.qmail@stuge.se> <20111012153552.4580.qmail@stuge.se> <4E95C57B.1040201@openwrt.org> <20111012172309.19779.qmail@stuge.se> Message-ID: <4E95DE20.1040203@openwrt.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org On 2011-10-12 7:23 PM, Peter Stuge wrote: >> Having detailed information about the inner workings of the >> hardware is useless if you don't have a basic understanding of how >> the driver works. > > The two obviously go very tightly together. The more information > about both you have, the easier it is to work on either side. The > more information that flows about either, the easier it is to start > working on any side. Since no information flows, it takes lots of > effort to start working anywhere. Negative spiral. > > >> For many of the bugs that were fixed, you don't actually need any >> kind of detailed hardware info to fix them. > > ath9k is by now working really well for very many users, so any bugs > that do remain are particularly likely corner cases which do involve > hardware, firmware, driver, with two of the above, or all three, > being in some condition not yet been encountered by developers. I disagree, many such issues are simple software bugs, often race conditions. This is especially true on older hardware. >> What ath9k needs most is people that are willing to spend enough >> time reading the driver and playing with the code until they >> understand the software side well enough > > AKA reverse engineering, which is IMO not the best open source can > do. Yes the code is there, no loading the entire driver into your > head is not useful for fixing bugs. And IMO fixing bugs is always > useful, even if they are corner cases. Actually, loading the entire driver code into your head sometimes is the *only* way to find and fix weird corner cases. There are so many interdependencies between different parts of the driver that you have to be able to get a feel for the side effects of various parts to be able to even begin to systematically understand and address various bugs. From my own personal experience, a large portion of the bugs that I fixed over the last couple of months were found by simple code review that did not require specific knowledge about any hardware internals. The level of hardware understanding that was necessary for these bugs was basic enough that it's something that you automatically get when you've read and understood a fair share of the code. This has nothing to do with reverse engineering. It's a driver programmer's common sense! In my opinion the lack of availability of hardware information has too often simply been kicked in as an excuse for not even beginning to try to understand the driver code. I've been doing work on wifi drivers for years now, and even when I started doing this - without access to *any* forms of documentation - I've been able to find and fix lots of bugs. This was madwifi back then, and the code quality and stability was so much worse than what we have today. Considering that I've been able to easily get started with such work under circumstances that were a *lot* worse than what we have today, I'd expect a bit more from people showing up here seriously wanting to work on this stuff. Wifi driver work isn't exactly trivial, so people should expect to have to invest a bit more initial effort to get involved. If they can't handle that, then so be it. >> And even with competent users knowing Linux well, it can be quite >> hard to walk them through the right steps to track down bugs, >> because often it's very hard even for people with access to all >> hardware docs to even identify a likely source or general area of >> the bug. > > For a competent developer with knowledge of how hardware, firmware > and driver interacts this is not very hard. But it is time consuming! > > What I sometimes do to help with debugging is the skill test that I > mentioned. If debug info is missing, then ask for that info in a way > that will show the skill level of the other person. This quickly > provides an estimate of how much effort has to put into this > debugging session, ie. what kind of instructions can be given for > getting closer to the problem. With many bugs, I often don't even know where to begin to ask for proper debug information because the problem (or sometimes the problem description) is too vague to narrow it down to a particular subsystem. Once I've reproduced a bug, things become much easier, since I can 'randomly' try stuff at a rapid rate until I find an essential clue that (together with my mental model of how the driver works) shows me where to look. It's hard to get to that point remotely, even with cluey users. By the way, you keep mentioning the word 'firmware' - with ath9k there is no firmware involved. - Felix