* Can someone please try... @ 2007-01-16 17:06 Michael Buesch 2007-01-16 18:29 ` Pavel Roskin 2007-01-16 19:00 ` Andreas Schwab 0 siblings, 2 replies; 19+ messages in thread From: Michael Buesch @ 2007-01-16 17:06 UTC (permalink / raw) To: bcm43xx-dev; +Cc: netdev ...the bcm43xx driver in my tree with a 4318 chip? The code there works excellent with my 4306 now, but I can't get it to work with my 4318. It's strange, it doesn't seem to work at all. I don't seem to be able to TX and RX any packet. Not sure why. To get it, please try to avoid cloning the whole tree from my repository to avoid unnecessary bandwidth wasting. If you have a linville-wireless-dev tree, you can do the following: cd wireless-dev git branch mb git checkout mb git pull http://bu3sch.de/git/wireless-dev.git master I think this should also work if you have a linus-2.6 tree checked out somewhere. -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 17:06 Can someone please try Michael Buesch @ 2007-01-16 18:29 ` Pavel Roskin 2007-01-16 19:23 ` Michael Buesch 2007-01-16 19:00 ` Andreas Schwab 1 sibling, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-16 18:29 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev On Tue, 2007-01-16 at 18:06 +0100, Michael Buesch wrote: > ...the bcm43xx driver in my tree with a 4318 chip? Things are progressing for me a bit because I observed an association to an AP with no security. I still had to use wpa_supplicant. Unfortunately, there is a bigger issue with the new code. When I interrupt wpa_supplicant, the kernel reports several oopses and then panics, so I have to reboot. I had to use serial console just to capture the messages. I assume the first message is most relevant. Here it is: kernel BUG at /home/proski/src/linux-2.6/mm/slab.c:597! invalid opcode: 0000 [1] CPU 0 Modules linked in: bcm43xx_d80211 ssb Pid: 2984, comm: wpa_supplicant Not tainted 2.6.20-rc3 #2 RIP: 0010:[<ffffffff8020aa5a>] [<ffffffff8020aa5a>] kfree+0x5c/0x97 RSP: 0018:ffff81000727fd08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff81001e53a3c0 RCX: 0000000000000001 RDX: ffff810001689c40 RSI: 000000000727c010 RDI: ffff81001de38000 RBP: ffff81001de38000 R08: ffffffff8052c2e0 R09: ffff81001eac80c0 R10: ffff8100066153c0 R11: ffff8100066157c0 R12: 0000000000000286 R13: ffff810006dfb988 R14: ffff81001e23c000 R15: 0000000000000000 FS: 00002b75242c6cd0(0000) GS:ffffffff8056c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003a8ab12e60 CR3: 000000000727a000 CR4: 00000000000006e0 Process wpa_supplicant (pid: 2984, threadinfo ffff81000727e000, task ffff810011e3d0c0) Stack: ffff81001e53a3c0 0000000000000013 ffff81001dddc000 ffffffff802237be ffff81001eac80c0 ffffffff8801b2f6 ffffffff8056c980 ffffffff8028cb21 ffff81000707c7b8 ffff81001e8921c0 ffff81001e892000 ffffffff8801b63a Call Trace: [<ffffffff802237be>] kfree_skbmem+0x9/0x73 [<ffffffff8801b2f6>] :bcm43xx_d80211:bcm43xx_destroy_dmaring+0x1d1/0x205 [<ffffffff8028cb21>] free_irq+0xd8/0x120 [<ffffffff8801b63a>] :bcm43xx_d80211:bcm43xx_dma_free+0x89/0xad [<ffffffff88008c7e>] :bcm43xx_d80211:bcm43xx_wireless_core_exit+0x29/0x76 [<ffffffff88008dcc>] :bcm43xx_d80211:bcm43xx_remove_interface+0x101/0x135 [<ffffffff804422d3>] ieee80211_stop+0xdd/0xf7 [<ffffffff80407cac>] dev_close+0x52/0x71 [<ffffffff8040750f>] dev_change_flags+0x5a/0x119 [<ffffffff8042e57d>] devinet_ioctl+0x235/0x59b [<ffffffff804004a6>] sock_ioctl+0x1c8/0x1e5 [<ffffffff80238f2a>] do_ioctl+0x1b/0x50 [<ffffffff8022a82a>] vfs_ioctl+0x215/0x227 [<ffffffff80242166>] sys_ioctl+0x3c/0x5c [<ffffffff80250ede>] system_call+0x7e/0x83 Code: 0f 0b eb fe 48 8b 7a 28 48 8b 1f 8b 13 3b 53 04 73 0c 89 d0 RIP [<ffffffff8020aa5a>] kfree+0x5c/0x97 RSP <ffff81000727fd08> That's still the same Dell Latitude D520 with Core 2 Duo and Fedora Core 6, internal PCIe card 14e4:4312. I'm using your current tree ending with "bcm43xx-d80211: Various cleanups all over the code" SMP is disabled this time, just to make things simpler. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 18:29 ` Pavel Roskin @ 2007-01-16 19:23 ` Michael Buesch 2007-01-16 21:50 ` Pavel Roskin 0 siblings, 1 reply; 19+ messages in thread From: Michael Buesch @ 2007-01-16 19:23 UTC (permalink / raw) To: Pavel Roskin; +Cc: bcm43xx-dev, netdev On Tuesday 16 January 2007 19:29, Pavel Roskin wrote: > On Tue, 2007-01-16 at 18:06 +0100, Michael Buesch wrote: > > ...the bcm43xx driver in my tree with a 4318 chip? > > Things are progressing for me a bit because I observed an association to > an AP with no security. I still had to use wpa_supplicant. > > Unfortunately, there is a bigger issue with the new code. When I > interrupt wpa_supplicant, the kernel reports several oopses and then > panics, so I have to reboot. I had to use serial console just to > capture the messages. > > I assume the first message is most relevant. Here it is: A patch for that is already upstream. It's surprising that it doesn't happen for me, though. Neiter on PPC, nor on i386. Patch was [PATCH] bcm43xx-d80211: Fix DMA TX skb doublefree -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 19:23 ` Michael Buesch @ 2007-01-16 21:50 ` Pavel Roskin 2007-01-16 22:07 ` Michael Buesch 0 siblings, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-16 21:50 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev On Tue, 2007-01-16 at 20:23 +0100, Michael Buesch wrote: > A patch for that is already upstream. I don't see it. It's not in your tree yet. > It's surprising that it doesn't happen for me, though. > Neiter on PPC, nor on i386. It did happen for me on i386, as well as on x86_64. The dump was for x86_64, as evidenced by the register size. Maybe you have less debugging options enabled? > Patch was > [PATCH] bcm43xx-d80211: Fix DMA TX skb doublefree Even with this hint, I cannot spot the bug immediately, so it would be great if you sync the public repository soon. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 21:50 ` Pavel Roskin @ 2007-01-16 22:07 ` Michael Buesch 2007-01-16 23:51 ` Pavel Roskin 0 siblings, 1 reply; 19+ messages in thread From: Michael Buesch @ 2007-01-16 22:07 UTC (permalink / raw) To: Pavel Roskin Cc: netdev-u79uwXL29TY76Z2rM5mHXA, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w On Tuesday 16 January 2007 22:50, Pavel Roskin wrote: > On Tue, 2007-01-16 at 20:23 +0100, Michael Buesch wrote: > > > A patch for that is already upstream. > > I don't see it. It's not in your tree yet. It is on its way upstream to linville. > > It's surprising that it doesn't happen for me, though. > > Neiter on PPC, nor on i386. > > It did happen for me on i386, as well as on x86_64. The dump was for > x86_64, as evidenced by the register size. Maybe you have less > debugging options enabled? All. > > Patch was > > [PATCH] bcm43xx-d80211: Fix DMA TX skb doublefree > > Even with this hint, I cannot spot the bug immediately, so it would be > great if you sync the public repository soon. Linville has to put the patch into his tree first, so I can pull it. You can find the patch easily by searching bcm43xx-dev or netdev archives. -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 22:07 ` Michael Buesch @ 2007-01-16 23:51 ` Pavel Roskin 2007-01-17 9:52 ` Michael Buesch 0 siblings, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-16 23:51 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev On Tue, 2007-01-16 at 23:07 +0100, Michael Buesch wrote: > On Tuesday 16 January 2007 22:50, Pavel Roskin wrote: > > On Tue, 2007-01-16 at 20:23 +0100, Michael Buesch wrote: > > > > > A patch for that is already upstream. > > > > I don't see it. It's not in your tree yet. > > It is on its way upstream to linville. Well, it's pretty cruel to ask others to test code with known fatal bugs, IMHO. Even it git were extremely poor at handling a patch applied in two branches. In fact, git is not so bad at all at handling such situations. > > > It's surprising that it doesn't happen for me, though. > > > Neiter on PPC, nor on i386. > > > > It did happen for me on i386, as well as on x86_64. The dump was for > > x86_64, as evidenced by the register size. Maybe you have less > > debugging options enabled? > > All. That's commendable. I tried the 32-bit kernel without SMP and with almost all debugging. One thing I noticed is that scanning ignores the pure 802.11b AP running HostAP that I was going to use for testing. Other APs are detected. The association didn't work, probably for that reason. Scanning may trigger many assertion failures: bcm43xx_d80211: ASSERTION FAILED ((lna & ~0x7) == 0) at: /home/proski/src/linux-2.6/drivers/net/ wireless/d80211/bcm43xx/bcm43xx_lo.c:235:lo_measure_feedthrough() Finally, interrupting wpa_supplicant hits another bug: BUG: unable to handle kernel paging request at virtual address c3e2cbf8 printing eip: e03835e1 *pde = 0000f067 *pte = 03e2c000 Oops: 0002 [#1] DEBUG_PAGEALLOC Modules linked in: bcm43xx_d80211 ssb CPU: 0 EIP: 0060:[<e03835e1>] Not tainted VLI EFLAGS: 00210282 (2.6.20-rc3 #3) EIP is at bcm43xx_wireless_core_init+0x5a/0x98e [bcm43xx_d80211] eax: 00000000 ebx: c3dab740 ecx: 000000e1 edx: c3493808 esi: c34937f8 edi: c3e2cbf8 ebp: c3e60e38 esp: c3e60db8 ds: 007b es: 007b ss: 0068 Process wpa_supplicant (pid: 2942, ti=c3e60000 task=c3f92590 task.ti=c3e60000) Stack: c0339db6 c0339db6 00000000 c3f92590 c0339db6 00200246 c3e60df0 c3f30000 c3dab740 c0339de4 c3493808 c3e60e0c c3e60e0c 00200246 c3e60e2c c0339dc0 00000000 00000002 c0339de4 c3e60e50 c3f92590 22222222 22222222 22222222 Call Trace: [<c010335d>] show_trace_log_lvl+0x1a/0x2f [<c010340d>] show_stack_log_lvl+0x9b/0xa3 [<c01035a6>] show_registers+0x191/0x267 [<c010378f>] die+0x113/0x212 [<c011010a>] do_page_fault+0x43a/0x50c [<c033b47c>] error_code+0x74/0x7c [<e03850bc>] bcm43xx_add_interface+0x4f/0xb7 [bcm43xx_d80211] [<c032022f>] ieee80211_open+0x19d/0x27e [<c02dbb77>] dev_open+0x2d/0x64 [<c02da71f>] dev_change_flags+0x51/0xf1 [<c030b67a>] devinet_ioctl+0x235/0x53a [<c030bc38>] inet_ioctl+0x73/0x91 [<c02d1db8>] sock_ioctl+0x1ac/0x1c9 [<c015dd64>] do_ioctl+0x1c/0x51 [<c015df94>] vfs_ioctl+0x1fb/0x212 [<c015dfdc>] sys_ioctl+0x31/0x49 [<c0102cba>] sysenter_past_esp+0x5f/0x99 ======================= Code: 00 80 66 0d ef 8d be 9c 01 00 00 f3 ab 8b 7a 5c 80 62 49 c5 c7 42 4c ff ff ff ff 85 ff c7 42 50 00 00 00 00 74 13 b9 e1 00 00 00 <f3> ab 8b 42 5c 66 c7 80 76 03 00 00 ff ff 8b 4d a8 89 f 0 c7 41 EIP: [<e03835e1>] bcm43xx_wireless_core_init+0x5a/0x98e [bcm43xx_d80211] SS:ESP 0068:c3e60db8 Then I used MadWifi on the AP side, and "iwpriv scan" picked it. Moreover, wpa_supplicant reported connection! I interrupted wpa_supplicant and started it again, and then the kernel oopsed again. Strangely, the driver is not even mentioned in the backtrace. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c02d8863 *pde = 00000000 Oops: 0002 [#1] DEBUG_PAGEALLOC Modules linked in: bcm43xx_d80211 ssb CPU: 0 EIP: 0060:[<c02d8863>] Not tainted VLI EFLAGS: 00210246 (2.6.20-rc3 #3) EIP is at datagram_poll+0xba/0xc5 eax: 00000000 ebx: cc252bf8 ecx: 00000049 edx: 00000000 esi: 00000002 edi: 00000004 ebp: cb940b70 esp: cb940b68 ds: 007b es: 007b ss: 0068 Process wpa_supplicant (pid: 4344, ti=cb940000 task=c2be0590 task.ti=cb940000) Stack: c0353220 c9fedf2c cb940b7c c02d1643 00000000 cb940e30 c015ebae c033b3bd cb940e54 cb940e50 cb940f9c cb940f50 cb940be0 00000000 00000000 cb940e5c cb940e60 cb940e64 cb940e50 cb940e54 cb940e58 00000070 00000000 00000000 Call Trace: [<c010335d>] show_trace_log_lvl+0x1a/0x2f [<c010340d>] show_stack_log_lvl+0x9b/0xa3 [<c01035a6>] show_registers+0x191/0x267 [<c010378f>] die+0x113/0x212 [<c011010a>] do_page_fault+0x43a/0x50c [<c033b47c>] error_code+0x74/0x7c [<c02d1643>] sock_poll+0x12/0x15 [<c015ebae>] do_select+0x2b4/0x4cc [<c015f076>] core_sys_select+0x2b0/0x2d5 [<c015f631>] sys_select+0x99/0x170 [<c0102cba>] sysenter_past_esp+0x5f/0x99 ======================= Code: ca 3c 02 74 2b 8b 83 7c 01 00 00 ba 02 00 00 00 89 d6 99 f7 fe 39 83 cc 00 00 00 7d 08 81 c9 04 03 00 00 eb 0b 8b 83 44 02 00 00 <0f> ba 68 04 00 5b 89 c8 5e 5d c3 55 89 e5 57 56 89 c6 5 3 83 ec EIP: [<c02d8863>] datagram_poll+0xba/0xc5 SS:ESP 0068:cb940b68 -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 23:51 ` Pavel Roskin @ 2007-01-17 9:52 ` Michael Buesch 2007-01-18 9:41 ` Pavel Roskin 0 siblings, 1 reply; 19+ messages in thread From: Michael Buesch @ 2007-01-17 9:52 UTC (permalink / raw) To: Pavel Roskin Cc: netdev-u79uwXL29TY76Z2rM5mHXA, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w On Wednesday 17 January 2007 00:51, Pavel Roskin wrote: > On Tue, 2007-01-16 at 23:07 +0100, Michael Buesch wrote: > > On Tuesday 16 January 2007 22:50, Pavel Roskin wrote: > > > On Tue, 2007-01-16 at 20:23 +0100, Michael Buesch wrote: > > > > > > > A patch for that is already upstream. > > > > > > I don't see it. It's not in your tree yet. > > > > It is on its way upstream to linville. > > Well, it's pretty cruel to ask others to test code with known fatal > bugs, IMHO. I forgot that the bug was there, because it doesn't trigger on my machines. I already explained that... > Even it git were extremely poor at handling a patch applied > in two branches. In fact, git is not so bad at all at handling such > situations. I have to wait until linville pulls it. Fullstop. > > > > It's surprising that it doesn't happen for me, though. > > > > Neiter on PPC, nor on i386. > > > > > > It did happen for me on i386, as well as on x86_64. The dump was for > > > x86_64, as evidenced by the register size. Maybe you have less > > > debugging options enabled? > > > > All. > > That's commendable. I tried the 32-bit kernel without SMP and with > almost all debugging. One thing I noticed is that scanning ignores the > pure 802.11b AP running HostAP that I was going to use for testing. > Other APs are detected. The association didn't work, probably for that > reason. Probably some d80211 bug. Dunno. > Scanning may trigger many assertion failures: > > bcm43xx_d80211: ASSERTION FAILED ((lna & ~0x7) == 0) > at: /home/proski/src/linux-2.6/drivers/net/ > wireless/d80211/bcm43xx/bcm43xx_lo.c:235:lo_measure_feedthrough() It's not triggered by scanning, it's known and it's nonfatal. > Finally, interrupting wpa_supplicant hits another bug: > > BUG: unable to handle kernel paging request at virtual address c3e2cbf8 > printing eip: > e03835e1 > *pde = 0000f067 > *pte = 03e2c000 > Oops: 0002 [#1] > DEBUG_PAGEALLOC > Modules linked in: bcm43xx_d80211 ssb > CPU: 0 > EIP: 0060:[<e03835e1>] Not tainted VLI > EFLAGS: 00210282 (2.6.20-rc3 #3) > EIP is at bcm43xx_wireless_core_init+0x5a/0x98e [bcm43xx_d80211] > eax: 00000000 ebx: c3dab740 ecx: 000000e1 edx: c3493808 > esi: c34937f8 edi: c3e2cbf8 ebp: c3e60e38 esp: c3e60db8 > ds: 007b es: 007b ss: 0068 > Process wpa_supplicant (pid: 2942, ti=c3e60000 task=c3f92590 task.ti=c3e60000) > Stack: c0339db6 c0339db6 00000000 c3f92590 c0339db6 00200246 c3e60df0 c3f30000 > c3dab740 c0339de4 c3493808 c3e60e0c c3e60e0c 00200246 c3e60e2c c0339dc0 > 00000000 00000002 c0339de4 c3e60e50 c3f92590 22222222 22222222 22222222 > Call Trace: > [<c010335d>] show_trace_log_lvl+0x1a/0x2f > [<c010340d>] show_stack_log_lvl+0x9b/0xa3 > [<c01035a6>] show_registers+0x191/0x267 > [<c010378f>] die+0x113/0x212 > [<c011010a>] do_page_fault+0x43a/0x50c > [<c033b47c>] error_code+0x74/0x7c > [<e03850bc>] bcm43xx_add_interface+0x4f/0xb7 [bcm43xx_d80211] > [<c032022f>] ieee80211_open+0x19d/0x27e > [<c02dbb77>] dev_open+0x2d/0x64 > [<c02da71f>] dev_change_flags+0x51/0xf1 > [<c030b67a>] devinet_ioctl+0x235/0x53a > [<c030bc38>] inet_ioctl+0x73/0x91 > [<c02d1db8>] sock_ioctl+0x1ac/0x1c9 > [<c015dd64>] do_ioctl+0x1c/0x51 > [<c015df94>] vfs_ioctl+0x1fb/0x212 > [<c015dfdc>] sys_ioctl+0x31/0x49 > [<c0102cba>] sysenter_past_esp+0x5f/0x99 > ======================= > Code: 00 80 66 0d ef 8d be 9c 01 00 00 f3 ab 8b 7a 5c 80 62 49 c5 c7 42 4c ff ff ff ff 85 ff c7 > 42 50 00 00 00 00 74 13 b9 e1 00 00 00 <f3> ab 8b 42 5c 66 c7 80 76 03 00 00 ff ff 8b 4d a8 89 f > 0 c7 41 > EIP: [<e03835e1>] bcm43xx_wireless_core_init+0x5a/0x98e [bcm43xx_d80211] SS:ESP 0068:c3e60db8 Doesn't happen for me. I have no idea what's happening. Care to debug it? But it's weird that _killing_ the supplicant calls add_interface. I'd expect it to call remove_interface. > Then I used MadWifi on the AP side, and "iwpriv scan" picked it. > Moreover, wpa_supplicant reported connection! I interrupted > wpa_supplicant and started it again, and then the kernel oopsed again. > Strangely, the driver is not even mentioned in the backtrace. > > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004 > printing eip: > c02d8863 > *pde = 00000000 > Oops: 0002 [#1] > DEBUG_PAGEALLOC > Modules linked in: bcm43xx_d80211 ssb > CPU: 0 > EIP: 0060:[<c02d8863>] Not tainted VLI > EFLAGS: 00210246 (2.6.20-rc3 #3) > EIP is at datagram_poll+0xba/0xc5 > eax: 00000000 ebx: cc252bf8 ecx: 00000049 edx: 00000000 > esi: 00000002 edi: 00000004 ebp: cb940b70 esp: cb940b68 > ds: 007b es: 007b ss: 0068 > Process wpa_supplicant (pid: 4344, ti=cb940000 task=c2be0590 task.ti=cb940000) > Stack: c0353220 c9fedf2c cb940b7c c02d1643 00000000 cb940e30 c015ebae c033b3bd > cb940e54 cb940e50 cb940f9c cb940f50 cb940be0 00000000 00000000 cb940e5c > cb940e60 cb940e64 cb940e50 cb940e54 cb940e58 00000070 00000000 00000000 > Call Trace: > [<c010335d>] show_trace_log_lvl+0x1a/0x2f > [<c010340d>] show_stack_log_lvl+0x9b/0xa3 > [<c01035a6>] show_registers+0x191/0x267 > [<c010378f>] die+0x113/0x212 > [<c011010a>] do_page_fault+0x43a/0x50c > [<c033b47c>] error_code+0x74/0x7c > [<c02d1643>] sock_poll+0x12/0x15 > [<c015ebae>] do_select+0x2b4/0x4cc > [<c015f076>] core_sys_select+0x2b0/0x2d5 > [<c015f631>] sys_select+0x99/0x170 > [<c0102cba>] sysenter_past_esp+0x5f/0x99 > ======================= > Code: ca 3c 02 74 2b 8b 83 7c 01 00 00 ba 02 00 00 00 89 d6 99 f7 fe 39 83 cc 00 00 00 7d 08 81 > c9 04 03 00 00 eb 0b 8b 83 44 02 00 00 <0f> ba 68 04 00 5b 89 c8 5e 5d c3 55 89 e5 57 56 89 c6 5 > 3 83 ec > EIP: [<c02d8863>] datagram_poll+0xba/0xc5 SS:ESP 0068:cb940b68 I have absolutely no idea. Did not happen a single time for me. In fact. It's all pretty stable on my machines. -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-17 9:52 ` Michael Buesch @ 2007-01-18 9:41 ` Pavel Roskin 2007-01-19 7:54 ` Pavel Roskin 0 siblings, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-18 9:41 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev On Wed, 2007-01-17 at 10:52 +0100, Michael Buesch wrote: > Doesn't happen for me. I have no idea what's happening. > Care to debug it? > But it's weird that _killing_ the supplicant calls add_interface. > I'd expect it to call remove_interface. I'm sorry, I was actually running wpa_supplicant again at the time of the crash. What I have now is very different behavior in two configurations on the same machine. The i386 kernel without SMP with most debug enabled and serial console. wpa_supplicant times out. If I restart is, the kernel oopses, every time in a different place. The x86_64 kernel with SMP and with very few debug options. wpa_supplicant connects. Killing and restarting wpa_supplicant doesn't cause any problems. In fact, wpa_supplicant reconnects quickly. I can even ping the station from the AP, but the packet loss is horrible. It appears that most loss is on the receiving side. I'll try to debug the problem when I have time. At least I'll try to find out if it's specific to the architecture or to another kernel option. Anyway, it's exciting that I could send first packets today! -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-18 9:41 ` Pavel Roskin @ 2007-01-19 7:54 ` Pavel Roskin 2007-01-22 20:06 ` Michael Buesch 0 siblings, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-19 7:54 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev Hello, Michael! I did more testing, and the results are following. It looks like the oopses and panics on i386 were triggered by 4k stacks. x86_64 doesn't have this option. Now that I enabled other debug options on both platforms. but not 4k stacks, I'm seeing exactly the same problem on each platform. When run initially, wpa_supplicant connects with no problems (except very poor reception of the data packets, but it's another story). If interrupted and restarted, wpa_supplicant reconnects, but I'm getting messages like this (i386): Slab corruption: start=cfdaece0, len=1024 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [<c02d70c2>](skb_release_data+0x7b/0x7f) 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Prev obj: start=cfdae8d4, len=1024 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<c026ea5a>](device_create+0x2c/0x98) 000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: ad 4e ad de ff ff ff ff ff ff ff ff 10 3a 6d c0 Next obj: start=cfdaf0ec, len=1024 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<c0165730>](expand_files+0x95/0x2c2) 000: 78 55 39 c7 78 55 39 c7 78 55 39 c7 88 da 52 df 010: d8 18 3b c7 00 00 00 00 00 00 00 00 00 00 00 00 and this (x86_64): Slab corruption: start=ffff81000ec8a198, len=1024 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [<ffffffff8042e916>](skb_release_data+0x94/0x99) 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Next obj: start=ffff81000ec8a5b0, len=1024 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<ffffffff803be6e9>](device_create+0x5f/0x110) 000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 I can restart wpa_supplicant again, and it would show similar messages. The first "Last user" is inevitably skb_release_data. I have no idea how to deal with it. I think I need a stack trace at the time when skb_release_data is called. This is a stack trace at the time when slab corruption is detected. It's actually incorrect closer to the top, perhaps from gcc optimizations for static functions. Slab corruption: start=ffff8100066f81d8, len=1024 Call Trace: [<ffffffff80218636>] vsnprintf+0x338/0x5a8 [<ffffffff8020713d>] check_poison_obj+0x69/0x1ae [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 [<ffffffff8020c09a>] cache_alloc_debugcheck_after+0x32/0x1a2 [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 [<ffffffff802aaae2>] kmem_cache_zalloc+0xaf/0xd8 [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 [<ffffffff880111ea>] :bcm43xx_d80211:bcm43xx_phy_init_tssi2dbm_table +0xf0/0x2ca [<ffffffff803c432a>] request_firmware+0xe/0x10 [<ffffffff88007d75>] :bcm43xx_d80211:bcm43xx_chip_init+0x96/0xaba [<ffffffff8020a03d>] kmem_cache_alloc+0xaf/0xbe [<ffffffff88009c97>] :bcm43xx_d80211:bcm43xx_wireless_core_init +0x4de/0xa3d [<ffffffff8800b4e8>] :bcm43xx_d80211:bcm43xx_add_interface+0x64/0xde [<ffffffff8046eaa0>] ieee80211_open+0x1c7/0x2cc [<ffffffff804330da>] dev_open+0x36/0x76 [<ffffffff8043185b>] dev_change_flags+0x5d/0x122 [<ffffffff8045a1a3>] devinet_ioctl+0x259/0x5e8 [<ffffffff8045a7f2>] inet_ioctl+0x71/0x8f [<ffffffff8042a395>] sock_ioctl+0x1db/0x1fd [<ffffffff8023bfa7>] do_ioctl+0x1b/0x50 [<ffffffff8022c9b2>] vfs_ioctl+0x22a/0x23c [<ffffffff80289975>] trace_hardirqs_on+0x124/0x14e [<ffffffff802459a2>] sys_ioctl+0x42/0x65 [<ffffffff8025531e>] system_call+0x7e/0x83 Anyway, I could narrow down this message to the first kzalloc() call in fw_register_device(), file drivers/base/firmware_class.c. This only seems to confirm my suspicion that the actual corruption happened before this point. We are just hitting it when trying to allocate more memory. Help with debugging this problem will be appreciated. I've never hunted down such problems, especially in kernel space. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-19 7:54 ` Pavel Roskin @ 2007-01-22 20:06 ` Michael Buesch 2007-01-22 20:44 ` Pavel Roskin 0 siblings, 1 reply; 19+ messages in thread From: Michael Buesch @ 2007-01-22 20:06 UTC (permalink / raw) To: Pavel Roskin; +Cc: bcm43xx-dev, netdev On Friday 19 January 2007 08:54, Pavel Roskin wrote: > Hello, Michael! > > I did more testing, and the results are following. It looks like the > oopses and panics on i386 were triggered by 4k stacks. x86_64 doesn't > have this option. > > Now that I enabled other debug options on both platforms. but not 4k > stacks, I'm seeing exactly the same problem on each platform. When run > initially, wpa_supplicant connects with no problems (except very poor > reception of the data packets, but it's another story). If interrupted > and restarted, wpa_supplicant reconnects, but I'm getting messages like > this (i386): That's a very interresting discover. Partly, because I don't see this on my i386 machine. ;) It's obviously some stack/memory corruption. But I'm not sure if this is a stackoverflow. I'd rather say no, it isn't. Could probably be triggered by something like kfree()ing a dangling pointer or something... > Slab corruption: start=cfdaece0, len=1024 > Redzone: 0x5a2cf071/0x5a2cf071. > Last user: [<c02d70c2>](skb_release_data+0x7b/0x7f) > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Prev obj: start=cfdae8d4, len=1024 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [<c026ea5a>](device_create+0x2c/0x98) > 000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: ad 4e ad de ff ff ff ff ff ff ff ff 10 3a 6d c0 > Next obj: start=cfdaf0ec, len=1024 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [<c0165730>](expand_files+0x95/0x2c2) > 000: 78 55 39 c7 78 55 39 c7 78 55 39 c7 88 da 52 df > 010: d8 18 3b c7 00 00 00 00 00 00 00 00 00 00 00 00 > > and this (x86_64): > > Slab corruption: start=ffff81000ec8a198, len=1024 > Redzone: 0x5a2cf071/0x5a2cf071. > Last user: [<ffffffff8042e916>](skb_release_data+0x94/0x99) > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Next obj: start=ffff81000ec8a5b0, len=1024 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [<ffffffff803be6e9>](device_create+0x5f/0x110) > 000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > I can restart wpa_supplicant again, and it would show similar messages. > The first "Last user" is inevitably skb_release_data. > > I have no idea how to deal with it. I think I need a stack trace at the > time when skb_release_data is called. > > This is a stack trace at the time when slab corruption is detected. > It's actually incorrect closer to the top, perhaps from gcc > optimizations for static functions. > > Slab corruption: start=ffff8100066f81d8, len=1024 > > Call Trace: > [<ffffffff80218636>] vsnprintf+0x338/0x5a8 > [<ffffffff8020713d>] check_poison_obj+0x69/0x1ae > [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 > [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 > > > [<ffffffff8020c09a>] cache_alloc_debugcheck_after+0x32/0x1a2 > [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 > [<ffffffff802aaae2>] kmem_cache_zalloc+0xaf/0xd8 > [<ffffffff803c3ff2>] _request_firmware+0x8f/0x326 > [<ffffffff880111ea>] :bcm43xx_d80211:bcm43xx_phy_init_tssi2dbm_table > +0xf0/0x2ca > [<ffffffff803c432a>] request_firmware+0xe/0x10 > [<ffffffff88007d75>] :bcm43xx_d80211:bcm43xx_chip_init+0x96/0xaba > [<ffffffff8020a03d>] kmem_cache_alloc+0xaf/0xbe > [<ffffffff88009c97>] :bcm43xx_d80211:bcm43xx_wireless_core_init > +0x4de/0xa3d > [<ffffffff8800b4e8>] :bcm43xx_d80211:bcm43xx_add_interface+0x64/0xde > [<ffffffff8046eaa0>] ieee80211_open+0x1c7/0x2cc > [<ffffffff804330da>] dev_open+0x36/0x76 > [<ffffffff8043185b>] dev_change_flags+0x5d/0x122 > [<ffffffff8045a1a3>] devinet_ioctl+0x259/0x5e8 > [<ffffffff8045a7f2>] inet_ioctl+0x71/0x8f > [<ffffffff8042a395>] sock_ioctl+0x1db/0x1fd > [<ffffffff8023bfa7>] do_ioctl+0x1b/0x50 > [<ffffffff8022c9b2>] vfs_ioctl+0x22a/0x23c > [<ffffffff80289975>] trace_hardirqs_on+0x124/0x14e > [<ffffffff802459a2>] sys_ioctl+0x42/0x65 > [<ffffffff8025531e>] system_call+0x7e/0x83 > > Anyway, I could narrow down this message to the first kzalloc() call in > fw_register_device(), file drivers/base/firmware_class.c. This only > seems to confirm my suspicion that the actual corruption happened before > this point. We are just hitting it when trying to allocate more memory. > > Help with debugging this problem will be appreciated. I've never hunted > down such problems, especially in kernel space. > -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-22 20:06 ` Michael Buesch @ 2007-01-22 20:44 ` Pavel Roskin 2007-01-22 21:00 ` Michael Buesch 0 siblings, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-22 20:44 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev Hello, Michael! On Mon, 2007-01-22 at 21:06 +0100, Michael Buesch wrote: > It's obviously some stack/memory corruption. But I'm not > sure if this is a stackoverflow. I'd rather say no, it isn't. > > Could probably be triggered by something like kfree()ing > a dangling pointer or something... Yes. That's what my patch was for ("Fix major memory corruption bug"). It was pretty hard to catch because I would find the consequences rather than to the offending code. I got lucky after I enabled some weird options, such as 64Gb support and highmem debugging. Whether it played any role or not, the oops finally happened where the driver tried to erase memory pointed to by the stale phy->lo_control pointer. Now the situation is following. No more random crashes. There is still a crash if I rmmod the driver while wlan0 is up, but it's a separate issue, and it's easy to avoid (unlike the interface going down). I hope to look at it soon. The driver connects to a 802.11b Linksys router just fine. I can send and receive data. The driver is fully functional. 128-bit WEP is supported. There are periodic bursts of assertion failures. Looking at the driver, I see three places where lna a.k.a. phy->lo_gain[0] is assigned the value of 32 (written as 0x20 in one place). It's not surprising that it exceeds 7 in lo_measure_feedthrough(). I think the assert() should be replaced with a FIXME, which would not annoy end users so much. And while at that, it would be great to replace phy->lo_gain with four fields with descriptive names. phy->lo_gain is never used as an array. Alternatively, you could make it a structure within bcm43xx_phy. The problems with a MadWifi based AP turn out to be related to 802.11g. If the AP is configured for 802.11b only, everything is working. If 802.11g is enabled, strange things are happening. Judging by what's on the air, it looks like the driver loses the data frames is receives. wpa_supplicant connects instantly, but ARP and ping packets from AP to STA are lost. The frames are even acknowledged, but not seen on the station side. It takes from one to ten minutes util ping suddenly starts working. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-22 20:44 ` Pavel Roskin @ 2007-01-22 21:00 ` Michael Buesch 2007-01-22 22:04 ` Larry Finger 2007-01-23 6:14 ` Pavel Roskin 0 siblings, 2 replies; 19+ messages in thread From: Michael Buesch @ 2007-01-22 21:00 UTC (permalink / raw) To: Pavel Roskin; +Cc: bcm43xx-dev, netdev On Monday 22 January 2007 21:44, Pavel Roskin wrote: > Hello, Michael! > > On Mon, 2007-01-22 at 21:06 +0100, Michael Buesch wrote: > > It's obviously some stack/memory corruption. But I'm not > > sure if this is a stackoverflow. I'd rather say no, it isn't. > > > > Could probably be triggered by something like kfree()ing > > a dangling pointer or something... > > Yes. That's what my patch was for ("Fix major memory corruption bug"). > It was pretty hard to catch because I would find the consequences rather > than to the offending code. I got lucky after I enabled some weird > options, such as 64Gb support and highmem debugging. Whether it played > any role or not, the oops finally happened where the driver tried to > erase memory pointed to by the stale phy->lo_control pointer. > > Now the situation is following. > > No more random crashes. There is still a crash if I rmmod the driver > while wlan0 is up, but it's a separate issue, and it's easy to avoid > (unlike the interface going down). I hope to look at it soon. Did you apply that d80211 rmmod crash fix that Michael Wu posted recently. I bet it will fix your issue. > The driver connects to a 802.11b Linksys router just fine. I can send > and receive data. The driver is fully functional. 128-bit WEP is > supported. Nice. > There are periodic bursts of assertion failures. Looking at the driver, > I see three places where lna a.k.a. phy->lo_gain[0] is assigned the > value of 32 (written as 0x20 in one place). It's not surprising that it > exceeds 7 in lo_measure_feedthrough(). I know about these and I am going to fix that, soon. Ignore it for the time being, please. > I think the assert() should be replaced with a FIXME, which would not > annoy end users so much. Well, no. It's kind of: Michael, go ahead and fix that crap! So I'd like to keep it to get me to fix it. :D > And while at that, it would be great to > replace phy->lo_gain with four fields with descriptive names. > phy->lo_gain is never used as an array. Alternatively, you could make > it a structure within bcm43xx_phy. Yeah, one step after the other. ;) We didn't know the meanings of the values until recently. Of course I am going to rename them. > The problems with a MadWifi based AP turn out to be related to 802.11g. > If the AP is configured for 802.11b only, everything is working. If > 802.11g is enabled, strange things are happening. Judging by what's on > the air, it looks like the driver loses the data frames is receives. > wpa_supplicant connects instantly, but ARP and ping packets from AP to > STA are lost. The frames are even acknowledged, but not seen on the > station side. It takes from one to ten minutes util ping suddenly > starts working. Hm, is this 4318? It is known to loose lots of packets. -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-22 21:00 ` Michael Buesch @ 2007-01-22 22:04 ` Larry Finger 2007-01-23 6:14 ` Pavel Roskin 1 sibling, 0 replies; 19+ messages in thread From: Larry Finger @ 2007-01-22 22:04 UTC (permalink / raw) To: Michael Buesch; +Cc: Pavel Roskin, netdev, bcm43xx-dev Michael Buesch wrote: > On Monday 22 January 2007 21:44, Pavel Roskin wrote: >> The problems with a MadWifi based AP turn out to be related to 802.11g. >> If the AP is configured for 802.11b only, everything is working. If >> 802.11g is enabled, strange things are happening. Judging by what's on >> the air, it looks like the driver loses the data frames is receives. >> wpa_supplicant connects instantly, but ARP and ping packets from AP to >> STA are lost. The frames are even acknowledged, but not seen on the >> station side. It takes from one to ten minutes util ping suddenly >> starts working. > > Hm, is this 4318? It is known to loose lots of packets. On my 4311 with softmac, the throughput is increased by a factor of 6 by reducing the rate from the default 11M to 1M. Obviously the success rate is greatly improved. Perhaps the same effect will happen for 4318's. Does the d80211 version let you change the rate? Larry ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-22 21:00 ` Michael Buesch 2007-01-22 22:04 ` Larry Finger @ 2007-01-23 6:14 ` Pavel Roskin 2007-01-23 9:21 ` Michael Buesch 1 sibling, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-23 6:14 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev On Mon, 2007-01-22 at 22:00 +0100, Michael Buesch wrote: > > No more random crashes. There is still a crash if I rmmod the driver > > while wlan0 is up, but it's a separate issue, and it's easy to avoid > > (unlike the interface going down). I hope to look at it soon. > > Did you apply that d80211 rmmod crash fix that Michael Wu posted > recently. I bet it will fix your issue. I have tried the patch, and it doesn't fix the problem. It's a separate problem. It happens when bcm43xx_interrupt_handler() is called on a device that has already been removed. It looks like bcm43xx_wireless_core_stop() should be called from bcm43xx_one_core_detach(). Unfortunately, I cannot come to a satisfactory solution yet. If I call bcm43xx_wireless_core_stop() with the mutex held, the driver won't unload if the interface is down. If I don't hold the mutex, it would happen when the interface is up. By the way, I think it's a bad idea to unlock any mutexes or other locks set outside the function. The caller assumes that the lock is held until it (the caller) unlocks it. Unlocking locks from other functions breaks this convention. > > I think the assert() should be replaced with a FIXME, which would not > > annoy end users so much. > > Well, no. It's kind of: Michael, go ahead and fix that crap! > So I'd like to keep it to get me to fix it. :D I, for one, prefer to keep my to-do items in my to-do list, but I don't want to distract you with petty arguments from fixing the real problem. > > And while at that, it would be great to > > replace phy->lo_gain with four fields with descriptive names. > > phy->lo_gain is never used as an array. Alternatively, you could make > > it a structure within bcm43xx_phy. > > Yeah, one step after the other. ;) > We didn't know the meanings of the values until recently. Of course > I am going to rename them. Great! > > The problems with a MadWifi based AP turn out to be related to 802.11g. > > If the AP is configured for 802.11b only, everything is working. If > > 802.11g is enabled, strange things are happening. Judging by what's on > > the air, it looks like the driver loses the data frames is receives. > > wpa_supplicant connects instantly, but ARP and ping packets from AP to > > STA are lost. The frames are even acknowledged, but not seen on the > > station side. It takes from one to ten minutes util ping suddenly > > starts working. > > Hm, is this 4318? It is known to loose lots of packets. No, it's 4312. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-23 6:14 ` Pavel Roskin @ 2007-01-23 9:21 ` Michael Buesch [not found] ` <200701231021.34995.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Michael Buesch @ 2007-01-23 9:21 UTC (permalink / raw) To: Pavel Roskin; +Cc: bcm43xx-dev, netdev On Tuesday 23 January 2007 07:14, Pavel Roskin wrote: > On Mon, 2007-01-22 at 22:00 +0100, Michael Buesch wrote: > > > No more random crashes. There is still a crash if I rmmod the driver > > > while wlan0 is up, but it's a separate issue, and it's easy to avoid > > > (unlike the interface going down). I hope to look at it soon. > > > > Did you apply that d80211 rmmod crash fix that Michael Wu posted > > recently. I bet it will fix your issue. > > I have tried the patch, and it doesn't fix the problem. It's a separate > problem. It happens when bcm43xx_interrupt_handler() is called on a > device that has already been removed. That shouldn't happen and doesn't for me. > It looks like > bcm43xx_wireless_core_stop() should be called from > bcm43xx_one_core_detach(). No, well... . remove_interface should have been called by the stack, no? > Unfortunately, I cannot come to a satisfactory solution yet. If I call > bcm43xx_wireless_core_stop() with the mutex held, the driver won't > unload if the interface is down. If I don't hold the mutex, it would > happen when the interface is up. > > By the way, I think it's a bad idea to unlock any mutexes or other locks > set outside the function. The caller assumes that the lock is held > until it (the caller) unlocks it. Unlocking locks from other functions > breaks this convention. It would result in a deadlock, if we don't unlock it there. That's perfectly fine. > > > I think the assert() should be replaced with a FIXME, which would not > > > annoy end users so much. > > > > Well, no. It's kind of: Michael, go ahead and fix that crap! > > So I'd like to keep it to get me to fix it. :D > > I, for one, prefer to keep my to-do items in my to-do list, but I don't > want to distract you with petty arguments from fixing the real problem. Well, assert() statements are there to find bugs. And if there is a bug, they trigger. That's pretty much the semantics of an assert() statement. I'm not sure why you want to hide a bug. Either way, in this case it seems like the code is right and just the assert() mask is wrong. But that's only this way by luck. Could easily have been the other way around. ;) Specs were slightly wrong at this point. But as I said, I will commit a fix today. > > Hm, is this 4318? It is known to loose lots of packets. > > No, it's 4312. That has got the same problems. -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <200701231021.34995.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>]
* Re: Can someone please try... [not found] ` <200701231021.34995.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org> @ 2007-01-24 5:43 ` Pavel Roskin 2007-01-24 8:43 ` Michael Buesch 0 siblings, 1 reply; 19+ messages in thread From: Pavel Roskin @ 2007-01-24 5:43 UTC (permalink / raw) To: Michael Buesch Cc: netdev-u79uwXL29TY76Z2rM5mHXA, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w On Tue, 2007-01-23 at 10:21 +0100, Michael Buesch wrote: > On Tuesday 23 January 2007 07:14, Pavel Roskin wrote: > > I have tried the patch, and it doesn't fix the problem. It's a separate > > problem. It happens when bcm43xx_interrupt_handler() is called on a > > device that has already been removed. > > That shouldn't happen and doesn't for me. > > > It looks like > > bcm43xx_wireless_core_stop() should be called from > > bcm43xx_one_core_detach(). > > No, well... . remove_interface should have been called by the stack, no? It is not. It's called if I bring the interface down with ifconfig. If I remove live interface with "rmmod bcm43xx_d80211", bcm43xx_one_core_detach() is called first, followed by kernel panic in bcm43xx_interrupt_handler(). And that's what I see in the code. Module removal calls bcm43xx_exit(). It unregisters the ssb driver first. The ssb layer calls bcm43xx_remove(), which calls bcm43xx_one_core_detach() before doing anything with the wireless stack or with interrupts. I tried to put bcm43xx_one_core_detach() to the end of bcm43xx_remove(), but the result was the same. Still, I think the solution lies in that direction. We should stop the hardware before dismantling any data structures. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-24 5:43 ` Pavel Roskin @ 2007-01-24 8:43 ` Michael Buesch 0 siblings, 0 replies; 19+ messages in thread From: Michael Buesch @ 2007-01-24 8:43 UTC (permalink / raw) To: Pavel Roskin; +Cc: bcm43xx-dev, netdev On Wednesday 24 January 2007 06:43, Pavel Roskin wrote: > On Tue, 2007-01-23 at 10:21 +0100, Michael Buesch wrote: > > On Tuesday 23 January 2007 07:14, Pavel Roskin wrote: > > > I have tried the patch, and it doesn't fix the problem. It's a separate > > > problem. It happens when bcm43xx_interrupt_handler() is called on a > > > device that has already been removed. > > > > That shouldn't happen and doesn't for me. > > > > > It looks like > > > bcm43xx_wireless_core_stop() should be called from > > > bcm43xx_one_core_detach(). > > > > No, well... . remove_interface should have been called by the stack, no? > > It is not. It's called if I bring the interface down with ifconfig. If > I remove live interface with "rmmod bcm43xx_d80211", > bcm43xx_one_core_detach() is called first, followed by kernel panic in > bcm43xx_interrupt_handler(). > > And that's what I see in the code. Module removal calls bcm43xx_exit(). > It unregisters the ssb driver first. The ssb layer calls > bcm43xx_remove(), which calls bcm43xx_one_core_detach() before doing > anything with the wireless stack or with interrupts. > > I tried to put bcm43xx_one_core_detach() to the end of bcm43xx_remove(), > but the result was the same. Still, I think the solution lies in that > direction. We should stop the hardware before dismantling any data > structures. Ok, I see. I will try to debug this. -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 17:06 Can someone please try Michael Buesch 2007-01-16 18:29 ` Pavel Roskin @ 2007-01-16 19:00 ` Andreas Schwab 2007-01-16 19:24 ` Michael Buesch 1 sibling, 1 reply; 19+ messages in thread From: Andreas Schwab @ 2007-01-16 19:00 UTC (permalink / raw) To: Michael Buesch; +Cc: bcm43xx-dev, netdev Michael Buesch <mb@bu3sch.de> writes: > ...the bcm43xx driver in my tree with a 4318 chip? > The code there works excellent with my 4306 now, but I can't > get it to work with my 4318. Doesn't work for me either. I cannot get it to associate to the AP. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Can someone please try... 2007-01-16 19:00 ` Andreas Schwab @ 2007-01-16 19:24 ` Michael Buesch 0 siblings, 0 replies; 19+ messages in thread From: Michael Buesch @ 2007-01-16 19:24 UTC (permalink / raw) To: Andreas Schwab; +Cc: bcm43xx-dev, netdev On Tuesday 16 January 2007 20:00, Andreas Schwab wrote: > Michael Buesch <mb@bu3sch.de> writes: > > > ...the bcm43xx driver in my tree with a 4318 chip? > > The code there works excellent with my 4306 now, but I can't > > get it to work with my 4318. > > Doesn't work for me either. I cannot get it to associate to the AP. Ok, let's see. I found a few other bugs. But I can't make any promises when I'll find all of them. ;) -- Greetings Michael. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-01-24 8:44 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-16 17:06 Can someone please try Michael Buesch
2007-01-16 18:29 ` Pavel Roskin
2007-01-16 19:23 ` Michael Buesch
2007-01-16 21:50 ` Pavel Roskin
2007-01-16 22:07 ` Michael Buesch
2007-01-16 23:51 ` Pavel Roskin
2007-01-17 9:52 ` Michael Buesch
2007-01-18 9:41 ` Pavel Roskin
2007-01-19 7:54 ` Pavel Roskin
2007-01-22 20:06 ` Michael Buesch
2007-01-22 20:44 ` Pavel Roskin
2007-01-22 21:00 ` Michael Buesch
2007-01-22 22:04 ` Larry Finger
2007-01-23 6:14 ` Pavel Roskin
2007-01-23 9:21 ` Michael Buesch
[not found] ` <200701231021.34995.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
2007-01-24 5:43 ` Pavel Roskin
2007-01-24 8:43 ` Michael Buesch
2007-01-16 19:00 ` Andreas Schwab
2007-01-16 19:24 ` Michael Buesch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).