From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Buesch Subject: Re: Can someone please try... Date: Mon, 22 Jan 2007 21:06:24 +0100 Message-ID: <200701222106.24329.mb@bu3sch.de> References: <200701161806.02780.mb@bu3sch.de> <1169113274.10770.18.camel@dv> <1169193247.9908.34.camel@dv> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Cc: bcm43xx-dev@lists.berlios.de, netdev@vger.kernel.org Return-path: Received: from static-ip-62-75-166-246.inaddr.intergenia.de ([62.75.166.246]:54398 "EHLO vs166246.vserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751453AbXAVUHl (ORCPT ); Mon, 22 Jan 2007 15:07:41 -0500 To: Pavel Roskin In-Reply-To: <1169193247.9908.34.camel@dv> Content-Disposition: inline Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Friday 19 January 2007 08:54, Pavel Roskin wrote: > Hello, Michael! > > I did more testing, and the results are following. It looks like the > oopses and panics on i386 were triggered by 4k stacks. x86_64 doesn't > have this option. > > Now that I enabled other debug options on both platforms. but not 4k > stacks, I'm seeing exactly the same problem on each platform. When run > initially, wpa_supplicant connects with no problems (except very poor > reception of the data packets, but it's another story). If interrupted > and restarted, wpa_supplicant reconnects, but I'm getting messages like > this (i386): That's a very interresting discover. Partly, because I don't see this on my i386 machine. ;) It's obviously some stack/memory corruption. But I'm not sure if this is a stackoverflow. I'd rather say no, it isn't. Could probably be triggered by something like kfree()ing a dangling pointer or something... > Slab corruption: start=cfdaece0, len=1024 > Redzone: 0x5a2cf071/0x5a2cf071. > Last user: [](skb_release_data+0x7b/0x7f) > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Prev obj: start=cfdae8d4, len=1024 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [](device_create+0x2c/0x98) > 000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: ad 4e ad de ff ff ff ff ff ff ff ff 10 3a 6d c0 > Next obj: start=cfdaf0ec, len=1024 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [](expand_files+0x95/0x2c2) > 000: 78 55 39 c7 78 55 39 c7 78 55 39 c7 88 da 52 df > 010: d8 18 3b c7 00 00 00 00 00 00 00 00 00 00 00 00 > > and this (x86_64): > > Slab corruption: start=ffff81000ec8a198, len=1024 > Redzone: 0x5a2cf071/0x5a2cf071. > Last user: [](skb_release_data+0x94/0x99) > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Next obj: start=ffff81000ec8a5b0, len=1024 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [](device_create+0x5f/0x110) > 000: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > I can restart wpa_supplicant again, and it would show similar messages. > The first "Last user" is inevitably skb_release_data. > > I have no idea how to deal with it. I think I need a stack trace at the > time when skb_release_data is called. > > This is a stack trace at the time when slab corruption is detected. > It's actually incorrect closer to the top, perhaps from gcc > optimizations for static functions. > > Slab corruption: start=ffff8100066f81d8, len=1024 > > Call Trace: > [] vsnprintf+0x338/0x5a8 > [] check_poison_obj+0x69/0x1ae > [] _request_firmware+0x8f/0x326 > [] _request_firmware+0x8f/0x326 > > > [] cache_alloc_debugcheck_after+0x32/0x1a2 > [] _request_firmware+0x8f/0x326 > [] kmem_cache_zalloc+0xaf/0xd8 > [] _request_firmware+0x8f/0x326 > [] :bcm43xx_d80211:bcm43xx_phy_init_tssi2dbm_table > +0xf0/0x2ca > [] request_firmware+0xe/0x10 > [] :bcm43xx_d80211:bcm43xx_chip_init+0x96/0xaba > [] kmem_cache_alloc+0xaf/0xbe > [] :bcm43xx_d80211:bcm43xx_wireless_core_init > +0x4de/0xa3d > [] :bcm43xx_d80211:bcm43xx_add_interface+0x64/0xde > [] ieee80211_open+0x1c7/0x2cc > [] dev_open+0x36/0x76 > [] dev_change_flags+0x5d/0x122 > [] devinet_ioctl+0x259/0x5e8 > [] inet_ioctl+0x71/0x8f > [] sock_ioctl+0x1db/0x1fd > [] do_ioctl+0x1b/0x50 > [] vfs_ioctl+0x22a/0x23c > [] trace_hardirqs_on+0x124/0x14e > [] sys_ioctl+0x42/0x65 > [] system_call+0x7e/0x83 > > Anyway, I could narrow down this message to the first kzalloc() call in > fw_register_device(), file drivers/base/firmware_class.c. This only > seems to confirm my suspicion that the actual corruption happened before > this point. We are just hitting it when trying to allocate more memory. > > Help with debugging this problem will be appreciated. I've never hunted > down such problems, especially in kernel space. > -- Greetings Michael.