* RFC: Reproducible oops with lockdep on count_matching_names() @ 2007-11-01 19:17 Luis R. Rodriguez 2007-11-01 19:49 ` John W. Linville 2007-11-01 23:26 ` Michael Wu 0 siblings, 2 replies; 13+ messages in thread From: Luis R. Rodriguez @ 2007-11-01 19:17 UTC (permalink / raw) To: linux-wireless Cc: John W. Linville, Ingo Molnar, Peter Zijlstra, Johannes Berg mcgrof@pogo:~/devel/wireless-2.6$ git-describe v2.6.24-rc1-146-g2280253 So I hit segfault with lockdep on count_matching_names() on the strcmp() multiple times now. This is reproducible and with different wireless drivers. Essentially I have an ipw2200 built-in to my laptop so the driver always loads on bootup. Then I have a few cardbus cards. I've tested this with ath5k and with b43. If I do the following after bootup I always get a segfault: (ipw2200 loaded as I have the card built-in) --> Insert my ath5k card ---- OR ---- Insert b43 card mcgrof@pogo:~$ sudo rmmod ipw2200 mcgrof@pogo:~$ sudo rmmod ath5k ---- OR ---- sudo rmmod b43 mcgrof@pogo:~$ sudo modprobe ipw2200 Segmentation fault Below you'll find a few captured oops: ath5k + ipw2200 combo: **************************************************************** Nov 1 13:15:17 pogo kernel: pccard: CardBus card inserted into slot 0 Nov 1 13:15:17 pogo kernel: PCI: Enabling device 0000:15:00.0 (0000 -> 0002) Nov 1 13:15:17 pogo kernel: ACPI: PCI Interrupt 0000:15:00.0[A] -> GSI 16 (level, low) -> IRQ 16 Nov 1 13:15:17 pogo kernel: phy0: Selected rate control algorithm 'simple' Nov 1 13:15:17 pogo kernel: ath5k_pci 0000:15:00.0: Atheros AR5213A chip found: MAC 0x59, PHY: 0x43 Nov 1 13:15:17 pogo kernel: ath5k_pci 0000:15:00.0: RF5112A radio found (0x36) Nov 1 13:15:34 pogo kernel: ACPI: PCI interrupt for device 0000:14:02.0 disabled Nov 1 13:15:39 pogo kernel: ACPI: PCI interrupt for device 0000:15:00.0 disabled Nov 1 13:15:43 pogo kernel: ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, 1.2.2kmpr Nov 1 13:15:43 pogo kernel: ipw2200: Copyright(c) 2003-2006 Intel Corporation Nov 1 13:15:43 pogo kernel: ACPI: PCI Interrupt 0000:14:02.0[A] -> GSI 21 (level, low) -> IRQ 18 Nov 1 13:15:43 pogo kernel: ipw2200: Detected Intel PRO/Wireless 2915ABG Network Connection Nov 1 13:15:43 pogo kernel: BUG: unable to handle kernel paging request at virtual address f89ba359 Nov 1 13:15:43 pogo kernel: printing eip: c01be6e4 *pde = 02000067 *pte = 00000000 Nov 1 13:15:43 pogo kernel: Oops: 0000 [#1] Nov 1 13:15:43 pogo kernel: Modules linked in: ipw2200 arc4 ecb blkcipher cryptomgr crypto_algapi rc80211_simple mac80211 cfg80211 uinput thinkpad_acpi hwmon backlight nvram ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative dock snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_oss snd_seq_midi_event pcmcia crc32 snd_seq ieee80211 ieee80211_crypt snd_timer snd_seq_device firmware_class sg ehci_hcd uhci_hcd yenta_socket rsrc_nonstatic pcmcia_core sr_mod cdrom tg3 snd evdev usbcore rng_core rtc soundcore Nov 1 13:15:43 pogo kernel: Nov 1 13:15:43 pogo kernel: Pid: 2950, comm: modprobe Not tainted (2.6.24-rc1 #6) Nov 1 13:15:43 pogo kernel: EIP: 0060:[strcmp+9/29] EFLAGS: 00010086 CPU: 0 Nov 1 13:15:43 pogo kernel: EIP is at strcmp+0x9/0x1d Nov 1 13:15:43 pogo kernel: EAX: f89ba359 EBX: c044ce00 ECX: 00000000 EDX: f8941e70 Nov 1 13:15:43 pogo kernel: ESI: f89ba359 EDI: f8941e70 EBP: c2b3bce4 ESP: c2b3bcdc Nov 1 13:15:43 pogo kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Nov 1 13:15:43 pogo kernel: Process modprobe (pid: 2950, ti=c2b3a000 task=c3403010 task.ti=c2b3a000) Nov 1 13:15:43 pogo kernel: Stack: c044cfb8 00000000 c2b3bcf8 c013034d 000303b8 c044cfb8 00000002 c2b3bd58 Nov 1 13:15:43 pogo kernel: c01329b7 00000000 00000000 00000000 00000000 00000000 00000000 00000002 Nov 1 13:15:43 pogo kernel: 00000000 00000000 c352acdc 00000000 c3403010 00000000 c04d4580 00000000 Nov 1 13:15:43 pogo kernel: Call Trace: Nov 1 13:15:43 pogo kernel: [show_trace_log_lvl+26/47] show_trace_log_lvl+0x1a/0x2f Nov 1 13:15:43 pogo kernel: [show_stack_log_lvl+157/165] show_stack_log_lvl+0x9d/0xa5 Nov 1 13:15:43 pogo kernel: [show_registers+173/380] show_registers+0xad/0x17c Nov 1 13:15:43 pogo kernel: [die+245/454] die+0xf5/0x1c6 Nov 1 13:15:43 pogo kernel: [do_page_fault+1104/1335] do_page_fault+0x450/0x537 Nov 1 13:15:43 pogo kernel: [error_code+106/112] error_code+0x6a/0x70 Nov 1 13:15:43 pogo kernel: [count_matching_names+74/118] count_matching_names+0x4a/0x76 Nov 1 13:15:43 pogo kernel: [__lock_acquire+609/3102] __lock_acquire+0x261/0xc1e Nov 1 13:15:43 pogo kernel: [lock_acquire+120/145] lock_acquire+0x78/0x91 Nov 1 13:15:43 pogo kernel: [mutex_lock_nested+244/628] mutex_lock_nested+0xf4/0x274 Nov 1 13:15:43 pogo kernel: [<f8938e9d>] ipw_pci_probe+0x8aa/0xac6 [ipw2200] Nov 1 13:15:43 pogo kernel: [pci_device_probe+57/91] pci_device_probe+0x39/0x5b Nov 1 13:15:43 pogo kernel: [driver_probe_device+232/360] driver_probe_device+0xe8/0x168 Nov 1 13:15:43 pogo kernel: [__driver_attach+106/161] __driver_attach+0x6a/0xa1 Nov 1 13:15:43 pogo kernel: [bus_for_each_dev+54/91] bus_for_each_dev+0x36/0x5b Nov 1 13:15:43 pogo kernel: [driver_attach+25/27] driver_attach+0x19/0x1b Nov 1 13:15:43 pogo kernel: [bus_add_driver+115/426] bus_add_driver+0x73/0x1aa Nov 1 13:15:43 pogo kernel: [driver_register+103/108] driver_register+0x67/0x6c Nov 1 13:15:43 pogo kernel: [__pci_register_driver+86/131] __pci_register_driver+0x56/0x83 Nov 1 13:15:43 pogo kernel: [<f885a033>] ipw_init+0x33/0x78 [ipw2200] Nov 1 13:15:43 pogo kernel: [sys_init_module+4418/4706] sys_init_module+0x1142/0x1262 Nov 1 13:15:43 pogo kernel: [sysenter_past_esp+95/165] sysenter_past_esp+0x5f/0xa5 Nov 1 13:15:43 pogo kernel: ======================= Nov 1 13:15:43 pogo kernel: Code: ec 89 d0 83 c9 ff f2 ae 4f 8b 4d ec 49 78 06 ac aa 84 c0 75 f7 31 c0 aa 83 c4 0c 89 d8 5b 5e 5f 5d c3 55 89 e5 57 89 d7 56 89 c6 <ac> ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 5e 5f 5d c3 55 Nov 1 13:15:43 pogo kernel: EIP: [strcmp+9/29] strcmp+0x9/0x1d SS:ESP 0068:c2b3bcdc **************************************************************** b43 + ipw2200 combo: **************************************************************** Nov 1 13:52:34 pogo kernel: pccard: CardBus card inserted into slot 0 Nov 1 13:52:34 pogo kernel: PCI: Enabling device 0000:15:00.0 (0000 -> 0002) Nov 1 13:52:34 pogo kernel: ACPI: PCI Interrupt 0000:15:00.0[A] -> GSI 16 (level, low) -> IRQ 16 Nov 1 13:52:34 pogo kernel: PCI: Setting latency timer of device 0000:15:00.0 to 64 Nov 1 13:52:34 pogo kernel: ssb: Sonics Silicon Backplane found on PCI device 0000:15:00.0 Nov 1 13:52:35 pogo kernel: bcm43xx driver Nov 1 13:52:35 pogo kernel: b43-phy0: Broadcom 4318 WLAN found Nov 1 13:52:35 pogo kernel: phy0: Selected rate control algorithm 'simple' Nov 1 13:52:52 pogo kernel: ACPI: PCI interrupt for device 0000:14:02.0 disabled Nov 1 13:53:12 pogo kernel: ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, 1.2.2kmpr Nov 1 13:53:12 pogo kernel: ipw2200: Copyright(c) 2003-2006 Intel Corporation Nov 1 13:53:12 pogo kernel: ACPI: PCI Interrupt 0000:14:02.0[A] -> GSI 21 (level, low) -> IRQ 18 Nov 1 13:53:12 pogo kernel: ipw2200: Detected Intel PRO/Wireless 2915ABG Network Connection Nov 1 13:53:12 pogo kernel: BUG: unable to handle kernel paging request at virtual address f8bbda82 Nov 1 13:53:12 pogo kernel: printing eip: c01be6e4 *pde = 02000067 *pte = 00000000 Nov 1 13:53:12 pogo kernel: Oops: 0000 [#1] Nov 1 13:53:12 pogo kernel: Modules linked in: ipw2200 arc4 ecb blkcipher cryptomgr crypto_algapi rc80211_simple mac80211 cfg80211 bcm43xx ieee80211softmac ssb uinput thinkpad_acpi hwmon backlight nvram ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative dock snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_oss snd_seq_midi_event pcmcia crc32 snd_seq snd_timer snd_seq_device ehci_hcd uhci_hcd ieee80211 ieee80211_crypt sg firmware_class yenta_socket rsrc_nonstatic pcmcia_core sr_mod cdrom tg3 snd usbcore rng_core evdev rtc soundcore Nov 1 13:53:12 pogo kernel: Nov 1 13:53:12 pogo kernel: Pid: 2970, comm: modprobe Not tainted (2.6.24-rc1 #7) Nov 1 13:53:12 pogo kernel: EIP: 0060:[strcmp+9/29] EFLAGS: 00010086 CPU: 0 Nov 1 13:53:12 pogo kernel: EIP is at strcmp+0x9/0x1d Nov 1 13:53:12 pogo kernel: EAX: f8bbda82 EBX: c044d094 ECX: 00000000 EDX: f88e5e70 Nov 1 13:53:12 pogo kernel: ESI: f8bbda82 EDI: f88e5e70 EBP: c3483ce4 ESP: c3483cdc Nov 1 13:53:12 pogo kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Nov 1 13:53:12 pogo kernel: Process modprobe (pid: 2970, ti=c3482000 task=c341e5f0 task.ti=c3482000) Nov 1 13:53:12 pogo kernel: Stack: c044d328 00000000 c3483cf8 c013034d 00030728 c044d328 00000002 c3483d58 Nov 1 13:53:12 pogo kernel: c01329b7 00000000 00000000 00000000 00000000 00000000 00000000 00000002 Nov 1 13:53:12 pogo kernel: 00000000 00000000 c2a92cdc 00000000 c341e5f0 00000000 c04d3d80 00000000 Nov 1 13:53:12 pogo kernel: Call Trace: Nov 1 13:53:12 pogo kernel: [show_trace_log_lvl+26/47] show_trace_log_lvl+0x1a/0x2f Nov 1 13:53:12 pogo kernel: [show_stack_log_lvl+157/165] show_stack_log_lvl+0x9d/0xa5 Nov 1 13:53:12 pogo kernel: [show_registers+173/380] show_registers+0xad/0x17c Nov 1 13:53:12 pogo kernel: [die+245/454] die+0xf5/0x1c6 Nov 1 13:53:12 pogo kernel: [do_page_fault+1104/1335] do_page_fault+0x450/0x537 Nov 1 13:53:12 pogo kernel: [error_code+106/112] error_code+0x6a/0x70 Nov 1 13:53:12 pogo kernel: [count_matching_names+74/118] count_matching_names+0x4a/0x76 Nov 1 13:53:12 pogo kernel: [__lock_acquire+609/3102] __lock_acquire+0x261/0xc1e Nov 1 13:53:12 pogo kernel: [lock_acquire+120/145] lock_acquire+0x78/0x91 Nov 1 13:53:12 pogo kernel: [mutex_lock_nested+244/628] mutex_lock_nested+0xf4/0x274 Nov 1 13:53:12 pogo kernel: [<f88dce9d>] ipw_pci_probe+0x8aa/0xac6 [ipw2200] Nov 1 13:53:12 pogo kernel: [pci_device_probe+57/91] pci_device_probe+0x39/0x5b Nov 1 13:53:12 pogo kernel: [driver_probe_device+232/360] driver_probe_device+0xe8/0x168 Nov 1 13:53:12 pogo kernel: [__driver_attach+106/161] __driver_attach+0x6a/0xa1 Nov 1 13:53:12 pogo kernel: [bus_for_each_dev+54/91] bus_for_each_dev+0x36/0x5b Nov 1 13:53:12 pogo kernel: [driver_attach+25/27] driver_attach+0x19/0x1b Nov 1 13:53:12 pogo kernel: [bus_add_driver+115/426] bus_add_driver+0x73/0x1aa Nov 1 13:53:12 pogo kernel: [driver_register+103/108] driver_register+0x67/0x6c Nov 1 13:53:12 pogo kernel: [__pci_register_driver+86/131] __pci_register_driver+0x56/0x83 Nov 1 13:53:12 pogo kernel: [<f8834033>] ipw_init+0x33/0x78 [ipw2200] Nov 1 13:53:12 pogo kernel: [sys_init_module+4418/4706] sys_init_module+0x1142/0x1262 Nov 1 13:53:12 pogo kernel: [sysenter_past_esp+95/165] sysenter_past_esp+0x5f/0xa5 Nov 1 13:53:12 pogo kernel: ======================= Nov 1 13:53:12 pogo kernel: Code: ec 89 d0 83 c9 ff f2 ae 4f 8b 4d ec 49 78 06 ac aa 84 c0 75 f7 31 c0 aa 83 c4 0c 89 d8 5b 5e 5f 5d c3 55 89 e5 57 89 d7 56 89 c6 <ac> ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 5e 5f 5d c3 55 Nov 1 13:53:12 pogo kernel: EIP: [strcmp+9/29] strcmp+0x9/0x1d SS:ESP 0068:c3483cdc **************************************************************** So I started reviewing the probes on each driver and came up with this patch because Documenation/pci.txt has: "The device driver needs to call pci_request_region() to verify no other device is already using the same address resource. Conversely, drivers should call pci_release_region() AFTER calling pci_disable_device(). The idea is to prevent two devices colliding on the same address range" Most wireless drivers do this backwards, we tend to call pci_release_region() BEFORE pci_disable_device() as when you probe you first pci_enable_device() and then pci_request_region(). Anyway so I tried the following patch, but no I still get the same oops. I'll have to review more the probe/remove paths. Any ideas? Changes to base.c Changes-licensed-under: 3-clause-BSD Signed-off-by: Luis R. Rodriguez <mcgrof@gmail.com> --- diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c index 15ae868..d4fff45 100644 --- a/drivers/net/wireless/ath5k/base.c +++ b/drivers/net/wireless/ath5k/base.c @@ -602,10 +602,10 @@ err_free: ieee80211_free_hw(hw); err_map: pci_iounmap(pdev, mem); -err_reg: - pci_release_region(pdev, 0); err_dis: pci_disable_device(pdev); +err_reg: + pci_release_region(pdev, 0); err: return ret; } @@ -621,8 +621,8 @@ ath5k_pci_remove(struct pci_dev *pdev) free_irq(pdev->irq, sc); pci_disable_msi(pdev); pci_iounmap(pdev, sc->iobase); - pci_release_region(pdev, 0); pci_disable_device(pdev); + pci_release_region(pdev, 0); ieee80211_free_hw(hw); } diff --git a/drivers/net/wireless/ipw2200.c b/drivers/net/wireless/ipw2200.c index 54f44e5..47af1f2 100644 --- a/drivers/net/wireless/ipw2200.c +++ b/drivers/net/wireless/ipw2200.c @@ -11756,10 +11756,10 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) priv->workqueue = NULL; out_iounmap: iounmap(priv->hw_base); - out_pci_release_regions: - pci_release_regions(pdev); out_pci_disable_device: pci_disable_device(pdev); + out_pci_release_regions: + pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); out_free_ieee80211: free_ieee80211(priv->net_dev); @@ -11824,8 +11824,8 @@ static void ipw_pci_remove(struct pci_dev *pdev) free_irq(pdev->irq, priv); iounmap(priv->hw_base); - pci_release_regions(pdev); pci_disable_device(pdev); + pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); free_ieee80211(priv->net_dev); free_firmware(); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-01 19:17 RFC: Reproducible oops with lockdep on count_matching_names() Luis R. Rodriguez @ 2007-11-01 19:49 ` John W. Linville 2007-11-01 21:29 ` Luis R. Rodriguez 2007-11-01 23:26 ` Michael Wu 1 sibling, 1 reply; 13+ messages in thread From: John W. Linville @ 2007-11-01 19:49 UTC (permalink / raw) To: Luis R. Rodriguez Cc: linux-wireless, Ingo Molnar, Peter Zijlstra, Johannes Berg On Thu, Nov 01, 2007 at 03:17:16PM -0400, Luis R. Rodriguez wrote: > So I started reviewing the probes on each driver and came up with this > patch because Documenation/pci.txt has: > > "The device driver needs to call pci_request_region() to verify > no other device is already using the same address resource. > Conversely, drivers should call pci_release_region() AFTER > calling pci_disable_device(). The idea is to prevent two devices > colliding on the same address range" No idea off the top of my head if this relates to the problem or not... > --- a/drivers/net/wireless/ath5k/base.c > +++ b/drivers/net/wireless/ath5k/base.c > @@ -602,10 +602,10 @@ err_free: > ieee80211_free_hw(hw); > err_map: > pci_iounmap(pdev, mem); > -err_reg: > - pci_release_region(pdev, 0); > err_dis: > pci_disable_device(pdev); > +err_reg: > + pci_release_region(pdev, 0); > err: > return ret; > } If you do this, don't you need to change any "goto err_reg" to "goto err_dis" as well? > --- a/drivers/net/wireless/ipw2200.c > +++ b/drivers/net/wireless/ipw2200.c > @@ -11756,10 +11756,10 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > priv->workqueue = NULL; > out_iounmap: > iounmap(priv->hw_base); > - out_pci_release_regions: > - pci_release_regions(pdev); > out_pci_disable_device: > pci_disable_device(pdev); > + out_pci_release_regions: > + pci_release_regions(pdev); > pci_set_drvdata(pdev, NULL); > out_free_ieee80211: > free_ieee80211(priv->net_dev); Same as last comment, but for out_pci_release_regions and out_pci_disable_device. John -- John W. Linville linville@tuxdriver.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-01 19:49 ` John W. Linville @ 2007-11-01 21:29 ` Luis R. Rodriguez 0 siblings, 0 replies; 13+ messages in thread From: Luis R. Rodriguez @ 2007-11-01 21:29 UTC (permalink / raw) To: John W. Linville Cc: linux-wireless, Ingo Molnar, Peter Zijlstra, Johannes Berg On Thu, Nov 01, 2007 at 03:49:09PM -0400, John W. Linville wrote: > On Thu, Nov 01, 2007 at 03:17:16PM -0400, Luis R. Rodriguez wrote: > > > So I started reviewing the probes on each driver and came up with this > > patch because Documenation/pci.txt has: > > > > "The device driver needs to call pci_request_region() to verify > > no other device is already using the same address resource. > > Conversely, drivers should call pci_release_region() AFTER > > calling pci_disable_device(). The idea is to prevent two devices > > colliding on the same address range" > > No idea off the top of my head if this relates to the problem or not... > > > --- a/drivers/net/wireless/ath5k/base.c > > +++ b/drivers/net/wireless/ath5k/base.c > > @@ -602,10 +602,10 @@ err_free: > > ieee80211_free_hw(hw); > > err_map: > > pci_iounmap(pdev, mem); > > -err_reg: > > - pci_release_region(pdev, 0); > > err_dis: > > pci_disable_device(pdev); > > +err_reg: > > + pci_release_region(pdev, 0); > > err: > > return ret; > > } > > If you do this, don't you need to change any "goto err_reg" to "goto > err_dis" as well? > > > --- a/drivers/net/wireless/ipw2200.c > > +++ b/drivers/net/wireless/ipw2200.c > > @@ -11756,10 +11756,10 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > > priv->workqueue = NULL; > > out_iounmap: > > iounmap(priv->hw_base); > > - out_pci_release_regions: > > - pci_release_regions(pdev); > > out_pci_disable_device: > > pci_disable_device(pdev); > > + out_pci_release_regions: > > + pci_release_regions(pdev); > > pci_set_drvdata(pdev, NULL); > > out_free_ieee80211: > > free_ieee80211(priv->net_dev); > > Same as last comment, but for out_pci_release_regions and > out_pci_disable_device. Yeah, you're right, duh, here it is with some more changes. Anyway this still doesn't fix it but it does address the documenation. Any more ideas? Changes to base.c Changes-licensed-under: 3-clause-BSD Signed-off-by: Luis R. Rodriguez <mcgrof@gmail.com> --- diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c index 15ae868..cea17ce 100644 --- a/drivers/net/wireless/ath5k/base.c +++ b/drivers/net/wireless/ath5k/base.c @@ -453,14 +453,15 @@ ath5k_pci_probe(struct pci_dev *pdev, ret = pci_enable_device(pdev); if (ret) { dev_err(&pdev->dev, "can't enable device\n"); - goto err; + return ret; } /* XXX 32-bit addressing only */ ret = pci_set_dma_mask(pdev, DMA_32BIT_MASK); if (ret) { dev_err(&pdev->dev, "32-bit DMA not available\n"); - goto err_dis; + pci_disable_device(pdev); + return ret; } /* @@ -498,14 +499,15 @@ ath5k_pci_probe(struct pci_dev *pdev, ret = pci_request_region(pdev, 0, "ath5k"); if (ret) { dev_err(&pdev->dev, "cannot reserve PCI memory region\n"); - goto err_dis; + pci_disable_device(pdev); + return ret; } mem = pci_iomap(pdev, 0, 0); if (!mem) { dev_err(&pdev->dev, "cannot remap PCI memory region\n") ; ret = -EIO; - goto err_reg; + goto err_dis; } /* @@ -602,11 +604,9 @@ err_free: ieee80211_free_hw(hw); err_map: pci_iounmap(pdev, mem); -err_reg: - pci_release_region(pdev, 0); err_dis: pci_disable_device(pdev); -err: + pci_release_region(pdev, 0); return ret; } @@ -621,8 +621,8 @@ ath5k_pci_remove(struct pci_dev *pdev) free_irq(pdev->irq, sc); pci_disable_msi(pdev); pci_iounmap(pdev, sc->iobase); - pci_release_region(pdev, 0); pci_disable_device(pdev); + pci_release_region(pdev, 0); ieee80211_free_hw(hw); } diff --git a/drivers/net/wireless/ipw2200.c b/drivers/net/wireless/ipw2200.c index 54f44e5..7f2ea6d 100644 --- a/drivers/net/wireless/ipw2200.c +++ b/drivers/net/wireless/ipw2200.c @@ -11611,8 +11611,7 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) net_dev = alloc_ieee80211(sizeof(struct ipw_priv)); if (net_dev == NULL) { - err = -ENOMEM; - goto out; + return -ENOMEM; } priv = ieee80211_priv(net_dev); @@ -11628,8 +11627,8 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) mutex_init(&priv->mutex); if (pci_enable_device(pdev)) { - err = -ENODEV; - goto out_free_ieee80211; + free_ieee80211(priv->net_dev); + return -ENODEV; } pci_set_master(pdev); @@ -11639,14 +11638,14 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) err = pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK); if (err) { printk(KERN_WARNING DRV_NAME ": No suitable DMA available.\n"); - goto out_pci_disable_device; + goto out_pci_disable_device_end; } pci_set_drvdata(pdev, priv); err = pci_request_regions(pdev, DRV_NAME); if (err) - goto out_pci_disable_device; + goto out_pci_disable_device_end; /* We disable the RETRY_TIMEOUT register (0x41) to keep * PCI Tx retries from interfering with C3 CPU state */ @@ -11660,7 +11659,7 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) base = ioremap_nocache(pci_resource_start(pdev, 0), length); if (!base) { err = -ENODEV; - goto out_pci_release_regions; + goto out_pci_disable_device; } priv->hw_base = base; @@ -11756,14 +11755,15 @@ static int ipw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) priv->workqueue = NULL; out_iounmap: iounmap(priv->hw_base); - out_pci_release_regions: - pci_release_regions(pdev); out_pci_disable_device: pci_disable_device(pdev); + pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); - out_free_ieee80211: free_ieee80211(priv->net_dev); - out: + return err; + out_pci_disable_device_end: /* We don't release regions here */ + pci_disable_device(pdev); + free_ieee80211(priv->net_dev); return err; } @@ -11824,8 +11824,8 @@ static void ipw_pci_remove(struct pci_dev *pdev) free_irq(pdev->irq, priv); iounmap(priv->hw_base); - pci_release_regions(pdev); pci_disable_device(pdev); + pci_release_regions(pdev); pci_set_drvdata(pdev, NULL); free_ieee80211(priv->net_dev); free_firmware(); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-01 19:17 RFC: Reproducible oops with lockdep on count_matching_names() Luis R. Rodriguez 2007-11-01 19:49 ` John W. Linville @ 2007-11-01 23:26 ` Michael Wu 2007-11-02 10:58 ` Peter Zijlstra 1 sibling, 1 reply; 13+ messages in thread From: Michael Wu @ 2007-11-01 23:26 UTC (permalink / raw) To: Luis R. Rodriguez Cc: linux-wireless, John W. Linville, Ingo Molnar, Peter Zijlstra, Johannes Berg, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1631 bytes --] On Thursday 01 November 2007 15:17:16 Luis R. Rodriguez wrote: > mcgrof@pogo:~/devel/wireless-2.6$ git-describe > v2.6.24-rc1-146-g2280253 > > So I hit segfault with lockdep on count_matching_names() on the > strcmp() multiple times now. This is reproducible and with different > wireless drivers. > I've found the problem. It appears to be in lockdep. struct lock_class has a const char *name field which points to a statically allocated string that comes from the code which uses the lock. If that code/string is in a module and gets unloaded, the pointer in |name| is no longer valid. Next time this field is dereferenced (count_matching_names, in this case), we crash. The following patch fixes the issue but there's probably a better way. -Michael Wu --- diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 4c4d236..2aa0d35 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -114,7 +114,7 @@ struct lock_class { */ unsigned long ops; - const char *name; + char name[128]; int name_version; #ifdef CONFIG_LOCK_STAT diff --git a/kernel/lockdep.c b/kernel/lockdep.c index 55fe0c7..63c4d8f 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -768,7 +768,7 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force) class = lock_classes + nr_lock_classes++; debug_atomic_inc(&nr_unused_locks); class->key = key; - class->name = lock->name; + strcpy(class->name, lock->name); class->subclass = subclass; INIT_LIST_HEAD(&class->lock_entry); INIT_LIST_HEAD(&class->locks_before); [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 194 bytes --] ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-01 23:26 ` Michael Wu @ 2007-11-02 10:58 ` Peter Zijlstra 2007-11-03 19:58 ` Luis R. Rodriguez 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2007-11-02 10:58 UTC (permalink / raw) To: Michael Wu Cc: Luis R. Rodriguez, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel On Thu, 2007-11-01 at 19:26 -0400, Michael Wu wrote: > On Thursday 01 November 2007 15:17:16 Luis R. Rodriguez wrote: > > mcgrof@pogo:~/devel/wireless-2.6$ git-describe > > v2.6.24-rc1-146-g2280253 > > > > So I hit segfault with lockdep on count_matching_names() on the > > strcmp() multiple times now. This is reproducible and with different > > wireless drivers. > > > I've found the problem. It appears to be in lockdep. struct lock_class has a > const char *name field which points to a statically allocated string that > comes from the code which uses the lock. If that code/string is in a module > and gets unloaded, the pointer in |name| is no longer valid. Next time this > field is dereferenced (count_matching_names, in this case), we crash. > > The following patch fixes the issue but there's probably a better way. Thanks, and indeed. From my understanding lockdep_free_key_range() should destroy all classes of a module on module unload. So I'm not quite sure what has gone wrong here.. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-02 10:58 ` Peter Zijlstra @ 2007-11-03 19:58 ` Luis R. Rodriguez 2007-11-03 20:06 ` Michael Buesch 0 siblings, 1 reply; 13+ messages in thread From: Luis R. Rodriguez @ 2007-11-03 19:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev, Michael Buesch On 11/2/07, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, 2007-11-01 at 19:26 -0400, Michael Wu wrote: > > On Thursday 01 November 2007 15:17:16 Luis R. Rodriguez wrote: > > > mcgrof@pogo:~/devel/wireless-2.6$ git-describe > > > v2.6.24-rc1-146-g2280253 > > > > > > So I hit segfault with lockdep on count_matching_names() on the > > > strcmp() multiple times now. This is reproducible and with different > > > wireless drivers. > > > > > I've found the problem. It appears to be in lockdep. struct lock_class has a > > const char *name field which points to a statically allocated string that > > comes from the code which uses the lock. If that code/string is in a module > > and gets unloaded, the pointer in |name| is no longer valid. Next time this > > field is dereferenced (count_matching_names, in this case), we crash. > > > > The following patch fixes the issue but there's probably a better way. > > Thanks, and indeed. From my understanding lockdep_free_key_range() > should destroy all classes of a module on module unload. > > So I'm not quite sure what has gone wrong here.. I've tried digging more and just am still not sure what caused this. At first I thought perhaps all_lock_classes list had some element not yet removed as lockdep_free_key_range() iterates over the hash tables but this doesn't seem to be the case. I was using SLAB and ran into other strange oops, as the one below, but after switching to SLUB, after Michael Buesch's suggestion that one went away... The lockdep segfault is still present, however. Just not sure what's going on. Any ideas? ----- oops with slab, not reproducible with slub: mcgrof@pogo:~$ sudo rmmod tg3 mcgrof@pogo:~$ sudo rmmod sr_mod *** dmesg -c ACPI: PCI interrupt for device 0000:02:00.0 disabled BUG: unable to handle kernel paging request at virtual address f88a4a05 printing eip: f88a4a05 *pde = 02000067 *pte = 00000000 Oops: 0000 [#1] Modules linked in: sr_mod uinput thinkpad_acpi hwmon backlight nvram ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative dock arc4 ecb blkcipher cryptomgr crypto_algapi rc80211_simple ath5k mac80211 cfg80211 pcmcia crc32 snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_oss ipw2200 snd_seq_midi_event ieee80211 ieee80211_crypt sg ehci_hcd uhci_hcd yenta_socket rsrc_nonstatic snd_seq snd_timer snd_seq_device firmware_class cdrom pcmcia_core usbcore evdev rng_core rtc snd soundcore Pid: 2908, comm: modprobe Not tainted (2.6.24-rc1 #18) EIP: 0060:[<f88a4a05>] EFLAGS: 00010086 CPU: 0 EIP is at 0xf88a4a05 EAX: c20b75c8 EBX: c2f86f38 ECX: f88a4a05 EDX: c2f86f38 ESI: c20b75c8 EDI: c2f89c00 EBP: c3897bfc ESP: c3897be0 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process modprobe (pid: 2908, ti=c3896000 task=c3935150 task.ti=c3896000) Stack: c01b2afc c2f82d98 c3897bf4 c01ba8b6 c2f86f38 c20b75c8 c2f82c00 c3897c24 c02186dd c2f86f38 c3897c24 c01b54c0 c20b75c8 00000001 c20b75c8 c2f86f38 c20b75c8 c3897c30 c01b54ed 00000001 c3897c54 c01b556c 00000001 c3897cd4 Call Trace: [<c0104cec>] show_trace_log_lvl+0x1a/0x2f [<c0104d9e>] show_stack_log_lvl+0x9d/0xa5 [<c0104e53>] show_registers+0xad/0x17c [<c0105017>] die+0xf5/0x1c6 [<c0112715>] do_page_fault+0x450/0x537 [<c02a835a>] error_code+0x6a/0x70 [<c02186dd>] scsi_request_fn+0x5f/0x2ec [<c01b54ed>] __generic_unplug_device+0x20/0x23 [<c01b556c>] blk_execute_rq_nowait+0x7c/0x8f [<c01b69e5>] blk_execute_rq+0xb1/0xcf [<c0217f53>] scsi_execute+0xc4/0xd7 [<c0218014>] scsi_execute_req+0xae/0xcb [<f885f571>] sr_probe+0x1d5/0x557 [sr_mod] [<c020fd33>] driver_probe_device+0xe8/0x168 [<c020fec9>] __driver_attach+0x6a/0xa1 [<c020f271>] bus_for_each_dev+0x36/0x5b [<c020fb7f>] driver_attach+0x19/0x1b [<c020f556>] bus_add_driver+0x73/0x1aa [<c02100a5>] driver_register+0x67/0x6c [<c021b4f8>] scsi_register_driver+0xf/0x11 [<f8863023>] init_sr+0x23/0x3d [sr_mod] [<c013a461>] sys_init_module+0x1142/0x1262 [<c0103d7e>] sysenter_past_esp+0x5f/0xa5 ======================= Code: Bad EIP value. EIP: [<f88a4a05>] 0xf88a4a05 SS:ESP 0068:c3897be0 Luis ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-03 19:58 ` Luis R. Rodriguez @ 2007-11-03 20:06 ` Michael Buesch 2007-11-05 12:00 ` Peter Zijlstra 0 siblings, 1 reply; 13+ messages in thread From: Michael Buesch @ 2007-11-03 20:06 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Peter Zijlstra, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev On Saturday 03 November 2007 20:58:09 Luis R. Rodriguez wrote: > I was using SLAB and ran into other strange oops, as the one below, > but after switching to SLUB, after Michael Buesch's suggestion that > one went away... The lockdep segfault is still present, however. Who is responsible for slab btw? I mean, someone should be interested in getting this bug fixed. :) When using slab I see random corruptions. I think related to rmmod, but I'm not sure. I don't see this with slub. -- Greetings Michael. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-03 20:06 ` Michael Buesch @ 2007-11-05 12:00 ` Peter Zijlstra 2007-11-05 12:23 ` Pekka Enberg 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2007-11-05 12:00 UTC (permalink / raw) To: Michael Buesch Cc: Luis R. Rodriguez, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev, Pekka Enberg, Christoph Lameter On Sat, 2007-11-03 at 21:06 +0100, Michael Buesch wrote: > On Saturday 03 November 2007 20:58:09 Luis R. Rodriguez wrote: > > I was using SLAB and ran into other strange oops, as the one below, > > but after switching to SLUB, after Michael Buesch's suggestion that > > one went away... The lockdep segfault is still present, however. > > Who is responsible for slab btw? > I mean, someone should be interested in getting this bug fixed. :) > When using slab I see random corruptions. I think related to rmmod, but > I'm not sure. I don't see this with slub. Pekka and Christoph do most SLAB work. the snipped oops: > ----- oops with slab, not reproducible with slub: > > mcgrof@pogo:~$ sudo rmmod tg3 > mcgrof@pogo:~$ sudo rmmod sr_mod > > *** dmesg -c > > ACPI: PCI interrupt for device 0000:02:00.0 disabled > BUG: unable to handle kernel paging request at virtual address f88a4a05 > printing eip: f88a4a05 *pde = 02000067 *pte = 00000000 > Oops: 0000 [#1] > Modules linked in: sr_mod uinput thinkpad_acpi hwmon backlight nvram > ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand > cpufreq_conservative dock arc4 ecb blkcipher cryptomgr crypto_algapi > rc80211_simple ath5k mac80211 cfg80211 pcmcia crc32 snd_hda_intel > snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_oss > ipw2200 snd_seq_midi_event ieee80211 ieee80211_crypt sg ehci_hcd > uhci_hcd yenta_socket rsrc_nonstatic snd_seq snd_timer snd_seq_device > firmware_class cdrom pcmcia_core usbcore evdev rng_core rtc snd > soundcore > > Pid: 2908, comm: modprobe Not tainted (2.6.24-rc1 #18) > EIP: 0060:[<f88a4a05>] EFLAGS: 00010086 CPU: 0 > EIP is at 0xf88a4a05 > EAX: c20b75c8 EBX: c2f86f38 ECX: f88a4a05 EDX: c2f86f38 > ESI: c20b75c8 EDI: c2f89c00 EBP: c3897bfc ESP: c3897be0 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process modprobe (pid: 2908, ti=c3896000 task=c3935150 task.ti=c3896000) > Stack: c01b2afc c2f82d98 c3897bf4 c01ba8b6 c2f86f38 c20b75c8 c2f82c00 c3897c24 > c02186dd c2f86f38 c3897c24 c01b54c0 c20b75c8 00000001 c20b75c8 c2f86f38 > c20b75c8 c3897c30 c01b54ed 00000001 c3897c54 c01b556c 00000001 c3897cd4 > Call Trace: > [<c0104cec>] show_trace_log_lvl+0x1a/0x2f > [<c0104d9e>] show_stack_log_lvl+0x9d/0xa5 > [<c0104e53>] show_registers+0xad/0x17c > [<c0105017>] die+0xf5/0x1c6 > [<c0112715>] do_page_fault+0x450/0x537 > [<c02a835a>] error_code+0x6a/0x70 > [<c02186dd>] scsi_request_fn+0x5f/0x2ec > [<c01b54ed>] __generic_unplug_device+0x20/0x23 > [<c01b556c>] blk_execute_rq_nowait+0x7c/0x8f > [<c01b69e5>] blk_execute_rq+0xb1/0xcf > [<c0217f53>] scsi_execute+0xc4/0xd7 > [<c0218014>] scsi_execute_req+0xae/0xcb > [<f885f571>] sr_probe+0x1d5/0x557 [sr_mod] > [<c020fd33>] driver_probe_device+0xe8/0x168 > [<c020fec9>] __driver_attach+0x6a/0xa1 > [<c020f271>] bus_for_each_dev+0x36/0x5b > [<c020fb7f>] driver_attach+0x19/0x1b > [<c020f556>] bus_add_driver+0x73/0x1aa > [<c02100a5>] driver_register+0x67/0x6c > [<c021b4f8>] scsi_register_driver+0xf/0x11 > [<f8863023>] init_sr+0x23/0x3d [sr_mod] > [<c013a461>] sys_init_module+0x1142/0x1262 > [<c0103d7e>] sysenter_past_esp+0x5f/0xa5 > ======================= > Code: Bad EIP value. > EIP: [<f88a4a05>] 0xf88a4a05 SS:ESP 0068:c3897be0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-05 12:00 ` Peter Zijlstra @ 2007-11-05 12:23 ` Pekka Enberg 2007-11-05 13:03 ` Michael Buesch 0 siblings, 1 reply; 13+ messages in thread From: Pekka Enberg @ 2007-11-05 12:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael Buesch, Luis R. Rodriguez, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev, Christoph Lameter Hi Michael, On Sat, 2007-11-03 at 21:06 +0100, Michael Buesch wrote: > Who is responsible for slab btw? > I mean, someone should be interested in getting this bug fixed. :) > When using slab I see random corruptions. I think related to rmmod, but > I'm not sure. I don't see this with slub. Is CONFIG_DEBUG_SLAB enabled? Usually these kind of random corruptions are caused by someone passing a bad pointer to kfree() or kmem_cache_free(). Pekka ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-05 12:23 ` Pekka Enberg @ 2007-11-05 13:03 ` Michael Buesch 2007-11-05 13:56 ` Pekka Enberg 0 siblings, 1 reply; 13+ messages in thread From: Michael Buesch @ 2007-11-05 13:03 UTC (permalink / raw) To: Pekka Enberg Cc: Peter Zijlstra, Luis R. Rodriguez, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev, Christoph Lameter On Monday 05 November 2007 13:23:50 Pekka Enberg wrote: > Hi Michael, > > On Sat, 2007-11-03 at 21:06 +0100, Michael Buesch wrote: > > Who is responsible for slab btw? > > I mean, someone should be interested in getting this bug fixed. :) > > When using slab I see random corruptions. I think related to rmmod, but > > I'm not sure. I don't see this with slub. > > Is CONFIG_DEBUG_SLAB enabled? Usually these kind of random corruptions > are caused by someone passing a bad pointer to kfree() or > kmem_cache_free(). Yeah. What I also saw was random "one-bit-errors" once and then on rmmod of modules. I have absolutely no idea how they were caused, though (I read the freeing codes of the stuff hundreds of times). I don't have any of the oops messages anymore. But I do _not_ see this behaviour with slub anymore. -- Greetings Michael. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-05 13:03 ` Michael Buesch @ 2007-11-05 13:56 ` Pekka Enberg 2007-11-05 14:26 ` Michael Buesch 0 siblings, 1 reply; 13+ messages in thread From: Pekka Enberg @ 2007-11-05 13:56 UTC (permalink / raw) To: Michael Buesch Cc: Peter Zijlstra, Luis R. Rodriguez, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev, Christoph Lameter Hi Michael, On Monday 05 November 2007 13:23:50 Pekka Enberg wrote: > > Is CONFIG_DEBUG_SLAB enabled? Usually these kind of random corruptions > > are caused by someone passing a bad pointer to kfree() or > > kmem_cache_free(). On 11/5/07, Michael Buesch <mb@bu3sch.de> wrote: > Yeah. > > What I also saw was random "one-bit-errors" once and then on rmmod of modules. > I have absolutely no idea how they were caused, though (I read the freeing > codes of the stuff hundreds of times). I don't have any of the oops messages > anymore. > But I do _not_ see this behaviour with slub anymore. It is possible that the corruption is still there but SLUB doesn't show it. Have you tried with slub_debug enabled? Anyway, looking at the oops: > BUG: unable to handle kernel paging request at virtual address f88a4a05 > printing eip: f88a4a05 *pde = 02000067 *pte = 00000000 > > EIP: 0060:[<f88a4a05>] EFLAGS: 00010086 CPU: 0 > EIP is at 0xf88a4a05 > EAX: c20b75c8 EBX: c2f86f38 ECX: f88a4a05 EDX: c2f86f38 > ESI: c20b75c8 EDI: c2f89c00 EBP: c3897bfc ESP: c3897be0 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process modprobe (pid: 2908, ti=c3896000 task=c3935150 task.ti=c3896000) > Stack: c01b2afc c2f82d98 c3897bf4 c01ba8b6 c2f86f38 c20b75c8 c2f82c00 c3897c24 > c02186dd c2f86f38 c3897c24 c01b54c0 c20b75c8 00000001 c20b75c8 c2f86f38 > c20b75c8 c3897c30 c01b54ed 00000001 c3897c54 c01b556c 00000001 c3897cd4 > Call Trace: > [<c0104cec>] show_trace_log_lvl+0x1a/0x2f > [<c0104d9e>] show_stack_log_lvl+0x9d/0xa5 > [<c0104e53>] show_registers+0xad/0x17c > [<c0105017>] die+0xf5/0x1c6 > [<c0112715>] do_page_fault+0x450/0x537 > [<c02a835a>] error_code+0x6a/0x70 > [<c02186dd>] scsi_request_fn+0x5f/0x2ec > [<c01b54ed>] __generic_unplug_device+0x20/0x23 We jump to a bogus address 0xf88a4a05 via a function pointer from scsi_request_fn(). Can you work out the exact file and line for scsi_request_fn+0x5f (look for "gdb vmlinux" in Documentation/BUG-HUNTING) please? Pekka ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-05 13:56 ` Pekka Enberg @ 2007-11-05 14:26 ` Michael Buesch 2007-11-05 18:47 ` Christoph Lameter 0 siblings, 1 reply; 13+ messages in thread From: Michael Buesch @ 2007-11-05 14:26 UTC (permalink / raw) To: Pekka Enberg Cc: Peter Zijlstra, Luis R. Rodriguez, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev, Christoph Lameter On Monday 05 November 2007 14:56:22 Pekka Enberg wrote: > Hi Michael, > > On Monday 05 November 2007 13:23:50 Pekka Enberg wrote: > > > Is CONFIG_DEBUG_SLAB enabled? Usually these kind of random corruptions > > > are caused by someone passing a bad pointer to kfree() or > > > kmem_cache_free(). > > On 11/5/07, Michael Buesch <mb@bu3sch.de> wrote: > > Yeah. > > > > What I also saw was random "one-bit-errors" once and then on rmmod of modules. > > I have absolutely no idea how they were caused, though (I read the freeing > > codes of the stuff hundreds of times). I don't have any of the oops messages > > anymore. > > But I do _not_ see this behaviour with slub anymore. > > It is possible that the corruption is still there but SLUB doesn't > show it. Have you tried with slub_debug enabled? Hm, I don't really remember. Though, I usually have all almost kernel-hacking options enabled. I'll check and enable some more. > > BUG: unable to handle kernel paging request at virtual address f88a4a05 > > printing eip: f88a4a05 *pde = 02000067 *pte = 00000000 > > > > EIP: 0060:[<f88a4a05>] EFLAGS: 00010086 CPU: 0 > > EIP is at 0xf88a4a05 > > EAX: c20b75c8 EBX: c2f86f38 ECX: f88a4a05 EDX: c2f86f38 > > ESI: c20b75c8 EDI: c2f89c00 EBP: c3897bfc ESP: c3897be0 > > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > > Process modprobe (pid: 2908, ti=c3896000 task=c3935150 task.ti=c3896000) > > Stack: c01b2afc c2f82d98 c3897bf4 c01ba8b6 c2f86f38 c20b75c8 c2f82c00 c3897c24 > > c02186dd c2f86f38 c3897c24 c01b54c0 c20b75c8 00000001 c20b75c8 c2f86f38 > > c20b75c8 c3897c30 c01b54ed 00000001 c3897c54 c01b556c 00000001 c3897cd4 > > Call Trace: > > [<c0104cec>] show_trace_log_lvl+0x1a/0x2f > > [<c0104d9e>] show_stack_log_lvl+0x9d/0xa5 > > [<c0104e53>] show_registers+0xad/0x17c > > [<c0105017>] die+0xf5/0x1c6 > > [<c0112715>] do_page_fault+0x450/0x537 > > [<c02a835a>] error_code+0x6a/0x70 > > [<c02186dd>] scsi_request_fn+0x5f/0x2ec > > [<c01b54ed>] __generic_unplug_device+0x20/0x23 > > We jump to a bogus address 0xf88a4a05 via a function pointer from > scsi_request_fn(). Can you work out the exact file and line for > scsi_request_fn+0x5f (look for "gdb vmlinux" in > Documentation/BUG-HUNTING) please? That'd be Luis' task then :) -- Greetings Michael. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: RFC: Reproducible oops with lockdep on count_matching_names() 2007-11-05 14:26 ` Michael Buesch @ 2007-11-05 18:47 ` Christoph Lameter 0 siblings, 0 replies; 13+ messages in thread From: Christoph Lameter @ 2007-11-05 18:47 UTC (permalink / raw) To: Michael Buesch Cc: Pekka Enberg, Peter Zijlstra, Luis R. Rodriguez, Michael Wu, linux-wireless, John W. Linville, Ingo Molnar, Johannes Berg, linux-kernel, Michael Chan, netdev On Mon, 5 Nov 2007, Michael Buesch wrote: > Hm, I don't really remember. Though, I usually have all almost kernel-hacking > options enabled. > I'll check and enable some more. slub_debug must be specified on the command line. Alternately switch on CONFIG_SLUB_DEBUG_ON in the .config to force it to be always on. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-11-05 18:47 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-01 19:17 RFC: Reproducible oops with lockdep on count_matching_names() Luis R. Rodriguez 2007-11-01 19:49 ` John W. Linville 2007-11-01 21:29 ` Luis R. Rodriguez 2007-11-01 23:26 ` Michael Wu 2007-11-02 10:58 ` Peter Zijlstra 2007-11-03 19:58 ` Luis R. Rodriguez 2007-11-03 20:06 ` Michael Buesch 2007-11-05 12:00 ` Peter Zijlstra 2007-11-05 12:23 ` Pekka Enberg 2007-11-05 13:03 ` Michael Buesch 2007-11-05 13:56 ` Pekka Enberg 2007-11-05 14:26 ` Michael Buesch 2007-11-05 18:47 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).