BUG: unable to handle kernel NULL pointer dereference at 0000000000000040

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
@ 2013-06-14 12:49 nirinA raseliarison
  2013-06-14 14:30 ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: nirinA raseliarison @ 2013-06-14 12:49 UTC (permalink / raw)
  To: linux-kernel

hello there,
i have this ethernet controler:

  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
Ethernet controller (rev 05)

that uses the r8169 module.
it works fine, but sometimes after a reboot and issueing:

  ifconfig eth0 192.168.1.1 up

i got the message below. after another reboot the
message disappears. i also get the same message this 3.9.5 and 3.9.4.

it seems i catch my first oops and don't know what to do with it.
currently running:

  cat /proc/version
  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) )  
#1 SMP Fri Jun 14 09:14:50 EAT 2013

  uname -a
  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64  
Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux

thanks,
-----------------8<------------------------------8<---------------------------------------

[   57.877560] BUG: unable to handle kernel NULL pointer dereference at  
0000000000000040
[   57.877603] IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
[   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
[   57.877660] Oops: 0002 [#1] SMP
[   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169  
microcode mii
[   57.877735] CPU 0
[   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To  
be filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
[   57.877790] RIP: 0010:[<ffffffff81491844>]  [<ffffffff81491844>]  
fw_load_abort.isra.5+0x4/0x20
[   57.877824] RSP: 0018:ffff8802119a7e80  EFLAGS: 00010246
[   57.877844] RAX: ffff8802158fe250 RBX: ffff880211a03b40 RCX:  
0000000000000000
[   57.877869] RDX: ffffffff81c742c8 RSI: ffff8802158fe250 RDI:  
0000000000000000
[   57.877895] RBP: ffff8802119a7e80 R08: ffff8802119a6000 R09:  
00000000000005aa
[   57.877920] R10: 0000000000000000 R11: 0000000000000000 R12:  
ffffffffffffffff
[   57.877945] R13: ffff880213d34088 R14: 0000000000000003 R15:  
ffff88020eafc230
[   57.877970] FS:  00007f3c6cb2a740(0000) GS:ffff88021f200000(0000)  
knlGS:0000000000000000
[   57.877998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.878019] CR2: 0000000000000040 CR3: 0000000203155000 CR4:  
00000000001407f0
[   57.878044] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  
0000000000000000
[   57.878069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:  
0000000000000400
[   57.878094] Process firmware (pid: 1950, threadinfo ffff8802119a6000,  
task ffff8802158fe250)
[   57.878124] Stack:
[   57.878133]  ffff8802119a7eb0 ffffffff81491917 ffff880211a4d5a0  
0000000000000003
[   57.878168]  ffff8802119a7f50 ffffffff818765a0 ffff8802119a7ec0  
ffffffff81483063
[   57.878203]  ffff8802119a7f08 ffffffff8119bc9e ffff880213d34098  
ffff880211a4d5c0
[   57.878237] Call Trace:
[   57.878251]  [<ffffffff81491917>] firmware_loading_store+0x77/0x150
[   57.878275]  [<ffffffff81483063>] dev_attr_store+0x13/0x20
[   57.878297]  [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
[   57.878320]  [<ffffffff81133e8a>] vfs_write+0x9a/0x160
[   57.878340]  [<ffffffff81134164>] sys_write+0x44/0x90
[   57.878360]  [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
[   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff  
ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5  
<f0> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
[   57.881753] RIP  [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
[   57.882888]  RSP <ffff8802119a7e80>
[   57.884019] CR2: 0000000000000040
[   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---

-- 
nirinA

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-14 12:49 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 nirinA raseliarison
@ 2013-06-14 14:30 ` Bjorn Helgaas
  2013-06-14 15:45   ` Guenter Roeck
  2013-06-14 17:02   ` Ming Lei
  0 siblings, 2 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2013-06-14 14:30 UTC (permalink / raw)
  To: nirinA raseliarison
  Cc: linux-kernel@vger.kernel.org, Francois Romieu, nic_swsd, Ming Lei,
	Hayes Wang

[+cc Ming, Hayes, Francois, r8169 list]

On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
<nirina.raseliarison@gmail.com> wrote:
> hello there,
> i have this ethernet controler:
>
>  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
> controller (rev 05)
>
> that uses the r8169 module.
> it works fine, but sometimes after a reboot and issueing:
>
>  ifconfig eth0 192.168.1.1 up
>
> i got the message below. after another reboot the
> message disappears. i also get the same message this 3.9.5 and 3.9.4.
>
> it seems i catch my first oops and don't know what to do with it.
> currently running:
>
>  cat /proc/version
>  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
> SMP Fri Jun 14 09:14:50 EAT 2013
>
>  uname -a
>  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>
> thanks,
> -----------------8<------------------------------8<---------------------------------------
>
> [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000040
> [   57.877603] IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
> [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
> [   57.877660] Oops: 0002 [#1] SMP
> [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
> microcode mii
> [   57.877735] CPU 0
> [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
> [   57.877790] RIP: 0010:[<ffffffff81491844>]  [<ffffffff81491844>]
> fw_load_abort.isra.5+0x4/0x20
> [   57.877824] RSP: 0018:ffff8802119a7e80  EFLAGS: 00010246
> [   57.877844] RAX: ffff8802158fe250 RBX: ffff880211a03b40 RCX:
> 0000000000000000
> [   57.877869] RDX: ffffffff81c742c8 RSI: ffff8802158fe250 RDI:
> 0000000000000000
> [   57.877895] RBP: ffff8802119a7e80 R08: ffff8802119a6000 R09:
> 00000000000005aa
> [   57.877920] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffffffffffff
> [   57.877945] R13: ffff880213d34088 R14: 0000000000000003 R15:
> ffff88020eafc230
> [   57.877970] FS:  00007f3c6cb2a740(0000) GS:ffff88021f200000(0000)
> knlGS:0000000000000000
> [   57.877998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   57.878019] CR2: 0000000000000040 CR3: 0000000203155000 CR4:
> 00000000001407f0
> [   57.878044] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   57.878069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [   57.878094] Process firmware (pid: 1950, threadinfo ffff8802119a6000,
> task ffff8802158fe250)
> [   57.878124] Stack:
> [   57.878133]  ffff8802119a7eb0 ffffffff81491917 ffff880211a4d5a0
> 0000000000000003
> [   57.878168]  ffff8802119a7f50 ffffffff818765a0 ffff8802119a7ec0
> ffffffff81483063
> [   57.878203]  ffff8802119a7f08 ffffffff8119bc9e ffff880213d34098
> ffff880211a4d5c0
> [   57.878237] Call Trace:
> [   57.878251]  [<ffffffff81491917>] firmware_loading_store+0x77/0x150
> [   57.878275]  [<ffffffff81483063>] dev_attr_store+0x13/0x20
> [   57.878297]  [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
> [   57.878320]  [<ffffffff81133e8a>] vfs_write+0x9a/0x160
> [   57.878340]  [<ffffffff81134164>] sys_write+0x44/0x90
> [   57.878360]  [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
> [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
> <f0> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
> [   57.881753] RIP  [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
> [   57.882888]  RSP <ffff8802119a7e80>
> [   57.884019] CR2: 0000000000000040
> [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
>
> --
> nirinA
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-14 14:30 ` Bjorn Helgaas
@ 2013-06-14 15:45   ` Guenter Roeck
  2013-06-14 17:07     ` nirinA raseliarison
  2013-06-14 17:02   ` Ming Lei
  1 sibling, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2013-06-14 15:45 UTC (permalink / raw)
  To: nirinA raseliarison
  Cc: linux-kernel@vger.kernel.org, Francois Romieu, nic_swsd, Ming Lei,
	Hayes Wang

On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote:
> [+cc Ming, Hayes, Francois, r8169 list]
> 
> On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
> <nirina.raseliarison@gmail.com> wrote:
> > hello there,
> > i have this ethernet controler:
> >
> >  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
> > controller (rev 05)
> >
> > that uses the r8169 module.
> > it works fine, but sometimes after a reboot and issueing:
> >
> >  ifconfig eth0 192.168.1.1 up
> >
> > i got the message below. after another reboot the
> > message disappears. i also get the same message this 3.9.5 and 3.9.4.
> >
> > it seems i catch my first oops and don't know what to do with it.
> > currently running:
> >
> >  cat /proc/version
> >  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
> > SMP Fri Jun 14 09:14:50 EAT 2013
> >
> >  uname -a
> >  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
> > Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
> >
> > thanks,
> > -----------------8<------------------------------8<---------------------------------------
> >
> > [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000040
> > [   57.877603] IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
> > [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
> > [   57.877660] Oops: 0002 [#1] SMP
> > [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
> > microcode mii
> > [   57.877735] CPU 0
> > [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
> > filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
> > [   57.877790] RIP: 0010:[<ffffffff81491844>]  [<ffffffff81491844>]
> > fw_load_abort.isra.5+0x4/0x20
> > [   57.877824] RSP: 0018:ffff8802119a7e80  EFLAGS: 00010246
> > [   57.877844] RAX: ffff8802158fe250 RBX: ffff880211a03b40 RCX:
> > 0000000000000000
> > [   57.877869] RDX: ffffffff81c742c8 RSI: ffff8802158fe250 RDI:
> > 0000000000000000
> > [   57.877895] RBP: ffff8802119a7e80 R08: ffff8802119a6000 R09:
> > 00000000000005aa
> > [   57.877920] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffffffffffff
> > [   57.877945] R13: ffff880213d34088 R14: 0000000000000003 R15:
> > ffff88020eafc230
> > [   57.877970] FS:  00007f3c6cb2a740(0000) GS:ffff88021f200000(0000)
> > knlGS:0000000000000000
> > [   57.877998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   57.878019] CR2: 0000000000000040 CR3: 0000000203155000 CR4:
> > 00000000001407f0
> > [   57.878044] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [   57.878069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [   57.878094] Process firmware (pid: 1950, threadinfo ffff8802119a6000,
> > task ffff8802158fe250)
> > [   57.878124] Stack:
> > [   57.878133]  ffff8802119a7eb0 ffffffff81491917 ffff880211a4d5a0
> > 0000000000000003
> > [   57.878168]  ffff8802119a7f50 ffffffff818765a0 ffff8802119a7ec0
> > ffffffff81483063
> > [   57.878203]  ffff8802119a7f08 ffffffff8119bc9e ffff880213d34098
> > ffff880211a4d5c0
> > [   57.878237] Call Trace:
> > [   57.878251]  [<ffffffff81491917>] firmware_loading_store+0x77/0x150
> > [   57.878275]  [<ffffffff81483063>] dev_attr_store+0x13/0x20
> > [   57.878297]  [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
> > [   57.878320]  [<ffffffff81133e8a>] vfs_write+0x9a/0x160
> > [   57.878340]  [<ffffffff81134164>] sys_write+0x44/0x90
> > [   57.878360]  [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
> > [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
> > ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
> > <f0> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
> > [   57.881753] RIP  [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
> > [   57.882888]  RSP <ffff8802119a7e80>
> > [   57.884019] CR2: 0000000000000040
> > [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
> >

Please try the following patch.

[ Bjorn, sorry I dropped you from the recipient list, but unfortunately
Google still considers me to be a spammer and doesn't let me send any
e-mail to you ]

Guenter

----------

>From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Fri, 14 Jun 2013 08:39:06 -0700
Subject: [PATCH] firmware: Fix race condition in firmware_loading_store

Fix:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
...
Call Trace:
[<ffffffff81491917>] firmware_loading_store+0x77/0x150
[<ffffffff81483063>] dev_attr_store+0x13/0x20
[<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
[<ffffffff81133e8a>] vfs_write+0x9a/0x160
[<ffffffff81134164>] sys_write+0x44/0x90
[<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 drivers/base/firmware_class.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b1f926..f34b489 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -570,12 +570,13 @@ static ssize_t firmware_loading_store(struct device *dev,
 				      const char *buf, size_t count)
 {
 	struct firmware_priv *fw_priv = to_firmware_priv(dev);
-	struct firmware_buf *fw_buf = fw_priv->buf;
 	int loading = simple_strtol(buf, NULL, 10);
+	struct firmware_buf *fw_buf;
 	int i;
 
 	mutex_lock(&fw_lock);
 
+	fw_buf = fw_priv->buf;
 	if (!fw_buf)
 		goto out;
 
-- 
1.7.9.7


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-14 15:45   ` Guenter Roeck
@ 2013-06-14 17:07     ` nirinA raseliarison
  2013-06-15  2:32       ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: nirinA raseliarison @ 2013-06-14 17:07 UTC (permalink / raw)
  To: nirinA raseliarison, Guenter Roeck
  Cc: linux-kernel@vger.kernel.org, Francois Romieu, nic_swsd, Ming Lei,
	Hayes Wang

on Fri, 14 Jun 2013 18:45:48 +0300, Guenter Roeck <linux@roeck-us.net>  
wrote:

> On Fri, Jun 14, 2013 at 08:30:29AM -0600, Bjorn Helgaas wrote:
>> [+cc Ming, Hayes, Francois, r8169 list]
>>
>> On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
>> <nirina.raseliarison@gmail.com> wrote:
>> > hello there,
>> > i have this ethernet controler:
>> >
>> >  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
>> Ethernet
>> > controller (rev 05)
>> >
>> > that uses the r8169 module.
>> > it works fine, but sometimes after a reboot and issueing:
>> >
>> >  ifconfig eth0 192.168.1.1 up
>> >
>> > i got the message below. after another reboot the
>> > message disappears. i also get the same message this 3.9.5 and 3.9.4.
>> >
>> > it seems i catch my first oops and don't know what to do with it.
>> > currently running:
>> >
>> >  cat /proc/version
>> >  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1  
>> (GCC) ) #1
>> > SMP Fri Jun 14 09:14:50 EAT 2013
>> >
>> >  uname -a
>> >  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013  
>> x86_64
>> > Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>> >
>> > thanks,
>> >  
>> -----------------8<------------------------------8<---------------------------------------
>> >
>> > [   57.877560] BUG: unable to handle kernel NULL pointer dereference  
>> at
>> > 0000000000000040
>> > [   57.877603] IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
>> > [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
>> > [   57.877660] Oops: 0002 [#1] SMP
>> > [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev  
>> r8169
>> > microcode mii
>> > [   57.877735] CPU 0
>> > [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614  
>> #1 To be
>> > filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
>> > [   57.877790] RIP: 0010:[<ffffffff81491844>]  [<ffffffff81491844>]
>> > fw_load_abort.isra.5+0x4/0x20
>> > [   57.877824] RSP: 0018:ffff8802119a7e80  EFLAGS: 00010246
>> > [   57.877844] RAX: ffff8802158fe250 RBX: ffff880211a03b40 RCX:
>> > 0000000000000000
>> > [   57.877869] RDX: ffffffff81c742c8 RSI: ffff8802158fe250 RDI:
>> > 0000000000000000
>> > [   57.877895] RBP: ffff8802119a7e80 R08: ffff8802119a6000 R09:
>> > 00000000000005aa
>> > [   57.877920] R10: 0000000000000000 R11: 0000000000000000 R12:
>> > ffffffffffffffff
>> > [   57.877945] R13: ffff880213d34088 R14: 0000000000000003 R15:
>> > ffff88020eafc230
>> > [   57.877970] FS:  00007f3c6cb2a740(0000) GS:ffff88021f200000(0000)
>> > knlGS:0000000000000000
>> > [   57.877998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [   57.878019] CR2: 0000000000000040 CR3: 0000000203155000 CR4:
>> > 00000000001407f0
>> > [   57.878044] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> > 0000000000000000
>> > [   57.878069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> > 0000000000000400
>> > [   57.878094] Process firmware (pid: 1950, threadinfo  
>> ffff8802119a6000,
>> > task ffff8802158fe250)
>> > [   57.878124] Stack:
>> > [   57.878133]  ffff8802119a7eb0 ffffffff81491917 ffff880211a4d5a0
>> > 0000000000000003
>> > [   57.878168]  ffff8802119a7f50 ffffffff818765a0 ffff8802119a7ec0
>> > ffffffff81483063
>> > [   57.878203]  ffff8802119a7f08 ffffffff8119bc9e ffff880213d34098
>> > ffff880211a4d5c0
>> > [   57.878237] Call Trace:
>> > [   57.878251]  [<ffffffff81491917>] firmware_loading_store+0x77/0x150
>> > [   57.878275]  [<ffffffff81483063>] dev_attr_store+0x13/0x20
>> > [   57.878297]  [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
>> > [   57.878320]  [<ffffffff81133e8a>] vfs_write+0x9a/0x160
>> > [   57.878340]  [<ffffffff81134164>] sys_write+0x44/0x90
>> > [   57.878360]  [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
>> > [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79  
>> ff ff
>> > ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48  
>> 89 e5
>> > <f0> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
>> > [   57.881753] RIP  [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
>> > [   57.882888]  RSP <ffff8802119a7e80>
>> > [   57.884019] CR2: 0000000000000040
>> > [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
>> >
>
> Please try the following patch.

patch applied and no longer have the bug message when i
reboot and wake up the ethernet controller.

thanks,

> [ Bjorn, sorry I dropped you from the recipient list, but unfortunately
> Google still considers me to be a spammer and doesn't let me send any
> e-mail to you ]
>
> Guenter
>
> ----------
>
> From 9feae0b1b33721573c41fbf2323db2a12c34c725 Mon Sep 17 00:00:00 2001
> From: Guenter Roeck <linux@roeck-us.net>
> Date: Fri, 14 Jun 2013 08:39:06 -0700
> Subject: [PATCH] firmware: Fix race condition in firmware_loading_store
>
> Fix:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
> ...
> Call Trace:
> [<ffffffff81491917>] firmware_loading_store+0x77/0x150
> [<ffffffff81483063>] dev_attr_store+0x13/0x20
> [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
> [<ffffffff81133e8a>] vfs_write+0x9a/0x160
> [<ffffffff81134164>] sys_write+0x44/0x90
> [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>  drivers/base/firmware_class.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/firmware_class.c  
> b/drivers/base/firmware_class.c
> index 4b1f926..f34b489 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -570,12 +570,13 @@ static ssize_t firmware_loading_store(struct  
> device *dev,
>  				      const char *buf, size_t count)
>  {
>  	struct firmware_priv *fw_priv = to_firmware_priv(dev);
> -	struct firmware_buf *fw_buf = fw_priv->buf;
>  	int loading = simple_strtol(buf, NULL, 10);
> +	struct firmware_buf *fw_buf;
>  	int i;
> 	mutex_lock(&fw_lock);
> +	fw_buf = fw_priv->buf;
>  	if (!fw_buf)
>  		goto out;
>


-- 
nirinA

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-14 17:07     ` nirinA raseliarison
@ 2013-06-15  2:32       ` Ming Lei
  2013-06-15  6:30         ` Guenter Roeck
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2013-06-15  2:32 UTC (permalink / raw)
  To: nirinA raseliarison
  Cc: Guenter Roeck, linux-kernel@vger.kernel.org, Francois Romieu,
	nic_swsd, Hayes Wang

On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
<nirina.raseliarison@gmail.com> wrote:

> patch applied and no longer have the bug message when i
> reboot and wake up the ethernet controller.

I am wondering if Guenter's patch can fix the race really, but I'd like to
see Guenter's explanation on his patch.

The race should be caused by below:

- request timeout triggered by internal timer

- user space aborts the requests before the line in _request_firmware_load()

             fw_priv->buf = NULL

which is run in timeout path

- then the abort() called from firmware_loading_store() may use a freed fw buf
since the timeout path will free the fw buffer.

Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't protected
by fw_lock now, so Guenter's patch can't avoid the race entirely.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-15  2:32       ` Ming Lei
@ 2013-06-15  6:30         ` Guenter Roeck
  2013-06-15  8:08           ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2013-06-15  6:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: nirinA raseliarison, linux-kernel@vger.kernel.org,
	Francois Romieu, nic_swsd, Hayes Wang

On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
> On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
> <nirina.raseliarison@gmail.com> wrote:
> 
> > patch applied and no longer have the bug message when i
> > reboot and wake up the ethernet controller.
> 
> I am wondering if Guenter's patch can fix the race really, but I'd like to
> see Guenter's explanation on his patch.
> 
> The race should be caused by below:
> 
> - request timeout triggered by internal timer
> 
> - user space aborts the requests before the line in _request_firmware_load()
> 
>              fw_priv->buf = NULL
> 
> which is run in timeout path
> 
> - then the abort() called from firmware_loading_store() may use a freed fw buf
> since the timeout path will free the fw buffer.
> 
> Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't protected
> by fw_lock now, so Guenter's patch can't avoid the race entirely.
> 
I agree; my patch only protects one specific path, and was based on the
observation that access to fw_priv->buf is protected elsewhwere in the code.
My suspicion was that fw_priv->buf was freed while waiting for the mutex in
firmware_loading_store().

Your patch is more comprehensive.

Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-15  6:30         ` Guenter Roeck
@ 2013-06-15  8:08           ` Ming Lei
  2013-06-15 16:43             ` nirinA raseliarison
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2013-06-15  8:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: nirinA raseliarison, linux-kernel@vger.kernel.org,
	Francois Romieu, nic_swsd, Hayes Wang

On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
>> On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
>> <nirina.raseliarison@gmail.com> wrote:
>>
>> > patch applied and no longer have the bug message when i
>> > reboot and wake up the ethernet controller.
>>
>> I am wondering if Guenter's patch can fix the race really, but I'd like to
>> see Guenter's explanation on his patch.
>>
>> The race should be caused by below:
>>
>> - request timeout triggered by internal timer
>>
>> - user space aborts the requests before the line in _request_firmware_load()
>>
>>              fw_priv->buf = NULL
>>
>> which is run in timeout path
>>
>> - then the abort() called from firmware_loading_store() may use a freed fw buf
>> since the timeout path will free the fw buffer.
>>
>> Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't protected
>> by fw_lock now, so Guenter's patch can't avoid the race entirely.
>>
> I agree; my patch only protects one specific path, and was based on the
> observation that access to fw_priv->buf is protected elsewhwere in the code.
> My suspicion was that fw_priv->buf was freed while waiting for the mutex in
> firmware_loading_store().
>
> Your patch is more comprehensive.

OK, thanks for your reply.

I will post out one version for merge, and this one moves the
"fw_priv->buf = NULL;" into fw_load_abort() for simplifying change.


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-15  8:08           ` Ming Lei
@ 2013-06-15 16:43             ` nirinA raseliarison
  0 siblings, 0 replies; 10+ messages in thread
From: nirinA raseliarison @ 2013-06-15 16:43 UTC (permalink / raw)
  To: Guenter Roeck, Ming Lei
  Cc: nirinA raseliarison, linux-kernel@vger.kernel.org,
	Francois Romieu, nic_swsd, Hayes Wang

on Sat, 15 Jun 2013 11:08:47 +0300, Ming Lei <ming.lei@canonical.com>  
wrote:

> On Sat, Jun 15, 2013 at 2:30 PM, Guenter Roeck <linux@roeck-us.net>  
> wrote:
>> On Sat, Jun 15, 2013 at 10:32:14AM +0800, Ming Lei wrote:
>>> On Sat, Jun 15, 2013 at 1:07 AM, nirinA raseliarison
>>> <nirina.raseliarison@gmail.com> wrote:
>>>
>>> > patch applied and no longer have the bug message when i
>>> > reboot and wake up the ethernet controller.
>>>
>>> I am wondering if Guenter's patch can fix the race really, but I'd  
>>> like to
>>> see Guenter's explanation on his patch.
>>>
>>> The race should be caused by below:
>>>
>>> - request timeout triggered by internal timer
>>>
>>> - user space aborts the requests before the line in  
>>> _request_firmware_load()
>>>
>>>              fw_priv->buf = NULL
>>>
>>> which is run in timeout path
>>>
>>> - then the abort() called from firmware_loading_store() may use a  
>>> freed fw buf
>>> since the timeout path will free the fw buffer.
>>>
>>> Considered clearing 'fw_priv->buf' in _request_firmware_load()() isn't  
>>> protected
>>> by fw_lock now, so Guenter's patch can't avoid the race entirely.
>>>
>> I agree; my patch only protects one specific path, and was based on the
>> observation that access to fw_priv->buf is protected elsewhwere in the  
>> code.
>> My suspicion was that fw_priv->buf was freed while waiting for the  
>> mutex in
>> firmware_loading_store().
>>
>> Your patch is more comprehensive.
>
> OK, thanks for your reply.
>
> I will post out one version for merge, and this one moves the
> "fw_priv->buf = NULL;" into fw_load_abort() for simplifying change.

this is just to let you know that i've tested Ming Lei's latest patch.
thank you very much for the fix and the explanation.

-- 
nirinA

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-14 14:30 ` Bjorn Helgaas
  2013-06-14 15:45   ` Guenter Roeck
@ 2013-06-14 17:02   ` Ming Lei
  2013-06-14 18:32     ` nirinA raseliarison
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2013-06-14 17:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: nirinA raseliarison, linux-kernel@vger.kernel.org,
	Francois Romieu, nic_swsd, Hayes Wang, Guenter Roeck

[-- Attachment #1: Type: text/plain, Size: 6227 bytes --]

On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Ming, Hayes, Francois, r8169 list]
>
> On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
> <nirina.raseliarison@gmail.com> wrote:
>> hello there,
>> i have this ethernet controler:
>>
>>  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet
>> controller (rev 05)
>>
>> that uses the r8169 module.
>> it works fine, but sometimes after a reboot and issueing:
>>
>>  ifconfig eth0 192.168.1.1 up
>>
>> i got the message below. after another reboot the
>> message disappears. i also get the same message this 3.9.5 and 3.9.4.
>>
>> it seems i catch my first oops and don't know what to do with it.
>> currently running:
>>
>>  cat /proc/version
>>  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1 (GCC) ) #1
>> SMP Fri Jun 14 09:14:50 EAT 2013
>>
>>  uname -a
>>  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013 x86_64
>> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>>
>> thanks,
>> -----------------8<------------------------------8<---------------------------------------
>>
>> [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000040
>> [   57.877603] IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
>> [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
>> [   57.877660] Oops: 0002 [#1] SMP
>> [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev r8169
>> microcode mii
>> [   57.877735] CPU 0
>> [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1 To be
>> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
>> [   57.877790] RIP: 0010:[<ffffffff81491844>]  [<ffffffff81491844>]
>> fw_load_abort.isra.5+0x4/0x20
>> [   57.877824] RSP: 0018:ffff8802119a7e80  EFLAGS: 00010246
>> [   57.877844] RAX: ffff8802158fe250 RBX: ffff880211a03b40 RCX:
>> 0000000000000000
>> [   57.877869] RDX: ffffffff81c742c8 RSI: ffff8802158fe250 RDI:
>> 0000000000000000
>> [   57.877895] RBP: ffff8802119a7e80 R08: ffff8802119a6000 R09:
>> 00000000000005aa
>> [   57.877920] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffffffffffffffff
>> [   57.877945] R13: ffff880213d34088 R14: 0000000000000003 R15:
>> ffff88020eafc230
>> [   57.877970] FS:  00007f3c6cb2a740(0000) GS:ffff88021f200000(0000)
>> knlGS:0000000000000000
>> [   57.877998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   57.878019] CR2: 0000000000000040 CR3: 0000000203155000 CR4:
>> 00000000001407f0
>> [   57.878044] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [   57.878069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [   57.878094] Process firmware (pid: 1950, threadinfo ffff8802119a6000,
>> task ffff8802158fe250)
>> [   57.878124] Stack:
>> [   57.878133]  ffff8802119a7eb0 ffffffff81491917 ffff880211a4d5a0
>> 0000000000000003
>> [   57.878168]  ffff8802119a7f50 ffffffff818765a0 ffff8802119a7ec0
>> ffffffff81483063
>> [   57.878203]  ffff8802119a7f08 ffffffff8119bc9e ffff880213d34098
>> ffff880211a4d5c0
>> [   57.878237] Call Trace:
>> [   57.878251]  [<ffffffff81491917>] firmware_loading_store+0x77/0x150
>> [   57.878275]  [<ffffffff81483063>] dev_attr_store+0x13/0x20
>> [   57.878297]  [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
>> [   57.878320]  [<ffffffff81133e8a>] vfs_write+0x9a/0x160
>> [   57.878340]  [<ffffffff81134164>] sys_write+0x44/0x90
>> [   57.878360]  [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
>> [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79 ff ff
>> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48 89 e5
>> <f0> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
>> [   57.881753] RIP  [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
>> [   57.882888]  RSP <ffff8802119a7e80>
>> [   57.884019] CR2: 0000000000000040
>> [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---

Looks it is a double abort race, could you try below patch?
(also attached for applying)

--
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 6ede229..a217ba8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev,
 				     struct device_attribute *attr, char *buf)
 {
 	struct firmware_priv *fw_priv = to_firmware_priv(dev);
-	int loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
+	int loading = 0;
+
+	mutex_lock(&fw_lock);
+	if (fw_priv->buf)
+		loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
+	mutex_unlock(&fw_lock);

 	return sprintf(buf, "%d\n", loading);
 }
@@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev,
 				      const char *buf, size_t count)
 {
 	struct firmware_priv *fw_priv = to_firmware_priv(dev);
-	struct firmware_buf *fw_buf = fw_priv->buf;
+	struct firmware_buf *fw_buf;
 	int loading = simple_strtol(buf, NULL, 10);
 	int i;

 	mutex_lock(&fw_lock);
-
+	fw_buf = fw_priv->buf;
 	if (!fw_buf)
 		goto out;

@@ -636,6 +641,7 @@ static ssize_t firmware_loading_store(struct device *dev,
 		/* fallthrough */
 	case -1:
 		fw_load_abort(fw_buf);
+		fw_priv->buf = NULL;
 		break;
 	}
 out:
@@ -704,6 +710,7 @@ static int fw_realloc_buffer(struct firmware_priv
*fw_priv, int min_size)
 				    GFP_KERNEL);
 		if (!new_pages) {
 			fw_load_abort(buf);
+			fw_priv->buf = NULL;
 			return -ENOMEM;
 		}
 		memcpy(new_pages, buf->pages,
@@ -721,6 +728,7 @@ static int fw_realloc_buffer(struct firmware_priv
*fw_priv, int min_size)

 		if (!buf->pages[buf->nr_pages]) {
 			fw_load_abort(buf);
+			fw_priv->buf = NULL;
 			return -ENOMEM;
 		}
 		buf->nr_pages++;
@@ -805,6 +813,7 @@ static void firmware_class_timeout_work(struct
work_struct *work)
 		return;
 	}
 	fw_load_abort(fw_priv->buf);
+	fw_priv->buf = NULL;
 	mutex_unlock(&fw_lock);
 }

@@ -886,8 +895,6 @@ static int _request_firmware_load(struct
firmware_priv *fw_priv, bool uevent,

 	cancel_delayed_work_sync(&fw_priv->timeout_work);

-	fw_priv->buf = NULL;
-
 	device_remove_file(f_dev, &dev_attr_loading);
 err_del_bin_attr:
 	device_remove_bin_file(f_dev, &firmware_attr_data);


Thanks,
--
Ming Lei

[-- Attachment #2: fw-double-abort.patch --]
[-- Type: application/octet-stream, Size: 2170 bytes --]

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 6ede229..a217ba8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device *dev,
 				     struct device_attribute *attr, char *buf)
 {
 	struct firmware_priv *fw_priv = to_firmware_priv(dev);
-	int loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
+	int loading = 0;
+
+	mutex_lock(&fw_lock);
+	if (fw_priv->buf)
+		loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
+	mutex_unlock(&fw_lock);
 
 	return sprintf(buf, "%d\n", loading);
 }
@@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct device *dev,
 				      const char *buf, size_t count)
 {
 	struct firmware_priv *fw_priv = to_firmware_priv(dev);
-	struct firmware_buf *fw_buf = fw_priv->buf;
+	struct firmware_buf *fw_buf;
 	int loading = simple_strtol(buf, NULL, 10);
 	int i;
 
 	mutex_lock(&fw_lock);
-
+	fw_buf = fw_priv->buf;
 	if (!fw_buf)
 		goto out;
 
@@ -636,6 +641,7 @@ static ssize_t firmware_loading_store(struct device *dev,
 		/* fallthrough */
 	case -1:
 		fw_load_abort(fw_buf);
+		fw_priv->buf = NULL;
 		break;
 	}
 out:
@@ -704,6 +710,7 @@ static int fw_realloc_buffer(struct firmware_priv *fw_priv, int min_size)
 				    GFP_KERNEL);
 		if (!new_pages) {
 			fw_load_abort(buf);
+			fw_priv->buf = NULL;
 			return -ENOMEM;
 		}
 		memcpy(new_pages, buf->pages,
@@ -721,6 +728,7 @@ static int fw_realloc_buffer(struct firmware_priv *fw_priv, int min_size)
 
 		if (!buf->pages[buf->nr_pages]) {
 			fw_load_abort(buf);
+			fw_priv->buf = NULL;
 			return -ENOMEM;
 		}
 		buf->nr_pages++;
@@ -805,6 +813,7 @@ static void firmware_class_timeout_work(struct work_struct *work)
 		return;
 	}
 	fw_load_abort(fw_priv->buf);
+	fw_priv->buf = NULL;
 	mutex_unlock(&fw_lock);
 }
 
@@ -886,8 +895,6 @@ static int _request_firmware_load(struct firmware_priv *fw_priv, bool uevent,
 
 	cancel_delayed_work_sync(&fw_priv->timeout_work);
 
-	fw_priv->buf = NULL;
-
 	device_remove_file(f_dev, &dev_attr_loading);
 err_del_bin_attr:
 	device_remove_bin_file(f_dev, &firmware_attr_data);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
  2013-06-14 17:02   ` Ming Lei
@ 2013-06-14 18:32     ` nirinA raseliarison
  0 siblings, 0 replies; 10+ messages in thread
From: nirinA raseliarison @ 2013-06-14 18:32 UTC (permalink / raw)
  To: Bjorn Helgaas, Ming Lei
  Cc: nirinA raseliarison, linux-kernel@vger.kernel.org,
	Francois Romieu, nic_swsd, Hayes Wang, Guenter Roeck

on Fri, 14 Jun 2013 20:02:25 +0300, Ming Lei <ming.lei@canonical.com>  
wrote:

> On Fri, Jun 14, 2013 at 10:30 PM, Bjorn Helgaas <bhelgaas@google.com>  
> wrote:
>> [+cc Ming, Hayes, Francois, r8169 list]
>>
>> On Fri, Jun 14, 2013 at 6:49 AM, nirinA raseliarison
>> <nirina.raseliarison@gmail.com> wrote:
>>> hello there,
>>> i have this ethernet controler:
>>>
>>>  Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast  
>>> Ethernet
>>> controller (rev 05)
>>>
>>> that uses the r8169 module.
>>> it works fine, but sometimes after a reboot and issueing:
>>>
>>>  ifconfig eth0 192.168.1.1 up
>>>
>>> i got the message below. after another reboot the
>>> message disappears. i also get the same message this 3.9.5 and 3.9.4.
>>>
>>> it seems i catch my first oops and don't know what to do with it.
>>> currently running:
>>>
>>>  cat /proc/version
>>>  Linux version 3.9.6.20130614 (root@supernova) (gcc version 4.8.1  
>>> (GCC) ) #1
>>> SMP Fri Jun 14 09:14:50 EAT 2013
>>>
>>>  uname -a
>>>  Linux supernova 3.9.6.20130614 #1 SMP Fri Jun 14 09:14:50 EAT 2013  
>>> x86_64
>>> Intel(R) Celeron(R) CPU G1610 @ 2.60GHz GenuineIntel GNU/Linux
>>>
>>> thanks,
>>> -----------------8<------------------------------8<---------------------------------------
>>>
>>> [   57.877560] BUG: unable to handle kernel NULL pointer dereference at
>>> 0000000000000040
>>> [   57.877603] IP: [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
>>> [   57.877634] PGD 21330a067 PUD 211a3a067 PMD 0
>>> [   57.877660] Oops: 0002 [#1] SMP
>>> [   57.877681] Modules linked in: fuse coretemp kvm_intel kvm evdev  
>>> r8169
>>> microcode mii
>>> [   57.877735] CPU 0
>>> [   57.877746] Pid: 1950, comm: firmware Not tainted 3.9.6.20130614 #1  
>>> To be
>>> filled by O.E.M. To be filled by O.E.M./ONDA H61V Ver:4.01
>>> [   57.877790] RIP: 0010:[<ffffffff81491844>]  [<ffffffff81491844>]
>>> fw_load_abort.isra.5+0x4/0x20
>>> [   57.877824] RSP: 0018:ffff8802119a7e80  EFLAGS: 00010246
>>> [   57.877844] RAX: ffff8802158fe250 RBX: ffff880211a03b40 RCX:
>>> 0000000000000000
>>> [   57.877869] RDX: ffffffff81c742c8 RSI: ffff8802158fe250 RDI:
>>> 0000000000000000
>>> [   57.877895] RBP: ffff8802119a7e80 R08: ffff8802119a6000 R09:
>>> 00000000000005aa
>>> [   57.877920] R10: 0000000000000000 R11: 0000000000000000 R12:
>>> ffffffffffffffff
>>> [   57.877945] R13: ffff880213d34088 R14: 0000000000000003 R15:
>>> ffff88020eafc230
>>> [   57.877970] FS:  00007f3c6cb2a740(0000) GS:ffff88021f200000(0000)
>>> knlGS:0000000000000000
>>> [   57.877998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   57.878019] CR2: 0000000000000040 CR3: 0000000203155000 CR4:
>>> 00000000001407f0
>>> [   57.878044] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [   57.878069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>> 0000000000000400
>>> [   57.878094] Process firmware (pid: 1950, threadinfo  
>>> ffff8802119a6000,
>>> task ffff8802158fe250)
>>> [   57.878124] Stack:
>>> [   57.878133]  ffff8802119a7eb0 ffffffff81491917 ffff880211a4d5a0
>>> 0000000000000003
>>> [   57.878168]  ffff8802119a7f50 ffffffff818765a0 ffff8802119a7ec0
>>> ffffffff81483063
>>> [   57.878203]  ffff8802119a7f08 ffffffff8119bc9e ffff880213d34098
>>> ffff880211a4d5c0
>>> [   57.878237] Call Trace:
>>> [   57.878251]  [<ffffffff81491917>] firmware_loading_store+0x77/0x150
>>> [   57.878275]  [<ffffffff81483063>] dev_attr_store+0x13/0x20
>>> [   57.878297]  [<ffffffff8119bc9e>] sysfs_write_file+0xce/0x140
>>> [   57.878320]  [<ffffffff81133e8a>] vfs_write+0x9a/0x160
>>> [   57.878340]  [<ffffffff81134164>] sys_write+0x44/0x90
>>> [   57.878360]  [<ffffffff817d70ed>] system_call_fastpath+0x1a/0x1f
>>> [   57.879379] Code: 6b ff ff ff 48 89 df 31 db e8 b9 b0 c9 ff e9 79  
>>> ff ff
>>> ff 0f 1f 40 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 55 48  
>>> 89 e5
>>> <f0> 80 4f 40 04 48 83 c7 18 e8 8e a9 bd ff 5d c3 66 66 66 2e 0f
>>> [   57.881753] RIP  [<ffffffff81491844>] fw_load_abort.isra.5+0x4/0x20
>>> [   57.882888]  RSP <ffff8802119a7e80>
>>> [   57.884019] CR2: 0000000000000040
>>> [   57.885166] ---[ end trace 6705f6d4ce6b6a12 ]---
>
> Looks it is a double abort race, could you try below patch?
> (also attached for applying)

i've also applied this patch and up to now, after
reboot a few times all thing seems to work fine.

thanks,

> --
> diff --git a/drivers/base/firmware_class.c  
> b/drivers/base/firmware_class.c
> index 6ede229..a217ba8 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -550,7 +550,12 @@ static ssize_t firmware_loading_show(struct device  
> *dev,
>  				     struct device_attribute *attr, char *buf)
>  {
>  	struct firmware_priv *fw_priv = to_firmware_priv(dev);
> -	int loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
> +	int loading = 0;
> +
> +	mutex_lock(&fw_lock);
> +	if (fw_priv->buf)
> +		loading = test_bit(FW_STATUS_LOADING, &fw_priv->buf->status);
> +	mutex_unlock(&fw_lock);
>
>  	return sprintf(buf, "%d\n", loading);
>  }
> @@ -592,12 +597,12 @@ static ssize_t firmware_loading_store(struct  
> device *dev,
>  				      const char *buf, size_t count)
>  {
>  	struct firmware_priv *fw_priv = to_firmware_priv(dev);
> -	struct firmware_buf *fw_buf = fw_priv->buf;
> +	struct firmware_buf *fw_buf;
>  	int loading = simple_strtol(buf, NULL, 10);
>  	int i;
>
>  	mutex_lock(&fw_lock);
> -
> +	fw_buf = fw_priv->buf;
>  	if (!fw_buf)
>  		goto out;
>
> @@ -636,6 +641,7 @@ static ssize_t firmware_loading_store(struct device  
> *dev,
>  		/* fallthrough */
>  	case -1:
>  		fw_load_abort(fw_buf);
> +		fw_priv->buf = NULL;
>  		break;
>  	}
>  out:
> @@ -704,6 +710,7 @@ static int fw_realloc_buffer(struct firmware_priv
> *fw_priv, int min_size)
>  				    GFP_KERNEL);
>  		if (!new_pages) {
>  			fw_load_abort(buf);
> +			fw_priv->buf = NULL;
>  			return -ENOMEM;
>  		}
>  		memcpy(new_pages, buf->pages,
> @@ -721,6 +728,7 @@ static int fw_realloc_buffer(struct firmware_priv
> *fw_priv, int min_size)
>
>  		if (!buf->pages[buf->nr_pages]) {
>  			fw_load_abort(buf);
> +			fw_priv->buf = NULL;
>  			return -ENOMEM;
>  		}
>  		buf->nr_pages++;
> @@ -805,6 +813,7 @@ static void firmware_class_timeout_work(struct
> work_struct *work)
>  		return;
>  	}
>  	fw_load_abort(fw_priv->buf);
> +	fw_priv->buf = NULL;
>  	mutex_unlock(&fw_lock);
>  }
>
> @@ -886,8 +895,6 @@ static int _request_firmware_load(struct
> firmware_priv *fw_priv, bool uevent,
>
>  	cancel_delayed_work_sync(&fw_priv->timeout_work);
>
> -	fw_priv->buf = NULL;
> -
>  	device_remove_file(f_dev, &dev_attr_loading);
>  err_del_bin_attr:
>  	device_remove_bin_file(f_dev, &firmware_attr_data);
>
>
> Thanks,
> --
> Ming Lei


-- 
nirinA

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-06-15 16:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-14 12:49 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 nirinA raseliarison
2013-06-14 14:30 ` Bjorn Helgaas
2013-06-14 15:45   ` Guenter Roeck
2013-06-14 17:07     ` nirinA raseliarison
2013-06-15  2:32       ` Ming Lei
2013-06-15  6:30         ` Guenter Roeck
2013-06-15  8:08           ` Ming Lei
2013-06-15 16:43             ` nirinA raseliarison
2013-06-14 17:02   ` Ming Lei
2013-06-14 18:32     ` nirinA raseliarison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox