* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-12 20:26 ` Jiri Slaby
@ 2009-07-12 21:54 ` Parag Warudkar
2009-07-12 21:56 ` Eric Paris
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Parag Warudkar @ 2009-07-12 21:54 UTC (permalink / raw)
To: Jiri Slaby; +Cc: linux-kernel, thomas, sds, jmorris, eparis
On Sun, Jul 12, 2009 at 4:26 PM, Jiri Slaby<jirislaby@gmail.com> wrote:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
>> static void selinux_write_opts(struct seq_file *m,
>> 1012 struct security_mnt_opts *opts)
>> 1013 {
>> 1014 int i;
>> 1015 char *prefix;
>> 1016
>> 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
>> 1018 char *has_comma;
>> 1019
>> 1020 if (opts->mnt_opts[i])
>> 1021 has_comma = strchr(opts->mnt_opts[i], ',');
>> ^^^^^^^^^^^^^^^^^^^^^^^^^
>> And that is a NULL pointer dereference - but we just checked for
>> opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
Ah right - anything < PAGE_SIZE is "NULL" dereference to the kernel.
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-12 20:26 ` Jiri Slaby
2009-07-12 21:54 ` Parag Warudkar
@ 2009-07-12 21:56 ` Eric Paris
2009-07-13 17:00 ` Thomas Meyer
2009-07-20 19:23 ` Thomas Meyer
3 siblings, 0 replies; 10+ messages in thread
From: Eric Paris @ 2009-07-12 21:56 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Parag Warudkar, linux-kernel, thomas, sds, jmorris, eparis
On Sun, 2009-07-12 at 22:26 +0200, Jiri Slaby wrote:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > static void selinux_write_opts(struct seq_file *m,
> > 1012 struct security_mnt_opts *opts)
> > 1013 {
> > 1014 int i;
> > 1015 char *prefix;
> > 1016
> > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > 1018 char *has_comma;
> > 1019
> > 1020 if (opts->mnt_opts[i])
> > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > And that is a NULL pointer dereference - but we just checked for
> > opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
>
> Or a single bit mem error. Is memtest OK with this machine?
>
> What warning tainted the kernel before this oops is still interesting...
I just looked over the selinux code where we build the
security_mnt_opts. We can do a 0 length kmalloc, but that should hurt
aything. I should probably not be doing any allocations and leaving the
opts->mnt_opts and opts->mnt_opts_flags == NULL, but 0x40 !=
ZERO_SIZE_PTR(0x10) nor is the security_mnt_opts structure anywhere near
large enough to hit an offset of 0x40.....
I really think I'd like to see any previous BUG/WARN messages you got
and like Jiri said, see if memtest86+ runs cleanly....
-Eric
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-12 20:26 ` Jiri Slaby
2009-07-12 21:54 ` Parag Warudkar
2009-07-12 21:56 ` Eric Paris
@ 2009-07-13 17:00 ` Thomas Meyer
2009-07-13 17:45 ` Jiri Slaby
2009-07-20 19:23 ` Thomas Meyer
3 siblings, 1 reply; 10+ messages in thread
From: Thomas Meyer @ 2009-07-13 17:00 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Parag Warudkar, linux-kernel, sds, jmorris, eparis
Am Sonntag, den 12.07.2009, 22:26 +0200 schrieb Jiri Slaby:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > static void selinux_write_opts(struct seq_file *m,
> > 1012 struct security_mnt_opts *opts)
> > 1013 {
> > 1014 int i;
> > 1015 char *prefix;
> > 1016
> > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > 1018 char *has_comma;
> > 1019
> > 1020 if (opts->mnt_opts[i])
> > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > And that is a NULL pointer dereference - but we just checked for
> > opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
>
> Or a single bit mem error. Is memtest OK with this machine?
Will schedule a memtest run.
>
> What warning tainted the kernel before this oops is still interesting...
There seems to be no OOPS before that BUG. Does a WARNING also set the G
flag? Something like that:
Jul 12 16:42:34 localhost kernel: [ 6179.647004] ------------[ cut here ]------------
Jul 12 16:42:34 localhost kernel: [ 6179.647004] WARNING: at lib/kref.c:43 kref_get+0x2f/0x4c()
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Hardware name: MS-7250
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Modules linked in: rndis_wlan floppy olympic forcedeth [last unloaded: scsi_wait_scan]
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Pid: 27483, comm: pm-suspend Tainted: G W 2.6.31-rc2 #79
Jul 12 16:42:34 localhost kernel: [ 6179.647004] Call Trace:
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8106c1c6>] ? warn_slowpath_common+0x84/0xb2
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8134876b>] ? kref_get+0x2f/0x4c
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff813476b2>] ? kobject_get+0x26/0x44
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a344f>] ? cpufreq_cpu_get+0x74/0xca
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a3675>] ? cpufreq_suspend+0x5f/0x16f
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81044a4f>] ? disable_local_APIC+0x2d/0x6d
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81449b54>] ? sysdev_suspend+0xa7/0x28d
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a1229>] ? suspend_devices_and_enter+0x11c/0x1e6
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a13d8>] ? enter_state+0xe5/0x14c
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a0841>] ? state_store+0xc1/0xf6
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81178bb8>] ? sysfs_write_file+0xe6/0x137
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117d03>] ? vfs_write+0xb4/0x126
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117e5d>] ? sys_write+0x55/0x90
Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8102bf2b>] ? system_call_fastpath+0x16/0x1b
Jul 12 16:42:34 localhost kernel: [ 6179.647004] ---[ end trace f7a4d612974b9b3e ]---
Btw this seems also to be new in 2.6.31-rc2:
Jul 12 16:42:34 localhost kernel: [ 6187.256748] sd 2:0:0:0: [sdb] Starting disk
Jul 12 16:42:34 localhost kernel: [ 6190.318012]
Jul 12 16:42:34 localhost kernel: [ 6190.318128] floppy driver state
Jul 12 16:42:34 localhost kernel: [ 6190.318244] -------------------
Jul 12 16:42:34 localhost kernel: [ 6190.320684] now=4300857616 last interrupt=4294760495 diff=6097121 last called handler=ffffffffa00263ea
Jul 12 16:42:34 localhost kernel: [ 6190.320896] timeout_message=lock fdc
Jul 12 16:42:34 localhost kernel: [ 6190.321018] last output bytes:
Jul 12 16:42:34 localhost kernel: [ 6190.321135] f 80 4294754488
Jul 12 16:42:34 localhost kernel: [ 6190.321251] 0 90 4294754489
Jul 12 16:42:34 localhost kernel: [ 6190.321367] 1 91 4294754489
Jul 12 16:42:34 localhost kernel: [ 6190.321484] 8 81 4294754493
Jul 12 16:42:34 localhost kernel: [ 6190.321600] 4 80 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.321717] 0 90 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.321833] f 80 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.321949] 0 90 4294756489
Jul 12 16:42:34 localhost kernel: [ 6190.322073] 0 91 4294756490
Jul 12 16:42:34 localhost kernel: [ 6190.322189] 8 81 4294756494
Jul 12 16:42:34 localhost kernel: [ 6190.322305] f 80 4294758491
Jul 12 16:42:34 localhost kernel: [ 6190.322421] 0 90 4294758491
Jul 12 16:42:34 localhost kernel: [ 6190.322538] 1 91 4294758491
Jul 12 16:42:34 localhost kernel: [ 6190.322654] 8 81 4294758496
Jul 12 16:42:34 localhost kernel: [ 6190.322770] 4 80 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.322886] 0 90 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323010] f 80 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323126] 0 90 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323243] 0 91 4294760491
Jul 12 16:42:34 localhost kernel: [ 6190.323359] 8 81 4294760495
Jul 12 16:42:34 localhost kernel: [ 6190.323475] last result at 4294760495
Jul 12 16:42:34 localhost kernel: [ 6190.323591] last redo_fd_request at 4296147775
Jul 12 16:42:34 localhost kernel: [ 6190.323708] 20 0
Jul 12 16:42:34 localhost kernel: [ 6190.323869] status=4e
Jul 12 16:42:34 localhost kernel: [ 6190.323984] fdc_busy=1
Jul 12 16:42:34 localhost kernel: [ 6190.324107] do_floppy=ffffffffa0022263
Jul 12 16:42:34 localhost kernel: [ 6190.324224] cont=ffffffffa002d190
Jul 12 16:42:34 localhost kernel: [ 6190.324340] current_req=(null)
Jul 12 16:42:34 localhost kernel: [ 6190.324456] command_status=-1
Jul 12 16:42:34 localhost kernel: [ 6190.324572]
Jul 12 16:42:34 localhost kernel: [ 6190.324689] floppy0: floppy timeout called
Jul 12 16:42:34 localhost kernel: [ 6190.365498] PM: resume devices took 10.626 seconds
greets
thomas
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-13 17:00 ` Thomas Meyer
@ 2009-07-13 17:45 ` Jiri Slaby
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Slaby @ 2009-07-13 17:45 UTC (permalink / raw)
To: Thomas Meyer; +Cc: Parag Warudkar, linux-kernel, sds, jmorris, eparis, cpufreq
On 07/13/2009 07:00 PM, Thomas Meyer wrote:
>> What warning tainted the kernel before this oops is still interesting...
>
> There seems to be no OOPS before that BUG. Does a WARNING also set the G
> flag?
No, W is set by warning. G means no proprietary modules loaded.
> Something like that:
>
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] ------------[ cut here ]------------
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] WARNING: at lib/kref.c:43 kref_get+0x2f/0x4c()
It tried to get a reference of a structure which was already "freed".
Its reference count already had lowered to 0.
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Hardware name: MS-7250
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Modules linked in: rndis_wlan floppy olympic forcedeth [last unloaded: scsi_wait_scan]
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Pid: 27483, comm: pm-suspend Tainted: G W 2.6.31-rc2 #79
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] Call Trace:
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8106c1c6>] ? warn_slowpath_common+0x84/0xb2
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8134876b>] ? kref_get+0x2f/0x4c
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff813476b2>] ? kobject_get+0x26/0x44
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a344f>] ? cpufreq_cpu_get+0x74/0xca
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff815a3675>] ? cpufreq_suspend+0x5f/0x16f
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81044a4f>] ? disable_local_APIC+0x2d/0x6d
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81449b54>] ? sysdev_suspend+0xa7/0x28d
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a1229>] ? suspend_devices_and_enter+0x11c/0x1e6
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a13d8>] ? enter_state+0xe5/0x14c
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff810a0841>] ? state_store+0xc1/0xf6
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81178bb8>] ? sysfs_write_file+0xe6/0x137
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117d03>] ? vfs_write+0xb4/0x126
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff81117e5d>] ? sys_write+0x55/0x90
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] [<ffffffff8102bf2b>] ? system_call_fastpath+0x16/0x1b
> Jul 12 16:42:34 localhost kernel: [ 6179.647004] ---[ end trace f7a4d612974b9b3e ]---
>
> Btw this seems also to be new in 2.6.31-rc2:
This is unrelated I think. I met it many times.
> Jul 12 16:42:34 localhost kernel: [ 6187.256748] sd 2:0:0:0: [sdb] Starting disk
> Jul 12 16:42:34 localhost kernel: [ 6190.318012]
> Jul 12 16:42:34 localhost kernel: [ 6190.318128] floppy driver state
> Jul 12 16:42:34 localhost kernel: [ 6190.318244] -------------------
> Jul 12 16:42:34 localhost kernel: [ 6190.320684] now=4300857616 last interrupt=4294760495 diff=6097121 last called handler=ffffffffa00263ea
> Jul 12 16:42:34 localhost kernel: [ 6190.320896] timeout_message=lock fdc
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-12 20:26 ` Jiri Slaby
` (2 preceding siblings ...)
2009-07-13 17:00 ` Thomas Meyer
@ 2009-07-20 19:23 ` Thomas Meyer
2009-07-22 20:28 ` Rafael J. Wysocki
3 siblings, 1 reply; 10+ messages in thread
From: Thomas Meyer @ 2009-07-20 19:23 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Parag Warudkar, linux-kernel, sds, jmorris, eparis
Am Sonntag, den 12.07.2009, 22:26 +0200 schrieb Jiri Slaby:
> On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > static void selinux_write_opts(struct seq_file *m,
> > 1012 struct security_mnt_opts *opts)
> > 1013 {
> > 1014 int i;
> > 1015 char *prefix;
> > 1016
> > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > 1018 char *has_comma;
> > 1019
> > 1020 if (opts->mnt_opts[i])
> > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > And that is a NULL pointer dereference - but we just checked for
> > opts->mnt_opts[i] for not NULL.
>
> Note, that there is not a NULL dereference. It dereferences 0x40 which
> came in as %rdi. Looks like somebody assigned garbage in there.
>
> Or a single bit mem error. Is memtest OK with this machine?
Yes it ran fine for one cycle (about 2:15 hours).
Maybe memory in acpi S3 is not so stable? Is this possible?
>
> What warning tainted the kernel before this oops is still interesting...
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-20 19:23 ` Thomas Meyer
@ 2009-07-22 20:28 ` Rafael J. Wysocki
2009-07-24 11:03 ` Jiri Slaby
0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2009-07-22 20:28 UTC (permalink / raw)
To: Thomas Meyer
Cc: Jiri Slaby, Parag Warudkar, linux-kernel, sds, jmorris, eparis
On Monday 20 July 2009, Thomas Meyer wrote:
> Am Sonntag, den 12.07.2009, 22:26 +0200 schrieb Jiri Slaby:
> > On 07/12/2009 07:30 PM, Parag Warudkar wrote:
> > > static void selinux_write_opts(struct seq_file *m,
> > > 1012 struct security_mnt_opts *opts)
> > > 1013 {
> > > 1014 int i;
> > > 1015 char *prefix;
> > > 1016
> > > 1017 for (i = 0; i < opts->num_mnt_opts; i++) {
> > > 1018 char *has_comma;
> > > 1019
> > > 1020 if (opts->mnt_opts[i])
> > > 1021 has_comma = strchr(opts->mnt_opts[i], ',');
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > > And that is a NULL pointer dereference - but we just checked for
> > > opts->mnt_opts[i] for not NULL.
> >
> > Note, that there is not a NULL dereference. It dereferences 0x40 which
> > came in as %rdi. Looks like somebody assigned garbage in there.
> >
> > Or a single bit mem error. Is memtest OK with this machine?
> Yes it ran fine for one cycle (about 2:15 hours).
>
> Maybe memory in acpi S3 is not so stable? Is this possible?
Generally, it is, but I haven't seen it happen yet. In theory, if the memory
chips are not refreshed appropriately while suspended, something like this may
happen.
Best,
Rafael
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: 2.6.31-rc2: BUG: unable to handle kernel NULL pointer dereference
2009-07-22 20:28 ` Rafael J. Wysocki
@ 2009-07-24 11:03 ` Jiri Slaby
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Slaby @ 2009-07-24 11:03 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Meyer, Parag Warudkar, linux-kernel, sds, jmorris, eparis
On 07/22/2009 10:28 PM, Rafael J. Wysocki wrote:
> On Monday 20 July 2009, Thomas Meyer wrote:
>> Maybe memory in acpi S3 is not so stable? Is this possible?
>
> Generally, it is, but I haven't seen it happen yet. In theory, if the memory
> chips are not refreshed appropriately while suspended, something like this may
> happen.
You can check it by a program from:
http://lkml.org/lkml/2008/4/24/439
Only change 1000000000 to circa 5/6 of your memory size. Suspend the
machine after "alloced" appears on the console.
^ permalink raw reply [flat|nested] 10+ messages in thread