public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
@ 2006-12-18 11:20 Jiri Slaby
  2006-12-18 15:46 ` [linux-pm] " Rafael J. Wysocki
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Slaby @ 2006-12-18 11:20 UTC (permalink / raw)
  To: Linux kernel mailing list; +Cc: akpm, pavel, linux-pm

Hi.

I got this oops while suspending:
[  309.366557] Disabling non-boot CPUs ...
[  309.386563] CPU 1 is now offline
[  309.387625] CPU1 is down
[  309.387704] Stopping tasks ... done.
[  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
[  310.456669] SMP
[  310.456814] last sysfs file:
/devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
[  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
[  310.457259] CPU:    0
[  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
[  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
[  310.457478] EIP is at shrink_slab+0x9e/0x169
[  310.457548] eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
[  310.457623] esi: 00000000   edi: c18fe500   ebp: f7b3fe3c   esp: f7b3fe08
[  310.457696] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[  310.457772] Process swsusp (pid: 3243, ti=f7b3e000 task=f756f030
task.ti=f7b3e000)
[  310.457845] Stack: c175d8a0 c175daa0 c175db00 00000000 00000000 c053cec0
000045ec 000000d0
[  310.458286]        00000000 00000000 00001179 00001179 00000000 f7b3fe94
c0151445 00000001
[  310.458723]        f7b3fe64 00001df1 00000002 00000001 00000001 00038000
00000c79 0000117b
[  310.459199] Call Trace:
[  310.459334]  [<c0103f1b>] show_trace_log_lvl+0x1a/0x30
[  310.459450]  [<c0103fd6>] show_stack_log_lvl+0xa5/0xca
[  310.459562]  [<c01041ce>] show_registers+0x1d3/0x2b8
[  310.459673]  [<c01043d4>] die+0x121/0x243
[  310.459781]  [<c010456c>] do_trap+0x76/0x9c
[  310.459892]  [<c0104bd8>] do_divide_error+0x94/0x9e
[  310.460001]  [<c038a7e4>] error_code+0x7c/0x84
[  310.460113]  [<c0151445>] shrink_all_memory+0x211/0x2eb
[  310.460225]  [<c01418c1>] swsusp_shrink_memory+0x187/0x196
[  310.460335]  [<c0141a07>] prepare_processes+0x35/0xc8
[  310.460446]  [<c0141cce>] pm_suspend_disk+0xd/0x16f
[  310.460558]  [<c0140c87>] enter_state+0x129/0x19b
[  310.460668]  [<c0140d9c>] state_store+0xa3/0xac
[  310.460777]  [<c0198ab0>] subsys_attr_store+0x20/0x25
[  310.460889]  [<c0198b9f>] sysfs_write_file+0x97/0xd8
[  310.460998]  [<c0165262>] vfs_write+0x8b/0x149
[  310.461108]  [<c01658cb>] sys_write+0x3d/0x64
[  310.461216]  [<c0102fe4>] syscall_call+0x7/0xb
[  310.461328]  =======================
[  310.461397] Code: 31 c0 ff 17 89 c3 8b 45 e4 31 d2 f7 77 0c f7 e3 89 45 d8 89
55 dc 89 d1 89 c6 31 d2 85 c9 74 09 89 c8 31 d2 f7 75 f0 89 c1 89 f0 <f7> 75 f0
89 ca 89 45 d8 89 55 dc 8b 45 d8 03 47 10 89 47 10 85
[  310.464079] EIP: [<c0150c9a>] shrink_slab+0x9e/0x169 SS:ESP 0068:f7b3fe08
[  310.464228]

swsusp script is something like this:
echo platform > /sys/power/disk
echo disk > /sys/power/state

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 11:20 OOPS: divide error while s2dsk (2.6.20-rc1-mm1) Jiri Slaby
@ 2006-12-18 15:46 ` Rafael J. Wysocki
  2006-12-18 17:02   ` Jiri Slaby
  0 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2006-12-18 15:46 UTC (permalink / raw)
  To: linux-pm; +Cc: Jiri Slaby, Linux kernel mailing list, akpm, linux-pm

Hi,

On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
> Hi.
> 
> I got this oops while suspending:
> [  309.366557] Disabling non-boot CPUs ...
> [  309.386563] CPU 1 is now offline
> [  309.387625] CPU1 is down
> [  309.387704] Stopping tasks ... done.
> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
> [  310.456669] SMP
> [  310.456814] last sysfs file:
> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
> [  310.457259] CPU:    0
> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
> [  310.457478] EIP is at shrink_slab+0x9e/0x169

Looks like we have a problem with slab shrinking here.

Could you please use gdb to check what exactly is at shrink_slab+0x9e?

> [  310.457548] eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
> [  310.457623] esi: 00000000   edi: c18fe500   ebp: f7b3fe3c   esp: f7b3fe08
> [  310.457696] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [  310.457772] Process swsusp (pid: 3243, ti=f7b3e000 task=f756f030
> task.ti=f7b3e000)
> [  310.457845] Stack: c175d8a0 c175daa0 c175db00 00000000 00000000 c053cec0
> 000045ec 000000d0
> [  310.458286]        00000000 00000000 00001179 00001179 00000000 f7b3fe94
> c0151445 00000001
> [  310.458723]        f7b3fe64 00001df1 00000002 00000001 00000001 00038000
> 00000c79 0000117b
> [  310.459199] Call Trace:
> [  310.459334]  [<c0103f1b>] show_trace_log_lvl+0x1a/0x30
> [  310.459450]  [<c0103fd6>] show_stack_log_lvl+0xa5/0xca
> [  310.459562]  [<c01041ce>] show_registers+0x1d3/0x2b8
> [  310.459673]  [<c01043d4>] die+0x121/0x243
> [  310.459781]  [<c010456c>] do_trap+0x76/0x9c
> [  310.459892]  [<c0104bd8>] do_divide_error+0x94/0x9e
> [  310.460001]  [<c038a7e4>] error_code+0x7c/0x84
> [  310.460113]  [<c0151445>] shrink_all_memory+0x211/0x2eb
> [  310.460225]  [<c01418c1>] swsusp_shrink_memory+0x187/0x196
> [  310.460335]  [<c0141a07>] prepare_processes+0x35/0xc8
> [  310.460446]  [<c0141cce>] pm_suspend_disk+0xd/0x16f
> [  310.460558]  [<c0140c87>] enter_state+0x129/0x19b
> [  310.460668]  [<c0140d9c>] state_store+0xa3/0xac
> [  310.460777]  [<c0198ab0>] subsys_attr_store+0x20/0x25
> [  310.460889]  [<c0198b9f>] sysfs_write_file+0x97/0xd8
> [  310.460998]  [<c0165262>] vfs_write+0x8b/0x149
> [  310.461108]  [<c01658cb>] sys_write+0x3d/0x64
> [  310.461216]  [<c0102fe4>] syscall_call+0x7/0xb
> [  310.461328]  =======================
> [  310.461397] Code: 31 c0 ff 17 89 c3 8b 45 e4 31 d2 f7 77 0c f7 e3 89 45 d8 89
> 55 dc 89 d1 89 c6 31 d2 85 c9 74 09 89 c8 31 d2 f7 75 f0 89 c1 89 f0 <f7> 75 f0
> 89 ca 89 45 d8 89 55 dc 8b 45 d8 03 47 10 89 47 10 85
> [  310.464079] EIP: [<c0150c9a>] shrink_slab+0x9e/0x169 SS:ESP 0068:f7b3fe08
> [  310.464228]
> 
> swsusp script is something like this:
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> 
> regards,

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
		- Stephen King

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 15:46 ` [linux-pm] " Rafael J. Wysocki
@ 2006-12-18 17:02   ` Jiri Slaby
  2006-12-18 20:59     ` Andrew Morton
  2006-12-18 22:38     ` Rafael J. Wysocki
  0 siblings, 2 replies; 12+ messages in thread
From: Jiri Slaby @ 2006-12-18 17:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Jiri Slaby, Linux kernel mailing list, akpm, linux-pm

Rafael J. Wysocki wrote:
> Hi,
> 
> On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
>> Hi.
>>
>> I got this oops while suspending:
>> [  309.366557] Disabling non-boot CPUs ...
>> [  309.386563] CPU 1 is now offline
>> [  309.387625] CPU1 is down
>> [  309.387704] Stopping tasks ... done.
>> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
>> [  310.456669] SMP
>> [  310.456814] last sysfs file:
>> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
>> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
>> [  310.457259] CPU:    0
>> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
>> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
>> [  310.457478] EIP is at shrink_slab+0x9e/0x169
> 
> Looks like we have a problem with slab shrinking here.
> 
> Could you please use gdb to check what exactly is at shrink_slab+0x9e?

Sure, but not till Friday, sorry (I am away).

>> [  310.457548] eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
>> [  310.457623] esi: 00000000   edi: c18fe500   ebp: f7b3fe3c   esp: f7b3fe08
>> [  310.457696] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
>> [  310.457772] Process swsusp (pid: 3243, ti=f7b3e000 task=f756f030
>> task.ti=f7b3e000)
>> [  310.457845] Stack: c175d8a0 c175daa0 c175db00 00000000 00000000 c053cec0
>> 000045ec 000000d0
>> [  310.458286]        00000000 00000000 00001179 00001179 00000000 f7b3fe94
>> c0151445 00000001
>> [  310.458723]        f7b3fe64 00001df1 00000002 00000001 00000001 00038000
>> 00000c79 0000117b
>> [  310.459199] Call Trace:
>> [  310.459334]  [<c0103f1b>] show_trace_log_lvl+0x1a/0x30
>> [  310.459450]  [<c0103fd6>] show_stack_log_lvl+0xa5/0xca
>> [  310.459562]  [<c01041ce>] show_registers+0x1d3/0x2b8
>> [  310.459673]  [<c01043d4>] die+0x121/0x243
>> [  310.459781]  [<c010456c>] do_trap+0x76/0x9c
>> [  310.459892]  [<c0104bd8>] do_divide_error+0x94/0x9e
>> [  310.460001]  [<c038a7e4>] error_code+0x7c/0x84
>> [  310.460113]  [<c0151445>] shrink_all_memory+0x211/0x2eb
>> [  310.460225]  [<c01418c1>] swsusp_shrink_memory+0x187/0x196
>> [  310.460335]  [<c0141a07>] prepare_processes+0x35/0xc8
>> [  310.460446]  [<c0141cce>] pm_suspend_disk+0xd/0x16f
>> [  310.460558]  [<c0140c87>] enter_state+0x129/0x19b
>> [  310.460668]  [<c0140d9c>] state_store+0xa3/0xac
>> [  310.460777]  [<c0198ab0>] subsys_attr_store+0x20/0x25
>> [  310.460889]  [<c0198b9f>] sysfs_write_file+0x97/0xd8
>> [  310.460998]  [<c0165262>] vfs_write+0x8b/0x149
>> [  310.461108]  [<c01658cb>] sys_write+0x3d/0x64
>> [  310.461216]  [<c0102fe4>] syscall_call+0x7/0xb
>> [  310.461328]  =======================
>> [  310.461397] Code: 31 c0 ff 17 89 c3 8b 45 e4 31 d2 f7 77 0c f7 e3 89 45 d8 89
>> 55 dc 89 d1 89 c6 31 d2 85 c9 74 09 89 c8 31 d2 f7 75 f0 89 c1 89 f0 <f7> 75 f0
>> 89 ca 89 45 d8 89 55 dc 8b 45 d8 03 47 10 89 47 10 85
>> [  310.464079] EIP: [<c0150c9a>] shrink_slab+0x9e/0x169 SS:ESP 0068:f7b3fe08
>> [  310.464228]
>>
>> swsusp script is something like this:
>> echo platform > /sys/power/disk
>> echo disk > /sys/power/state

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 17:02   ` Jiri Slaby
@ 2006-12-18 20:59     ` Andrew Morton
  2006-12-18 22:38     ` Rafael J. Wysocki
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2006-12-18 20:59 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Rafael J. Wysocki, linux-pm, Linux kernel mailing list, linux-pm

On Mon, 18 Dec 2006 18:02:20 +0100
Jiri Slaby <jirislaby@gmail.com> wrote:

> Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
> >> Hi.
> >>
> >> I got this oops while suspending:
> >> [  309.366557] Disabling non-boot CPUs ...
> >> [  309.386563] CPU 1 is now offline
> >> [  309.387625] CPU1 is down
> >> [  309.387704] Stopping tasks ... done.
> >> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
> >> [  310.456669] SMP
> >> [  310.456814] last sysfs file:
> >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
> >> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
> >> [  310.457259] CPU:    0
> >> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
> >> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
> >> [  310.457478] EIP is at shrink_slab+0x9e/0x169
> > 
> > Looks like we have a problem with slab shrinking here.
> > 
> > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> 
> Sure, but not till Friday, sorry (I am away).

I think there's only one divide in there which can do this, so...

--- a/mm/vmscan.c~shrink_slab-handle-bad-shrinkers
+++ a/mm/vmscan.c
@@ -20,6 +20,7 @@
 #include <linux/pagemap.h>
 #include <linux/init.h>
 #include <linux/highmem.h>
+#include <linux/kallsyms.h>
 #include <linux/vmstat.h>
 #include <linux/file.h>
 #include <linux/writeback.h>
@@ -190,7 +191,13 @@ unsigned long shrink_slab(unsigned long 
 		unsigned long total_scan;
 		unsigned long max_pass = (*shrinker->shrinker)(0, gfp_mask);
 
-		delta = (4 * scanned) / shrinker->seeks;
+		if (!shrinker->seeks) {
+			print_symbol("shrinker %s has zero seeks\n",
+				(unsigned long)shrinker->shrinker);
+			delta = (4 * scanned) / DEFAULT_SEEKS;
+		} else {
+			delta = (4 * scanned) / shrinker->seeks;
+		}
 		delta *= max_pass;
 		do_div(delta, lru_pages + 1);
 		shrinker->nr += delta;
_

A quick grep shows that all set_shrinker() callers are doing the right
thing, so something kooky has happened.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 17:02   ` Jiri Slaby
  2006-12-18 20:59     ` Andrew Morton
@ 2006-12-18 22:38     ` Rafael J. Wysocki
  2006-12-18 22:44       ` Nigel Cunningham
  2006-12-18 23:17       ` Andrew Morton
  1 sibling, 2 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2006-12-18 22:38 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-pm, Linux kernel mailing list, akpm, linux-pm

On Monday, 18 December 2006 18:02, Jiri Slaby wrote:
> Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
> >> Hi.
> >>
> >> I got this oops while suspending:
> >> [  309.366557] Disabling non-boot CPUs ...
> >> [  309.386563] CPU 1 is now offline
> >> [  309.387625] CPU1 is down
> >> [  309.387704] Stopping tasks ... done.
> >> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
> >> [  310.456669] SMP
> >> [  310.456814] last sysfs file:
> >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
> >> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
> >> [  310.457259] CPU:    0
> >> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
> >> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
> >> [  310.457478] EIP is at shrink_slab+0x9e/0x169
> > 
> > Looks like we have a problem with slab shrinking here.
> > 
> > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> 
> Sure, but not till Friday, sorry (I am away).

I reproduced this on one box, but then it turned out that EIP was at line 195
of mm/vmscan.c where there was

do_div(delta, lru_pages + 1);

Well, I have no idea how this can lead to a divide error (lru_pages is
unsigned).

I'm unable to reproduce this on another i386 box, so it seems to be somewhat
configuration specific.

Does 2.6.20-rc1 work for you?

Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
		- Stephen King

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 22:38     ` Rafael J. Wysocki
@ 2006-12-18 22:44       ` Nigel Cunningham
  2006-12-18 23:09         ` Rafael J. Wysocki
  2006-12-18 23:17       ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Nigel Cunningham @ 2006-12-18 22:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, akpm, linux-pm

Hi.

On Mon, 2006-12-18 at 23:38 +0100, Rafael J. Wysocki wrote:
> On Monday, 18 December 2006 18:02, Jiri Slaby wrote:
> > Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
> > >> Hi.
> > >>
> > >> I got this oops while suspending:
> > >> [  309.366557] Disabling non-boot CPUs ...
> > >> [  309.386563] CPU 1 is now offline
> > >> [  309.387625] CPU1 is down
> > >> [  309.387704] Stopping tasks ... done.
> > >> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
> > >> [  310.456669] SMP
> > >> [  310.456814] last sysfs file:
> > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
> > >> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
> > >> [  310.457259] CPU:    0
> > >> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
> > >> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
> > >> [  310.457478] EIP is at shrink_slab+0x9e/0x169
> > > 
> > > Looks like we have a problem with slab shrinking here.
> > > 
> > > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> > 
> > Sure, but not till Friday, sorry (I am away).
> 
> I reproduced this on one box, but then it turned out that EIP was at line 195
> of mm/vmscan.c where there was
> 
> do_div(delta, lru_pages + 1);
> 
> Well, I have no idea how this can lead to a divide error (lru_pages is
> unsigned).
> 
> I'm unable to reproduce this on another i386 box, so it seems to be somewhat
> configuration specific.
> 
> Does 2.6.20-rc1 work for you?

I have a patch in -mm that reduces lru_pages by what shrink_all_zones
returns. Could shrink_all_zones perhaps be returning incorrect values
such that lru_pages ends up becoming -1?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 22:44       ` Nigel Cunningham
@ 2006-12-18 23:09         ` Rafael J. Wysocki
  2006-12-18 23:16           ` Nigel Cunningham
  0 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2006-12-18 23:09 UTC (permalink / raw)
  To: nigel; +Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, akpm, linux-pm

Hi,

On Monday, 18 December 2006 23:44, Nigel Cunningham wrote:
> Hi.
> 
> On Mon, 2006-12-18 at 23:38 +0100, Rafael J. Wysocki wrote:
> > On Monday, 18 December 2006 18:02, Jiri Slaby wrote:
> > > Rafael J. Wysocki wrote:
> > > > Hi,
> > > > 
> > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
> > > >> Hi.
> > > >>
> > > >> I got this oops while suspending:
> > > >> [  309.366557] Disabling non-boot CPUs ...
> > > >> [  309.386563] CPU 1 is now offline
> > > >> [  309.387625] CPU1 is down
> > > >> [  309.387704] Stopping tasks ... done.
> > > >> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
> > > >> [  310.456669] SMP
> > > >> [  310.456814] last sysfs file:
> > > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
> > > >> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
> > > >> [  310.457259] CPU:    0
> > > >> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
> > > >> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
> > > >> [  310.457478] EIP is at shrink_slab+0x9e/0x169
> > > > 
> > > > Looks like we have a problem with slab shrinking here.
> > > > 
> > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> > > 
> > > Sure, but not till Friday, sorry (I am away).
> > 
> > I reproduced this on one box, but then it turned out that EIP was at line 195
> > of mm/vmscan.c where there was
> > 
> > do_div(delta, lru_pages + 1);
> > 
> > Well, I have no idea how this can lead to a divide error (lru_pages is
> > unsigned).
> > 
> > I'm unable to reproduce this on another i386 box, so it seems to be somewhat
> > configuration specific.
> > 
> > Does 2.6.20-rc1 work for you?
> 
> I have a patch in -mm that reduces lru_pages by what shrink_all_zones
> returns. Could shrink_all_zones perhaps be returning incorrect values
> such that lru_pages ends up becoming -1?

I don't think so, but look at the appended patch. ;-)

Greetings,
Rafael


---
Fix a (really bad) typo in shrink_all_memory().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.20-rc1-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.20-rc1-mm1.orig/mm/vmscan.c
+++ linux-2.6.20-rc1-mm1/mm/vmscan.c
@@ -1569,7 +1569,7 @@ unsigned long shrink_all_memory(unsigned
 			sc.swap_cluster_max = nr_pages - ret;
 			freed = shrink_all_zones(nr_to_scan, prio, pass, &sc);
 			ret += freed;
-			lru_pages =- freed;
+			lru_pages -= freed;
 			nr_to_scan = nr_pages - ret;
 			if (ret >= nr_pages)
 				goto out;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 23:09         ` Rafael J. Wysocki
@ 2006-12-18 23:16           ` Nigel Cunningham
  0 siblings, 0 replies; 12+ messages in thread
From: Nigel Cunningham @ 2006-12-18 23:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, akpm, linux-pm

Hi.

On Tue, 2006-12-19 at 00:09 +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> On Monday, 18 December 2006 23:44, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Mon, 2006-12-18 at 23:38 +0100, Rafael J. Wysocki wrote:
> > > On Monday, 18 December 2006 18:02, Jiri Slaby wrote:
> > > > Rafael J. Wysocki wrote:
> > > > > Hi,
> > > > > 
> > > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote:
> > > > >> Hi.
> > > > >>
> > > > >> I got this oops while suspending:
> > > > >> [  309.366557] Disabling non-boot CPUs ...
> > > > >> [  309.386563] CPU 1 is now offline
> > > > >> [  309.387625] CPU1 is down
> > > > >> [  309.387704] Stopping tasks ... done.
> > > > >> [  310.030991] Shrinking memory... -<0>divide error: 0000 [#1]
> > > > >> [  310.456669] SMP
> > > > >> [  310.456814] last sysfs file:
> > > > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions
> > > > >> [  310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom
> > > > >> [  310.457259] CPU:    0
> > > > >> [  310.457260] EIP:    0060:[<c0150c9a>]    Not tainted VLI
> > > > >> [  310.457261] EFLAGS: 00210246   (2.6.20-rc1-mm1 #207)
> > > > >> [  310.457478] EIP is at shrink_slab+0x9e/0x169
> > > > > 
> > > > > Looks like we have a problem with slab shrinking here.
> > > > > 
> > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> > > > 
> > > > Sure, but not till Friday, sorry (I am away).
> > > 
> > > I reproduced this on one box, but then it turned out that EIP was at line 195
> > > of mm/vmscan.c where there was
> > > 
> > > do_div(delta, lru_pages + 1);
> > > 
> > > Well, I have no idea how this can lead to a divide error (lru_pages is
> > > unsigned).
> > > 
> > > I'm unable to reproduce this on another i386 box, so it seems to be somewhat
> > > configuration specific.
> > > 
> > > Does 2.6.20-rc1 work for you?
> > 
> > I have a patch in -mm that reduces lru_pages by what shrink_all_zones
> > returns. Could shrink_all_zones perhaps be returning incorrect values
> > such that lru_pages ends up becoming -1?
> 
> I don't think so, but look at the appended patch. ;-)
> 
> Greetings,
> Rafael
> 
> 
> ---
> Fix a (really bad) typo in shrink_all_memory().
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6.20-rc1-mm1/mm/vmscan.c
> ===================================================================
> --- linux-2.6.20-rc1-mm1.orig/mm/vmscan.c
> +++ linux-2.6.20-rc1-mm1/mm/vmscan.c
> @@ -1569,7 +1569,7 @@ unsigned long shrink_all_memory(unsigned
>  			sc.swap_cluster_max = nr_pages - ret;
>  			freed = shrink_all_zones(nr_to_scan, prio, pass, &sc);
>  			ret += freed;
> -			lru_pages =- freed;
> +			lru_pages -= freed;
>  			nr_to_scan = nr_pages - ret;
>  			if (ret >= nr_pages)
>  				goto out;

Heh, yeah.

Definitely acked! :)

Nigel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 22:38     ` Rafael J. Wysocki
  2006-12-18 22:44       ` Nigel Cunningham
@ 2006-12-18 23:17       ` Andrew Morton
  2006-12-19  0:52         ` Rafael J. Wysocki
  2006-12-19  1:18         ` David Rientjes
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2006-12-18 23:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, linux-pm

On Mon, 18 Dec 2006 23:38:23 +0100
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> > > Looks like we have a problem with slab shrinking here.
> > > 
> > > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> > 
> > Sure, but not till Friday, sorry (I am away).
> 
> I reproduced this on one box, but then it turned out that EIP was at line 195
> of mm/vmscan.c where there was
> 
> do_div(delta, lru_pages + 1);

That implies that we passed it lru_pages=-1.

Presumably the logic in
vmscanc-account-for-memory-already-freed-in-seeking-to.patch caused that.

> Well, I have no idea how this can lead to a divide error (lru_pages is
> unsigned).
> 
> I'm unable to reproduce this on another i386 box, so it seems to be somewhat
> configuration specific.
> 

There is one wart in shrink_all_memory() and I think we should fix that in
2.6.20.

Please check the below.  I'll drop
vmscanc-account-for-memory-already-freed-in-seeking-to.patch.  It has other
stuff in it which we might still need.  But altering sc->swap_cluster_max
in that manner looks odd.



From: Andrew Morton <akpm@osdl.org>

At the end of shrink_all_memory() we forget to recalculate lru_pages: it can
be zero.

Fix that up, and add a helper function for this operation too.

Also, recalculate lru_pages each time around the inner loop to get the
balancing correct.

Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 mm/vmscan.c |   33 ++++++++++++++++-----------------
 1 files changed, 16 insertions(+), 17 deletions(-)

diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c
--- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling
+++ a/mm/vmscan.c
@@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un
 	return ret;
 }
 
+static unsigned long count_lru_pages(void)
+{
+	struct zone *zone;
+	unsigned long ret = 0;
+
+	for_each_zone(zone);
+		ret += zone->nr_active + zone->nr_inactive;
+	return ret;
+}
+
 /*
  * Try to free `nr_pages' of memory, system-wide, and return the number of
  * freed pages.
@@ -1498,7 +1508,6 @@ unsigned long shrink_all_memory(unsigned
 	unsigned long ret = 0;
 	int pass;
 	struct reclaim_state reclaim_state;
-	struct zone *zone;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
 		.may_swap = 0,
@@ -1509,10 +1518,7 @@ unsigned long shrink_all_memory(unsigned
 
 	current->reclaim_state = &reclaim_state;
 
-	lru_pages = 0;
-	for_each_zone(zone)
-		lru_pages += zone->nr_active + zone->nr_inactive;
-
+	lru_pages = count_lru_pages();
 	nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
 	/* If slab caches are huge, it's better to hit them first */
 	while (nr_slab >= lru_pages) {
@@ -1539,13 +1545,6 @@ unsigned long shrink_all_memory(unsigned
 	for (pass = 0; pass < 5; pass++) {
 		int prio;
 
-		/* Needed for shrinking slab caches later on */
-		if (!lru_pages)
-			for_each_zone(zone) {
-				lru_pages += zone->nr_active;
-				lru_pages += zone->nr_inactive;
-			}
-
 		/* Force reclaiming mapped pages in the passes #3 and #4 */
 		if (pass > 2) {
 			sc.may_swap = 1;
@@ -1561,7 +1560,8 @@ unsigned long shrink_all_memory(unsigned
 				goto out;
 
 			reclaim_state.reclaimed_slab = 0;
-			shrink_slab(sc.nr_scanned, sc.gfp_mask, lru_pages);
+			shrink_slab(sc.nr_scanned, sc.gfp_mask,
+					count_lru_pages());
 			ret += reclaim_state.reclaimed_slab;
 			if (ret >= nr_pages)
 				goto out;
@@ -1569,20 +1569,19 @@ unsigned long shrink_all_memory(unsigned
 			if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
 				congestion_wait(WRITE, HZ / 10);
 		}
-
-		lru_pages = 0;
 	}
 
 	/*
 	 * If ret = 0, we could not shrink LRUs, but there may be something
 	 * in slab caches
 	 */
-	if (!ret)
+	if (!ret) {
 		do {
 			reclaim_state.reclaimed_slab = 0;
-			shrink_slab(nr_pages, sc.gfp_mask, lru_pages);
+			shrink_slab(nr_pages, sc.gfp_mask, count_lru_pages());
 			ret += reclaim_state.reclaimed_slab;
 		} while (ret < nr_pages && reclaim_state.reclaimed_slab > 0);
+	}
 
 out:
 	current->reclaim_state = NULL;
_


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 23:17       ` Andrew Morton
@ 2006-12-19  0:52         ` Rafael J. Wysocki
  2006-12-19  1:18         ` David Rientjes
  1 sibling, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2006-12-19  0:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, linux-pm

On Tuesday, 19 December 2006 00:17, Andrew Morton wrote:
> On Mon, 18 Dec 2006 23:38:23 +0100
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > > > Looks like we have a problem with slab shrinking here.
> > > > 
> > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e?
> > > 
> > > Sure, but not till Friday, sorry (I am away).
> > 
> > I reproduced this on one box, but then it turned out that EIP was at line 195
> > of mm/vmscan.c where there was
> > 
> > do_div(delta, lru_pages + 1);
> 
> That implies that we passed it lru_pages=-1.
> 
> Presumably the logic in
> vmscanc-account-for-memory-already-freed-in-seeking-to.patch caused that.
> 
> > Well, I have no idea how this can lead to a divide error (lru_pages is
> > unsigned).
> > 
> > I'm unable to reproduce this on another i386 box, so it seems to be somewhat
> > configuration specific.
> > 
> 
> There is one wart in shrink_all_memory() and I think we should fix that in
> 2.6.20.
> 
> Please check the below.

Fine by me.

> I'll drop vmscanc-account-for-memory-already-freed-in-seeking-to.patch.  It
> has other stuff in it which we might still need.  But altering
> sc->swap_cluster_max in that manner looks odd.

Agreed.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-18 23:17       ` Andrew Morton
  2006-12-19  0:52         ` Rafael J. Wysocki
@ 2006-12-19  1:18         ` David Rientjes
  2006-12-19  1:28           ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: David Rientjes @ 2006-12-19  1:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Jiri Slaby, linux-pm,
	Linux kernel mailing list, linux-pm

On Mon, 18 Dec 2006, Andrew Morton wrote:

> diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c
> --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling
> +++ a/mm/vmscan.c
> @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un
>  	return ret;
>  }
>  
> +static unsigned long count_lru_pages(void)
> +{
> +	struct zone *zone;
> +	unsigned long ret = 0;
> +
> +	for_each_zone(zone);
> +		ret += zone->nr_active + zone->nr_inactive;
> +	return ret;
> +}
> +
>  /*
>   * Try to free `nr_pages' of memory, system-wide, and return the number of
>   * freed pages.

There's an extra semicolon there that results in only the final zone being 
used.

		David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1)
  2006-12-19  1:18         ` David Rientjes
@ 2006-12-19  1:28           ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2006-12-19  1:28 UTC (permalink / raw)
  To: David Rientjes
  Cc: Rafael J. Wysocki, Jiri Slaby, linux-pm,
	Linux kernel mailing list, linux-pm

On Mon, 18 Dec 2006 17:18:12 -0800 (PST)
David Rientjes <rientjes@cs.washington.edu> wrote:

> On Mon, 18 Dec 2006, Andrew Morton wrote:
> 
> > diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c
> > --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling
> > +++ a/mm/vmscan.c
> > @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un
> >  	return ret;
> >  }
> >  
> > +static unsigned long count_lru_pages(void)
> > +{
> > +	struct zone *zone;
> > +	unsigned long ret = 0;
> > +
> > +	for_each_zone(zone);
> > +		ret += zone->nr_active + zone->nr_inactive;
> > +	return ret;
> > +}
> > +
> >  /*
> >   * Try to free `nr_pages' of memory, system-wide, and return the number of
> >   * freed pages.
> 
> There's an extra semicolon there

Sigh.  coding-while-diseased.

> that results in only the final zone being 
> used.
> 

Actually it'll go oops.  Fixed, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-12-19  1:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-18 11:20 OOPS: divide error while s2dsk (2.6.20-rc1-mm1) Jiri Slaby
2006-12-18 15:46 ` [linux-pm] " Rafael J. Wysocki
2006-12-18 17:02   ` Jiri Slaby
2006-12-18 20:59     ` Andrew Morton
2006-12-18 22:38     ` Rafael J. Wysocki
2006-12-18 22:44       ` Nigel Cunningham
2006-12-18 23:09         ` Rafael J. Wysocki
2006-12-18 23:16           ` Nigel Cunningham
2006-12-18 23:17       ` Andrew Morton
2006-12-19  0:52         ` Rafael J. Wysocki
2006-12-19  1:18         ` David Rientjes
2006-12-19  1:28           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox