* OOPS: divide error while s2dsk (2.6.20-rc1-mm1) @ 2006-12-18 11:20 Jiri Slaby 2006-12-18 15:46 ` [linux-pm] " Rafael J. Wysocki 0 siblings, 1 reply; 12+ messages in thread From: Jiri Slaby @ 2006-12-18 11:20 UTC (permalink / raw) To: Linux kernel mailing list; +Cc: akpm, pavel, linux-pm Hi. I got this oops while suspending: [ 309.366557] Disabling non-boot CPUs ... [ 309.386563] CPU 1 is now offline [ 309.387625] CPU1 is down [ 309.387704] Stopping tasks ... done. [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] [ 310.456669] SMP [ 310.456814] last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom [ 310.457259] CPU: 0 [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) [ 310.457478] EIP is at shrink_slab+0x9e/0x169 [ 310.457548] eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000 [ 310.457623] esi: 00000000 edi: c18fe500 ebp: f7b3fe3c esp: f7b3fe08 [ 310.457696] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 310.457772] Process swsusp (pid: 3243, ti=f7b3e000 task=f756f030 task.ti=f7b3e000) [ 310.457845] Stack: c175d8a0 c175daa0 c175db00 00000000 00000000 c053cec0 000045ec 000000d0 [ 310.458286] 00000000 00000000 00001179 00001179 00000000 f7b3fe94 c0151445 00000001 [ 310.458723] f7b3fe64 00001df1 00000002 00000001 00000001 00038000 00000c79 0000117b [ 310.459199] Call Trace: [ 310.459334] [<c0103f1b>] show_trace_log_lvl+0x1a/0x30 [ 310.459450] [<c0103fd6>] show_stack_log_lvl+0xa5/0xca [ 310.459562] [<c01041ce>] show_registers+0x1d3/0x2b8 [ 310.459673] [<c01043d4>] die+0x121/0x243 [ 310.459781] [<c010456c>] do_trap+0x76/0x9c [ 310.459892] [<c0104bd8>] do_divide_error+0x94/0x9e [ 310.460001] [<c038a7e4>] error_code+0x7c/0x84 [ 310.460113] [<c0151445>] shrink_all_memory+0x211/0x2eb [ 310.460225] [<c01418c1>] swsusp_shrink_memory+0x187/0x196 [ 310.460335] [<c0141a07>] prepare_processes+0x35/0xc8 [ 310.460446] [<c0141cce>] pm_suspend_disk+0xd/0x16f [ 310.460558] [<c0140c87>] enter_state+0x129/0x19b [ 310.460668] [<c0140d9c>] state_store+0xa3/0xac [ 310.460777] [<c0198ab0>] subsys_attr_store+0x20/0x25 [ 310.460889] [<c0198b9f>] sysfs_write_file+0x97/0xd8 [ 310.460998] [<c0165262>] vfs_write+0x8b/0x149 [ 310.461108] [<c01658cb>] sys_write+0x3d/0x64 [ 310.461216] [<c0102fe4>] syscall_call+0x7/0xb [ 310.461328] ======================= [ 310.461397] Code: 31 c0 ff 17 89 c3 8b 45 e4 31 d2 f7 77 0c f7 e3 89 45 d8 89 55 dc 89 d1 89 c6 31 d2 85 c9 74 09 89 c8 31 d2 f7 75 f0 89 c1 89 f0 <f7> 75 f0 89 ca 89 45 d8 89 55 dc 8b 45 d8 03 47 10 89 47 10 85 [ 310.464079] EIP: [<c0150c9a>] shrink_slab+0x9e/0x169 SS:ESP 0068:f7b3fe08 [ 310.464228] swsusp script is something like this: echo platform > /sys/power/disk echo disk > /sys/power/state regards, -- http://www.fi.muni.cz/~xslaby/ Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 11:20 OOPS: divide error while s2dsk (2.6.20-rc1-mm1) Jiri Slaby @ 2006-12-18 15:46 ` Rafael J. Wysocki 2006-12-18 17:02 ` Jiri Slaby 0 siblings, 1 reply; 12+ messages in thread From: Rafael J. Wysocki @ 2006-12-18 15:46 UTC (permalink / raw) To: linux-pm; +Cc: Jiri Slaby, Linux kernel mailing list, akpm, linux-pm Hi, On Monday, 18 December 2006 12:20, Jiri Slaby wrote: > Hi. > > I got this oops while suspending: > [ 309.366557] Disabling non-boot CPUs ... > [ 309.386563] CPU 1 is now offline > [ 309.387625] CPU1 is down > [ 309.387704] Stopping tasks ... done. > [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] > [ 310.456669] SMP > [ 310.456814] last sysfs file: > /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions > [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom > [ 310.457259] CPU: 0 > [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI > [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) > [ 310.457478] EIP is at shrink_slab+0x9e/0x169 Looks like we have a problem with slab shrinking here. Could you please use gdb to check what exactly is at shrink_slab+0x9e? > [ 310.457548] eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000 > [ 310.457623] esi: 00000000 edi: c18fe500 ebp: f7b3fe3c esp: f7b3fe08 > [ 310.457696] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > [ 310.457772] Process swsusp (pid: 3243, ti=f7b3e000 task=f756f030 > task.ti=f7b3e000) > [ 310.457845] Stack: c175d8a0 c175daa0 c175db00 00000000 00000000 c053cec0 > 000045ec 000000d0 > [ 310.458286] 00000000 00000000 00001179 00001179 00000000 f7b3fe94 > c0151445 00000001 > [ 310.458723] f7b3fe64 00001df1 00000002 00000001 00000001 00038000 > 00000c79 0000117b > [ 310.459199] Call Trace: > [ 310.459334] [<c0103f1b>] show_trace_log_lvl+0x1a/0x30 > [ 310.459450] [<c0103fd6>] show_stack_log_lvl+0xa5/0xca > [ 310.459562] [<c01041ce>] show_registers+0x1d3/0x2b8 > [ 310.459673] [<c01043d4>] die+0x121/0x243 > [ 310.459781] [<c010456c>] do_trap+0x76/0x9c > [ 310.459892] [<c0104bd8>] do_divide_error+0x94/0x9e > [ 310.460001] [<c038a7e4>] error_code+0x7c/0x84 > [ 310.460113] [<c0151445>] shrink_all_memory+0x211/0x2eb > [ 310.460225] [<c01418c1>] swsusp_shrink_memory+0x187/0x196 > [ 310.460335] [<c0141a07>] prepare_processes+0x35/0xc8 > [ 310.460446] [<c0141cce>] pm_suspend_disk+0xd/0x16f > [ 310.460558] [<c0140c87>] enter_state+0x129/0x19b > [ 310.460668] [<c0140d9c>] state_store+0xa3/0xac > [ 310.460777] [<c0198ab0>] subsys_attr_store+0x20/0x25 > [ 310.460889] [<c0198b9f>] sysfs_write_file+0x97/0xd8 > [ 310.460998] [<c0165262>] vfs_write+0x8b/0x149 > [ 310.461108] [<c01658cb>] sys_write+0x3d/0x64 > [ 310.461216] [<c0102fe4>] syscall_call+0x7/0xb > [ 310.461328] ======================= > [ 310.461397] Code: 31 c0 ff 17 89 c3 8b 45 e4 31 d2 f7 77 0c f7 e3 89 45 d8 89 > 55 dc 89 d1 89 c6 31 d2 85 c9 74 09 89 c8 31 d2 f7 75 f0 89 c1 89 f0 <f7> 75 f0 > 89 ca 89 45 d8 89 55 dc 8b 45 d8 03 47 10 89 47 10 85 > [ 310.464079] EIP: [<c0150c9a>] shrink_slab+0x9e/0x169 SS:ESP 0068:f7b3fe08 > [ 310.464228] > > swsusp script is something like this: > echo platform > /sys/power/disk > echo disk > /sys/power/state > > regards, Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 15:46 ` [linux-pm] " Rafael J. Wysocki @ 2006-12-18 17:02 ` Jiri Slaby 2006-12-18 20:59 ` Andrew Morton 2006-12-18 22:38 ` Rafael J. Wysocki 0 siblings, 2 replies; 12+ messages in thread From: Jiri Slaby @ 2006-12-18 17:02 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-pm, Jiri Slaby, Linux kernel mailing list, akpm, linux-pm Rafael J. Wysocki wrote: > Hi, > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote: >> Hi. >> >> I got this oops while suspending: >> [ 309.366557] Disabling non-boot CPUs ... >> [ 309.386563] CPU 1 is now offline >> [ 309.387625] CPU1 is down >> [ 309.387704] Stopping tasks ... done. >> [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] >> [ 310.456669] SMP >> [ 310.456814] last sysfs file: >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions >> [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom >> [ 310.457259] CPU: 0 >> [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI >> [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) >> [ 310.457478] EIP is at shrink_slab+0x9e/0x169 > > Looks like we have a problem with slab shrinking here. > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? Sure, but not till Friday, sorry (I am away). >> [ 310.457548] eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000 >> [ 310.457623] esi: 00000000 edi: c18fe500 ebp: f7b3fe3c esp: f7b3fe08 >> [ 310.457696] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 >> [ 310.457772] Process swsusp (pid: 3243, ti=f7b3e000 task=f756f030 >> task.ti=f7b3e000) >> [ 310.457845] Stack: c175d8a0 c175daa0 c175db00 00000000 00000000 c053cec0 >> 000045ec 000000d0 >> [ 310.458286] 00000000 00000000 00001179 00001179 00000000 f7b3fe94 >> c0151445 00000001 >> [ 310.458723] f7b3fe64 00001df1 00000002 00000001 00000001 00038000 >> 00000c79 0000117b >> [ 310.459199] Call Trace: >> [ 310.459334] [<c0103f1b>] show_trace_log_lvl+0x1a/0x30 >> [ 310.459450] [<c0103fd6>] show_stack_log_lvl+0xa5/0xca >> [ 310.459562] [<c01041ce>] show_registers+0x1d3/0x2b8 >> [ 310.459673] [<c01043d4>] die+0x121/0x243 >> [ 310.459781] [<c010456c>] do_trap+0x76/0x9c >> [ 310.459892] [<c0104bd8>] do_divide_error+0x94/0x9e >> [ 310.460001] [<c038a7e4>] error_code+0x7c/0x84 >> [ 310.460113] [<c0151445>] shrink_all_memory+0x211/0x2eb >> [ 310.460225] [<c01418c1>] swsusp_shrink_memory+0x187/0x196 >> [ 310.460335] [<c0141a07>] prepare_processes+0x35/0xc8 >> [ 310.460446] [<c0141cce>] pm_suspend_disk+0xd/0x16f >> [ 310.460558] [<c0140c87>] enter_state+0x129/0x19b >> [ 310.460668] [<c0140d9c>] state_store+0xa3/0xac >> [ 310.460777] [<c0198ab0>] subsys_attr_store+0x20/0x25 >> [ 310.460889] [<c0198b9f>] sysfs_write_file+0x97/0xd8 >> [ 310.460998] [<c0165262>] vfs_write+0x8b/0x149 >> [ 310.461108] [<c01658cb>] sys_write+0x3d/0x64 >> [ 310.461216] [<c0102fe4>] syscall_call+0x7/0xb >> [ 310.461328] ======================= >> [ 310.461397] Code: 31 c0 ff 17 89 c3 8b 45 e4 31 d2 f7 77 0c f7 e3 89 45 d8 89 >> 55 dc 89 d1 89 c6 31 d2 85 c9 74 09 89 c8 31 d2 f7 75 f0 89 c1 89 f0 <f7> 75 f0 >> 89 ca 89 45 d8 89 55 dc 8b 45 d8 03 47 10 89 47 10 85 >> [ 310.464079] EIP: [<c0150c9a>] shrink_slab+0x9e/0x169 SS:ESP 0068:f7b3fe08 >> [ 310.464228] >> >> swsusp script is something like this: >> echo platform > /sys/power/disk >> echo disk > /sys/power/state regards, -- http://www.fi.muni.cz/~xslaby/ Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 17:02 ` Jiri Slaby @ 2006-12-18 20:59 ` Andrew Morton 2006-12-18 22:38 ` Rafael J. Wysocki 1 sibling, 0 replies; 12+ messages in thread From: Andrew Morton @ 2006-12-18 20:59 UTC (permalink / raw) To: Jiri Slaby Cc: Rafael J. Wysocki, linux-pm, Linux kernel mailing list, linux-pm On Mon, 18 Dec 2006 18:02:20 +0100 Jiri Slaby <jirislaby@gmail.com> wrote: > Rafael J. Wysocki wrote: > > Hi, > > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote: > >> Hi. > >> > >> I got this oops while suspending: > >> [ 309.366557] Disabling non-boot CPUs ... > >> [ 309.386563] CPU 1 is now offline > >> [ 309.387625] CPU1 is down > >> [ 309.387704] Stopping tasks ... done. > >> [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] > >> [ 310.456669] SMP > >> [ 310.456814] last sysfs file: > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions > >> [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom > >> [ 310.457259] CPU: 0 > >> [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI > >> [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) > >> [ 310.457478] EIP is at shrink_slab+0x9e/0x169 > > > > Looks like we have a problem with slab shrinking here. > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > Sure, but not till Friday, sorry (I am away). I think there's only one divide in there which can do this, so... --- a/mm/vmscan.c~shrink_slab-handle-bad-shrinkers +++ a/mm/vmscan.c @@ -20,6 +20,7 @@ #include <linux/pagemap.h> #include <linux/init.h> #include <linux/highmem.h> +#include <linux/kallsyms.h> #include <linux/vmstat.h> #include <linux/file.h> #include <linux/writeback.h> @@ -190,7 +191,13 @@ unsigned long shrink_slab(unsigned long unsigned long total_scan; unsigned long max_pass = (*shrinker->shrinker)(0, gfp_mask); - delta = (4 * scanned) / shrinker->seeks; + if (!shrinker->seeks) { + print_symbol("shrinker %s has zero seeks\n", + (unsigned long)shrinker->shrinker); + delta = (4 * scanned) / DEFAULT_SEEKS; + } else { + delta = (4 * scanned) / shrinker->seeks; + } delta *= max_pass; do_div(delta, lru_pages + 1); shrinker->nr += delta; _ A quick grep shows that all set_shrinker() callers are doing the right thing, so something kooky has happened. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 17:02 ` Jiri Slaby 2006-12-18 20:59 ` Andrew Morton @ 2006-12-18 22:38 ` Rafael J. Wysocki 2006-12-18 22:44 ` Nigel Cunningham 2006-12-18 23:17 ` Andrew Morton 1 sibling, 2 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2006-12-18 22:38 UTC (permalink / raw) To: Jiri Slaby; +Cc: linux-pm, Linux kernel mailing list, akpm, linux-pm On Monday, 18 December 2006 18:02, Jiri Slaby wrote: > Rafael J. Wysocki wrote: > > Hi, > > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote: > >> Hi. > >> > >> I got this oops while suspending: > >> [ 309.366557] Disabling non-boot CPUs ... > >> [ 309.386563] CPU 1 is now offline > >> [ 309.387625] CPU1 is down > >> [ 309.387704] Stopping tasks ... done. > >> [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] > >> [ 310.456669] SMP > >> [ 310.456814] last sysfs file: > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions > >> [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom > >> [ 310.457259] CPU: 0 > >> [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI > >> [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) > >> [ 310.457478] EIP is at shrink_slab+0x9e/0x169 > > > > Looks like we have a problem with slab shrinking here. > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > Sure, but not till Friday, sorry (I am away). I reproduced this on one box, but then it turned out that EIP was at line 195 of mm/vmscan.c where there was do_div(delta, lru_pages + 1); Well, I have no idea how this can lead to a divide error (lru_pages is unsigned). I'm unable to reproduce this on another i386 box, so it seems to be somewhat configuration specific. Does 2.6.20-rc1 work for you? Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 22:38 ` Rafael J. Wysocki @ 2006-12-18 22:44 ` Nigel Cunningham 2006-12-18 23:09 ` Rafael J. Wysocki 2006-12-18 23:17 ` Andrew Morton 1 sibling, 1 reply; 12+ messages in thread From: Nigel Cunningham @ 2006-12-18 22:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, akpm, linux-pm Hi. On Mon, 2006-12-18 at 23:38 +0100, Rafael J. Wysocki wrote: > On Monday, 18 December 2006 18:02, Jiri Slaby wrote: > > Rafael J. Wysocki wrote: > > > Hi, > > > > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote: > > >> Hi. > > >> > > >> I got this oops while suspending: > > >> [ 309.366557] Disabling non-boot CPUs ... > > >> [ 309.386563] CPU 1 is now offline > > >> [ 309.387625] CPU1 is down > > >> [ 309.387704] Stopping tasks ... done. > > >> [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] > > >> [ 310.456669] SMP > > >> [ 310.456814] last sysfs file: > > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions > > >> [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom > > >> [ 310.457259] CPU: 0 > > >> [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI > > >> [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) > > >> [ 310.457478] EIP is at shrink_slab+0x9e/0x169 > > > > > > Looks like we have a problem with slab shrinking here. > > > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > > > Sure, but not till Friday, sorry (I am away). > > I reproduced this on one box, but then it turned out that EIP was at line 195 > of mm/vmscan.c where there was > > do_div(delta, lru_pages + 1); > > Well, I have no idea how this can lead to a divide error (lru_pages is > unsigned). > > I'm unable to reproduce this on another i386 box, so it seems to be somewhat > configuration specific. > > Does 2.6.20-rc1 work for you? I have a patch in -mm that reduces lru_pages by what shrink_all_zones returns. Could shrink_all_zones perhaps be returning incorrect values such that lru_pages ends up becoming -1? Regards, Nigel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 22:44 ` Nigel Cunningham @ 2006-12-18 23:09 ` Rafael J. Wysocki 2006-12-18 23:16 ` Nigel Cunningham 0 siblings, 1 reply; 12+ messages in thread From: Rafael J. Wysocki @ 2006-12-18 23:09 UTC (permalink / raw) To: nigel; +Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, akpm, linux-pm Hi, On Monday, 18 December 2006 23:44, Nigel Cunningham wrote: > Hi. > > On Mon, 2006-12-18 at 23:38 +0100, Rafael J. Wysocki wrote: > > On Monday, 18 December 2006 18:02, Jiri Slaby wrote: > > > Rafael J. Wysocki wrote: > > > > Hi, > > > > > > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote: > > > >> Hi. > > > >> > > > >> I got this oops while suspending: > > > >> [ 309.366557] Disabling non-boot CPUs ... > > > >> [ 309.386563] CPU 1 is now offline > > > >> [ 309.387625] CPU1 is down > > > >> [ 309.387704] Stopping tasks ... done. > > > >> [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] > > > >> [ 310.456669] SMP > > > >> [ 310.456814] last sysfs file: > > > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions > > > >> [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom > > > >> [ 310.457259] CPU: 0 > > > >> [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI > > > >> [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) > > > >> [ 310.457478] EIP is at shrink_slab+0x9e/0x169 > > > > > > > > Looks like we have a problem with slab shrinking here. > > > > > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > > > > > Sure, but not till Friday, sorry (I am away). > > > > I reproduced this on one box, but then it turned out that EIP was at line 195 > > of mm/vmscan.c where there was > > > > do_div(delta, lru_pages + 1); > > > > Well, I have no idea how this can lead to a divide error (lru_pages is > > unsigned). > > > > I'm unable to reproduce this on another i386 box, so it seems to be somewhat > > configuration specific. > > > > Does 2.6.20-rc1 work for you? > > I have a patch in -mm that reduces lru_pages by what shrink_all_zones > returns. Could shrink_all_zones perhaps be returning incorrect values > such that lru_pages ends up becoming -1? I don't think so, but look at the appended patch. ;-) Greetings, Rafael --- Fix a (really bad) typo in shrink_all_memory(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.20-rc1-mm1/mm/vmscan.c =================================================================== --- linux-2.6.20-rc1-mm1.orig/mm/vmscan.c +++ linux-2.6.20-rc1-mm1/mm/vmscan.c @@ -1569,7 +1569,7 @@ unsigned long shrink_all_memory(unsigned sc.swap_cluster_max = nr_pages - ret; freed = shrink_all_zones(nr_to_scan, prio, pass, &sc); ret += freed; - lru_pages =- freed; + lru_pages -= freed; nr_to_scan = nr_pages - ret; if (ret >= nr_pages) goto out; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 23:09 ` Rafael J. Wysocki @ 2006-12-18 23:16 ` Nigel Cunningham 0 siblings, 0 replies; 12+ messages in thread From: Nigel Cunningham @ 2006-12-18 23:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, akpm, linux-pm Hi. On Tue, 2006-12-19 at 00:09 +0100, Rafael J. Wysocki wrote: > Hi, > > On Monday, 18 December 2006 23:44, Nigel Cunningham wrote: > > Hi. > > > > On Mon, 2006-12-18 at 23:38 +0100, Rafael J. Wysocki wrote: > > > On Monday, 18 December 2006 18:02, Jiri Slaby wrote: > > > > Rafael J. Wysocki wrote: > > > > > Hi, > > > > > > > > > > On Monday, 18 December 2006 12:20, Jiri Slaby wrote: > > > > >> Hi. > > > > >> > > > > >> I got this oops while suspending: > > > > >> [ 309.366557] Disabling non-boot CPUs ... > > > > >> [ 309.386563] CPU 1 is now offline > > > > >> [ 309.387625] CPU1 is down > > > > >> [ 309.387704] Stopping tasks ... done. > > > > >> [ 310.030991] Shrinking memory... -<0>divide error: 0000 [#1] > > > > >> [ 310.456669] SMP > > > > >> [ 310.456814] last sysfs file: > > > > >> /devices/pci0000:00/0000:00:1e.0/0000:02:08.0/eth0/statistics/collisions > > > > >> [ 310.456919] Modules linked in: eth1394 floppy ohci1394 ide_cd ieee1394 cdrom > > > > >> [ 310.457259] CPU: 0 > > > > >> [ 310.457260] EIP: 0060:[<c0150c9a>] Not tainted VLI > > > > >> [ 310.457261] EFLAGS: 00210246 (2.6.20-rc1-mm1 #207) > > > > >> [ 310.457478] EIP is at shrink_slab+0x9e/0x169 > > > > > > > > > > Looks like we have a problem with slab shrinking here. > > > > > > > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > > > > > > > Sure, but not till Friday, sorry (I am away). > > > > > > I reproduced this on one box, but then it turned out that EIP was at line 195 > > > of mm/vmscan.c where there was > > > > > > do_div(delta, lru_pages + 1); > > > > > > Well, I have no idea how this can lead to a divide error (lru_pages is > > > unsigned). > > > > > > I'm unable to reproduce this on another i386 box, so it seems to be somewhat > > > configuration specific. > > > > > > Does 2.6.20-rc1 work for you? > > > > I have a patch in -mm that reduces lru_pages by what shrink_all_zones > > returns. Could shrink_all_zones perhaps be returning incorrect values > > such that lru_pages ends up becoming -1? > > I don't think so, but look at the appended patch. ;-) > > Greetings, > Rafael > > > --- > Fix a (really bad) typo in shrink_all_memory(). > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> > --- > mm/vmscan.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6.20-rc1-mm1/mm/vmscan.c > =================================================================== > --- linux-2.6.20-rc1-mm1.orig/mm/vmscan.c > +++ linux-2.6.20-rc1-mm1/mm/vmscan.c > @@ -1569,7 +1569,7 @@ unsigned long shrink_all_memory(unsigned > sc.swap_cluster_max = nr_pages - ret; > freed = shrink_all_zones(nr_to_scan, prio, pass, &sc); > ret += freed; > - lru_pages =- freed; > + lru_pages -= freed; > nr_to_scan = nr_pages - ret; > if (ret >= nr_pages) > goto out; Heh, yeah. Definitely acked! :) Nigel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 22:38 ` Rafael J. Wysocki 2006-12-18 22:44 ` Nigel Cunningham @ 2006-12-18 23:17 ` Andrew Morton 2006-12-19 0:52 ` Rafael J. Wysocki 2006-12-19 1:18 ` David Rientjes 1 sibling, 2 replies; 12+ messages in thread From: Andrew Morton @ 2006-12-18 23:17 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, linux-pm On Mon, 18 Dec 2006 23:38:23 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > Looks like we have a problem with slab shrinking here. > > > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > > > Sure, but not till Friday, sorry (I am away). > > I reproduced this on one box, but then it turned out that EIP was at line 195 > of mm/vmscan.c where there was > > do_div(delta, lru_pages + 1); That implies that we passed it lru_pages=-1. Presumably the logic in vmscanc-account-for-memory-already-freed-in-seeking-to.patch caused that. > Well, I have no idea how this can lead to a divide error (lru_pages is > unsigned). > > I'm unable to reproduce this on another i386 box, so it seems to be somewhat > configuration specific. > There is one wart in shrink_all_memory() and I think we should fix that in 2.6.20. Please check the below. I'll drop vmscanc-account-for-memory-already-freed-in-seeking-to.patch. It has other stuff in it which we might still need. But altering sc->swap_cluster_max in that manner looks odd. From: Andrew Morton <akpm@osdl.org> At the end of shrink_all_memory() we forget to recalculate lru_pages: it can be zero. Fix that up, and add a helper function for this operation too. Also, recalculate lru_pages each time around the inner loop to get the balancing correct. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> --- mm/vmscan.c | 33 ++++++++++++++++----------------- 1 files changed, 16 insertions(+), 17 deletions(-) diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling +++ a/mm/vmscan.c @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un return ret; } +static unsigned long count_lru_pages(void) +{ + struct zone *zone; + unsigned long ret = 0; + + for_each_zone(zone); + ret += zone->nr_active + zone->nr_inactive; + return ret; +} + /* * Try to free `nr_pages' of memory, system-wide, and return the number of * freed pages. @@ -1498,7 +1508,6 @@ unsigned long shrink_all_memory(unsigned unsigned long ret = 0; int pass; struct reclaim_state reclaim_state; - struct zone *zone; struct scan_control sc = { .gfp_mask = GFP_KERNEL, .may_swap = 0, @@ -1509,10 +1518,7 @@ unsigned long shrink_all_memory(unsigned current->reclaim_state = &reclaim_state; - lru_pages = 0; - for_each_zone(zone) - lru_pages += zone->nr_active + zone->nr_inactive; - + lru_pages = count_lru_pages(); nr_slab = global_page_state(NR_SLAB_RECLAIMABLE); /* If slab caches are huge, it's better to hit them first */ while (nr_slab >= lru_pages) { @@ -1539,13 +1545,6 @@ unsigned long shrink_all_memory(unsigned for (pass = 0; pass < 5; pass++) { int prio; - /* Needed for shrinking slab caches later on */ - if (!lru_pages) - for_each_zone(zone) { - lru_pages += zone->nr_active; - lru_pages += zone->nr_inactive; - } - /* Force reclaiming mapped pages in the passes #3 and #4 */ if (pass > 2) { sc.may_swap = 1; @@ -1561,7 +1560,8 @@ unsigned long shrink_all_memory(unsigned goto out; reclaim_state.reclaimed_slab = 0; - shrink_slab(sc.nr_scanned, sc.gfp_mask, lru_pages); + shrink_slab(sc.nr_scanned, sc.gfp_mask, + count_lru_pages()); ret += reclaim_state.reclaimed_slab; if (ret >= nr_pages) goto out; @@ -1569,20 +1569,19 @@ unsigned long shrink_all_memory(unsigned if (sc.nr_scanned && prio < DEF_PRIORITY - 2) congestion_wait(WRITE, HZ / 10); } - - lru_pages = 0; } /* * If ret = 0, we could not shrink LRUs, but there may be something * in slab caches */ - if (!ret) + if (!ret) { do { reclaim_state.reclaimed_slab = 0; - shrink_slab(nr_pages, sc.gfp_mask, lru_pages); + shrink_slab(nr_pages, sc.gfp_mask, count_lru_pages()); ret += reclaim_state.reclaimed_slab; } while (ret < nr_pages && reclaim_state.reclaimed_slab > 0); + } out: current->reclaim_state = NULL; _ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 23:17 ` Andrew Morton @ 2006-12-19 0:52 ` Rafael J. Wysocki 2006-12-19 1:18 ` David Rientjes 1 sibling, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2006-12-19 0:52 UTC (permalink / raw) To: Andrew Morton; +Cc: Jiri Slaby, linux-pm, Linux kernel mailing list, linux-pm On Tuesday, 19 December 2006 00:17, Andrew Morton wrote: > On Mon, 18 Dec 2006 23:38:23 +0100 > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > Looks like we have a problem with slab shrinking here. > > > > > > > > Could you please use gdb to check what exactly is at shrink_slab+0x9e? > > > > > > Sure, but not till Friday, sorry (I am away). > > > > I reproduced this on one box, but then it turned out that EIP was at line 195 > > of mm/vmscan.c where there was > > > > do_div(delta, lru_pages + 1); > > That implies that we passed it lru_pages=-1. > > Presumably the logic in > vmscanc-account-for-memory-already-freed-in-seeking-to.patch caused that. > > > Well, I have no idea how this can lead to a divide error (lru_pages is > > unsigned). > > > > I'm unable to reproduce this on another i386 box, so it seems to be somewhat > > configuration specific. > > > > There is one wart in shrink_all_memory() and I think we should fix that in > 2.6.20. > > Please check the below. Fine by me. > I'll drop vmscanc-account-for-memory-already-freed-in-seeking-to.patch. It > has other stuff in it which we might still need. But altering > sc->swap_cluster_max in that manner looks odd. Agreed. Greetings, Rafael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-18 23:17 ` Andrew Morton 2006-12-19 0:52 ` Rafael J. Wysocki @ 2006-12-19 1:18 ` David Rientjes 2006-12-19 1:28 ` Andrew Morton 1 sibling, 1 reply; 12+ messages in thread From: David Rientjes @ 2006-12-19 1:18 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Jiri Slaby, linux-pm, Linux kernel mailing list, linux-pm On Mon, 18 Dec 2006, Andrew Morton wrote: > diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c > --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling > +++ a/mm/vmscan.c > @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un > return ret; > } > > +static unsigned long count_lru_pages(void) > +{ > + struct zone *zone; > + unsigned long ret = 0; > + > + for_each_zone(zone); > + ret += zone->nr_active + zone->nr_inactive; > + return ret; > +} > + > /* > * Try to free `nr_pages' of memory, system-wide, and return the number of > * freed pages. There's an extra semicolon there that results in only the final zone being used. David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-pm] OOPS: divide error while s2dsk (2.6.20-rc1-mm1) 2006-12-19 1:18 ` David Rientjes @ 2006-12-19 1:28 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2006-12-19 1:28 UTC (permalink / raw) To: David Rientjes Cc: Rafael J. Wysocki, Jiri Slaby, linux-pm, Linux kernel mailing list, linux-pm On Mon, 18 Dec 2006 17:18:12 -0800 (PST) David Rientjes <rientjes@cs.washington.edu> wrote: > On Mon, 18 Dec 2006, Andrew Morton wrote: > > > diff -puN mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling mm/vmscan.c > > --- a/mm/vmscan.c~shrink_all_memory-fix-lru_pages-handling > > +++ a/mm/vmscan.c > > @@ -1484,6 +1484,16 @@ static unsigned long shrink_all_zones(un > > return ret; > > } > > > > +static unsigned long count_lru_pages(void) > > +{ > > + struct zone *zone; > > + unsigned long ret = 0; > > + > > + for_each_zone(zone); > > + ret += zone->nr_active + zone->nr_inactive; > > + return ret; > > +} > > + > > /* > > * Try to free `nr_pages' of memory, system-wide, and return the number of > > * freed pages. > > There's an extra semicolon there Sigh. coding-while-diseased. > that results in only the final zone being > used. > Actually it'll go oops. Fixed, thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-12-19 1:28 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-12-18 11:20 OOPS: divide error while s2dsk (2.6.20-rc1-mm1) Jiri Slaby 2006-12-18 15:46 ` [linux-pm] " Rafael J. Wysocki 2006-12-18 17:02 ` Jiri Slaby 2006-12-18 20:59 ` Andrew Morton 2006-12-18 22:38 ` Rafael J. Wysocki 2006-12-18 22:44 ` Nigel Cunningham 2006-12-18 23:09 ` Rafael J. Wysocki 2006-12-18 23:16 ` Nigel Cunningham 2006-12-18 23:17 ` Andrew Morton 2006-12-19 0:52 ` Rafael J. Wysocki 2006-12-19 1:18 ` David Rientjes 2006-12-19 1:28 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox