* Re: Bad page state on AMD Opteron Dual System with kernel 2.6.13-rc6-git13 [not found] <20050826165342.GA11796@pbkg4> @ 2005-08-28 0:20 ` Daniel Drake 2005-08-29 5:24 ` Tim Weippert 0 siblings, 1 reply; 4+ messages in thread From: Daniel Drake @ 2005-08-28 0:20 UTC (permalink / raw) To: Tim Weippert; +Cc: akpm, cpufreq, discuss, linux-kernel, davej Hi, Tim Weippert wrote: > i have read some postings concerning the following Kernel Messages: > > Aug 26 18:04:01 montdsnsu3 kernel: grep[11619] general protection > rip:2aaaaaaaed43 rsp:7fffff9c0740 error:0 > Aug 26 18:08:02 montdsnsu3 kernel: ping[14867] general protection > rip:2aaaaaaaed43 rsp:7fffffdbf300 error:0 > Aug 26 18:08:03 montdsnsu3 kernel: grep[14987] general protection > rip:2aaaaaaaed43 rsp:7fffffdbfce0 error:0 > Aug 26 18:08:03 montdsnsu3 kernel: grep[15041] general protection > rip:2aaaaaaaed43 rsp:7fffff9bf550 error:0 > > And the Bad Page State Messages: > > Bad page state at prep_new_page (in process 'sh', page ffff8100011a69c8) > flags:0x0100000000000014 mapping:0000000000000000 mapcount:-3 count:0 > Backtrace: > > Call Trace:<ffffffff8015a653>{bad_page+99} > <ffffffff8015aa31>{prep_new_page+65} > <ffffffff8015b1e2>{buffered_rmqueue+306} > <ffffffff8015b425>{__alloc_pages+261} > <ffffffff8015b7c5>{get_zeroed_page+37} > <ffffffff801691b7>{__pmd_alloc+55} > <ffffffff8016614e>{copy_page_range+462} > <ffffffff80131da4>{copy_mm+820} > <ffffffff80132cba>{copy_process+2282} > <ffffffff80133407>{do_fork+215} > <ffffffff8010db7a>{system_call+126} > <ffffffff8010deeb>{ptregscall_common+103} > > Trying to fix it up, but a reboot is needed Seems to be an identical problem as was filed here: http://bugs.gentoo.org/show_bug.cgi?id=103497 This bug report seems to suggest that the ondemand scaling governor may be at fault. Does your setup use this too? (CC'ing some extra people to make sure problem is known) > > I have the same issues on an SUN V20z with an dual opteron 248. > > montdsnsu3:~# lspci > 0000:00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev > 07) > 0000:00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev > 05) > 0000:00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE > (rev 03) > 0000:00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) > 0000:00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X > Bridge (rev 12) > 0000:00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev > 01) > 0000:00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X > Bridge (rev 12) > 0000:00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev > 01) > 0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 0000:01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB > (rev 0b) > 0000:01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB > (rev 0b) > 0000:01:05.0 VGA compatible controller: Trident Microsystems Blade 3D > PCI/AGP (rev 3a) > 0000:02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > Gigabit Ethernet (rev 03) > 0000:02:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > Gigabit Ethernet (rev 03) > 0000:02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 > PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) > > > With the running kernel i get 2 kernel panics within the last week and > the machine crash totally. > > I would like to offer my help if i can do anything in debugging this or > deal with more informations to fix this issue. > > HTH, > > weiti > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bad page state on AMD Opteron Dual System with kernel 2.6.13-rc6-git13 2005-08-28 0:20 ` Bad page state on AMD Opteron Dual System with kernel 2.6.13-rc6-git13 Daniel Drake @ 2005-08-29 5:24 ` Tim Weippert 2005-08-29 10:28 ` Tim Weippert 2005-08-29 10:49 ` Daniel Drake 0 siblings, 2 replies; 4+ messages in thread From: Tim Weippert @ 2005-08-29 5:24 UTC (permalink / raw) To: Daniel Drake; +Cc: akpm, cpufreq, discuss, linux-kernel, davej On Sun, Aug 28, 2005 at 01:20:51AM +0100, Daniel Drake wrote: > Hi, > > Tim Weippert wrote: > >i have read some postings concerning the following Kernel Messages: > > > >Aug 26 18:04:01 montdsnsu3 kernel: grep[11619] general protection > >rip:2aaaaaaaed43 rsp:7fffff9c0740 error:0 > >Aug 26 18:08:02 montdsnsu3 kernel: ping[14867] general protection > >rip:2aaaaaaaed43 rsp:7fffffdbf300 error:0 > >Aug 26 18:08:03 montdsnsu3 kernel: grep[14987] general protection > >rip:2aaaaaaaed43 rsp:7fffffdbfce0 error:0 > >Aug 26 18:08:03 montdsnsu3 kernel: grep[15041] general protection > >rip:2aaaaaaaed43 rsp:7fffff9bf550 error:0 > > > >And the Bad Page State Messages: > > > >Bad page state at prep_new_page (in process 'sh', page ffff8100011a69c8) > >flags:0x0100000000000014 mapping:0000000000000000 mapcount:-3 count:0 > >Backtrace: > > > >Call Trace:<ffffffff8015a653>{bad_page+99} > ><ffffffff8015aa31>{prep_new_page+65} > > <ffffffff8015b1e2>{buffered_rmqueue+306} > ><ffffffff8015b425>{__alloc_pages+261} > > <ffffffff8015b7c5>{get_zeroed_page+37} > ><ffffffff801691b7>{__pmd_alloc+55} > > <ffffffff8016614e>{copy_page_range+462} > ><ffffffff80131da4>{copy_mm+820} > > <ffffffff80132cba>{copy_process+2282} > ><ffffffff80133407>{do_fork+215} > > <ffffffff8010db7a>{system_call+126} > ><ffffffff8010deeb>{ptregscall_common+103} > > > >Trying to fix it up, but a reboot is needed > > Seems to be an identical problem as was filed here: > > http://bugs.gentoo.org/show_bug.cgi?id=103497 > > This bug report seems to suggest that the ondemand scaling governor may be > at fault. Does your setup use this too? > > (CC'ing some extra people to make sure problem is known) > As this is an Server, i don't even use cpufreq on this machine. So it think this isn't the same problem ... Kind regards, weiti p.s: Please CC me, cause i am not subscribed on lkml. -- Interpunktion und Orthographie dieser Email ist frei erfunden. Eine Übereinstimmung mit aktuellen oder ehemaligen Regeln wäre rein zufällig und ist nicht beabsichtigt. Tim Weippert <weiti@topf-sicret.org> http://www.topf-sicret.org/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bad page state on AMD Opteron Dual System with kernel 2.6.13-rc6-git13 2005-08-29 5:24 ` Tim Weippert @ 2005-08-29 10:28 ` Tim Weippert 2005-08-29 10:49 ` Daniel Drake 1 sibling, 0 replies; 4+ messages in thread From: Tim Weippert @ 2005-08-29 10:28 UTC (permalink / raw) Cc: akpm, davej, discuss, cpufreq, linux-kernel, Daniel Drake Hi, On Mon, Aug 29, 2005 at 07:24:54AM +0200, Tim Weippert wrote: > On Sun, Aug 28, 2005 at 01:20:51AM +0100, Daniel Drake wrote: > > > > Seems to be an identical problem as was filed here: > > > > http://bugs.gentoo.org/show_bug.cgi?id=103497 > > > > This bug report seems to suggest that the ondemand scaling governor may be > > at fault. Does your setup use this too? > > > > (CC'ing some extra people to make sure problem is known) > > > > As this is an Server, i don't even use cpufreq on this machine. So it > think this isn't the same problem ... Update, with stable 2.6.13. I get nearly the same behavior. One new oops: swap_free: Bad swap file entry c000007fffff802f swap_free: Bad swap file entry c800007fffff802f swap_free: Bad swap file entry d000007fffff802f swap_free: Bad swap file entry d800007fffff802f swap_free: Bad swap file entry e000007fffff802f swap_free: Bad swap file entry 4000000000000000 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at "mm/rmap.c":493 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: autofs4 floppy i2c_amd756 i2c_core hw_random ohci_hcd tg3 tsdev evdev evbug psmouse genrtc unix Pid: 9014, comm: sh Not tainted 2.6.13 RIP: 0010:[<ffffffff8016e9ab>] <ffffffff8016e9ab>{page_remove_rmap+43} RSP: 0018:ffff8100481c3da0 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffff81004a5fc420 RCX: ffff81000000d000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8100011a69c8 RBP: 0000000000484000 R08: 0000000000000001 R09: 000000000000000f R10: 0000000000000001 R11: 0000000000000000 R12: 00000000078bfbff R13: ffff810040e133e0 R14: ffff8100011a69c8 R15: 0000000000000000 FS: 00000000457ff970(0000) GS:ffffffff8056f880(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaaabd000 CR3: 0000000048205000 CR4: 00000000000006e0 Process sh (pid: 9014, threadinfo ffff8100481c2000, task ffff810048e7e270) Stack: ffffffff801663f4 0000000000497000 ffff81004937f010 0000000000497000 0000000000497000 0000000000496fff ffff8100497dd000 0000000000497000 ffffffff801666ab 0000000000000000 Call Trace:<ffffffff801663f4>{zap_pte_range+436} <ffffffff801666ab>{unmap_page_range+507} <ffffffff80166815>{unmap_vmas+293} <ffffffff8016c4d2>{exit_mmap+162} <ffffffff801318b1>{mmput+49} <ffffffff80136d3a>{do_exit+442} <ffffffff801370c0>{sys_exit_group+0} <ffffffff8010db7a>{system_call+126} Code: 0f 0b a3 b4 5b 3f 80 ff ff ff ff c2 ed 01 66 66 66 90 66 66 RIP <ffffffff8016e9ab>{page_remove_rmap+43} RSP <ffff8100481c3da0> <1>Fixing recursive fault but reboot is needed! With this i get an hanging [sh] process which can't be killed, only cleanable with reboot: www-data 7701 0.0 0.3 74448 6452 ? S 11:56 0:00 /usr/sbin/cactid 0 93 www-data 7721 0.0 0.5 56296 10504 ? S 11:56 0:00 \_ /usr/bin/php /usr/share/cacti/site/script_server.php cactid 0 www-data 9014 0.0 0.0 0 0 ? D 11:56 0:00 \_ [sh] The machine is an cacti system with generally high load ... seems the kernel does only have problems on higher load. HTH, weiti -- Interpunktion und Orthographie dieser Email ist frei erfunden. Eine Übereinstimmung mit aktuellen oder ehemaligen Regeln wäre rein zufällig und ist nicht beabsichtigt. Tim Weippert <weiti@topf-sicret.org> http://www.topf-sicret.org/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bad page state on AMD Opteron Dual System with kernel 2.6.13-rc6-git13 2005-08-29 5:24 ` Tim Weippert 2005-08-29 10:28 ` Tim Weippert @ 2005-08-29 10:49 ` Daniel Drake 1 sibling, 0 replies; 4+ messages in thread From: Daniel Drake @ 2005-08-29 10:49 UTC (permalink / raw) To: Tim Weippert; +Cc: akpm, cpufreq, discuss, linux-kernel, davej Tim Weippert wrote: > As this is an Server, i don't even use cpufreq on this machine. So it > think this isn't the same problem ... Maybe you could post your experiences here: http://bugzilla.kernel.org/show_bug.cgi?id=5133 Daniel ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-08-29 10:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050826165342.GA11796@pbkg4>
2005-08-28 0:20 ` Bad page state on AMD Opteron Dual System with kernel 2.6.13-rc6-git13 Daniel Drake
2005-08-29 5:24 ` Tim Weippert
2005-08-29 10:28 ` Tim Weippert
2005-08-29 10:49 ` Daniel Drake
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox