* error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels
@ 2007-10-16 17:17 Arnaud Fontaine
2007-10-16 18:35 ` Dave Jones
0 siblings, 1 reply; 5+ messages in thread
From: Arnaud Fontaine @ 2007-10-16 17:17 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 60 bytes --]
Hello,
We have often the following error from the kernel:
[-- Attachment #2: kern.log --]
[-- Type: text/plain, Size: 3158 bytes --]
sshd[1551] trap invalid opcode rip:2aeacc0677a0 rsp:7fffe0c7e688 error:0
Eeek! page_mapcount(page) went negative! (-1)
page pfn = 7f7a8
page->flags = 400000000001002c
page->count = 1
page->mapping = ffff810056170550
vma->vm_ops = 0xffffffff80667ba0
vma->vm_ops->nopage = _stext+0x7fdf7000/0x20
vma->vm_ops->fault = filemap_fault+0x0/0x450
vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x50
------------[ cut here ]------------
kernel BUG at mm/rmap.c:630!
invalid opcode: 0000 [1] SMP
CPU 0
Pid: 2554, comm: atop Not tainted 2.6.23.1-ipot #1
RIP: 0010:[<ffffffff8027550b>] [<ffffffff8027550b>] page_remove_rmap+0x12b/0x140
RSP: 0018:ffff810075183d98 EFLAGS: 00010296
RAX: 000000000000003b RBX: ffff810002be2cc0 RCX: 0000000000000001
RDX: ffffffff80663968 RSI: 0000000000000086 RDI: ffffffff80663960
RBP: ffff81007dccf5d0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 00002ac92645c000
R13: ffff810002be2cc0 R14: 00002ac926462000 R15: 0000000000026000
FS: 0000000000000000(0000) GS:ffffffff806b4000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002ac92705a020 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process atop (pid: 2554, threadinfo ffff810075182000, task ffff81007fe6f7d0)
Stack: ffff8100013656f8 00002ac926462000 ffff8100751782e0 ffffffff8026cef3
0000000000000000 ffffffff807235e0 ffff810002c0e5f8 00002ac926461fff
0000000000000000 ffff810075183eb8 ffffffffffffffff 0000000000000000
Call Trace:
[<ffffffff8026cef3>] unmap_vmas+0x4f3/0x7e0
[<ffffffff80271518>] exit_mmap+0x78/0x100
[<ffffffff80230966>] mmput+0x26/0xb0
[<ffffffff80236458>] do_exit+0x198/0x900
[<ffffffff80236bec>] do_group_exit+0x2c/0x80
[<ffffffff8020bbde>] system_call+0x7e/0x83
Code: 0f 0b eb fe 48 8b 53 10 e9 65 ff ff ff 0f 1f 84 00 00 00 00
RIP [<ffffffff8027550b>] page_remove_rmap+0x12b/0x140
RSP <ffff810075183d98>
Fixing recursive fault but reboot is needed!
Bad page state in process 'kswapd0'
page:ffff810002be2cc0 flags:0x4000000000010008 mapping:0000000000000000 mapcount:-1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Call Trace:
[<ffffffff80263650>] bad_page+0x60/0xa0
[<ffffffff80263f65>] free_hot_cold_page+0x345/0x350
[<ffffffff80263f94>] __pagevec_free+0x24/0x30
[<ffffffff80267210>] __pagevec_release_nonlru+0x60/0x70
[<ffffffff8026885a>] shrink_page_list+0x26a/0x640
[<ffffffff80267cfb>] isolate_lru_pages+0x8b/0x240
[<ffffffff80268d79>] shrink_inactive_list+0x149/0x3f0
[<ffffffff802690f0>] shrink_zone+0xd0/0x140
[<ffffffff8026975d>] kswapd+0x3dd/0x520
[<ffffffff80247560>] autoremove_wake_function+0x0/0x30
[<ffffffff80269380>] kswapd+0x0/0x520
[<ffffffff8024719b>] kthread+0x4b/0x80
[<ffffffff8020c9f8>] child_rip+0xa/0x12
[<ffffffff80247150>] kthread+0x0/0x80
[<ffffffff8020c9ee>] child_rip+0x0/0x12
list_lists[27598]: segfault at 0000000000000038 rip 000000000043b95a rsp 00007fff368010d0 error 4
[-- Attachment #3: Type: text/plain, Size: 240 bytes --]
We have tested with different kernel (2.6.23.1 and 2.6.22) and the same
error happens with different process. Any idea for knowing what could
cause this error?
Please Cc me as I'm not subscribed to the list.
Regards,
Arnaud Fontaine
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels
2007-10-16 17:17 error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels Arnaud Fontaine
@ 2007-10-16 18:35 ` Dave Jones
2007-10-16 23:03 ` Arnaud Fontaine
0 siblings, 1 reply; 5+ messages in thread
From: Dave Jones @ 2007-10-16 18:35 UTC (permalink / raw)
To: Arnaud Fontaine; +Cc: linux-kernel
On Tue, Oct 16, 2007 at 07:17:32PM +0200, Arnaud Fontaine wrote:
> Hello,
>
> We have often the following error from the kernel:
>
> sshd[1551] trap invalid opcode rip:2aeacc0677a0 rsp:7fffe0c7e688 error:0
> Eeek! page_mapcount(page) went negative! (-1)
> page pfn = 7f7a8
> page->flags = 400000000001002c
> page->count = 1
> page->mapping = ffff810056170550
> vma->vm_ops = 0xffffffff80667ba0
> vma->vm_ops->nopage = _stext+0x7fdf7000/0x20
> vma->vm_ops->fault = filemap_fault+0x0/0x450
> vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x50
> ....
>
> We have tested with different kernel (2.6.23.1 and 2.6.22) and the same
> error happens with different process. Any idea for knowing what could
> cause this error?
Many of these that I've seen have turned out to be a hardware problem.
Try running memtest86+ on that machine for a while.
It doesn't catch all problems, but it will highlight more common memory faults.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels
2007-10-16 18:35 ` Dave Jones
@ 2007-10-16 23:03 ` Arnaud Fontaine
2007-10-17 2:36 ` Dave Jones
2007-10-18 11:25 ` Goswin von Brederlow
0 siblings, 2 replies; 5+ messages in thread
From: Arnaud Fontaine @ 2007-10-16 23:03 UTC (permalink / raw)
To: linux-kernel
>>>>> "Dave" == Dave Jones <davej@redhat.com> writes:
Dave> Many of these that I've seen have turned out to be a hardware
Dave> problem. Try running memtest86+ on that machine for a while.
Dave> It doesn't catch all problems, but it will highlight more
Dave> common memory faults.
Hello,
We ran memtest86+ before production, it was about one month ago. Do you
think it could come from that anyway?
Regards,
Arnaud Fontaine
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels
2007-10-16 23:03 ` Arnaud Fontaine
@ 2007-10-17 2:36 ` Dave Jones
2007-10-18 11:25 ` Goswin von Brederlow
1 sibling, 0 replies; 5+ messages in thread
From: Dave Jones @ 2007-10-17 2:36 UTC (permalink / raw)
To: Arnaud Fontaine; +Cc: linux-kernel
On Wed, Oct 17, 2007 at 01:03:02AM +0200, Arnaud Fontaine wrote:
> >>>>> "Dave" == Dave Jones <davej@redhat.com> writes:
>
> Dave> Many of these that I've seen have turned out to be a hardware
> Dave> problem. Try running memtest86+ on that machine for a while.
> Dave> It doesn't catch all problems, but it will highlight more
> Dave> common memory faults.
>
> Hello,
>
> We ran memtest86+ before production, it was about one month ago. Do you
> think it could come from that anyway?
Not impossible. Hardware failures can occur at any time.
Somewhat unlikely though. As I mentioned, memtest also doesn't trap
all hardware problems. I have a board that passes memtest with flying
colours, yet dies under even slight load. Examination of the board
shows that it has leaking capacitors.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels
2007-10-16 23:03 ` Arnaud Fontaine
2007-10-17 2:36 ` Dave Jones
@ 2007-10-18 11:25 ` Goswin von Brederlow
1 sibling, 0 replies; 5+ messages in thread
From: Goswin von Brederlow @ 2007-10-18 11:25 UTC (permalink / raw)
To: Arnaud Fontaine; +Cc: linux-kernel
Arnaud Fontaine <arnaud@andesi.org> writes:
>>>>>> "Dave" == Dave Jones <davej@redhat.com> writes:
>
> Dave> Many of these that I've seen have turned out to be a hardware
> Dave> problem. Try running memtest86+ on that machine for a while.
> Dave> It doesn't catch all problems, but it will highlight more
> Dave> common memory faults.
>
> Hello,
>
> We ran memtest86+ before production, it was about one month ago. Do you
> think it could come from that anyway?
I find that a lot of the time memtest does not reveal an error. Only
when you combine multiple sources or on random access do you get
errors. For example compiling a kernel while doing heavy I/O on the
disk. But that might just be me. Errors are rather random occurances.
Compiling a kernel repeadatly and multiple in parallel is usualy a
good test. If it sometimes fails to compile then it is near certain a
hardware error.
MfG
Goswin
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-10-18 11:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 17:17 error: Eeek! page_mapcount(page) went negative! (-1) with different process and kernels Arnaud Fontaine
2007-10-16 18:35 ` Dave Jones
2007-10-16 23:03 ` Arnaud Fontaine
2007-10-17 2:36 ` Dave Jones
2007-10-18 11:25 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox