The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: Re: unused swap offset / bad page map.
       [not found]                   ` <lLujx-5DP-31@gated-at.bofh.it>
@ 2013-11-12 18:05                     ` Alin Dobre
  0 siblings, 0 replies; only message in thread
From: Alin Dobre @ 2013-11-12 18:05 UTC (permalink / raw)
  To: linux-kernel

On 27/08/13 17:30, Dave Jones wrote:
> Seems to do the trick.

We are running many virtualization hosts with Linux 3.11.3, qemu 1.6.1 + 
kvm and ksm. The hosts have 128GB RAM, 10GB swap and 24x AMD Opteron 
6238 cores.

Several times few weeks ago, we have seen the OOM killer come to life 
and quickly kill a large number of VMs on a host, even when there 
appears to be free memory on that host at the start of this.

However the OOM killings are preceded by some other traces, similar to 
the ones that were reported by Dave couple of months ago in this very 
thread (https://lkml.org/lkml/2013/8/7/27).

The relevant kernel log lines read:

20:30:44 kernel: swap_free: Unused swap file entry 200000000000200
20:30:44 kernel: BUG: Bad page map in process qemu-system-x86 
pte:00040002 pmd:1ecc0d4067
20:30:44 kernel: addr:00007f5b8b404000 vm_flags:80100073 
anon_vma:ffff880ff0e9df00 mapping:          (null) index:7f5b8b404
20:30:44 kernel: CPU: 9 PID: 22652 Comm: qemu-system-x86 Not tainted 
3.11.2-elastic #2
20:30:44 kernel: Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 
2.0b       03/01/2012
20:30:44 kernel: 00007f5b8b404000 ffff8807b76b1ab8 ffffffff817ee7a6 
00000000000400f6
20:30:44 kernel: ffff880ea36a0e60 ffff8807b76b1b08 ffffffff81135ed5 
000000000000000e
20:30:44 kernel: 00000007f5b8b404 ffff8807b76b1b08 00007f5b8b404000 
ffff880ea36a0e60
20:30:44 kernel: Call Trace:
20:30:44 kernel: [<ffffffff817ee7a6>] dump_stack+0x55/0x86
20:30:44 kernel: [<ffffffff81135ed5>] print_bad_pte+0x1f5/0x213
20:30:44 kernel: [<ffffffff811379fd>] unmap_single_vma+0x509/0x6d6
20:30:44 kernel: [<ffffffff81138291>] unmap_vmas+0x4d/0x80
20:30:44 kernel: [<ffffffff8113e615>] exit_mmap+0x93/0x11e
20:30:44 kernel: [<ffffffff810bc2fb>] mmput+0x51/0xdb
20:30:44 kernel: [<ffffffff810c00b1>] do_exit+0x33c/0x8a2
20:30:44 kernel: [<ffffffff810f58ab>] ? get_futex_key+0x87/0x20c
20:30:44 kernel: [<ffffffff810c7215>] ? __dequeue_signal+0x16/0x114
20:30:44 kernel: [<ffffffff810c06af>] do_group_exit+0x6a/0x9d
20:30:44 kernel: [<ffffffff810c956a>] get_signal_to_deliver+0x488/0x4a7
20:30:44 kernel: [<ffffffff81032db9>] do_signal+0x47/0x48f
20:30:44 kernel: [<ffffffff8110dc29>] ? rcu_eqs_enter+0x7d/0x82
20:30:44 kernel: [<ffffffff810e0ff4>] ? account_user_time+0x6a/0x95
20:30:44 kernel: [<ffffffff810e13b6>] ? vtime_account_user+0x5d/0x65
20:30:44 kernel: [<ffffffff81033229>] do_notify_resume+0x28/0x6a
20:30:44 kernel: [<ffffffff817f6358>] int_signal+0x12/0x17
20:30:44 kernel: Disabling lock debugging due to kernel taint
20:30:44 kernel: 33550335 pages RAM
20:30:44 kernel: 561601 pages reserved
20:30:44 kernel: 24628376 pages shared
20:30:44 kernel: 7190750 pages non-shared

Since we are using a 3.11.3 kernel, it already contains Cyrill's fix. 
However, our kernel log is very similar to Dave's report, so we are 
wondering if our mass OOM kill is another problem in the same area?

Any thoughts on this? I can provide more information from the logs, if 
necessary, and my colleague Richard originally reported the mass OOM 
kill in detail at http://article.gmane.org/gmane.linux.kernel.mm/108703.

Cheers,
Alin.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-11-12 18:34 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <lJQeu-5tO-9@gated-at.bofh.it>
     [not found] ` <lJQHv-5W5-3@gated-at.bofh.it>
     [not found]   ` <lJQHv-5W5-1@gated-at.bofh.it>
     [not found]     ` <lKVYt-5fo-1@gated-at.bofh.it>
     [not found]       ` <lLakO-6hD-15@gated-at.bofh.it>
     [not found]         ` <lLbqy-7B9-31@gated-at.bofh.it>
     [not found]           ` <lLbTA-824-19@gated-at.bofh.it>
     [not found]             ` <lLd90-1jl-19@gated-at.bofh.it>
     [not found]               ` <lLdsl-1BG-15@gated-at.bofh.it>
     [not found]                 ` <lLmYF-4Sc-1@gated-at.bofh.it>
     [not found]                   ` <lLujx-5DP-31@gated-at.bofh.it>
2013-11-12 18:05                     ` Re: unused swap offset / bad page map Alin Dobre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox