All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Samu Kallio <samu.kallio@aberdeencloud.com>, mingo@redhat.com
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	xen-devel@lists.xensource.com
Subject: Re: x86: mm: Fix vmalloc_fault oops during lazy MMU updates.
Date: Fri, 22 Feb 2013 20:06:07 -0500	[thread overview]
Message-ID: <20130223010607.GA15337@phenom.dumpdata.com> (raw)
In-Reply-To: <CAOn_CrHifkBieZumEBje1FiSU3abJrZ8QRKK4FGkKbp217He9w@mail.gmail.com>

On Thu, Feb 21, 2013 at 05:56:35PM +0200, Samu Kallio wrote:
> On Thu, Feb 21, 2013 at 2:33 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Sun, Feb 17, 2013 at 02:35:52AM -0000, Samu Kallio wrote:
> >> In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
> >> when lazy MMU updates are enabled, because set_pgd effects are being
> >> deferred.
> >>
> >> One instance of this problem is during process mm cleanup with memory
> >> cgroups enabled. The chain of events is as follows:
> >>
> >> - zap_pte_range enables lazy MMU updates
> >> - zap_pte_range eventually calls mem_cgroup_charge_statistics,
> >>   which accesses the vmalloc'd mem_cgroup per-cpu stat area
> >> - vmalloc_fault is triggered which tries to sync the corresponding
> >>   PGD entry with set_pgd, but the update is deferred
> >> - vmalloc_fault oopses due to a mismatch in the PUD entries
> >>
> >> Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
> >> changes visible to the consistency checks.
> >
> > How do you reproduce this? Is there a BUG() or WARN() trace that
> > is triggered when this happens?
> 
> In my case I've seen this triggered on an Amazon EC2 (Xen PV) instance
> under heavy load spawning many LXC containers. The best I can say at
> this point is that the frequency of this bug seems to be linked to how
> busy the machine is.
> 
> The earliest report of this problem was from 3.3:
>     http://comments.gmane.org/gmane.linux.kernel.cgroups/5540
> I can personally confirm the issue since 3.5.
> 
> Here's a sample bug report from a 3.7 kernel (vanilla with Xen XSAVE patch
> for EC2 compatibility). The latest kernel version I have tested and seen this
> problem occur is 3.7.9.

Ingo,

I am OK with this patch. Are you OK taking this in or should I take
it (and add the nice RIP below)?

It should also have CC: stable@vger.kernel.org on it.

FYI, There is also a Red Hat bug for this: https://bugzilla.redhat.com/show_bug.cgi?id=914737

> 
> [11852214.733630] ------------[ cut here ]------------
> [11852214.733642] kernel BUG at arch/x86/mm/fault.c:397!
> [11852214.733648] invalid opcode: 0000 [#1] SMP
> [11852214.733654] Modules linked in: veth xt_nat xt_comment fuse btrfs
> libcrc32c zlib_deflate ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
> xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> bridge stp llc iptable_filter ip_tables x_tables ghash_clmulni_intel
> aesni_intel aes_x86_64 ablk_helper cryptd xts lrw gf128mul microcode
> ext4 crc16 jbd2 mbcache
> [11852214.733695] CPU 1
> [11852214.733700] Pid: 1617, comm: qmgr Not tainted 3.7.0-1-ec2 #1
> [11852214.733705] RIP: e030:[<ffffffff8143018d>]  [<ffffffff8143018d>]
> vmalloc_fault+0x14b/0x249
> [11852214.733725] RSP: e02b:ffff88083e57d7f8  EFLAGS: 00010046
> [11852214.733730] RAX: 0000000854046000 RBX: ffffe8ffffc80d70 RCX:
> ffff880000000000
> [11852214.733736] RDX: 00003ffffffff000 RSI: ffff880854046ff8 RDI:
> 0000000000000000
> [11852214.733744] RBP: ffff88083e57d818 R08: 0000000000000000 R09:
> ffff880000000ff8
> [11852214.733750] R10: 0000000000007ff0 R11: 0000000000000001 R12:
> ffff880854686e88
> [11852214.733758] R13: ffffffff8180ce88 R14: ffff88083e57d948 R15:
> 0000000000000000
> [11852214.733768] FS:  00007ff3bf0f8740(0000)
> GS:ffff88088b480000(0000) knlGS:0000000000000000
> [11852214.733777] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [11852214.733782] CR2: ffffe8ffffc80d70 CR3: 0000000854686000 CR4:
> 0000000000002660
> [11852214.733790] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [11852214.733796] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [11852214.733803] Process qmgr (pid: 1617, threadinfo
> ffff88083e57c000, task ffff88084474b3e0)
> [11852214.733810] Stack:
> [11852214.733814]  0000000000000029 0000000000000002 ffffe8ffffc80d70
> ffff88083e57d948
> [11852214.733828]  ffff88083e57d928 ffffffff8103e0c7 0000000000000000
> ffff88083e57d8d0
> [11852214.733840]  ffff88084474b3e0 0000000000000060 0000000000000000
> 0000000000006cf6
> [11852214.733852] Call Trace:
> [11852214.733861]  [<ffffffff8103e0c7>] __do_page_fault+0x2c7/0x4a0
> [11852214.733871]  [<ffffffff81004ac2>] ? xen_mc_flush+0xb2/0x1b0
> [11852214.733880]  [<ffffffff810032ce>] ? xen_end_context_switch+0x1e/0x30
> [11852214.733888]  [<ffffffff810043cb>] ? xen_write_msr_safe+0x9b/0xc0
> [11852214.733900]  [<ffffffff810125b3>] ? __switch_to+0x163/0x4a0
> [11852214.733907]  [<ffffffff8103e2de>] do_page_fault+0xe/0x10
> [11852214.733919]  [<ffffffff81437f98>] page_fault+0x28/0x30
> [11852214.733930]  [<ffffffff8115e873>] ?
> mem_cgroup_charge_statistics.isra.12+0x13/0x50
> [11852214.733940]  [<ffffffff8116012e>] __mem_cgroup_uncharge_common+0xce/0x2d0
> [11852214.733948]  [<ffffffff81007fee>] ? xen_pte_val+0xe/0x10
> [11852214.733958]  [<ffffffff8116391a>] mem_cgroup_uncharge_page+0x2a/0x30
> [11852214.733966]  [<ffffffff81139e78>] page_remove_rmap+0xf8/0x150
> [11852214.733976]  [<ffffffff8112d78a>] ? vm_normal_page+0x1a/0x80
> [11852214.733984]  [<ffffffff8112e5b3>] unmap_single_vma+0x573/0x860
> [11852214.733994]  [<ffffffff81114520>] ? release_pages+0x1f0/0x230
> [11852214.734004]  [<ffffffff810054aa>] ? __xen_pgd_walk+0x16a/0x260
> [11852214.734018]  [<ffffffff8112f0b2>] unmap_vmas+0x52/0xa0
> [11852214.734026]  [<ffffffff81136e08>] exit_mmap+0x98/0x170
> [11852214.734034]  [<ffffffff8104b929>] mmput+0x59/0x110
> [11852214.734043]  [<ffffffff81053d95>] exit_mm+0x105/0x130
> [11852214.734051]  [<ffffffff814376e0>] ? _raw_spin_lock_irq+0x10/0x40
> [11852214.734059]  [<ffffffff81053f27>] do_exit+0x167/0x900
> [11852214.734070]  [<ffffffff8106093d>] ? __sigqueue_free+0x3d/0x50
> [11852214.734079]  [<ffffffff81060b9e>] ? __dequeue_signal+0x10e/0x1f0
> [11852214.734087]  [<ffffffff810549ff>] do_group_exit+0x3f/0xb0
> [11852214.734097]  [<ffffffff81063431>] get_signal_to_deliver+0x1c1/0x5e0
> [11852214.734107]  [<ffffffff8101334f>] do_signal+0x3f/0x960
> [11852214.734114]  [<ffffffff811aae61>] ? ep_poll+0x2a1/0x360
> [11852214.734122]  [<ffffffff81083420>] ? try_to_wake_up+0x2d0/0x2d0
> [11852214.734129]  [<ffffffff81013cd8>] do_notify_resume+0x48/0x60
> [11852214.734138]  [<ffffffff81438a5a>] int_signal+0x12/0x17
> [11852214.734143] Code: ff ff 3f 00 00 48 21 d0 4c 8d 0c 30 ff 14 25
> b8 f3 81 81 48 21 d0 48 01 c6 48 83 3e 00 0f 84 fa 00 00 00 49 8b 39
> 48 85 ff 75 02 <0f> 0b ff 14 25 e0 f3 81 81 49 89 c0 48 8b 3e ff 14 25
> e0 f3 81
> [11852214.734212] RIP  [<ffffffff8143018d>] vmalloc_fault+0x14b/0x249
> [11852214.734222]  RSP <ffff88083e57d7f8>
> [11852214.734231] ---[ end trace 81ac798210f95867 ]---
> [11852214.734237] Fixing recursive fault but reboot is needed!
> 
> > Also pls next time also CC me.
> 
> Will do, I originally CC'd Jeremy since made some lazy MMU related
> cleanups in arch/x86/mm/fault.c, and I thought he might have a comment
> on this.

      reply	other threads:[~2013-02-23  1:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-17  2:35 [PATCH] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Samu Kallio
2013-02-21 12:33 ` Konrad Rzeszutek Wilk
2013-02-21 15:56   ` Samu Kallio
2013-02-23  1:06     ` Konrad Rzeszutek Wilk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130223010607.GA15337@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=samu.kallio@aberdeencloud.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.