Re: acpidump crashes on some machines

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andre Przywara <andre.przywara@amd.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	xen-devel <xen-devel@lists.xensource.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: acpidump crashes on some machines
Date: Wed, 20 Jun 2012 10:51:27 -0400	[thread overview]
Message-ID: <20120620145127.GD12787@phenom.dumpdata.com> (raw)
In-Reply-To: <4FE1C423.6070001@amd.com>

On Wed, Jun 20, 2012 at 02:37:55PM +0200, Andre Przywara wrote:
> Hi,
> 
> we have some problems with acpidump running on Xen Dom0. On 64 bit
> Dom0 it will trigger the OOM killer, on 32 bit Dom0s it will cause a
> kernel crash.
> The hypervisor does not matter, I tried 4.1.3-rc2 as well as various
> unstable versions including 25467, also 32-bit versions of 4.1.
> The Dom0 kernels were always PVOPS versions, the problems starts
> with 3.2-rc1~194 and is still in 3.5.0-rc3.
> Also you need to restrict the Dom0 memory with dom0_mem=
> The crash says (on a 3.4.3 32bit Dom0 kernel):
> uruk:~ # ./acpidump32
> [  158.843444] ------------[ cut here ]------------
> [  158.843460] kernel BUG at mm/rmap.c:1027!
> [  158.843466] invalid opcode: 0000 [#1] SMP
> [  158.843472] Modules linked in:
> [  158.843478]
> [  158.843483] Pid: 4874, comm: acpidump32 Tainted: G        W
> 3.4.0+ #105 empty empty/S3993
> [  158.843493] EIP: 0061:[<c10b0e27>] EFLAGS: 00010246 CPU: 3
> [  158.843505] EIP is at __page_set_anon_rmap+0x12/0x45
> [  158.843511] EAX: d6022dc0 EBX: dfecb6e0 ECX: b76faf64 EDX: b76faf64
> [  158.843516] ESI: 00000000 EDI: b76faf64 EBP: d6091e8c ESP: d6091e84
> [  158.843522]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [  158.843529] CR0: 8005003b CR2: b76faf64 CR3: 17633000 CR4: 00000660
> [  158.843535] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  158.843581] DR6: ffff0ff0 DR7: 00000400
> [  158.843586] Process acpidump32 (pid: 4874, ti=d6090000
> task=d60b34f0 task.ti=d6090000)
> [  158.843591] Stack:
> [  158.843594]  dfecb6e0 00000001 d6091ea8 c10b15c4 00000000
> d6022dc0 d61fbdd8 d6022dc0
> [  158.843610]  00000000 d6091efc c10aacbe 00000000 99948025
> 80000001 d8aa1f80 80000001
> [  158.843631]  dfefc800 00000000 d8aa1f80 00000000 166b7025
> d7f407d0 b76faf64 99948025
> [  158.843649] Call Trace:
> [  158.843656]  [<c10b15c4>] do_page_add_anon_rmap+0x5b/0x64
> [  158.843664]  [<c10aacbe>] handle_pte_fault+0x81d/0xa06
> [  158.843674]  [<c10ab0ff>] handle_mm_fault+0x1fa/0x209
> [  158.843683]  [<c159e4e8>] ? spurious_fault+0x104/0x104
> [  158.843688]  [<c159e881>] do_page_fault+0x399/0x3b4
> [  158.843696]  [<c10c639d>] ? filp_close+0x55/0x5f
> [  158.843701]  [<c10c6408>] ? sys_close+0x61/0xa0
> [  158.843706]  [<c159e4e8>] ? spurious_fault+0x104/0x104
> [  158.843714]  [<c159c452>] error_code+0x5a/0x60
> [  158.843720]  [<c159e4e8>] ? spurious_fault+0x104/0x104
> [  158.843724] Code: e8 45 91 00 00 89 c2 eb 09 2b 50 04 c1 ea 0c 03
> 50 4c 89 53 08 5b 5e 5d c3 55 89 e5 56 53 89 c3 89 d0 89 ca 8b 70 44
> 85 f6 75 02 <0f> 0b f6 43 04 01 75 27 83 7d 08 00 75 02 8b 36 46 89
> 73 04 f6
> [  158.843824] EIP: [<c10b0e27>] __page_set_anon_rmap+0x12/0x45
> SS:ESP 0069:d6091e84
> [  158.843848] ---[ end trace 4eaa2a86a8e2da24 ]---
> [  158.843854] note: acpidump32[4874] exited with preempt_count 1
> 
> 
> On 64bit the OOM goes around, finally killing the login shell:
> uruk:~ # ./acpidump_inst
> acpi_map_memory(917504, 131072);
> opened /dev/mem (fd=3)
> calling mmap(NULL, 131072, PROT_READ, MAP_PRIVATE, fd, e0000);
>   mmap returned 0xf7571000, function returns 0xf7571000
> acpi_map_table(cfef0f64, "XSDT");
> acpi_map_memory(3488550756, 36);
> opened /dev/mem (fd=3)
> calling mmap(NULL, 3976, PROT_READ, MAP_PRIVATE, fd, cfef0000);
>   mmap returned 0xf76fd000, function returns 0xf76fdf64
>   having mapped table header
>   reading signature:
> 
> Welcome to SUSE Linux Enterprise Server 11 SP1  (i586) - Kernel
> 3.5.0-rc3+ (hvc0).
> 
> uruk login:
> -----------
> This dump shows that the bug happens the moment acpidump accesses
> the mmapped ACPI table at @cfef0000 (the lower map at e0000 works).

What is the e0000 one? I don't see in your E820 the region being
reserved?

> 
> This is extra unfortunate as in SLES11 acpidump will be called by
> the kbd init script (querying the BIOS NumLock setting!)

Ah. Is the acpidump somewhere easily available to compile? Should
I get it from here:
http://www.lesswatts.org/projects/acpi/utilities.php

> 
> I bisected the Dom0 kernel to find this one (v3.2-rc~194):
> commit 5eef150c1d7e41baaefd00dd56c153debcd86aee
> Merge: 315eb8a f3f436e
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Tue Oct 25 09:17:07 2011 +0200
> 
>     Merge branch 'stable/e820-3.2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
> 
>     * 'stable/e820-3.2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:

Oh boy. v3.2 .. that is eons ago! :-)

>       xen: release all pages within 1-1 p2m mappings
>       xen: allow extra memory to be in multiple regions
>       xen: allow balloon driver to use more than one memory region
>       xen/balloon: simplify test for the end of usable RAM
>       xen/balloon: account for pages released during memory setup
> 
> 
> I tried to find something obvious, but to no avail. At least the new
> E820 looks sane, nothing that would prevent the mapping of the
> requested regions. Reverting this commit will not work easily on
> newer kernels, also is probably not desirable.

The one thing that comes to my mind is the 1-1 mapping having
some issues. Can you boot the kernel with 'debug loglevel=8'. That should
print something like this:

Setting pfn cfef0->cfef7 to 1-1 
or such during bootup.

> 
> But it does not show on every machine here, so the machine E820
> could actually be a differentiator. This particular box was a dual
> socket Barcelona server with 12GB of memory.
> 
> This whole PV memory management goes beyond my knowledge, so I'd
> like to ask for help on this issue.
> If you need more information (I attached the boot log, which shows
> the two E820 tables), please ask. I can also quickly do some
> experiments if needed.

This is strange one - the P2M code should fetch the MFN (so it should
give you cfef0) whenever anybody asks for that. Lets double-check that.

Can you try this little module?
[not compile tested]


#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/pagemap.h>
#include <linux/init.h>
#include <xen/xen.h>
#define ACPITEST  "0.1"

MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>");
MODULE_DESCRIPTION("acpitest");
MODULE_LICENSE("GPL");
MODULE_VERSION(ACPITEST);

static int __init acpitest_init(void)
{
	unsigned int pfn = 0xcfef0;
	unsigned int mfn;
	void *data;

	mfn = pfn_to_mfn(pfn);
	WARN_ON(pfn != mfn, "We get %lx instead of %lx!\n", pfn, mfn);
	if (pfn != mfn) {
		printk(KERN_INFO "raw p2m (%lx) gives us: %lx\n", pfn, get_phys_to_machine(pfn));
		return -EINVAL;
	}
	data = mfn_to_virt(mfn);
	printk(KERN_INFO "va is 0x%lx\n", data);
	print_hex_dump_bytes("acpi:", DUMP_PREFIX_OFFSET, data, PAGE_SIZE);
	
	return 0;
}
static void __exit acpitest_exit(void)
{
}
module_init(acpitest_init);
module_exit(acpitest_exit);

next prev parent reply	other threads:[~2012-06-20 14:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 12:37 acpidump crashes on some machines Andre Przywara
2012-06-20 14:51 ` Konrad Rzeszutek Wilk [this message]
2012-06-21 14:21   ` Andre Przywara
2012-06-30  1:48     ` Konrad Rzeszutek Wilk
2012-06-30  2:19       ` Konrad Rzeszutek Wilk
2012-07-26 13:02         ` Andre Przywara
2012-08-17 20:52           ` Konrad Rzeszutek Wilk
2012-08-23 10:14             ` Andre Przywara
2012-08-23 10:22               ` David Vrabel
2012-08-23 14:10                 ` Konrad Rzeszutek Wilk
2012-08-23 14:36                   ` David Vrabel
2012-08-23 14:35                     ` Konrad Rzeszutek Wilk
2012-08-23 14:06               ` Konrad Rzeszutek Wilk
2012-07-04 10:21 ` David Vrabel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120620145127.GD12787@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andre.przywara@amd.com \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.