linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: linux kernel mailing list <linux-kernel@vger.kernel.org>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Kexec Mailing List <kexec@lists.infradead.org>,
	Baoquan He <bhe@redhat.com>, WANG Chao <chaowang@redhat.com>,
	Dave Young <dyoung@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: /proc/vmcore mmap() failure issue
Date: Wed, 13 Nov 2013 15:41:30 -0500	[thread overview]
Message-ID: <20131113204130.GD7613@redhat.com> (raw)

Hi Hatayama,

We are facing some /proc/vmcore mmap() failure issues and then makdumpfile
exits without saving dump and system reboots.

I tried latest makedumpfile (devel branch) with 3.12 kernel.

I think this issue happens only on some machines. And it looks like it
happens when end of system RAM chunk in first kernel is not page aligned. For
example, I have one machine where I noticed it and this is how system
RAM looks like.

00100000-dafa57ff : System RAM
  01000000-015892fa : Kernel code
  015892fb-0195c9ff : Kernel data
  01ae6000-01d31fff : Kernel bss
  24000000-33ffffff : Crash kernel
dafa5800-dbffffff : reserved

Notice that dafa57ff does not end at page boundary and next reserved
range does not start at page boundary. I think that next reserved
range is referenced through some ACPI data. More on this later.

So we put some printk() messages to get more info. In a nut shell,
remap_pfn_range() fails when we try to map the last section of system
RAM not ending on page boundary.

remap_pfn_range()
   track_pfn_remap() {
        /*
         * For anything smaller than the vma size we set prot based on the
         * lookup.
         */ 
        flags = lookup_memtype(paddr);
        
        /* Check memtype for the remaining pages */
        while (size > PAGE_SIZE) {
                size -= PAGE_SIZE;
                paddr += PAGE_SIZE;
                if (flags != lookup_memtype(paddr))
                        return -EINVAL; <---------------- Failure.
        }
	
   }
     

So we pass in a range to track_pfn_remap. Say pfn=0xdad62 size=0x244000.
Now we call lookup_memtype() on every page in the range and make sure
they all are same, otherwise we fail. Guess what, all all same except
last page (which does not end at page boundary).

I dived deeper in to lookup_memtype() and noticed that all regular
ranges are not registered anywhere and their flags are _PAGE_CACHE_UC_MINUS.
But last unaligned page/range, is registered in memtype rb tree and
has attribute, _PAGE_CACHE_WB.

Then I hooked into reserve_memtype() to figure out who is registering
page 0xdafa5000 and it is acpi_init() which does it.

[    0.721655] Hardware name: <edited>
[    0.730590]  ffff8800340f3830 ffff8800340f37c0 ffffffff81575509
00000000dafa5000
[    0.738010]  ffff8800340f3800 ffffffff810566cc 00000000000dafa5
00000000dafa5000
[    0.745428]  00000000dafa6000 00000000dafa5000 0000000000000000
0000000000001000
[    0.752845] Call Trace:
[    0.755288]  [<ffffffff81575509>] dump_stack+0x45/0x56
[    0.760414]  [<ffffffff810566cc>] reserve_memtype+0x31c/0x3f0
[    0.766144]  [<ffffffff810537ef>] __ioremap_caller+0x12f/0x360
[    0.771963]  [<ffffffff8130ad56>] ? acpi_os_release_object+0xe/0x12
[    0.778217]  [<ffffffff815686ba>] ? acpi_os_map_memory+0xf6/0x14e
[    0.784295]  [<ffffffff81053a54>] ioremap_cache+0x14/0x20
[    0.789679]  [<ffffffff815686ba>] acpi_os_map_memory+0xf6/0x14e
[    0.795582]  [<ffffffff81322ac9>]
acpi_ex_system_memory_space_handler+0xdd/0x1ca
[    0.802961]  [<ffffffff8131ca48>]
acpi_ev_address_space_dispatch+0x1b0/0x208
[    0.809993]  [<ffffffff8131fd49>] acpi_ex_access_region+0x20e/0x2a2
[    0.816244]  [<ffffffff81149464>] ? __alloc_pages_nodemask+0x134/0x300
[    0.822754]  [<ffffffff813200e4>] acpi_ex_field_datum_io+0xf6/0x171
[    0.829004]  [<ffffffff81320301>] acpi_ex_extract_from_field+0xd7/0x20a
[    0.835602]  [<ffffffff81331d80>] ?
acpi_ut_create_internal_object_dbg+0x23/0x8a
[    0.842981]  [<ffffffff8131f8e7>]
acpi_ex_read_data_from_field+0x10f/0x14b
[    0.849838]  [<ffffffff81322e16>]
acpi_ex_resolve_node_to_value+0x18e/0x21c
[    0.856780]  [<ffffffff813230a6>] acpi_ex_resolve_to_value+0x202/0x209
[    0.863291]  [<ffffffff81319486>] acpi_ds_evaluate_name_path+0x7b/0xf5
[    0.869803]  [<ffffffff81319834>] acpi_ds_exec_end_op+0x98/0x3e8
[    0.875793]  [<ffffffff8132aca4>] acpi_ps_parse_loop+0x514/0x560
[    0.881784]  [<ffffffff8132b738>] acpi_ps_parse_aml+0x98/0x28c
[    0.887601]  [<ffffffff8132bf8d>] acpi_ps_execute_method+0x1c1/0x26c
[    0.893939]  [<ffffffff813269c5>] acpi_ns_evaluate+0x1c1/0x258
[    0.899755]  [<ffffffff8131cb98>] acpi_ev_execute_reg_method+0xca/0x112
[    0.906353]  [<ffffffff8131cd6e>] acpi_ev_reg_run+0x48/0x52
[    0.911910]  [<ffffffff81328fad>] acpi_ns_walk_namespace+0xc8/0x17f
[    0.918160]  [<ffffffff8131cd26>] ? acpi_ev_detach_region+0x146/0x146
[    0.924585]  [<ffffffff8131cdbc>] acpi_ev_execute_reg_methods+0x44/0xf7
[    0.931184]  [<ffffffff819b2324>] ? acpi_sleep_proc_init+0x2a/0x2a
[    0.937349]  [<ffffffff8130ac66>] ? acpi_os_wait_semaphore+0x43/0x57
[    0.943686]  [<ffffffff81331a3f>] ? acpi_ut_acquire_mutex+0x48/0x88
[    0.949938]  [<ffffffff8131ceb8>]
acpi_ev_initialize_op_regions+0x49/0x71
[    0.956709]  [<ffffffff819b2324>] ? acpi_sleep_proc_init+0x2a/0x2a
[    0.962873]  [<ffffffff81333310>] acpi_initialize_objects+0x23/0x4f
[    0.969125]  [<ffffffff819b23b4>] acpi_init+0x90/0x268

So basically, this split page seems to be a problem. Some other code
thinks that it has access to full page and goes ahead and registers
that with PAT rb tree and this causes problems in mmap() code.

I suspect we might have to go back to idea of copying first and last
non page aligned ranges in new kernel's memory and read it from there
to solve this issue. Do you have other ideas?

Thanks
Vivek

             reply	other threads:[~2013-11-13 20:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-13 20:41 Vivek Goyal [this message]
2013-11-13 21:04 ` /proc/vmcore mmap() failure issue Vivek Goyal
2013-11-13 21:14   ` H. Peter Anvin
2013-11-13 22:41     ` Vivek Goyal
2013-11-13 22:44       ` H. Peter Anvin
2013-11-13 23:00         ` Vivek Goyal
2013-11-13 23:08           ` H. Peter Anvin
2013-11-14 10:31 ` HATAYAMA Daisuke
2013-11-14 15:13   ` Vivek Goyal
2013-11-15  9:41     ` HATAYAMA Daisuke
2013-11-15 14:26       ` Vivek Goyal
2013-11-18  0:51         ` Atsushi Kumagai
2013-11-18 13:55           ` Vivek Goyal
2013-11-20  5:29             ` Atsushi Kumagai
2013-11-20 14:59               ` Vivek Goyal
2013-11-21  5:00                 ` Atsushi Kumagai
2013-11-21  8:31                   ` HATAYAMA Daisuke
2013-11-21 16:52                     ` Vivek Goyal
2013-11-25  8:10                       ` Atsushi Kumagai
2013-11-25  9:01                         ` HATAYAMA Daisuke
2013-11-25 14:41                           ` Vivek Goyal
2013-11-26  1:51                             ` Atsushi Kumagai
2013-11-26  5:16                             ` HATAYAMA Daisuke
2013-11-19  9:55           ` HATAYAMA Daisuke
2013-11-20  5:27             ` Atsushi Kumagai
2013-11-20  6:43               ` HATAYAMA Daisuke
2013-11-26  1:52                 ` Atsushi Kumagai
2013-11-21  7:14               ` chaowang
2013-11-25  8:09                 ` Atsushi Kumagai
2013-11-26  3:29                   ` chaowang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131113204130.GD7613@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=bhe@redhat.com \
    --cc=chaowang@redhat.com \
    --cc=d.hatayama@jp.fujitsu.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).