From: joeyli <jlee@suse.com>
To: Baoquan He <bhe@redhat.com>
Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>,
Vivek Goyal <vgoyal@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
x86@kernel.org, Stephen Rothwell <sfr@canb.auug.org.au>,
Viresh Kumar <viresh.kumar@linaro.org>,
Takashi Iwai <tiwai@suse.de>,
Jiang Liu <jiang.liu@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org
Subject: Re: [PATCH] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load()
Date: Mon, 28 Sep 2015 17:39:45 +0800 [thread overview]
Message-ID: <20150928093945.GC2115@linux-rxt1.site> (raw)
In-Reply-To: <20150928080757.GA29662@dhcp-128-28.nay.redhat.com>
On Mon, Sep 28, 2015 at 04:07:57PM +0800, Baoquan He wrote:
> On 09/28/15 at 02:41pm, Lee, Chun-Yi wrote:
> > On big machines have CPU number that's very nearly to consume whole ELF
> > headers buffer that's page aligned, 4096, 8192... Then the page fault error
> > randomly happened.
> >
> > This patch modified the code in fill_up_crash_elf_data() by using
> > walk_system_ram_res() instead of walk_system_ram_range() to count the max
> > number of crash memory ranges. That's because the walk_system_ram_range()
> > filters out small memory regions that reside the same page, but
> > walk_system_ram_res() does not.
> >
> > The oringial page fault issue sometimes happened on big machines when
> > preparing ELF headers:
> >
> > [ 305.291522] BUG: unable to handle kernel paging request at ffffc90613fc9000
> > [ 305.299621] IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
> > [ 305.308300] PGD e000032067 PUD 6dcbec54067 PMD 9dc9bdeb067 PTE 0
> > [ 305.315393] Oops: 0002 [#1] SMP
> > [...snip]
> > [ 305.420953] task: ffff8e1c01ced600 ti: ffff8e1c03ec2000 task.ti: ffff8e1c03ec2000
> > [ 305.429292] RIP: 0010:[<ffffffff8103d645>] [<ffffffff8103d645>] prepare_elf64_ra
> > m_headers_callback+0x165/0x260
> > [...snip]
> >
> > After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
> > the code uses walk_system_ram_res() to fill-in crash memory regions information
> > to program header, so it counts those small memory regions that reside in a
> > page area. But, when kernel was using walk_system_ram_range() in
> > fill_up_crash_elf_data() to count the number of crash memory regions, it
> > filters out small regions.
> >
> > I printed those small memory regions, for example:
> >
> > kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
> >
> > Base on the logic of walk_system_ram_range(), this memory region will be
> > filter out:
> >
> > pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
> > end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
> > end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) [FAIL]
> >
> > So, the max_nr_ranges that counted by kernel doesn't include small memory
> > regions. That causes the page fault issue happened in later code path for
> > preparing EFL headers,
> >
> > This issue was hided on small machine that doesn't have too many CPU because
> > the free space of ELF headers buffer can cover the number of small memory
> > regions. But, when the machine has more CPUs or the number of memory regions
> > very nearly to consume whole page aligned buffer, e.g. 4096, 8192... Then
> > issue will happen randomly.
>
> CC akpm too.
>
> Read code again and I think it makes sense to use walk_system_ram_res.
> And in prepare_elf64_headers it also uses walk_system_ram_res. That's
> why you can find this bug. Otherwise we never find this and those small
> regions which only spread in one page will be lost in vmcore.
>
> Besides could you please rearrange your patch log? It's not easy to get
> what this patch have done.
>
To avoid confusing, I will simplify the patch description.
Removing things about CPU number but keep the difference between
walk_system_ram_res and walk_system_ram_range.
Thanks a lot!
Joey Lee
> >
> > Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
> > ---
> > arch/x86/kernel/crash.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index e068d66..ad273b3d 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -185,8 +185,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > }
> >
> > #ifdef CONFIG_KEXEC_FILE
> > -static int get_nr_ram_ranges_callback(unsigned long start_pfn,
> > - unsigned long nr_pfn, void *arg)
> > +static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
> > {
> > int *nr_ranges = arg;
> >
> > @@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data *ced,
> >
> > ced->image = image;
> >
> > - walk_system_ram_range(0, -1, &nr_ranges,
> > + walk_system_ram_res(0, -1, &nr_ranges,
> > get_nr_ram_ranges_callback);
> >
> > ced->max_nr_ranges = nr_ranges;
> > --
> > 2.1.4
> >
next prev parent reply other threads:[~2015-09-28 9:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-28 6:41 [PATCH] kexec: fix out of the ELF headers buffer issue in syscall kexec_file_load() Lee, Chun-Yi
2015-09-28 7:16 ` Baoquan He
2015-09-28 9:35 ` joeyli
2015-09-28 8:07 ` Baoquan He
2015-09-28 9:39 ` joeyli [this message]
2015-09-28 9:52 ` Baoquan He
2015-09-29 3:50 ` Minfei Huang
2015-09-29 3:50 ` Minfei Huang
2015-09-29 8:52 ` joeyli
2015-09-29 8:52 ` joeyli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150928093945.GC2115@linux-rxt1.site \
--to=jlee@suse.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=hpa@zytor.com \
--cc=jiang.liu@linux.intel.com \
--cc=joeyli.kernel@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=tiwai@suse.de \
--cc=vgoyal@redhat.com \
--cc=viresh.kumar@linaro.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.