* [PATCH] Put each per-cpu kdump ELF notes into a single page
@ 2014-09-05 16:33 Petr Tesarik
2014-09-11 19:37 ` Petr Tesarik
2014-09-11 20:01 ` Vivek Goyal
0 siblings, 2 replies; 7+ messages in thread
From: Petr Tesarik @ 2014-09-05 16:33 UTC (permalink / raw)
To: Eric Biederman; +Cc: kexec, linux-kernel
On architectures that use percpu-vm, the percpu region is not guaranteed
to be contiguous in physical space. However, fs/proc/vmcore.c expects
all ELF notes to be contiguous. If the ELF note happens to occupy
two non-adjacent physical pages, part of the note may be read from an
incorrect memory location by the kdump kernel, resulting in failure to
initialize /proc/vmcore (if the content of the following physical page,
incorrectly interpreted as an ELF note specifies a large number), wrong
register values or other apparent random memory corruption.
There is currently no mechanism to pass the virtual-to-physical mapping
of the percpu allocation to the kdump kernel. So, instead, I'm changing
the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
is enough to make sure that the buffer cannot cross a page boundary,
effectively ensuring that the whole buffer is contiguous in physical
space.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
---
kernel/kexec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 2bee072..cdab59d 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
static int __init crash_notes_memory_init(void)
{
/* Allocate memory for saving cpu registers. */
- crash_notes = alloc_percpu(note_buf_t);
+ crash_notes = __alloc_percpu(sizeof(note_buf_t),
+ roundup_pow_of_two(sizeof(note_buf_t)));
if (!crash_notes) {
pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
return -ENOMEM;
--
1.8.4.5
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page
2014-09-05 16:33 [PATCH] Put each per-cpu kdump ELF notes into a single page Petr Tesarik
@ 2014-09-11 19:37 ` Petr Tesarik
2014-09-11 20:01 ` Vivek Goyal
1 sibling, 0 replies; 7+ messages in thread
From: Petr Tesarik @ 2014-09-11 19:37 UTC (permalink / raw)
To: Eric Biederman; +Cc: kexec, linux-kernel, Vivek Goyal
Hi all,
is anything wrong with the patch below? Are there questions? I thought
this would be an easy one-line bugfix...
Regards,
Petr Tesarik
On Fri, 5 Sep 2014 18:33:14 +0200
Petr Tesarik <ptesarik@suse.cz> wrote:
> On architectures that use percpu-vm, the percpu region is not guaranteed
> to be contiguous in physical space. However, fs/proc/vmcore.c expects
> all ELF notes to be contiguous. If the ELF note happens to occupy
> two non-adjacent physical pages, part of the note may be read from an
> incorrect memory location by the kdump kernel, resulting in failure to
> initialize /proc/vmcore (if the content of the following physical page,
> incorrectly interpreted as an ELF note specifies a large number), wrong
> register values or other apparent random memory corruption.
>
> There is currently no mechanism to pass the virtual-to-physical mapping
> of the percpu allocation to the kdump kernel. So, instead, I'm changing
> the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
> than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
> is enough to make sure that the buffer cannot cross a page boundary,
> effectively ensuring that the whole buffer is contiguous in physical
> space.
>
> Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
> ---
> kernel/kexec.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 2bee072..cdab59d 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> static int __init crash_notes_memory_init(void)
> {
> /* Allocate memory for saving cpu registers. */
> - crash_notes = alloc_percpu(note_buf_t);
> + crash_notes = __alloc_percpu(sizeof(note_buf_t),
> + roundup_pow_of_two(sizeof(note_buf_t)));
> if (!crash_notes) {
> pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
> return -ENOMEM;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page
2014-09-05 16:33 [PATCH] Put each per-cpu kdump ELF notes into a single page Petr Tesarik
2014-09-11 19:37 ` Petr Tesarik
@ 2014-09-11 20:01 ` Vivek Goyal
2014-09-11 20:43 ` Petr Tesarik
1 sibling, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2014-09-11 20:01 UTC (permalink / raw)
To: Petr Tesarik; +Cc: kexec, Eric Biederman, linux-kernel
On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
> On architectures that use percpu-vm, the percpu region is not guaranteed
> to be contiguous in physical space.
Petr,
Which are those arches?
> However, fs/proc/vmcore.c expects
> all ELF notes to be contiguous. If the ELF note happens to occupy
> two non-adjacent physical pages, part of the note may be read from an
> incorrect memory location by the kdump kernel, resulting in failure to
> initialize /proc/vmcore (if the content of the following physical page,
> incorrectly interpreted as an ELF note specifies a large number), wrong
> register values or other apparent random memory corruption.
>
> There is currently no mechanism to pass the virtual-to-physical mapping
> of the percpu allocation to the kdump kernel. So, instead, I'm changing
> the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
> than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
> is enough to make sure that the buffer cannot cross a page boundary,
> effectively ensuring that the whole buffer is contiguous in physical
> space.
>
> Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
> ---
> kernel/kexec.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 2bee072..cdab59d 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> static int __init crash_notes_memory_init(void)
> {
> /* Allocate memory for saving cpu registers. */
> - crash_notes = alloc_percpu(note_buf_t);
> + crash_notes = __alloc_percpu(sizeof(note_buf_t),
> + roundup_pow_of_two(sizeof(note_buf_t)));
I think some of the changelog should show up here as comment in short
form. I don't think it is obvious that why we are using __alloc_percpu()
and why aligning to nearst higher power of 2 is needed here. Please also
mention here which arches run into issues.
Thanks
Vivek
> if (!crash_notes) {
> pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
> return -ENOMEM;
> --
> 1.8.4.5
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page
2014-09-11 20:01 ` Vivek Goyal
@ 2014-09-11 20:43 ` Petr Tesarik
2014-09-11 21:16 ` Vivek Goyal
0 siblings, 1 reply; 7+ messages in thread
From: Petr Tesarik @ 2014-09-11 20:43 UTC (permalink / raw)
To: Vivek Goyal; +Cc: kexec, Eric Biederman, linux-kernel
On Thu, 11 Sep 2014 16:01:10 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
> > On architectures that use percpu-vm, the percpu region is not guaranteed
> > to be contiguous in physical space.
>
> Petr,
>
> Which are those arches?
All except nommu. Actually, percpu-km will be used instead even on MMU
if SMP is disabled, but since SMP is pretty standard now, I guess the
vast majority of all kernels out there is affected. ;-)
> > However, fs/proc/vmcore.c expects
> > all ELF notes to be contiguous. If the ELF note happens to occupy
> > two non-adjacent physical pages, part of the note may be read from an
> > incorrect memory location by the kdump kernel, resulting in failure to
> > initialize /proc/vmcore (if the content of the following physical page,
> > incorrectly interpreted as an ELF note specifies a large number), wrong
> > register values or other apparent random memory corruption.
> >
> > There is currently no mechanism to pass the virtual-to-physical mapping
> > of the percpu allocation to the kdump kernel. So, instead, I'm changing
> > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
> > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
> > is enough to make sure that the buffer cannot cross a page boundary,
> > effectively ensuring that the whole buffer is contiguous in physical
> > space.
> >
> > Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
> > ---
> > kernel/kexec.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kexec.c b/kernel/kexec.c
> > index 2bee072..cdab59d 100644
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> > static int __init crash_notes_memory_init(void)
> > {
> > /* Allocate memory for saving cpu registers. */
> > - crash_notes = alloc_percpu(note_buf_t);
> > + crash_notes = __alloc_percpu(sizeof(note_buf_t),
> > + roundup_pow_of_two(sizeof(note_buf_t)));
>
> I think some of the changelog should show up here as comment in short
> form. I don't think it is obvious that why we are using __alloc_percpu()
> and why aligning to nearst higher power of 2 is needed here. Please also
> mention here which arches run into issues.
OK, I'll add it as a comment in the code. I'll see if I can make it
short but still understandable.
Thanks,
Petr Tesarik
> Thanks
> Vivek
>
> > if (!crash_notes) {
> > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
> > return -ENOMEM;
> > --
> > 1.8.4.5
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page
2014-09-11 20:43 ` Petr Tesarik
@ 2014-09-11 21:16 ` Vivek Goyal
2014-09-11 22:15 ` Petr Tesarik
0 siblings, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2014-09-11 21:16 UTC (permalink / raw)
To: Petr Tesarik; +Cc: kexec, Eric Biederman, linux-kernel
On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote:
> On Thu, 11 Sep 2014 16:01:10 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
> > > On architectures that use percpu-vm, the percpu region is not guaranteed
> > > to be contiguous in physical space.
> >
> > Petr,
> >
> > Which are those arches?
>
> All except nommu. Actually, percpu-km will be used instead even on MMU
> if SMP is disabled, but since SMP is pretty standard now, I guess the
> vast majority of all kernels out there is affected. ;-)
Hi Petr,
To make sure I understand it correctly I will just summarize what you
said.
alloc_percpu() code does not guarantee that an object will be on physically
contiguous pages if object crosses page boundary. That's why we are forcing
allocation of object aligned to nearest higher power of two boundary of
object size and that way object will always be on same page (as long as object
is not bigger than a page).
Is that a fair summary?
Thanks
Vivek
>
> > > However, fs/proc/vmcore.c expects
> > > all ELF notes to be contiguous. If the ELF note happens to occupy
> > > two non-adjacent physical pages, part of the note may be read from an
> > > incorrect memory location by the kdump kernel, resulting in failure to
> > > initialize /proc/vmcore (if the content of the following physical page,
> > > incorrectly interpreted as an ELF note specifies a large number), wrong
> > > register values or other apparent random memory corruption.
> > >
> > > There is currently no mechanism to pass the virtual-to-physical mapping
> > > of the percpu allocation to the kdump kernel. So, instead, I'm changing
> > > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
> > > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
> > > is enough to make sure that the buffer cannot cross a page boundary,
> > > effectively ensuring that the whole buffer is contiguous in physical
> > > space.
> > >
> > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
> > > ---
> > > kernel/kexec.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/kexec.c b/kernel/kexec.c
> > > index 2bee072..cdab59d 100644
> > > --- a/kernel/kexec.c
> > > +++ b/kernel/kexec.c
> > > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> > > static int __init crash_notes_memory_init(void)
> > > {
> > > /* Allocate memory for saving cpu registers. */
> > > - crash_notes = alloc_percpu(note_buf_t);
> > > + crash_notes = __alloc_percpu(sizeof(note_buf_t),
> > > + roundup_pow_of_two(sizeof(note_buf_t)));
> >
> > I think some of the changelog should show up here as comment in short
> > form. I don't think it is obvious that why we are using __alloc_percpu()
> > and why aligning to nearst higher power of 2 is needed here. Please also
> > mention here which arches run into issues.
>
> OK, I'll add it as a comment in the code. I'll see if I can make it
> short but still understandable.
>
> Thanks,
> Petr Tesarik
>
> > Thanks
> > Vivek
> >
> > > if (!crash_notes) {
> > > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
> > > return -ENOMEM;
> > > --
> > > 1.8.4.5
> > >
> > > _______________________________________________
> > > kexec mailing list
> > > kexec@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/kexec
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page
2014-09-11 21:16 ` Vivek Goyal
@ 2014-09-11 22:15 ` Petr Tesarik
2014-09-12 11:52 ` Vivek Goyal
0 siblings, 1 reply; 7+ messages in thread
From: Petr Tesarik @ 2014-09-11 22:15 UTC (permalink / raw)
To: Vivek Goyal; +Cc: kexec, Eric Biederman, linux-kernel
On Thu, 11 Sep 2014 17:16:37 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote:
> > On Thu, 11 Sep 2014 16:01:10 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
> > > > On architectures that use percpu-vm, the percpu region is not guaranteed
> > > > to be contiguous in physical space.
> > >
> > > Petr,
> > >
> > > Which are those arches?
> >
> > All except nommu. Actually, percpu-km will be used instead even on MMU
> > if SMP is disabled, but since SMP is pretty standard now, I guess the
> > vast majority of all kernels out there is affected. ;-)
>
> Hi Petr,
>
> To make sure I understand it correctly I will just summarize what you
> said.
>
> alloc_percpu() code does not guarantee that an object will be on physically
> contiguous pages if object crosses page boundary. That's why we are forcing
> allocation of object aligned to nearest higher power of two boundary of
> object size and that way object will always be on same page (as long as object
> is not bigger than a page).
>
> Is that a fair summary?
Yes. I might add a note why physically contiguous memory is needed
here, but maybe it's obvious to anyone dealing with kdump.
Thanks,
Petr Tesarik
> Thanks
> Vivek
>
> >
> > > > However, fs/proc/vmcore.c expects
> > > > all ELF notes to be contiguous. If the ELF note happens to occupy
> > > > two non-adjacent physical pages, part of the note may be read from an
> > > > incorrect memory location by the kdump kernel, resulting in failure to
> > > > initialize /proc/vmcore (if the content of the following physical page,
> > > > incorrectly interpreted as an ELF note specifies a large number), wrong
> > > > register values or other apparent random memory corruption.
> > > >
> > > > There is currently no mechanism to pass the virtual-to-physical mapping
> > > > of the percpu allocation to the kdump kernel. So, instead, I'm changing
> > > > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less
> > > > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2
> > > > is enough to make sure that the buffer cannot cross a page boundary,
> > > > effectively ensuring that the whole buffer is contiguous in physical
> > > > space.
> > > >
> > > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
> > > > ---
> > > > kernel/kexec.c | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/kexec.c b/kernel/kexec.c
> > > > index 2bee072..cdab59d 100644
> > > > --- a/kernel/kexec.c
> > > > +++ b/kernel/kexec.c
> > > > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> > > > static int __init crash_notes_memory_init(void)
> > > > {
> > > > /* Allocate memory for saving cpu registers. */
> > > > - crash_notes = alloc_percpu(note_buf_t);
> > > > + crash_notes = __alloc_percpu(sizeof(note_buf_t),
> > > > + roundup_pow_of_two(sizeof(note_buf_t)));
> > >
> > > I think some of the changelog should show up here as comment in short
> > > form. I don't think it is obvious that why we are using __alloc_percpu()
> > > and why aligning to nearst higher power of 2 is needed here. Please also
> > > mention here which arches run into issues.
> >
> > OK, I'll add it as a comment in the code. I'll see if I can make it
> > short but still understandable.
> >
> > Thanks,
> > Petr Tesarik
> >
> > > Thanks
> > > Vivek
> > >
> > > > if (!crash_notes) {
> > > > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
> > > > return -ENOMEM;
> > > > --
> > > > 1.8.4.5
> > > >
> > > > _______________________________________________
> > > > kexec mailing list
> > > > kexec@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/kexec
> > >
> > > _______________________________________________
> > > kexec mailing list
> > > kexec@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page
2014-09-11 22:15 ` Petr Tesarik
@ 2014-09-12 11:52 ` Vivek Goyal
0 siblings, 0 replies; 7+ messages in thread
From: Vivek Goyal @ 2014-09-12 11:52 UTC (permalink / raw)
To: Petr Tesarik; +Cc: kexec, Eric Biederman, linux-kernel
On Fri, Sep 12, 2014 at 12:15:37AM +0200, Petr Tesarik wrote:
> On Thu, 11 Sep 2014 17:16:37 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote:
> > > On Thu, 11 Sep 2014 16:01:10 -0400
> > > Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote:
> > > > > On architectures that use percpu-vm, the percpu region is not guaranteed
> > > > > to be contiguous in physical space.
> > > >
> > > > Petr,
> > > >
> > > > Which are those arches?
> > >
> > > All except nommu. Actually, percpu-km will be used instead even on MMU
> > > if SMP is disabled, but since SMP is pretty standard now, I guess the
> > > vast majority of all kernels out there is affected. ;-)
> >
> > Hi Petr,
> >
> > To make sure I understand it correctly I will just summarize what you
> > said.
> >
> > alloc_percpu() code does not guarantee that an object will be on physically
> > contiguous pages if object crosses page boundary. That's why we are forcing
> > allocation of object aligned to nearest higher power of two boundary of
> > object size and that way object will always be on same page (as long as object
> > is not bigger than a page).
> >
> > Is that a fair summary?
>
> Yes. I might add a note why physically contiguous memory is needed
> here, but maybe it's obvious to anyone dealing with kdump.
I think adding couple of lines to explain why physically contiguous notes
are needed is a good idea. It will not be ovious to anybody new to kdump.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-09-12 11:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-05 16:33 [PATCH] Put each per-cpu kdump ELF notes into a single page Petr Tesarik
2014-09-11 19:37 ` Petr Tesarik
2014-09-11 20:01 ` Vivek Goyal
2014-09-11 20:43 ` Petr Tesarik
2014-09-11 21:16 ` Vivek Goyal
2014-09-11 22:15 ` Petr Tesarik
2014-09-12 11:52 ` Vivek Goyal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).