* [PATCH] Put each per-cpu kdump ELF notes into a single page @ 2014-09-05 16:33 Petr Tesarik 2014-09-11 19:37 ` Petr Tesarik 2014-09-11 20:01 ` Vivek Goyal 0 siblings, 2 replies; 7+ messages in thread From: Petr Tesarik @ 2014-09-05 16:33 UTC (permalink / raw) To: Eric Biederman; +Cc: kexec, linux-kernel On architectures that use percpu-vm, the percpu region is not guaranteed to be contiguous in physical space. However, fs/proc/vmcore.c expects all ELF notes to be contiguous. If the ELF note happens to occupy two non-adjacent physical pages, part of the note may be read from an incorrect memory location by the kdump kernel, resulting in failure to initialize /proc/vmcore (if the content of the following physical page, incorrectly interpreted as an ELF note specifies a large number), wrong register values or other apparent random memory corruption. There is currently no mechanism to pass the virtual-to-physical mapping of the percpu allocation to the kdump kernel. So, instead, I'm changing the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less than PAGE_SIZE, aligning the buffer to the nearest higher power of 2 is enough to make sure that the buffer cannot cross a page boundary, effectively ensuring that the whole buffer is contiguous in physical space. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> --- kernel/kexec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 2bee072..cdab59d 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) static int __init crash_notes_memory_init(void) { /* Allocate memory for saving cpu registers. */ - crash_notes = alloc_percpu(note_buf_t); + crash_notes = __alloc_percpu(sizeof(note_buf_t), + roundup_pow_of_two(sizeof(note_buf_t))); if (!crash_notes) { pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); return -ENOMEM; -- 1.8.4.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page 2014-09-05 16:33 [PATCH] Put each per-cpu kdump ELF notes into a single page Petr Tesarik @ 2014-09-11 19:37 ` Petr Tesarik 2014-09-11 20:01 ` Vivek Goyal 1 sibling, 0 replies; 7+ messages in thread From: Petr Tesarik @ 2014-09-11 19:37 UTC (permalink / raw) To: Eric Biederman; +Cc: Vivek Goyal, kexec, linux-kernel Hi all, is anything wrong with the patch below? Are there questions? I thought this would be an easy one-line bugfix... Regards, Petr Tesarik On Fri, 5 Sep 2014 18:33:14 +0200 Petr Tesarik <ptesarik@suse.cz> wrote: > On architectures that use percpu-vm, the percpu region is not guaranteed > to be contiguous in physical space. However, fs/proc/vmcore.c expects > all ELF notes to be contiguous. If the ELF note happens to occupy > two non-adjacent physical pages, part of the note may be read from an > incorrect memory location by the kdump kernel, resulting in failure to > initialize /proc/vmcore (if the content of the following physical page, > incorrectly interpreted as an ELF note specifies a large number), wrong > register values or other apparent random memory corruption. > > There is currently no mechanism to pass the virtual-to-physical mapping > of the percpu allocation to the kdump kernel. So, instead, I'm changing > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2 > is enough to make sure that the buffer cannot cross a page boundary, > effectively ensuring that the whole buffer is contiguous in physical > space. > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz> > --- > kernel/kexec.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index 2bee072..cdab59d 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > static int __init crash_notes_memory_init(void) > { > /* Allocate memory for saving cpu registers. */ > - crash_notes = alloc_percpu(note_buf_t); > + crash_notes = __alloc_percpu(sizeof(note_buf_t), > + roundup_pow_of_two(sizeof(note_buf_t))); > if (!crash_notes) { > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); > return -ENOMEM; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page 2014-09-05 16:33 [PATCH] Put each per-cpu kdump ELF notes into a single page Petr Tesarik 2014-09-11 19:37 ` Petr Tesarik @ 2014-09-11 20:01 ` Vivek Goyal 2014-09-11 20:43 ` Petr Tesarik 1 sibling, 1 reply; 7+ messages in thread From: Vivek Goyal @ 2014-09-11 20:01 UTC (permalink / raw) To: Petr Tesarik; +Cc: Eric Biederman, kexec, linux-kernel On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote: > On architectures that use percpu-vm, the percpu region is not guaranteed > to be contiguous in physical space. Petr, Which are those arches? > However, fs/proc/vmcore.c expects > all ELF notes to be contiguous. If the ELF note happens to occupy > two non-adjacent physical pages, part of the note may be read from an > incorrect memory location by the kdump kernel, resulting in failure to > initialize /proc/vmcore (if the content of the following physical page, > incorrectly interpreted as an ELF note specifies a large number), wrong > register values or other apparent random memory corruption. > > There is currently no mechanism to pass the virtual-to-physical mapping > of the percpu allocation to the kdump kernel. So, instead, I'm changing > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2 > is enough to make sure that the buffer cannot cross a page boundary, > effectively ensuring that the whole buffer is contiguous in physical > space. > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz> > --- > kernel/kexec.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index 2bee072..cdab59d 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > static int __init crash_notes_memory_init(void) > { > /* Allocate memory for saving cpu registers. */ > - crash_notes = alloc_percpu(note_buf_t); > + crash_notes = __alloc_percpu(sizeof(note_buf_t), > + roundup_pow_of_two(sizeof(note_buf_t))); I think some of the changelog should show up here as comment in short form. I don't think it is obvious that why we are using __alloc_percpu() and why aligning to nearst higher power of 2 is needed here. Please also mention here which arches run into issues. Thanks Vivek > if (!crash_notes) { > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); > return -ENOMEM; > -- > 1.8.4.5 > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page 2014-09-11 20:01 ` Vivek Goyal @ 2014-09-11 20:43 ` Petr Tesarik 2014-09-11 21:16 ` Vivek Goyal 0 siblings, 1 reply; 7+ messages in thread From: Petr Tesarik @ 2014-09-11 20:43 UTC (permalink / raw) To: Vivek Goyal; +Cc: kexec, Eric Biederman, linux-kernel On Thu, 11 Sep 2014 16:01:10 -0400 Vivek Goyal <vgoyal@redhat.com> wrote: > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote: > > On architectures that use percpu-vm, the percpu region is not guaranteed > > to be contiguous in physical space. > > Petr, > > Which are those arches? All except nommu. Actually, percpu-km will be used instead even on MMU if SMP is disabled, but since SMP is pretty standard now, I guess the vast majority of all kernels out there is affected. ;-) > > However, fs/proc/vmcore.c expects > > all ELF notes to be contiguous. If the ELF note happens to occupy > > two non-adjacent physical pages, part of the note may be read from an > > incorrect memory location by the kdump kernel, resulting in failure to > > initialize /proc/vmcore (if the content of the following physical page, > > incorrectly interpreted as an ELF note specifies a large number), wrong > > register values or other apparent random memory corruption. > > > > There is currently no mechanism to pass the virtual-to-physical mapping > > of the percpu allocation to the kdump kernel. So, instead, I'm changing > > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less > > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2 > > is enough to make sure that the buffer cannot cross a page boundary, > > effectively ensuring that the whole buffer is contiguous in physical > > space. > > > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz> > > --- > > kernel/kexec.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > index 2bee072..cdab59d 100644 > > --- a/kernel/kexec.c > > +++ b/kernel/kexec.c > > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > > static int __init crash_notes_memory_init(void) > > { > > /* Allocate memory for saving cpu registers. */ > > - crash_notes = alloc_percpu(note_buf_t); > > + crash_notes = __alloc_percpu(sizeof(note_buf_t), > > + roundup_pow_of_two(sizeof(note_buf_t))); > > I think some of the changelog should show up here as comment in short > form. I don't think it is obvious that why we are using __alloc_percpu() > and why aligning to nearst higher power of 2 is needed here. Please also > mention here which arches run into issues. OK, I'll add it as a comment in the code. I'll see if I can make it short but still understandable. Thanks, Petr Tesarik > Thanks > Vivek > > > if (!crash_notes) { > > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); > > return -ENOMEM; > > -- > > 1.8.4.5 > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page 2014-09-11 20:43 ` Petr Tesarik @ 2014-09-11 21:16 ` Vivek Goyal 2014-09-11 22:15 ` Petr Tesarik 0 siblings, 1 reply; 7+ messages in thread From: Vivek Goyal @ 2014-09-11 21:16 UTC (permalink / raw) To: Petr Tesarik; +Cc: kexec, Eric Biederman, linux-kernel On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote: > On Thu, 11 Sep 2014 16:01:10 -0400 > Vivek Goyal <vgoyal@redhat.com> wrote: > > > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote: > > > On architectures that use percpu-vm, the percpu region is not guaranteed > > > to be contiguous in physical space. > > > > Petr, > > > > Which are those arches? > > All except nommu. Actually, percpu-km will be used instead even on MMU > if SMP is disabled, but since SMP is pretty standard now, I guess the > vast majority of all kernels out there is affected. ;-) Hi Petr, To make sure I understand it correctly I will just summarize what you said. alloc_percpu() code does not guarantee that an object will be on physically contiguous pages if object crosses page boundary. That's why we are forcing allocation of object aligned to nearest higher power of two boundary of object size and that way object will always be on same page (as long as object is not bigger than a page). Is that a fair summary? Thanks Vivek > > > > However, fs/proc/vmcore.c expects > > > all ELF notes to be contiguous. If the ELF note happens to occupy > > > two non-adjacent physical pages, part of the note may be read from an > > > incorrect memory location by the kdump kernel, resulting in failure to > > > initialize /proc/vmcore (if the content of the following physical page, > > > incorrectly interpreted as an ELF note specifies a large number), wrong > > > register values or other apparent random memory corruption. > > > > > > There is currently no mechanism to pass the virtual-to-physical mapping > > > of the percpu allocation to the kdump kernel. So, instead, I'm changing > > > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less > > > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2 > > > is enough to make sure that the buffer cannot cross a page boundary, > > > effectively ensuring that the whole buffer is contiguous in physical > > > space. > > > > > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz> > > > --- > > > kernel/kexec.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > > index 2bee072..cdab59d 100644 > > > --- a/kernel/kexec.c > > > +++ b/kernel/kexec.c > > > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > > > static int __init crash_notes_memory_init(void) > > > { > > > /* Allocate memory for saving cpu registers. */ > > > - crash_notes = alloc_percpu(note_buf_t); > > > + crash_notes = __alloc_percpu(sizeof(note_buf_t), > > > + roundup_pow_of_two(sizeof(note_buf_t))); > > > > I think some of the changelog should show up here as comment in short > > form. I don't think it is obvious that why we are using __alloc_percpu() > > and why aligning to nearst higher power of 2 is needed here. Please also > > mention here which arches run into issues. > > OK, I'll add it as a comment in the code. I'll see if I can make it > short but still understandable. > > Thanks, > Petr Tesarik > > > Thanks > > Vivek > > > > > if (!crash_notes) { > > > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); > > > return -ENOMEM; > > > -- > > > 1.8.4.5 > > > > > > _______________________________________________ > > > kexec mailing list > > > kexec@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/kexec > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page 2014-09-11 21:16 ` Vivek Goyal @ 2014-09-11 22:15 ` Petr Tesarik 2014-09-12 11:52 ` Vivek Goyal 0 siblings, 1 reply; 7+ messages in thread From: Petr Tesarik @ 2014-09-11 22:15 UTC (permalink / raw) To: Vivek Goyal; +Cc: kexec, Eric Biederman, linux-kernel On Thu, 11 Sep 2014 17:16:37 -0400 Vivek Goyal <vgoyal@redhat.com> wrote: > On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote: > > On Thu, 11 Sep 2014 16:01:10 -0400 > > Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote: > > > > On architectures that use percpu-vm, the percpu region is not guaranteed > > > > to be contiguous in physical space. > > > > > > Petr, > > > > > > Which are those arches? > > > > All except nommu. Actually, percpu-km will be used instead even on MMU > > if SMP is disabled, but since SMP is pretty standard now, I guess the > > vast majority of all kernels out there is affected. ;-) > > Hi Petr, > > To make sure I understand it correctly I will just summarize what you > said. > > alloc_percpu() code does not guarantee that an object will be on physically > contiguous pages if object crosses page boundary. That's why we are forcing > allocation of object aligned to nearest higher power of two boundary of > object size and that way object will always be on same page (as long as object > is not bigger than a page). > > Is that a fair summary? Yes. I might add a note why physically contiguous memory is needed here, but maybe it's obvious to anyone dealing with kdump. Thanks, Petr Tesarik > Thanks > Vivek > > > > > > > However, fs/proc/vmcore.c expects > > > > all ELF notes to be contiguous. If the ELF note happens to occupy > > > > two non-adjacent physical pages, part of the note may be read from an > > > > incorrect memory location by the kdump kernel, resulting in failure to > > > > initialize /proc/vmcore (if the content of the following physical page, > > > > incorrectly interpreted as an ELF note specifies a large number), wrong > > > > register values or other apparent random memory corruption. > > > > > > > > There is currently no mechanism to pass the virtual-to-physical mapping > > > > of the percpu allocation to the kdump kernel. So, instead, I'm changing > > > > the alignment of the ELF note buffer. Since sizeof(note_buf_t) is less > > > > than PAGE_SIZE, aligning the buffer to the nearest higher power of 2 > > > > is enough to make sure that the buffer cannot cross a page boundary, > > > > effectively ensuring that the whole buffer is contiguous in physical > > > > space. > > > > > > > > Signed-off-by: Petr Tesarik <ptesarik@suse.cz> > > > > --- > > > > kernel/kexec.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > > > index 2bee072..cdab59d 100644 > > > > --- a/kernel/kexec.c > > > > +++ b/kernel/kexec.c > > > > @@ -1610,7 +1610,8 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > > > > static int __init crash_notes_memory_init(void) > > > > { > > > > /* Allocate memory for saving cpu registers. */ > > > > - crash_notes = alloc_percpu(note_buf_t); > > > > + crash_notes = __alloc_percpu(sizeof(note_buf_t), > > > > + roundup_pow_of_two(sizeof(note_buf_t))); > > > > > > I think some of the changelog should show up here as comment in short > > > form. I don't think it is obvious that why we are using __alloc_percpu() > > > and why aligning to nearst higher power of 2 is needed here. Please also > > > mention here which arches run into issues. > > > > OK, I'll add it as a comment in the code. I'll see if I can make it > > short but still understandable. > > > > Thanks, > > Petr Tesarik > > > > > Thanks > > > Vivek > > > > > > > if (!crash_notes) { > > > > pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); > > > > return -ENOMEM; > > > > -- > > > > 1.8.4.5 > > > > > > > > _______________________________________________ > > > > kexec mailing list > > > > kexec@lists.infradead.org > > > > http://lists.infradead.org/mailman/listinfo/kexec > > > > > > _______________________________________________ > > > kexec mailing list > > > kexec@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/kexec > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Put each per-cpu kdump ELF notes into a single page 2014-09-11 22:15 ` Petr Tesarik @ 2014-09-12 11:52 ` Vivek Goyal 0 siblings, 0 replies; 7+ messages in thread From: Vivek Goyal @ 2014-09-12 11:52 UTC (permalink / raw) To: Petr Tesarik; +Cc: kexec, Eric Biederman, linux-kernel On Fri, Sep 12, 2014 at 12:15:37AM +0200, Petr Tesarik wrote: > On Thu, 11 Sep 2014 17:16:37 -0400 > Vivek Goyal <vgoyal@redhat.com> wrote: > > > On Thu, Sep 11, 2014 at 10:43:30PM +0200, Petr Tesarik wrote: > > > On Thu, 11 Sep 2014 16:01:10 -0400 > > > Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > On Fri, Sep 05, 2014 at 06:33:14PM +0200, Petr Tesarik wrote: > > > > > On architectures that use percpu-vm, the percpu region is not guaranteed > > > > > to be contiguous in physical space. > > > > > > > > Petr, > > > > > > > > Which are those arches? > > > > > > All except nommu. Actually, percpu-km will be used instead even on MMU > > > if SMP is disabled, but since SMP is pretty standard now, I guess the > > > vast majority of all kernels out there is affected. ;-) > > > > Hi Petr, > > > > To make sure I understand it correctly I will just summarize what you > > said. > > > > alloc_percpu() code does not guarantee that an object will be on physically > > contiguous pages if object crosses page boundary. That's why we are forcing > > allocation of object aligned to nearest higher power of two boundary of > > object size and that way object will always be on same page (as long as object > > is not bigger than a page). > > > > Is that a fair summary? > > Yes. I might add a note why physically contiguous memory is needed > here, but maybe it's obvious to anyone dealing with kdump. I think adding couple of lines to explain why physically contiguous notes are needed is a good idea. It will not be ovious to anybody new to kdump. Thanks Vivek ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-09-12 11:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-05 16:33 [PATCH] Put each per-cpu kdump ELF notes into a single page Petr Tesarik 2014-09-11 19:37 ` Petr Tesarik 2014-09-11 20:01 ` Vivek Goyal 2014-09-11 20:43 ` Petr Tesarik 2014-09-11 21:16 ` Vivek Goyal 2014-09-11 22:15 ` Petr Tesarik 2014-09-12 11:52 ` Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox