public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed
From: Dave Young <dyoung@redhat.com>
To: Tim Hartrick <tim@edgecast.com>
Cc: Tejun Heo <tj@kernel.org>, WANG Cong <xiyou.wangcong@gmail.com>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] percpu: fix chunk range calculation
Date: Tue, 22 Nov 2011 10:52:24 +0800	[thread overview]
Message-ID: <4ECB0E68.9060600@redhat.com> (raw)
In-Reply-To: <CAMMEr5my3pt0HXWd1EdwAeZKAJ8JP04tM_WqvHdCNvW=Q3ifvg@mail.gmail.com>

On 11/22/2011 12:20 AM, Tim Hartrick wrote:

> Dave,
> 
> I will give it a try ASAP.  Unfortunately, ASAP won't be until Friday.


never mind, vmcore_init works in my test, I think this patch fixed the
bug even though there's still other kdump issue with mainline kernel

> 
> tim
> 
> On Nov 20, 2011 6:43 PM, "Dave Young" <dyoung@redhat.com
> <mailto:dyoung@redhat.com>> wrote:
> 
>     On 11/19/2011 02:55 AM, Tejun Heo wrote:
> 
>     > Percpu allocator recorded the cpus which map to the first and last
>     > units in pcpu_first/last_unit_cpu respectively and used them to
>     > determine the address range of a chunk - e.g. it assumed that the
>     > first unit has the lowest address in a chunk while the last unit has
>     > the highest address.
>     >
>     > This simply isn't true.  Groups in a chunk can have arbitrary positive
>     > or negative offsets from the previous one and there is no guarantee
>     > that the first unit occupies the lowest offset while the last one the
>     > highest.
>     >
>     > Fix it by actually comparing unit offsets to determine cpus occupying
>     > the lowest and highest offsets.  Also, rename pcu_first/last_unit_cpu
>     > to pcpu_low/high_unit_cpu to avoid confusion.
>     >
>     > The chunk address range is used to flush cache on vmalloc area
>     > map/unmap and decide whether a given address is in the first chunk by
>     > per_cpu_ptr_to_phys() and the bug was discovered by invalid
>     > per_cpu_ptr_to_phys() translation for crash_note.
>     >
>     > Kudos to Dave Young for tracking down the problem.
> 
> 
>     Tejun, thanks
> 
>     Now that if addr is not in first trunk it must be in vmalloc area, below
>     logic should be right:
>            if (in_first_chunk) {
>                    if (!is_vmalloc_addr(addr))
>                            return __pa(addr);
>                    else
>                            return page_to_phys(vmalloc_to_page(addr));
>            } else
>                    if (!is_vmalloc_addr(addr)) /* not possible */
>                            return __pa(addr);
>                    else
>                            return page_to_phys(pcpu_addr_to_page(addr));
> 
>     So how about just simply remove in first chunk checking to simplify the
>     code as followging:
> 
>     phys_addr_t per_cpu_ptr_to_phys(void *addr)
>     {
>            if (!is_vmalloc_addr(addr))
>                    return __pa(addr);
>            else
>                    return page_to_phys(pcpu_addr_to_page(addr));
>     }
> 
>     >
>     > Signed-off-by: Tejun Heo <tj@kernel.org <mailto:tj@kernel.org>>
>     > Reported-by: WANG Cong <xiyou.wangcong@gmail.com
>     <mailto:xiyou.wangcong@gmail.com>>
>     > Reported-by: Dave Young <dyoung@redhat.com <mailto:dyoung@redhat.com>>
>     > LKML-Reference: <4EC21F67.10905@redhat.com
>     <mailto:4EC21F67.10905@redhat.com>>
>     > Cc: stable @kernel.org <http://kernel.org>
>     > ---
>     > Heh, that's a stupid bug.  Can you please verify this fixes the
>     > problem?
> 
> 
>     I can confirm that the per cpu translation works with this patch, but I
>     have not managed to finished kdump test because kexec tools refuse
>      to load the crash kernel. I need more debugging on kexec tools.
> 
>     Tim, could you help to test if this patch works for your kdump case?
> 
>     >
>     > Thank you very much.
>     >
>     >  mm/percpu-vm.c |   12 ++++++------
>     >  mm/percpu.c    |   34 ++++++++++++++++++++--------------
>     >  2 files changed, 26 insertions(+), 20 deletions(-)
>     >
>     > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
>     > index ea53496..bfad724 100644
>     > --- a/mm/percpu-vm.c
>     > +++ b/mm/percpu-vm.c
>     > @@ -143,8 +143,8 @@ static void pcpu_pre_unmap_flush(struct
>     pcpu_chunk *chunk,
>     >                                int page_start, int page_end)
>     >  {
>     >       flush_cache_vunmap(
>     > -             pcpu_chunk_addr(chunk, pcpu_first_unit_cpu, page_start),
>     > -             pcpu_chunk_addr(chunk, pcpu_last_unit_cpu, page_end));
>     > +             pcpu_chunk_addr(chunk, pcpu_low_unit_cpu, page_start),
>     > +             pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end));
>     >  }
>     >
>     >  static void __pcpu_unmap_pages(unsigned long addr, int nr_pages)
>     > @@ -206,8 +206,8 @@ static void pcpu_post_unmap_tlb_flush(struct
>     pcpu_chunk *chunk,
>     >                                     int page_start, int page_end)
>     >  {
>     >       flush_tlb_kernel_range(
>     > -             pcpu_chunk_addr(chunk, pcpu_first_unit_cpu, page_start),
>     > -             pcpu_chunk_addr(chunk, pcpu_last_unit_cpu, page_end));
>     > +             pcpu_chunk_addr(chunk, pcpu_low_unit_cpu, page_start),
>     > +             pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end));
>     >  }
>     >
>     >  static int __pcpu_map_pages(unsigned long addr, struct page **pages,
>     > @@ -284,8 +284,8 @@ static void pcpu_post_map_flush(struct
>     pcpu_chunk *chunk,
>     >                               int page_start, int page_end)
>     >  {
>     >       flush_cache_vmap(
>     > -             pcpu_chunk_addr(chunk, pcpu_first_unit_cpu, page_start),
>     > -             pcpu_chunk_addr(chunk, pcpu_last_unit_cpu, page_end));
>     > +             pcpu_chunk_addr(chunk, pcpu_low_unit_cpu, page_start),
>     > +             pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end));
>     >  }
>     >
>     >  /**
>     > diff --git a/mm/percpu.c b/mm/percpu.c
>     > index bf80e55..93b5a7c 100644
>     > --- a/mm/percpu.c
>     > +++ b/mm/percpu.c
>     > @@ -116,9 +116,9 @@ static int pcpu_atom_size __read_mostly;
>     >  static int pcpu_nr_slots __read_mostly;
>     >  static size_t pcpu_chunk_struct_size __read_mostly;
>     >
>     > -/* cpus with the lowest and highest unit numbers */
>     > -static unsigned int pcpu_first_unit_cpu __read_mostly;
>     > -static unsigned int pcpu_last_unit_cpu __read_mostly;
>     > +/* cpus with the lowest and highest unit addresses */
>     > +static unsigned int pcpu_low_unit_cpu __read_mostly;
>     > +static unsigned int pcpu_high_unit_cpu __read_mostly;
>     >
>     >  /* the address of the first chunk which starts with the kernel
>     static area */
>     >  void *pcpu_base_addr __read_mostly;
>     > @@ -984,19 +984,19 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr)
>     >  {
>     >       void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
>     >       bool in_first_chunk = false;
>     > -     unsigned long first_start, first_end;
>     > +     unsigned long first_low, first_high;
>     >       unsigned int cpu;
>     >
>     >       /*
>     > -      * The following test on first_start/end isn't strictly
>     > +      * The following test on unit_low/high isn't strictly
>     >        * necessary but will speed up lookups of addresses which
>     >        * aren't in the first chunk.
>     >        */
>     > -     first_start = pcpu_chunk_addr(pcpu_first_chunk,
>     pcpu_first_unit_cpu, 0);
>     > -     first_end = pcpu_chunk_addr(pcpu_first_chunk,
>     pcpu_last_unit_cpu,
>     > -                                 pcpu_unit_pages);
>     > -     if ((unsigned long)addr >= first_start &&
>     > -         (unsigned long)addr < first_end) {
>     > +     first_low = pcpu_chunk_addr(pcpu_first_chunk,
>     pcpu_low_unit_cpu, 0);
>     > +     first_high = pcpu_chunk_addr(pcpu_first_chunk,
>     pcpu_high_unit_cpu,
>     > +                                  pcpu_unit_pages);
>     > +     if ((unsigned long)addr >= first_low &&
>     > +         (unsigned long)addr < first_high) {
>     >               for_each_possible_cpu(cpu) {
>     >                       void *start = per_cpu_ptr(base, cpu);
>     >
>     > @@ -1233,7 +1233,9 @@ int __init pcpu_setup_first_chunk(const
>     struct pcpu_alloc_info *ai,
>     >
>     >       for (cpu = 0; cpu < nr_cpu_ids; cpu++)
>     >               unit_map[cpu] = UINT_MAX;
>     > -     pcpu_first_unit_cpu = NR_CPUS;
>     > +
>     > +     pcpu_low_unit_cpu = NR_CPUS;
>     > +     pcpu_high_unit_cpu = NR_CPUS;
>     >
>     >       for (group = 0, unit = 0; group < ai->nr_groups; group++,
>     unit += i) {
>     >               const struct pcpu_group_info *gi = &ai->groups[group];
>     > @@ -1253,9 +1255,13 @@ int __init pcpu_setup_first_chunk(const
>     struct pcpu_alloc_info *ai,
>     >                       unit_map[cpu] = unit + i;
>     >                       unit_off[cpu] = gi->base_offset + i *
>     ai->unit_size;
>     >
>     > -                     if (pcpu_first_unit_cpu == NR_CPUS)
>     > -                             pcpu_first_unit_cpu = cpu;
>     > -                     pcpu_last_unit_cpu = cpu;
>     > +                     /* determine low/high unit_cpu */
>     > +                     if (pcpu_low_unit_cpu == NR_CPUS ||
>     > +                         unit_off[cpu] < unit_off[pcpu_low_unit_cpu])
>     > +                             pcpu_low_unit_cpu = cpu;
>     > +                     if (pcpu_high_unit_cpu == NR_CPUS ||
>     > +                         unit_off[cpu] >
>     unit_off[pcpu_high_unit_cpu])
>     > +                             pcpu_high_unit_cpu = cpu;
>     >               }
>     >       }
>     >       pcpu_nr_units = unit;
>     >
>     > _______________________________________________
>     > kexec mailing list
>     > kexec@lists.infradead.org <mailto:kexec@lists.infradead.org>
>     > http://lists.infradead.org/mailman/listinfo/kexec
> 
> 
> 
>     --
>     Thanks
>     Dave
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec



-- 
Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2011-11-22  2:50 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-11 23:39 Crash during vmcore_init Tim Hartrick
2011-11-14 13:39 ` WANG Cong
2011-11-14 18:50   ` Tim Hartrick
2011-11-15  8:14     ` Dave Young
2011-11-15 13:47       ` Américo Wang
2011-11-15 13:50         ` Américo Wang
2011-11-15 22:32       ` Tim Hartrick
2011-11-16  2:22         ` Dave Young
2011-11-16 18:20           ` Tim Hartrick
2011-11-17  3:30             ` Dave Young
2011-11-17  4:34               ` Tejun Heo
2011-11-17  4:46                 ` Dave Young
2011-11-17  5:22                   ` Tim Hartrick
2011-11-17  7:21                     ` Dave Young
2011-11-17  7:23                       ` Tejun Heo
2011-11-17  7:42                         ` Américo Wang
2011-11-17 16:40                       ` Tim Hartrick
2011-11-18  8:43                         ` Dave Young
2011-11-18  8:45                           ` Dave Young
2011-11-18 18:55                             ` [PATCH] percpu: fix chunk range calculation Tejun Heo
2011-11-21  1:45                               ` Dave Young
2011-11-21 16:20                                 ` Tim Hartrick
2011-11-22  2:52                                   ` Dave Young [this message]
2011-11-21 17:01                                 ` Tejun Heo
2011-11-22  3:00                                   ` Dave Young
2011-11-22 16:02                                     ` Tejun Heo
2011-11-21 21:10                               ` Tejun Heo
2011-11-22  2:48                                 ` Dave Young
2011-11-22 16:19                                   ` Tejun Heo
2011-11-15 14:13     ` Crash during vmcore_init Américo Wang
2011-11-15 22:57       ` Tim Hartrick
2011-11-16 12:47         ` Américo Wang
2011-11-16 13:19           ` Tim Hartrick
2011-11-16 13:31             ` Américo Wang
2011-11-16 13:44               ` Tim Hartrick
     [not found]               ` <1321462343.4198.29.camel@boudreau>
2011-11-17  6:48                 ` Américo Wang
2011-11-17 16:08                   ` Tim Hartrick
2011-11-17 16:31                     ` Tim Hartrick
2011-11-16 15:52           ` Tim Hartrick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECB0E68.9060600@redhat.com \
    --to=dyoung@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tim@edgecast.com \
    --cc=tj@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox