All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Hailong Liu <hailong.liu@oppo.com>
Cc: Uladzislau Rezki <urezki@gmail.com>, Baoquan He <bhe@redhat.com>,
	Nick Bowler <nbowler@draconx.ca>,
	linux-kernel@vger.kernel.org,
	Linux regressions mailing list <regressions@lists.linux.dev>,
	linux-mm@kvack.org, sparclinux@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
Date: Wed, 26 Jun 2024 12:51:42 +0200	[thread overview]
Message-ID: <Znvyvof4CnFETJ-v@pc636> (raw)
In-Reply-To: <20240626100342.2dudj6fjjx6srban@oppo.com>

On Wed, Jun 26, 2024 at 06:03:42PM +0800, Hailong Liu wrote:
> On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > /**
> > > > > > > > >  * cpumask_next - get the next cpu in a cpumask
> > > > > > > > >  * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > >  * @srcp: the cpumask pointer
> > > > > > > > >  *
> > > > > > > > >  * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > >
> > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > be possible CPU.
> > > > > > > >
> > > > > > > > Do I miss some corner cases?
> > > > > > > >
> > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > >
> > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > >
> > > > > > > <snip>
> > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > --- a/mm/vmalloc.c
> > > > > > > +++ b/mm/vmalloc.c
> > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > >  addr_to_vb_xa(unsigned long addr)
> > > > > > >  {
> > > > > > >         int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > +       int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > >
> > > > > > > -       return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > +       return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > <snip>
> > > > > > >
> > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > >
> > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > by the nr_cpu_ids.
> > > > > > >
> > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > easier to back-port on stable kernels.
> > > > > >
> > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > formal reviewing.
> > > > > >
> > > > > Agreed! The patch about hash array i will post later.
> > > > >
> > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > this moment.
> > > > > >
> > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > {
> > > > > >         return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > }
> > > > > >
> > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > >
> > > > Checked. There is a difference:
> > > >
> > > > 1. Default
> > > >
> > > > <snip>
> > > > ...
> > > > +   15.95%     6.05%  [kernel]        [k] __vmap_pages_range_noflush
> > > > +   15.91%     1.74%  [kernel]        [k] addr_to_vb_xa <---------------
> > > > +   15.13%    12.05%  [kernel]        [k] vunmap_p4d_range
> > > > +   14.17%    13.38%  [kernel]        [k] __find_nth_bit <--------------
> > > > +   10.62%     0.00%  [kernel]        [k] ret_from_fork_asm
> > > > +   10.62%     0.00%  [kernel]        [k] ret_from_fork
> > > > +   10.62%     0.00%  [kernel]        [k] kthread
> > > > ...
> > > > <snip>
> > > >
> > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > >
> > > > <snip>
> > > > ...
> > > > +    6.84%     0.29%  [kernel]          [k] alloc_vmap_area
> > > > +    6.80%     6.70%  [kernel]          [k] native_queued_spin_lock_slowpath
> > > > +    4.24%     0.09%  [kernel]          [k] free_vmap_block
> > > > +    2.41%     2.38%  [kernel]          [k] addr_to_vb_xa <-----------
> > > > +    1.94%     1.91%  [kernel]          [k] xas_start
> > > > ...
> > > > <snip>
> > > >
> > > > It is _worth_ to check if an index is in possible mask:
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > >  static struct xarray *
> > > >  addr_to_vb_xa(unsigned long addr)
> > > >  {
> > > > -       int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > +       int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > IIUC, use nr_cpu_ids here maybe incorrect.
> > >
> > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > might return 64.
> > >
> > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> >
> > What i miss here?
> >
> Sorry, I forget to reply to all :), I write a demo to test as follows:
> 
> static int cpumask_init(void)
> {
>        struct cpumask mask;
>        unsigned int cpu_id;
>        cpumask_clear(&mask);
> 
>        cpumask_set_cpu(1, &mask);
>        cpumask_set_cpu(3, &mask);
>        cpumask_set_cpu(5, &mask);
> 
>        cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
>        pr_info("cpu_id:%d\n", cpu_id);
> 
>        for (; i < nr_cpu_ids; i++) {
>                pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
>        }
> 
>        return 0;
> }
> 
> [    1.337020][    T1] cpu_id:6
> [    1.337338][    T1] 0: cpu_1
> [    1.337558][    T1] 1: cpu_3
> [    1.337751][    T1] 2: cpu_5
> [    1.337960][    T1] 3: cpu_64
> [    1.338183][    T1] 4: cpu_64
> [    1.338387][    T1] 5: cpu_64
> [    1.338594][    T1] 6: cpu_64
> 
> In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
> 
OK, i misread the cpumask_nth(). We should go with *_next() variant instead.

Thank you for pointing this. Below is updated version with extra comment:

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 45e1506d58c3..03b82fb8ecd3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,7 +2542,15 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
 static struct xarray *
 addr_to_vb_xa(unsigned long addr)
 {
-       int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+       int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+
+       /*
+        * Please note, nr_cpu_ids points on a highest set
+        * possible bit, i.e. we never invoke cpumask_next()
+        * if an index points on it which is nr_cpu_ids - 1.
+        */
+       if (!cpu_possible(index))
+               index = cpumask_next(index, cpu_possible_mask);
 
        return &per_cpu(vmap_block_queue, index).vmap_blocks;
 }
<snip>

Thanks!

--
Uladzislau Rezki

  parent reply	other threads:[~2024-06-26 10:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20  6:19 PROBLEM: kernel crashes when running xfsdump since ~6.4 Nick Bowler
2024-06-20  6:37 ` Hailong Liu
2024-06-20 14:36   ` Nick Bowler
2024-06-20 18:02 ` Nick Bowler
2024-06-21  3:30   ` Hailong Liu
2024-06-21  7:07     ` Baoquan He
2024-06-21  9:44       ` Uladzislau Rezki
2024-06-21 10:45         ` Hailong Liu
2024-06-21 11:15         ` Hailong Liu
2024-06-24 12:18           ` Uladzislau Rezki
2024-06-25  9:26             ` Hailong Liu
2024-06-25  9:55               ` Uladzislau Rezki
2024-06-21 13:42         ` Michael Kelley
2024-06-24 12:17           ` Uladzislau Rezki
2024-06-21 14:02         ` Baoquan He
2024-06-24 12:16           ` Uladzislau Rezki
2024-06-25  3:30             ` Baoquan He
2024-06-25 10:32               ` Uladzislau Rezki
2024-06-25 11:40                 ` Baoquan He
2024-06-25 12:40                   ` Uladzislau Rezki
2024-06-25 13:02                     ` Baoquan He
2024-06-25 15:33                       ` Uladzislau Rezki
2024-06-25 15:49                         ` Baoquan He
2024-06-25 16:49                           ` Uladzislau Rezki
2024-06-25 20:05                             ` Uladzislau Rezki
2024-06-26  0:38                               ` Baoquan He
2024-06-26  5:12                               ` Hailong Liu
2024-06-26  9:15                                 ` Uladzislau Rezki
2024-06-26 10:03                                   ` Hailong Liu
2024-06-26 10:51                                     ` Baoquan He
2024-06-26 10:53                                       ` Uladzislau Rezki
2024-06-26 11:30                                       ` Hailong Liu
2024-06-26 11:45                                         ` Uladzislau Rezki
2024-06-26 10:51                                     ` Uladzislau Rezki [this message]
2024-06-26 13:34                                       ` Nick Bowler
2024-06-26 13:38                                         ` Uladzislau Rezki
2024-06-25 11:19               ` Hailong Liu
2024-06-25 12:41               ` Uladzislau Rezki
2024-06-24 12:20   ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Znvyvof4CnFETJ-v@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hailong.liu@oppo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nbowler@draconx.ca \
    --cc=regressions@lists.linux.dev \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.