From: Baoquan He <bhe@redhat.com>
To: Hailong Liu <hailong.liu@oppo.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Nick Bowler <nbowler@draconx.ca>,
linux-kernel@vger.kernel.org,
Linux regressions mailing list <regressions@lists.linux.dev>,
linux-mm@kvack.org, sparclinux@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
Date: Wed, 26 Jun 2024 18:51:32 +0800 [thread overview]
Message-ID: <ZnvytLzoLrVwymXv@MiWiFi-R3L-srv> (raw)
In-Reply-To: <20240626100342.2dudj6fjjx6srban@oppo.com>
On 06/26/24 at 06:03pm, Hailong Liu wrote:
> On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > /**
> > > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > > *
> > > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > >
> > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > be possible CPU.
> > > > > > > >
> > > > > > > > Do I miss some corner cases?
> > > > > > > >
> > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > >
> > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > >
> > > > > > > <snip>
> > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > --- a/mm/vmalloc.c
> > > > > > > +++ b/mm/vmalloc.c
> > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > {
> > > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > >
> > > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > <snip>
> > > > > > >
> > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > >
> > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > by the nr_cpu_ids.
> > > > > > >
> > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > easier to back-port on stable kernels.
> > > > > >
> > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > formal reviewing.
> > > > > >
> > > > > Agreed! The patch about hash array i will post later.
> > > > >
> > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > this moment.
> > > > > >
> > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > {
> > > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > }
> > > > > >
> > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > >
> > > > Checked. There is a difference:
> > > >
> > > > 1. Default
> > > >
> > > > <snip>
> > > > ...
> > > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > > + 10.62% 0.00% [kernel] [k] kthread
> > > > ...
> > > > <snip>
> > > >
> > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > >
> > > > <snip>
> > > > ...
> > > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > > + 1.94% 1.91% [kernel] [k] xas_start
> > > > ...
> > > > <snip>
> > > >
> > > > It is _worth_ to check if an index is in possible mask:
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > static struct xarray *
> > > > addr_to_vb_xa(unsigned long addr)
> > > > {
> > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > IIUC, use nr_cpu_ids here maybe incorrect.
> > >
> > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > might return 64.
> > >
> > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> >
> > What i miss here?
> >
> Sorry, I forget to reply to all :), I write a demo to test as follows:
>
> static int cpumask_init(void)
> {
> struct cpumask mask;
> unsigned int cpu_id;
> cpumask_clear(&mask);
>
> cpumask_set_cpu(1, &mask);
> cpumask_set_cpu(3, &mask);
> cpumask_set_cpu(5, &mask);
>
> cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
> pr_info("cpu_id:%d\n", cpu_id);
>
> for (; i < nr_cpu_ids; i++) {
> pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
> }
>
> return 0;
> }
>
> [ 1.337020][ T1] cpu_id:6
> [ 1.337338][ T1] 0: cpu_1
> [ 1.337558][ T1] 1: cpu_3
> [ 1.337751][ T1] 2: cpu_5
> [ 1.337960][ T1] 3: cpu_64
> [ 1.338183][ T1] 4: cpu_64
> [ 1.338387][ T1] 5: cpu_64
> [ 1.338594][ T1] 6: cpu_64
>
> In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
I think just using below change for a quick fix is enough. It doesn't
have the issue cpumask_nth() has and very simple. For most of systems,
it only adds an extra cpu_possible(idex) checking.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 633363997dec..59a8951cc6c0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
- int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+
+ if (!cpu_possible(idex))
+ index = cpumask_next(index, cpu_possible_mask);
return &per_cpu(vmap_block_queue, index).vmap_blocks;
}
next prev parent reply other threads:[~2024-06-26 10:51 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 6:19 PROBLEM: kernel crashes when running xfsdump since ~6.4 Nick Bowler
2024-06-20 6:37 ` Hailong Liu
2024-06-20 14:36 ` Nick Bowler
2024-06-20 18:02 ` Nick Bowler
2024-06-21 3:30 ` Hailong Liu
2024-06-21 7:07 ` Baoquan He
2024-06-21 9:44 ` Uladzislau Rezki
2024-06-21 10:45 ` Hailong Liu
2024-06-21 11:15 ` Hailong Liu
2024-06-24 12:18 ` Uladzislau Rezki
2024-06-25 9:26 ` Hailong Liu
2024-06-25 9:55 ` Uladzislau Rezki
2024-06-21 13:42 ` Michael Kelley
2024-06-24 12:17 ` Uladzislau Rezki
2024-06-21 14:02 ` Baoquan He
2024-06-24 12:16 ` Uladzislau Rezki
2024-06-25 3:30 ` Baoquan He
2024-06-25 10:32 ` Uladzislau Rezki
2024-06-25 11:40 ` Baoquan He
2024-06-25 12:40 ` Uladzislau Rezki
2024-06-25 13:02 ` Baoquan He
2024-06-25 15:33 ` Uladzislau Rezki
2024-06-25 15:49 ` Baoquan He
2024-06-25 16:49 ` Uladzislau Rezki
2024-06-25 20:05 ` Uladzislau Rezki
2024-06-26 0:38 ` Baoquan He
2024-06-26 5:12 ` Hailong Liu
2024-06-26 9:15 ` Uladzislau Rezki
2024-06-26 10:03 ` Hailong Liu
2024-06-26 10:51 ` Baoquan He [this message]
2024-06-26 10:53 ` Uladzislau Rezki
2024-06-26 11:30 ` Hailong Liu
2024-06-26 11:45 ` Uladzislau Rezki
2024-06-26 10:51 ` Uladzislau Rezki
2024-06-26 13:34 ` Nick Bowler
2024-06-26 13:38 ` Uladzislau Rezki
2024-06-25 11:19 ` Hailong Liu
2024-06-25 12:41 ` Uladzislau Rezki
2024-06-24 12:20 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnvytLzoLrVwymXv@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hailong.liu@oppo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nbowler@draconx.ca \
--cc=regressions@lists.linux.dev \
--cc=sparclinux@vger.kernel.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.