All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Nick Bowler <nbowler@draconx.ca>,
	Hailong Liu <hailong.liu@oppo.com>,
	linux-kernel@vger.kernel.org,
	Linux regressions mailing list <regressions@lists.linux.dev>,
	linux-mm@kvack.org, sparclinux@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
Date: Tue, 25 Jun 2024 11:30:33 +0800	[thread overview]
Message-ID: <Zno52QBG0g5Z+otD@MiWiFi-R3L-srv> (raw)
In-Reply-To: <Znljtv5n-6EBgpsF@pc636>

On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > ......
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > >  static struct xarray *
> > > >  addr_to_vb_xa(unsigned long addr)
> > > >  {
> > > > -	int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > +	int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > >  
> > > >  	return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > >  }
> > > > 
> > > The problem i see is about not-initializing of the:
> > > <snip>
> > > 	for_each_possible_cpu(i) {
> > > 		struct vmap_block_queue *vbq;
> > > 		struct vfree_deferred *p;
> > > 
> > > 		vbq = &per_cpu(vmap_block_queue, i);
> > > 		spin_lock_init(&vbq->lock);
> > > 		INIT_LIST_HEAD(&vbq->free);
> > > 		p = &per_cpu(vfree_deferred, i);
> > > 		init_llist_head(&p->list);
> > > 		INIT_WORK(&p->wq, delayed_vfree_work);
> > > 		xa_init(&vbq->vmap_blocks);
> > > 	}
> > > <snip>
> > > 
> > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > can be non sequential :-/
> > > 
> > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > 
> > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > are different with my understanding.
> > 
> > kernel/smp.c
> > void __init setup_nr_cpu_ids(void)
> > {
> >         set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > }
> > 
> I see that it is not a weak function, so it is generic, thus the
> behavior can not be overwritten, which is great. This does what we
> need.
> 
> Thank you for checking this you are right!

Thanks for confirming this.

> 
> Then it is just a matter of proper initialization of the hash:
> 
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 5d3aa2dc88a8..1733946f7a12 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
>          */
>         vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
>  
> -       for_each_possible_cpu(i) {
> +       /*
> +        * We use "nr_cpu_ids" here because some architectures
> +        * may have "gaps" in cpu-possible-mask. It is OK for
> +        * per-cpu approaches but is not OK for cases where it
> +        * can be used as hashes also.
> +        */
> +       for (i = 0; i < nr_cpu_ids; i++) {

I was wrong about earlier comments. Percpu variables are only available
on possible CPUs. For those nonexistent possible CPUs of static percpu
variable vmap_block_queue, there isn't memory allocated and mapped for
them. So accessing into them will cause problem.

In Nick's case, there are only CPU0, CPU2. If you access
&per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
change to take other way for vbq. E.g:
1) Storing the vb in the nearest neighbouring vbq on possible CPU as
   below draft patch;
2) create an normal array to store vbq of size nr_cpu_ids, then we can
   store/fetch each vbq on non-possible CPU?

The way 1) is simpler, the existing code can be adapted a little just as
below.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 633363997dec..59a8951cc6c0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
 static struct xarray *
 addr_to_vb_xa(unsigned long addr)
 {
-	int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+	int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+
+	if (!cpu_possible(idex))
+		index = cpumask_next(index, cpu_possible_mask);
 
 	return &per_cpu(vmap_block_queue, index).vmap_blocks;
 }
@@ -2556,9 +2559,15 @@ addr_to_vb_xa(unsigned long addr)
 
 static unsigned long addr_to_vb_idx(unsigned long addr)
 {
+	int id = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+	int id_dest = id; 
+
+	if (!cpu_possible(id))
+		id_dest = cpumask_next(id, cpu_possible_mask);
+
 	addr -= VMALLOC_START & ~(VMAP_BLOCK_SIZE-1);
 	addr /= VMAP_BLOCK_SIZE;
-	return addr;
+	return addr + (id_dest - id);
 }


  reply	other threads:[~2024-06-25  3:30 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20  6:19 PROBLEM: kernel crashes when running xfsdump since ~6.4 Nick Bowler
2024-06-20  6:37 ` Hailong Liu
2024-06-20 14:36   ` Nick Bowler
2024-06-20 18:02 ` Nick Bowler
2024-06-21  3:30   ` Hailong Liu
2024-06-21  7:07     ` Baoquan He
2024-06-21  9:44       ` Uladzislau Rezki
2024-06-21 10:45         ` Hailong Liu
2024-06-21 11:15         ` Hailong Liu
2024-06-24 12:18           ` Uladzislau Rezki
2024-06-25  9:26             ` Hailong Liu
2024-06-25  9:55               ` Uladzislau Rezki
2024-06-21 13:42         ` Michael Kelley
2024-06-24 12:17           ` Uladzislau Rezki
2024-06-21 14:02         ` Baoquan He
2024-06-24 12:16           ` Uladzislau Rezki
2024-06-25  3:30             ` Baoquan He [this message]
2024-06-25 10:32               ` Uladzislau Rezki
2024-06-25 11:40                 ` Baoquan He
2024-06-25 12:40                   ` Uladzislau Rezki
2024-06-25 13:02                     ` Baoquan He
2024-06-25 15:33                       ` Uladzislau Rezki
2024-06-25 15:49                         ` Baoquan He
2024-06-25 16:49                           ` Uladzislau Rezki
2024-06-25 20:05                             ` Uladzislau Rezki
2024-06-26  0:38                               ` Baoquan He
2024-06-26  5:12                               ` Hailong Liu
2024-06-26  9:15                                 ` Uladzislau Rezki
2024-06-26 10:03                                   ` Hailong Liu
2024-06-26 10:51                                     ` Baoquan He
2024-06-26 10:53                                       ` Uladzislau Rezki
2024-06-26 11:30                                       ` Hailong Liu
2024-06-26 11:45                                         ` Uladzislau Rezki
2024-06-26 10:51                                     ` Uladzislau Rezki
2024-06-26 13:34                                       ` Nick Bowler
2024-06-26 13:38                                         ` Uladzislau Rezki
2024-06-25 11:19               ` Hailong Liu
2024-06-25 12:41               ` Uladzislau Rezki
2024-06-24 12:20   ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zno52QBG0g5Z+otD@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hailong.liu@oppo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nbowler@draconx.ca \
    --cc=regressions@lists.linux.dev \
    --cc=sparclinux@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.