All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: rulinhuang <rulin.huang@intel.com>
Cc: akpm@linux-foundation.org, colin.king@intel.com,
	hch@infradead.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, lstoakes@gmail.com, tianyou.li@intel.com,
	tim.c.chen@intel.com, urezki@gmail.com, wangyang.guo@intel.com,
	zhiguo.zhou@intel.com
Subject: Re: [PATCH v3] mm/vmalloc: lock contention optimization under multi-threading
Date: Wed, 21 Feb 2024 09:36:44 +0100	[thread overview]
Message-ID: <ZdW2HB-XdAJKph5s@pc636> (raw)
In-Reply-To: <20240221032905.11392-1-rulin.huang@intel.com>

On Tue, Feb 20, 2024 at 10:29:05PM -0500, rulinhuang wrote:
> When allocating a new memory area where the mapping address range is
> known, it is observed that the vmap_area lock is acquired twice.
> The first acquisition occurs in the alloc_vmap_area() function when
> inserting the vm area into the vm mapping red-black tree. The second
> acquisition occurs in the setup_vmalloc_vm() function when updating the
> properties of the vm, such as flags and address, etc.
> Combine these two operations together in alloc_vmap_area(), which
> improves scalability when the vmap_area lock is contended. By doing so,
> the need to acquire the lock twice can also be eliminated.
> With the above change, tested on intel icelake platform(160 vcpu, kernel
> v6.7), a 6% performance improvement and a 7% reduction in overall
> spinlock hotspot are gained on
> stress-ng/pthread(https://github.com/ColinIanKing/stress-ng), which is
> the stress test of thread creations.
> 
> Reviewed-by: Chen Tim C <tim.c.chen@intel.com>
> Reviewed-by: King Colin <colin.king@intel.com>
> Signed-off-by: rulinhuang <rulin.huang@intel.com>
> ---
> V1 -> V2: Avoided the partial initialization issue of vm and 
> separated insert_vmap_area() from alloc_vmap_area()
> V2 -> V3: Rebased on 6.8-rc5
> ---
>  mm/vmalloc.c | 36 +++++++++++++++++++++---------------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d12a17fc0c17..768e45f2ed94 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1630,17 +1630,18 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	va->vm = NULL;
>  	va->flags = va_flags;
>  
> -	spin_lock(&vmap_area_lock);
> -	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
> -	spin_unlock(&vmap_area_lock);
> -
>  	BUG_ON(!IS_ALIGNED(va->va_start, align));
>  	BUG_ON(va->va_start < vstart);
>  	BUG_ON(va->va_end > vend);
>  
>  	ret = kasan_populate_vmalloc(addr, size);
>  	if (ret) {
> -		free_vmap_area(va);
> +		/*
> +		 * Insert/Merge it back to the free tree/list.
> +		 */
> +		spin_lock(&free_vmap_area_lock);
> +		merge_or_add_vmap_area_augment(va, &free_vmap_area_root, &free_vmap_area_list);
> +		spin_unlock(&free_vmap_area_lock);
>  		return ERR_PTR(ret);
>  	}
>  
> @@ -1669,6 +1670,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	return ERR_PTR(-EBUSY);
>  }
>  
> +static inline void insert_vmap_area_with_lock(struct vmap_area *va)
> +{
> +	spin_lock(&vmap_area_lock);
> +	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
> +	spin_unlock(&vmap_area_lock);
> +}
> +
>  int register_vmap_purge_notifier(struct notifier_block *nb)
>  {
>  	return blocking_notifier_chain_register(&vmap_notify_list, nb);
> @@ -2045,6 +2053,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  		return ERR_CAST(va);
>  	}
>  
> +	insert_vmap_area_with_lock(va);
> +
>  	vaddr = vmap_block_vaddr(va->va_start, 0);
>  	spin_lock_init(&vb->lock);
>  	vb->va = va;
> @@ -2398,6 +2408,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
>  		if (IS_ERR(va))
>  			return NULL;
>  
> +		insert_vmap_area_with_lock(va);
> +
>  		addr = va->va_start;
>  		mem = (void *)addr;
>  	}
> @@ -2538,7 +2550,7 @@ static void vmap_init_free_space(void)
>  	}
>  }
>  
> -static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
> +static inline void setup_vmalloc_vm(struct vm_struct *vm,
>  	struct vmap_area *va, unsigned long flags, const void *caller)
>  {
>  	vm->flags = flags;
> @@ -2548,14 +2560,6 @@ static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
>  	va->vm = vm;
>  }
>  
> -static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
> -			      unsigned long flags, const void *caller)
> -{
> -	spin_lock(&vmap_area_lock);
> -	setup_vmalloc_vm_locked(vm, va, flags, caller);
> -	spin_unlock(&vmap_area_lock);
> -}
> -
>  static void clear_vm_uninitialized_flag(struct vm_struct *vm)
>  {
>  	/*
> @@ -2600,6 +2604,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>  
>  	setup_vmalloc_vm(area, va, flags, caller);
>  
> +	insert_vmap_area_with_lock(va);
> +
>
>  	/*
>  	 * Mark pages for non-VM_ALLOC mappings as accessible. Do it now as a
>  	 * best-effort approach, as they can be mapped outside of vmalloc code.
> @@ -4166,7 +4172,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  	for (area = 0; area < nr_vms; area++) {
>  		insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list);
>  
> -		setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC,
> +		setup_vmalloc_vm(vms[area], vas[area], VM_ALLOC,
>  				 pcpu_get_vm_areas);
>  	}
>  	spin_unlock(&vmap_area_lock);
> 
> base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d
> -- 
> 2.43.0
> 
Spreading the insert_vmap_area_lock() across several callers, like:
__get_vm_area_node(), new_vmap_block(), vm_map_ram(), etc is not good
approach, simply because it changes the behaviour and people might
miss this point.

Could you please re-spin it on the mm-unstable, because the vmalloc
code was changes a lot? From my side i can check and help you how to
fix it in a better way. Because the v3 should be improved anyaway.

Apparently i have not seen you messages for some reason, i do not
understand why. I started to get emails with below topic:

"Bounce probe for linux-kernel@vger.kernel.org (no action required)"

--
Uladzislau Rezki


  reply	other threads:[~2024-02-21  8:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-07  3:30 [PATCH] mm/vmalloc: lock contention optimization under multi-threading rulinhuang
2024-02-07  9:24 ` Uladzislau Rezki
2024-02-09 11:51   ` rulinhuang
2024-02-20  9:05     ` [PATCH v2] " rulinhuang
2024-02-20 19:54       ` Andrew Morton
2024-02-21  3:34         ` rulinhuang
2024-02-20  9:12   ` [PATCH] " rulinhuang
2024-02-21  8:38     ` Uladzislau Rezki
2024-02-21  3:29 ` [PATCH v3] " rulinhuang
2024-02-21  8:36   ` Uladzislau Rezki [this message]
2024-02-22 12:09     ` rulinhuang
2024-02-22 12:10     ` rulinhuang
2024-02-22 12:52       ` Uladzislau Rezki
2024-02-22 15:36         ` Baoquan He
2024-02-23 13:09         ` rulinhuang
2024-02-22 12:05 ` [PATCH v4] " rulinhuang
2024-02-23 13:03 ` [PATCH v5] " rulinhuang
2024-02-23 14:03   ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdW2HB-XdAJKph5s@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=colin.king@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=rulin.huang@intel.com \
    --cc=tianyou.li@intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=wangyang.guo@intel.com \
    --cc=zhiguo.zhou@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.