Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: Wen Gu <guwen@linux.alibaba.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	shaozhengchao <shaozhengchao@huawei.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree
Date: Fri, 5 Jan 2024 11:50:08 +0100	[thread overview]
Message-ID: <ZZfe4O850fFaxOgQ@pc638.lan> (raw)
In-Reply-To: <238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com>

Hello, Wen Gu.

> 
> Hi Uladzislau Rezki,
> 
> I really like your work, it is great and helpful!
> 
> Currently, I am working on using shared memory communication (SMC [1])
> to transparently accelerate TCP communication between two peers within
> the same OS instance[2].
> 
> In this scenario, a vzalloced kernel buffer acts as a shared memory and
> will be simultaneous read or written by two SMC sockets, thus forming an
> SMC connection.
> 
> 
>           socket1                    socket2
>              |                          ^
>              |                          |    userspace
>       ---- write -------------------- read ------
>              |    +-----------------+   |    kernel
>              +--->|  shared memory  |---+
>                   | (vzalloced now) |
>                   +-----------------+
> 
> Then I encountered the performance regression caused by lock contention
> in find_vmap_area() when multiple threads transfer data through multiple
> SMC connections on machines with many CPUs[3].
> 
> According to perf, the performance bottleneck is caused by the global
> vmap_area_lock contention[4]:
> 
> - writer:
> 
>   smc_tx_sendmsg
>   -> memcpy_from_msg
>      -> copy_from_iter
>         -> check_copy_size
>            -> check_object_size
>               -> if (CONFIG_HARDENED_USERCOPY is set) check_heap_object
>                  -> if(vm) find_vmap_area
>                     -> try to hold vmap_area_lock lock
> - reader:
> 
>   smc_rx_recvmsg
>    -> memcpy_to_msg
>       -> copy_to_iter
>          -> check_copy_size
>             -> check_object_size
>                -> if (CONFIG_HARDENED_USERCOPY is set) check_heap_object
>                   -> if(vm) find_vmap_area
>                      -> try to hold vmap_area_lock lock
> 
> Fortunately, thank you for this patch set, the global vmap_area_lock was
> removed and per node lock vn->busy.lock is introduced. it is really helpful:
> 
> In 48 CPUs qemu environment, the Requests/s increased by 5 times:
> - nginx
> - wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80
> 
>                 vzalloced shmem      vzalloced shmem(with this patch set)
> Requests/sec          113536.56            583729.93
> 
> 
Thank you for the confirmation that your workload is improved. The "nginx"
is 5 times better!

> But it also has some overhead, compared to using kzalloced shared memory
> or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:
> 
>                 kzalloced shmem      vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
> Requests/sec          831950.39            805164.78
> 
> 
The CONFIG_HARDENED_USERCOPY prevents coping "wrong" memory regions. That is 
why if it is a vmalloced memory it wants to make sure it is really true,
if not user-copy is aborted.

So there is an extra work that involves finding a VA associated with an address.

> So, as a newbie in Linux-mm, I would like to ask for some suggestions:
> 
> Is it possible to further eliminate the overhead caused by lock contention
> in find_vmap_area() in this scenario (maybe this is asking too much), or the
> only way out is not setting CONFIG_HARDENED_USERCOPY or not using vzalloced
> buffer in the situation where cocurrent kernel-userspace-copy happens?
> 
Could you please try below patch, if it improves this series further?
Just in case:

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e30dabf68263..40acf53cadfb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -772,7 +772,7 @@ static DEFINE_PER_CPU(struct vmap_area *, ne_fit_preload_node);
 struct rb_list {
 	struct rb_root root;
 	struct list_head head;
-	spinlock_t lock;
+	rwlock_t lock;
 };
 
 struct vmap_pool {
@@ -947,19 +947,19 @@ find_vmap_area_exceed_addr_lock(unsigned long addr, struct vmap_area **va)
 	for (i = 0; i < nr_vmap_nodes; i++) {
 		vn = &vmap_nodes[i];
 
-		spin_lock(&vn->busy.lock);
+		read_lock(&vn->busy.lock);
 		va_lowest = __find_vmap_area_exceed_addr(addr, &vn->busy.root);
 		if (va_lowest) {
 			if (!va_node || va_lowest->va_start < (*va)->va_start) {
 				if (va_node)
-					spin_unlock(&va_node->busy.lock);
+					read_unlock(&va_node->busy.lock);
 
 				*va = va_lowest;
 				va_node = vn;
 				continue;
 			}
 		}
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 	}
 
 	return va_node;
@@ -1695,9 +1695,9 @@ static void free_vmap_area(struct vmap_area *va)
 	/*
 	 * Remove from the busy tree/list.
 	 */
-	spin_lock(&vn->busy.lock);
+	write_lock(&vn->busy.lock);
 	unlink_va(va, &vn->busy.root);
-	spin_unlock(&vn->busy.lock);
+	write_unlock(&vn->busy.lock);
 
 	/*
 	 * Insert/Merge it back to the free tree/list.
@@ -1901,9 +1901,9 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 
 	vn = addr_to_node(va->va_start);
 
-	spin_lock(&vn->busy.lock);
+	write_lock(&vn->busy.lock);
 	insert_vmap_area(va, &vn->busy.root, &vn->busy.head);
-	spin_unlock(&vn->busy.lock);
+	write_unlock(&vn->busy.lock);
 
 	BUG_ON(!IS_ALIGNED(va->va_start, align));
 	BUG_ON(va->va_start < vstart);
@@ -2123,10 +2123,10 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end,
 		if (RB_EMPTY_ROOT(&vn->lazy.root))
 			continue;
 
-		spin_lock(&vn->lazy.lock);
+		write_lock(&vn->lazy.lock);
 		WRITE_ONCE(vn->lazy.root.rb_node, NULL);
 		list_replace_init(&vn->lazy.head, &vn->purge_list);
-		spin_unlock(&vn->lazy.lock);
+		write_unlock(&vn->lazy.lock);
 
 		start = min(start, list_first_entry(&vn->purge_list,
 			struct vmap_area, list)->va_start);
@@ -2223,9 +2223,9 @@ static void free_vmap_area_noflush(struct vmap_area *va)
 	vn = is_vn_id_valid(vn_id) ?
 		id_to_node(vn_id):addr_to_node(va->va_start);
 
-	spin_lock(&vn->lazy.lock);
+	write_lock(&vn->lazy.lock);
 	insert_vmap_area(va, &vn->lazy.root, &vn->lazy.head);
-	spin_unlock(&vn->lazy.lock);
+	write_unlock(&vn->lazy.lock);
 
 	trace_free_vmap_area_noflush(va_start, nr_lazy, nr_lazy_max);
 
@@ -2272,9 +2272,9 @@ struct vmap_area *find_vmap_area(unsigned long addr)
 	do {
 		vn = &vmap_nodes[i];
 
-		spin_lock(&vn->busy.lock);
+		read_lock(&vn->busy.lock);
 		va = __find_vmap_area(addr, &vn->busy.root);
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 
 		if (va)
 			return va;
@@ -2293,11 +2293,11 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
 	do {
 		vn = &vmap_nodes[i];
 
-		spin_lock(&vn->busy.lock);
+		write_lock(&vn->busy.lock);
 		va = __find_vmap_area(addr, &vn->busy.root);
 		if (va)
 			unlink_va(va, &vn->busy.root);
-		spin_unlock(&vn->busy.lock);
+		write_unlock(&vn->busy.lock);
 
 		if (va)
 			return va;
@@ -2514,9 +2514,9 @@ static void free_vmap_block(struct vmap_block *vb)
 	BUG_ON(tmp != vb);
 
 	vn = addr_to_node(vb->va->va_start);
-	spin_lock(&vn->busy.lock);
+	write_lock(&vn->busy.lock);
 	unlink_va(vb->va, &vn->busy.root);
-	spin_unlock(&vn->busy.lock);
+	write_unlock(&vn->busy.lock);
 
 	free_vmap_area_noflush(vb->va);
 	kfree_rcu(vb, rcu_head);
@@ -2942,9 +2942,9 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
 {
 	struct vmap_node *vn = addr_to_node(va->va_start);
 
-	spin_lock(&vn->busy.lock);
+	read_lock(&vn->busy.lock);
 	setup_vmalloc_vm_locked(vm, va, flags, caller);
-	spin_unlock(&vn->busy.lock);
+	read_unlock(&vn->busy.lock);
 }
 
 static void clear_vm_uninitialized_flag(struct vm_struct *vm)
@@ -4214,19 +4214,19 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
 
 	next_va:
 		next = va->va_end;
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 	} while ((vn = find_vmap_area_exceed_addr_lock(next, &va)));
 
 finished_zero:
 	if (vn)
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 
 	/* zero-fill memory holes */
 	return count - remains + zero_iter(iter, remains);
 finished:
 	/* Nothing remains, or We couldn't copy/zero everything. */
 	if (vn)
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 
 	return count - remains;
 }
@@ -4563,11 +4563,11 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 	for (area = 0; area < nr_vms; area++) {
 		struct vmap_node *vn = addr_to_node(vas[area]->va_start);
 
-		spin_lock(&vn->busy.lock);
+		write_lock(&vn->busy.lock);
 		insert_vmap_area(vas[area], &vn->busy.root, &vn->busy.head);
 		setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC,
 				 pcpu_get_vm_areas);
-		spin_unlock(&vn->busy.lock);
+		write_unlock(&vn->busy.lock);
 	}
 
 	/*
@@ -4687,7 +4687,7 @@ bool vmalloc_dump_obj(void *object)
 
 	vn = addr_to_node((unsigned long)objp);
 
-	if (spin_trylock(&vn->busy.lock)) {
+	if (read_trylock(&vn->busy.lock)) {
 		va = __find_vmap_area(addr, &vn->busy.root);
 
 		if (va && va->vm) {
@@ -4697,7 +4697,7 @@ bool vmalloc_dump_obj(void *object)
 			success = true;
 		}
 
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 	}
 
 	if (success)
@@ -4742,13 +4742,13 @@ static void show_purge_info(struct seq_file *m)
 	for (i = 0; i < nr_vmap_nodes; i++) {
 		vn = &vmap_nodes[i];
 
-		spin_lock(&vn->lazy.lock);
+		read_lock(&vn->lazy.lock);
 		list_for_each_entry(va, &vn->lazy.head, list) {
 			seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n",
 				(void *)va->va_start, (void *)va->va_end,
 				va->va_end - va->va_start);
 		}
-		spin_unlock(&vn->lazy.lock);
+		read_unlock(&vn->lazy.lock);
 	}
 }
 
@@ -4762,7 +4762,7 @@ static int vmalloc_info_show(struct seq_file *m, void *p)
 	for (i = 0; i < nr_vmap_nodes; i++) {
 		vn = &vmap_nodes[i];
 
-		spin_lock(&vn->busy.lock);
+		read_lock(&vn->busy.lock);
 		list_for_each_entry(va, &vn->busy.head, list) {
 			if (!va->vm) {
 				if (va->flags & VMAP_RAM)
@@ -4808,7 +4808,7 @@ static int vmalloc_info_show(struct seq_file *m, void *p)
 			show_numa_info(m, v);
 			seq_putc(m, '\n');
 		}
-		spin_unlock(&vn->busy.lock);
+		read_unlock(&vn->busy.lock);
 	}
 
 	/*
@@ -4902,11 +4902,11 @@ static void vmap_init_nodes(void)
 		vn = &vmap_nodes[n];
 		vn->busy.root = RB_ROOT;
 		INIT_LIST_HEAD(&vn->busy.head);
-		spin_lock_init(&vn->busy.lock);
+		rwlock_init(&vn->busy.lock);
 
 		vn->lazy.root = RB_ROOT;
 		INIT_LIST_HEAD(&vn->lazy.head);
-		spin_lock_init(&vn->lazy.lock);
+		rwlock_init(&vn->lazy.lock);
 
 		for (i = 0; i < MAX_VA_SIZE_PAGES; i++) {
 			INIT_LIST_HEAD(&vn->pool[i].head);
<snip>

Thank you!

--
Uladzislau Rezki

next prev parent reply	other threads:[~2024-01-05 10:50 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-02 18:46 [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 01/11] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 02/11] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 03/11] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-05  8:10   ` Wen Gu
2024-01-05 10:50     ` Uladzislau Rezki [this message]
2024-01-06  9:17       ` Wen Gu
2024-01-06 16:36         ` Uladzislau Rezki
2024-01-07  6:59           ` Hillf Danton
2024-01-08  7:45             ` Wen Gu
2024-01-08 18:37               ` Uladzislau Rezki
2024-01-16 23:25   ` Lorenzo Stoakes
2024-01-18 13:15     ` Uladzislau Rezki
2024-01-20 12:55       ` Lorenzo Stoakes
2024-01-22 17:44         ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 05/11] mm/vmalloc: remove vmap_area_list Uladzislau Rezki (Sony)
2024-01-16 23:36   ` Lorenzo Stoakes
2024-01-02 18:46 ` [PATCH v3 06/11] mm: vmalloc: Remove global purge_vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 07/11] mm: vmalloc: Offload free_vmap_area_lock lock Uladzislau Rezki (Sony)
2024-01-03 11:08   ` Hillf Danton
2024-01-03 15:47     ` Uladzislau Rezki
2024-01-11  9:02   ` Dave Chinner
2024-01-11 15:54     ` Uladzislau Rezki
2024-01-11 20:37       ` Dave Chinner
2024-01-12 12:18         ` Uladzislau Rezki
2024-01-16 22:12           ` Dave Chinner
2024-01-18 18:15             ` Uladzislau Rezki
2024-02-08  0:25   ` Baoquan He
2024-02-08 13:57     ` Uladzislau Rezki
2024-02-28  9:48   ` Baoquan He
2024-02-28 10:39     ` Uladzislau Rezki
2024-02-28 12:26       ` Baoquan He
2024-03-22 18:21   ` Guenter Roeck
2024-03-22 19:03     ` Uladzislau Rezki
2024-03-22 20:53       ` Guenter Roeck
2024-01-02 18:46 ` [PATCH v3 08/11] mm: vmalloc: Support multiple nodes in vread_iter Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 09/11] mm: vmalloc: Support multiple nodes in vmallocinfo Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Uladzislau Rezki (Sony)
2024-01-11  9:25   ` Dave Chinner
2024-01-15 19:09     ` Uladzislau Rezki
2024-01-16 22:06       ` Dave Chinner
2024-01-18 18:23         ` Uladzislau Rezki
2024-01-18 21:28           ` Dave Chinner
2024-01-19 10:32             ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 11/11] mm: vmalloc: Add a shrinker to drain vmap pools Uladzislau Rezki (Sony)
2024-02-22  8:35 ` [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki
2024-02-22 23:15   ` Pedro Falcato
2024-02-23  9:34     ` Uladzislau Rezki
2024-02-23 10:26       ` Baoquan He
2024-02-23 11:06         ` Uladzislau Rezki
2024-02-23 15:57           ` Baoquan He
2024-02-23 18:55             ` Uladzislau Rezki
2024-02-28  9:27               ` Baoquan He
2024-02-29 10:38                 ` Uladzislau Rezki

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e30dabf6826 dfblob:40acf53cadf )
 OR (
bs:"Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZfe4O850fFaxOgQ@pc638.lan \
    --to=urezki@gmail.com \
    --cc=guwen@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shaozhengchao@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.