public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator
@ 2022-10-21 11:49 Hou Tao
  2022-10-21 11:49 ` [PATCH bpf v2 1/2] bpf: " Hou Tao
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Hou Tao @ 2022-10-21 11:49 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Stanislav Fomichev
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Jiri Olsa,
	John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to fix one problem of bpf memory allocator destruction
when there is PREEMPT_RT kernel or kernel with arch_irq_work_has_interrupt()
being false (e.g. 1-cpu arm32 host or mips). The root cause is that
there may be busy refill_work when the allocator is destroying and it
may incur oops or other problems as shown in patch #1. Patch #1 fixes
the problem by waiting for the completion of irq work during destroying
and patch #2 is just a clean-up patch based on patch #1. Please see
individual patches for more details.

Comments are always welcome.

Change Log:
v2:
  * patch 1: fix typos and add notes about the overhead of irq_work_sync()
  * patch 1 & 2: add Acked-by tags from sdf@google.com

v1: https://lore.kernel.org/bpf/20221019115539.983394-1-houtao@huaweicloud.com/T/#t

Hou Tao (2):
  bpf: Wait for busy refill_work when destroying bpf memory allocator
  bpf: Use __llist_del_all() whenever possbile during memory draining

 kernel/bpf/memalloc.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH bpf v2 1/2] bpf: Wait for busy refill_work when destroying bpf memory allocator
  2022-10-21 11:49 [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator Hou Tao
@ 2022-10-21 11:49 ` Hou Tao
  2022-10-21 11:49 ` [PATCH bpf v2 2/2] bpf: Use __llist_del_all() whenever possbile during memory draining Hou Tao
  2022-10-22  2:30 ` [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Hou Tao @ 2022-10-21 11:49 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Stanislav Fomichev
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Jiri Olsa,
	John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

A busy irq work is an unfinished irq work and it can be either in the
pending state or in the running state. When destroying bpf memory
allocator, refill_work may be busy for PREEMPT_RT kernel in which irq
work is invoked in a per-CPU RT-kthread. It is also possible for kernel
with arch_irq_work_has_interrupt() being false (e.g. 1-cpu arm32 host or
mips) and irq work is inovked in timer interrupt.

The busy refill_work leads to various issues. The obvious one is that
there will be concurrent operations on free_by_rcu and free_list between
irq work and memory draining. Another one is call_rcu_in_progress will
not be reliable for the checking of pending RCU callback because
do_call_rcu() may have not been invoked by irq work yet. The other is
there will be use-after-free if irq work is freed before the callback
of irq work is invoked as shown below:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor instruction fetch in kernel mode
 #PF: error_code(0x0010) - not-present page
 PGD 12ab94067 P4D 12ab94067 PUD 1796b4067 PMD 0
 Oops: 0010 [#1] PREEMPT_RT SMP
 CPU: 5 PID: 64 Comm: irq_work/5 Not tainted 6.0.0-rt11+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 0018:ffffadc080293e78 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffffcdc07fb6a388 RCX: ffffa05000a2e000
 RDX: ffffa05000a2e000 RSI: ffffffff96cc9827 RDI: ffffcdc07fb6a388
 ......
 Call Trace:
  <TASK>
  irq_work_single+0x24/0x60
  irq_work_run_list+0x24/0x30
  run_irq_workd+0x23/0x30
  smpboot_thread_fn+0x203/0x300
  kthread+0x126/0x150
  ret_from_fork+0x1f/0x30
  </TASK>

Considering the ease of concurrency handling, no overhead for
irq_work_sync() under non-PREEMPT_RT kernel and has-irq-work-interrupt
kernel and the short wait time used for irq_work_sync() under PREEMPT_RT
(When running two test_maps on PREEMPT_RT kernel and 72-cpus host, the
max wait time is about 8ms and the 99th percentile is 10us), just using
irq_work_sync() to wait for busy refill_work to complete before memory
draining and memory freeing.

Fixes: 7c8199e24fa0 ("bpf: Introduce any context BPF specific memory allocator.")
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/memalloc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 2433be58bb85..4e4b3250aada 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -498,6 +498,16 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
 		rcu_in_progress = 0;
 		for_each_possible_cpu(cpu) {
 			c = per_cpu_ptr(ma->cache, cpu);
+			/*
+			 * refill_work may be unfinished for PREEMPT_RT kernel
+			 * in which irq work is invoked in a per-CPU RT thread.
+			 * It is also possible for kernel with
+			 * arch_irq_work_has_interrupt() being false and irq
+			 * work is invoked in timer interrupt. So waiting for
+			 * the completion of irq work to ease the handling of
+			 * concurrency.
+			 */
+			irq_work_sync(&c->refill_work);
 			drain_mem_cache(c);
 			rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
 		}
@@ -512,6 +522,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
 			cc = per_cpu_ptr(ma->caches, cpu);
 			for (i = 0; i < NUM_CACHES; i++) {
 				c = &cc->cache[i];
+				irq_work_sync(&c->refill_work);
 				drain_mem_cache(c);
 				rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
 			}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH bpf v2 2/2] bpf: Use __llist_del_all() whenever possbile during memory draining
  2022-10-21 11:49 [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator Hou Tao
  2022-10-21 11:49 ` [PATCH bpf v2 1/2] bpf: " Hou Tao
@ 2022-10-21 11:49 ` Hou Tao
  2022-10-22  2:30 ` [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Hou Tao @ 2022-10-21 11:49 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Stanislav Fomichev
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Jiri Olsa,
	John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

Except for waiting_for_gp list, there are no concurrent operations on
free_by_rcu, free_llist and free_llist_extra lists, so use
__llist_del_all() instead of llist_del_all(). waiting_for_gp list can be
deleted by RCU callback concurrently, so still use llist_del_all().

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/memalloc.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 4e4b3250aada..8f0d65f2474a 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -423,14 +423,17 @@ static void drain_mem_cache(struct bpf_mem_cache *c)
 	/* No progs are using this bpf_mem_cache, but htab_map_free() called
 	 * bpf_mem_cache_free() for all remaining elements and they can be in
 	 * free_by_rcu or in waiting_for_gp lists, so drain those lists now.
+	 *
+	 * Except for waiting_for_gp list, there are no concurrent operations
+	 * on these lists, so it is safe to use __llist_del_all().
 	 */
 	llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu))
 		free_one(c, llnode);
 	llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp))
 		free_one(c, llnode);
-	llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist))
+	llist_for_each_safe(llnode, t, __llist_del_all(&c->free_llist))
 		free_one(c, llnode);
-	llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra))
+	llist_for_each_safe(llnode, t, __llist_del_all(&c->free_llist_extra))
 		free_one(c, llnode);
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator
  2022-10-21 11:49 [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator Hou Tao
  2022-10-21 11:49 ` [PATCH bpf v2 1/2] bpf: " Hou Tao
  2022-10-21 11:49 ` [PATCH bpf v2 2/2] bpf: Use __llist_del_all() whenever possbile during memory draining Hou Tao
@ 2022-10-22  2:30 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-10-22  2:30 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, ast, sdf, martin.lau, andrii, song, haoluo, yhs, daniel,
	kpsingh, jolsa, john.fastabend, houtao1

Hello:

This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri, 21 Oct 2022 19:49:11 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The patchset aims to fix one problem of bpf memory allocator destruction
> when there is PREEMPT_RT kernel or kernel with arch_irq_work_has_interrupt()
> being false (e.g. 1-cpu arm32 host or mips). The root cause is that
> there may be busy refill_work when the allocator is destroying and it
> may incur oops or other problems as shown in patch #1. Patch #1 fixes
> the problem by waiting for the completion of irq work during destroying
> and patch #2 is just a clean-up patch based on patch #1. Please see
> individual patches for more details.
> 
> [...]

Here is the summary with links:
  - [bpf,v2,1/2] bpf: Wait for busy refill_work when destroying bpf memory allocator
    https://git.kernel.org/bpf/bpf/c/3d05818707bb
  - [bpf,v2,2/2] bpf: Use __llist_del_all() whenever possbile during memory draining
    https://git.kernel.org/bpf/bpf/c/fa4447cb73b2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-22  2:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-21 11:49 [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator Hou Tao
2022-10-21 11:49 ` [PATCH bpf v2 1/2] bpf: " Hou Tao
2022-10-21 11:49 ` [PATCH bpf v2 2/2] bpf: Use __llist_del_all() whenever possbile during memory draining Hou Tao
2022-10-22  2:30 ` [PATCH bpf v2 0/2] Wait for busy refill_work when destroying bpf memory allocator patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox