[PATCH bpf-next v3] bpf: Check map->usercnt after timer->timer is assigned

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v3] bpf: Check map->usercnt after timer->timer is assigned
@ 2023-10-30  6:36 Hou Tao
  2023-11-02  6:00 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 2+ messages in thread
From: Hou Tao @ 2023-10-30  6:36 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Hsin-Wei Hung,
	houtao1

From: Hou Tao <houtao1@huawei.com>

When there are concurrent uref release and bpf timer init operations,
the following sequence diagram is possible. It will break the guarantee
provided by bpf_timer: bpf_timer will still be alive after userspace
application releases or unpins the map. It also will lead to kmemleak
for old kernel version which doesn't release bpf_timer when map is
released.

bpf program X:

bpf_timer_init()
  lock timer->lock
    read timer->timer as NULL
    read map->usercnt != 0

                process Y:

                close(map_fd)
                  // put last uref
                  bpf_map_put_uref()
                    atomic_dec_and_test(map->usercnt)
                      array_map_free_timers()
                        bpf_timer_cancel_and_free()
                          // just return
                          read timer->timer is NULL

    t = bpf_map_kmalloc_node()
    timer->timer = t
  unlock timer->lock

Fix the problem by checking map->usercnt after timer->timer is assigned,
so when there are concurrent uref release and bpf timer init, either
bpf_timer_cancel_and_free() from uref release reads a no-NULL timer
or the newly-added atomic64_read() returns a zero usercnt.

Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer)
in bpf_timer_cancel_and_free() are not protected by a lock, so add
a memory barrier to guarantee the order between map->usercnt and
timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless
read of timer->timer in bpf_timer_cancel_and_free().

Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
v3:
 * patch #1: only check map->usercnt once and call kfree() with
             spin-lock acquired in error handling path (Alexei)
             update the commit messsage to explain that the patch only
             fixes the broken-guarantee problem for bpf_timer. The
             kmemleak problem will be fixed by another patchset.
 * patch #2: remove the selftest patch because it demonstrates the
             use-after-free problem for map-in-map and the kmemleak
             problem is just the superficial phenomenon. It will be
             re-added and refined in another patchset.

v2: https://lore.kernel.org/bpf/20231020014214.2471419-1-houtao@huaweicloud.com
  * patch #1: use smp_mb() instead of smp_mb__before_atomic()
  * patch #2: use WRITE_ONCE(timer->timer, x) to match the lockless read
              of timer->timer

v1: https://lore.kernel.org/bpf/20231017125717.241101-1-houtao@huaweicloud.com

 kernel/bpf/helpers.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e46ac288a1080..aed93df5c8aa0 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1177,13 +1177,6 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
 		ret = -EBUSY;
 		goto out;
 	}
-	if (!atomic64_read(&map->usercnt)) {
-		/* maps with timers must be either held by user space
-		 * or pinned in bpffs.
-		 */
-		ret = -EPERM;
-		goto out;
-	}
 	/* allocate hrtimer via map_kmalloc to use memcg accounting */
 	t = bpf_map_kmalloc_node(map, sizeof(*t), GFP_ATOMIC, map->numa_node);
 	if (!t) {
@@ -1196,7 +1189,21 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
 	rcu_assign_pointer(t->callback_fn, NULL);
 	hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
 	t->timer.function = bpf_timer_cb;
-	timer->timer = t;
+	WRITE_ONCE(timer->timer, t);
+	/* Guarantee the order between timer->timer and map->usercnt. So
+	 * when there are concurrent uref release and bpf timer init, either
+	 * bpf_timer_cancel_and_free() called by uref release reads a no-NULL
+	 * timer or atomic64_read() below returns a zero usercnt.
+	 */
+	smp_mb();
+	if (!atomic64_read(&map->usercnt)) {
+		/* maps with timers must be either held by user space
+		 * or pinned in bpffs.
+		 */
+		WRITE_ONCE(timer->timer, NULL);
+		kfree(t);
+		ret = -EPERM;
+	}
 out:
 	__bpf_spin_unlock_irqrestore(&timer->lock);
 	return ret;
@@ -1374,7 +1381,7 @@ void bpf_timer_cancel_and_free(void *val)
 	/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
 	 * this timer, since it won't be initialized.
 	 */
-	timer->timer = NULL;
+	WRITE_ONCE(timer->timer, NULL);
 out:
 	__bpf_spin_unlock_irqrestore(&timer->lock);
 	if (!t)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH bpf-next v3] bpf: Check map->usercnt after timer->timer is assigned
  2023-10-30  6:36 [PATCH bpf-next v3] bpf: Check map->usercnt after timer->timer is assigned Hou Tao
@ 2023-11-02  6:00 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 2+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-11-02  6:00 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, martin.lau, alexei.starovoitov, andrii, song, haoluo,
	yonghong.song, daniel, kpsingh, sdf, jolsa, john.fastabend,
	hsinweih, houtao1

Hello:

This patch was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Mon, 30 Oct 2023 14:36:16 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> When there are concurrent uref release and bpf timer init operations,
> the following sequence diagram is possible. It will break the guarantee
> provided by bpf_timer: bpf_timer will still be alive after userspace
> application releases or unpins the map. It also will lead to kmemleak
> for old kernel version which doesn't release bpf_timer when map is
> released.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3] bpf: Check map->usercnt after timer->timer is assigned
    https://git.kernel.org/bpf/bpf/c/fd381ce60a2d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-11-02  6:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-30  6:36 [PATCH bpf-next v3] bpf: Check map->usercnt after timer->timer is assigned Hou Tao
2023-11-02  6:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox