* [PATCH bpf 0/3] bpf, sockmap: Fix the element replace
@ 2024-12-02 11:29 Michal Luczaj
2024-12-02 11:29 ` [PATCH bpf 1/3] bpf, sockmap: Fix update element with same Michal Luczaj
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Michal Luczaj @ 2024-12-02 11:29 UTC (permalink / raw)
To: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Series takes care of two issues with sockmap update: inconsistent behaviour
after update with same, and race/refcount imbalance on element replace.
I am hesitant if patch 3/3 ("bpf, sockmap: Fix race between element replace
and close()") is the right approach. I might have missed some detail of the
current __sock_map_delete() implementation. I'd be grateful for comments,
thanks.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Michal Luczaj (3):
bpf, sockmap: Fix update element with same
selftest/bpf: Extend test for sockmap update with same
bpf, sockmap: Fix race between element replace and close()
net/core/sock_map.c | 6 +++---
tools/testing/selftests/bpf/prog_tests/sockmap_basic.c | 8 +++++---
2 files changed, 8 insertions(+), 6 deletions(-)
---
base-commit: 537a2525eaf76ea9b0dca62b994500d8670b39d5
change-id: 20241201-sockmap-replace-67c7077f3a31
Best regards,
--
Michal Luczaj <mhal@rbox.co>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf 1/3] bpf, sockmap: Fix update element with same
2024-12-02 11:29 [PATCH bpf 0/3] bpf, sockmap: Fix the element replace Michal Luczaj
@ 2024-12-02 11:29 ` Michal Luczaj
2024-12-09 5:47 ` John Fastabend
2024-12-02 11:29 ` [PATCH bpf 2/3] selftest/bpf: Extend test for sockmap update " Michal Luczaj
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Michal Luczaj @ 2024-12-02 11:29 UTC (permalink / raw)
To: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Consider a sockmap entry being updated with the same socket:
osk = stab->sks[idx];
sock_map_add_link(psock, link, map, &stab->sks[idx]);
stab->sks[idx] = sk;
if (osk)
sock_map_unref(osk, &stab->sks[idx]);
Due to sock_map_unref(), which invokes sock_map_del_link(), all the psock's
links for stab->sks[idx] are torn:
list_for_each_entry_safe(link, tmp, &psock->link, list) {
if (link->link_raw == link_raw) {
...
list_del(&link->list);
sk_psock_free_link(link);
}
}
And that includes the new link sock_map_add_link() added just before the
unref.
This results in a sockmap holding a socket, but without the respective
link. This in turn means that close(sock) won't trigger the cleanup, i.e. a
closed socket will not be automatically removed from the sockmap.
Stop tearing the links when a matching link_raw is found.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/core/sock_map.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 78347d7d25ef31525f8ec0a755a18e5793ad92c0..20b348b1964a10a1b0bfbe1a90a4a4cd99715b81 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -159,6 +159,7 @@ static void sock_map_del_link(struct sock *sk,
verdict_stop = true;
list_del(&link->list);
sk_psock_free_link(link);
+ break;
}
}
spin_unlock_bh(&psock->link_lock);
--
2.46.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf 2/3] selftest/bpf: Extend test for sockmap update with same
2024-12-02 11:29 [PATCH bpf 0/3] bpf, sockmap: Fix the element replace Michal Luczaj
2024-12-02 11:29 ` [PATCH bpf 1/3] bpf, sockmap: Fix update element with same Michal Luczaj
@ 2024-12-02 11:29 ` Michal Luczaj
2024-12-09 5:50 ` John Fastabend
2024-12-02 11:29 ` [PATCH bpf 3/3] bpf, sockmap: Fix race between element replace and close() Michal Luczaj
2024-12-10 16:50 ` [PATCH bpf 0/3] bpf, sockmap: Fix the element replace patchwork-bot+netdevbpf
3 siblings, 1 reply; 9+ messages in thread
From: Michal Luczaj @ 2024-12-02 11:29 UTC (permalink / raw)
To: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Verify that the sockmap link was not severed, and socket's entry is indeed
removed from the map when the corresponding descriptor gets closed.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
tools/testing/selftests/bpf/prog_tests/sockmap_basic.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index fdff0652d7ef5cdd4892af8c2c83cbf18cbf163f..248754296d972286e45d79331e95a8a6ae824590 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -934,8 +934,10 @@ static void test_sockmap_same_sock(void)
err = socketpair(AF_UNIX, SOCK_STREAM, 0, stream);
ASSERT_OK(err, "socketpair(af_unix, sock_stream)");
- if (err)
+ if (err) {
+ close(tcp);
goto out;
+ }
for (i = 0; i < 2; i++) {
err = bpf_map_update_elem(map, &zero, &stream[0], BPF_ANY);
@@ -954,14 +956,14 @@ static void test_sockmap_same_sock(void)
ASSERT_OK(err, "bpf_map_update_elem(tcp)");
}
+ close(tcp);
err = bpf_map_delete_elem(map, &zero);
- ASSERT_OK(err, "bpf_map_delete_elem(entry)");
+ ASSERT_ERR(err, "bpf_map_delete_elem(entry)");
close(stream[0]);
close(stream[1]);
out:
close(dgram);
- close(tcp);
close(udp);
test_sockmap_pass_prog__destroy(skel);
}
--
2.46.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf 3/3] bpf, sockmap: Fix race between element replace and close()
2024-12-02 11:29 [PATCH bpf 0/3] bpf, sockmap: Fix the element replace Michal Luczaj
2024-12-02 11:29 ` [PATCH bpf 1/3] bpf, sockmap: Fix update element with same Michal Luczaj
2024-12-02 11:29 ` [PATCH bpf 2/3] selftest/bpf: Extend test for sockmap update " Michal Luczaj
@ 2024-12-02 11:29 ` Michal Luczaj
2024-12-09 6:11 ` John Fastabend
2024-12-10 16:50 ` [PATCH bpf 0/3] bpf, sockmap: Fix the element replace patchwork-bot+netdevbpf
3 siblings, 1 reply; 9+ messages in thread
From: Michal Luczaj @ 2024-12-02 11:29 UTC (permalink / raw)
To: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Element replace (with a socket different from the one stored) may race with
socket's close() link popping & unlinking. __sock_map_delete()
unconditionally unrefs the (wrong) element:
// set map[0] = s0
map_update_elem(map, 0, s0)
// drop fd of s0
close(s0)
sock_map_close()
lock_sock(sk) (s0!)
sock_map_remove_links(sk)
link = sk_psock_link_pop()
sock_map_unlink(sk, link)
sock_map_delete_from_link
// replace map[0] with s1
map_update_elem(map, 0, s1)
sock_map_update_elem
(s1!) lock_sock(sk)
sock_map_update_common
psock = sk_psock(sk)
spin_lock(&stab->lock)
osk = stab->sks[idx]
sock_map_add_link(..., &stab->sks[idx])
sock_map_unref(osk, &stab->sks[idx])
psock = sk_psock(osk)
sk_psock_put(sk, psock)
if (refcount_dec_and_test(&psock))
sk_psock_drop(sk, psock)
spin_unlock(&stab->lock)
unlock_sock(sk)
__sock_map_delete
spin_lock(&stab->lock)
sk = *psk // s1 replaced s0; sk == s1
if (!sk_test || sk_test == sk) // sk_test (s0) != sk (s1); no branch
sk = xchg(psk, NULL)
if (sk)
sock_map_unref(sk, psk) // unref s1; sks[idx] will dangle
psock = sk_psock(sk)
sk_psock_put(sk, psock)
if (refcount_dec_and_test())
sk_psock_drop(sk, psock)
spin_unlock(&stab->lock)
release_sock(sk)
Then close(map) enqueues bpf_map_free_deferred, which finally calls
sock_map_free(). This results in some refcount_t warnings along with a
KASAN splat[1].
Fix __sock_map_delete(), do not allow sock_map_unref() on elements that may
have been replaced.
[1]:
BUG: KASAN: slab-use-after-free in sock_map_free+0x10e/0x330
Write of size 4 at addr ffff88811f5b9100 by task kworker/u64:12/1063
CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Not tainted 6.12.0+ #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Workqueue: events_unbound bpf_map_free_deferred
Call Trace:
<TASK>
dump_stack_lvl+0x68/0x90
print_report+0x174/0x4f6
kasan_report+0xb9/0x190
kasan_check_range+0x10f/0x1e0
sock_map_free+0x10e/0x330
bpf_map_free_deferred+0x173/0x320
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x29e/0x360
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 1202:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x10/0x30
__kasan_slab_alloc+0x85/0x90
kmem_cache_alloc_noprof+0x131/0x450
sk_prot_alloc+0x5b/0x220
sk_alloc+0x2c/0x870
unix_create1+0x88/0x8a0
unix_create+0xc5/0x180
__sock_create+0x241/0x650
__sys_socketpair+0x1ce/0x420
__x64_sys_socketpair+0x92/0x100
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 46:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x37/0x60
__kasan_slab_free+0x4b/0x70
kmem_cache_free+0x1a1/0x590
__sk_destruct+0x388/0x5a0
sk_psock_destroy+0x73e/0xa50
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x29e/0x360
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
The buggy address belongs to the object at ffff88811f5b9080
which belongs to the cache UNIX-STREAM of size 1984
The buggy address is located 128 bytes inside of
freed 1984-byte region [ffff88811f5b9080, ffff88811f5b9840)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11f5b8
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff888127d49401
flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
head: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
head: 0017ffffc0000003 ffffea00047d6e01 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88811f5b9000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88811f5b9080: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88811f5b9100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88811f5b9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88811f5b9200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Disabling lock debugging due to kernel taint
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 14 PID: 1063 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G B 6.12.0+ #125
Tainted: [B]=BAD_PAGE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Workqueue: events_unbound bpf_map_free_deferred
RIP: 0010:refcount_warn_saturate+0xce/0x150
Code: 34 73 eb 03 01 e8 82 53 ad fe 0f 0b eb b1 80 3d 27 73 eb 03 00 75 a8 48 c7 c7 80 bd 95 84 c6 05 17 73 eb 03 01 e8 62 53 ad fe <0f> 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05
RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcde6349
R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
FS: 0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
<TASK>
? __warn.cold+0x5f/0x1ff
? refcount_warn_saturate+0xce/0x150
? report_bug+0x1ec/0x390
? handle_bug+0x58/0x90
? exc_invalid_op+0x13/0x40
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0xce/0x150
sock_map_free+0x2e5/0x330
bpf_map_free_deferred+0x173/0x320
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x29e/0x360
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
irq event stamp: 10741
hardirqs last enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
softirqs last enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
refcount_t: underflow; use-after-free.
WARNING: CPU: 14 PID: 1063 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G B W 6.12.0+ #125
Tainted: [B]=BAD_PAGE, [W]=WARN
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Workqueue: events_unbound bpf_map_free_deferred
RIP: 0010:refcount_warn_saturate+0xee/0x150
Code: 17 73 eb 03 01 e8 62 53 ad fe 0f 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05 f6 72 eb 03 01 e8 42 53 ad fe <0f> 0b e9 6e ff ff ff 80 3d e6 72 eb 03 00 0f 85 61 ff ff ff 48 c7
RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcde6349
R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
FS: 0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
<TASK>
? __warn.cold+0x5f/0x1ff
? refcount_warn_saturate+0xee/0x150
? report_bug+0x1ec/0x390
? handle_bug+0x58/0x90
? exc_invalid_op+0x13/0x40
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0xee/0x150
sock_map_free+0x2d3/0x330
bpf_map_free_deferred+0x173/0x320
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x29e/0x360
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
irq event stamp: 10741
hardirqs last enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
softirqs last enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/core/sock_map.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 20b348b1964a10a1b0bfbe1a90a4a4cd99715b81..f1b9b3958792cd599efcb591742874e9b3f4a76b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -412,12 +412,11 @@ static void *sock_map_lookup_sys(struct bpf_map *map, void *key)
static int __sock_map_delete(struct bpf_stab *stab, struct sock *sk_test,
struct sock **psk)
{
- struct sock *sk;
+ struct sock *sk = NULL;
int err = 0;
spin_lock_bh(&stab->lock);
- sk = *psk;
- if (!sk_test || sk_test == sk)
+ if (!sk_test || sk_test == *psk)
sk = xchg(psk, NULL);
if (likely(sk))
--
2.46.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH bpf 1/3] bpf, sockmap: Fix update element with same
2024-12-02 11:29 ` [PATCH bpf 1/3] bpf, sockmap: Fix update element with same Michal Luczaj
@ 2024-12-09 5:47 ` John Fastabend
2024-12-09 9:54 ` Michal Luczaj
0 siblings, 1 reply; 9+ messages in thread
From: John Fastabend @ 2024-12-09 5:47 UTC (permalink / raw)
To: Michal Luczaj, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Michal Luczaj wrote:
> Consider a sockmap entry being updated with the same socket:
>
> osk = stab->sks[idx];
> sock_map_add_link(psock, link, map, &stab->sks[idx]);
> stab->sks[idx] = sk;
> if (osk)
> sock_map_unref(osk, &stab->sks[idx]);
>
> Due to sock_map_unref(), which invokes sock_map_del_link(), all the psock's
> links for stab->sks[idx] are torn:
>
> list_for_each_entry_safe(link, tmp, &psock->link, list) {
> if (link->link_raw == link_raw) {
> ...
> list_del(&link->list);
> sk_psock_free_link(link);
> }
> }
>
> And that includes the new link sock_map_add_link() added just before the
> unref.
>
> This results in a sockmap holding a socket, but without the respective
> link. This in turn means that close(sock) won't trigger the cleanup, i.e. a
> closed socket will not be automatically removed from the sockmap.
>
> Stop tearing the links when a matching link_raw is found.
>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
Thanks. LGTM.
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
> net/core/sock_map.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 78347d7d25ef31525f8ec0a755a18e5793ad92c0..20b348b1964a10a1b0bfbe1a90a4a4cd99715b81 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -159,6 +159,7 @@ static void sock_map_del_link(struct sock *sk,
> verdict_stop = true;
> list_del(&link->list);
> sk_psock_free_link(link);
> + break;
> }
> }
> spin_unlock_bh(&psock->link_lock);
>
> --
> 2.46.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH bpf 2/3] selftest/bpf: Extend test for sockmap update with same
2024-12-02 11:29 ` [PATCH bpf 2/3] selftest/bpf: Extend test for sockmap update " Michal Luczaj
@ 2024-12-09 5:50 ` John Fastabend
0 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2024-12-09 5:50 UTC (permalink / raw)
To: Michal Luczaj, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Michal Luczaj wrote:
> Verify that the sockmap link was not severed, and socket's entry is indeed
> removed from the map when the corresponding descriptor gets closed.
>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH bpf 3/3] bpf, sockmap: Fix race between element replace and close()
2024-12-02 11:29 ` [PATCH bpf 3/3] bpf, sockmap: Fix race between element replace and close() Michal Luczaj
@ 2024-12-09 6:11 ` John Fastabend
0 siblings, 0 replies; 9+ messages in thread
From: John Fastabend @ 2024-12-09 6:11 UTC (permalink / raw)
To: Michal Luczaj, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Jakub Sitnicki, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev, Michal Luczaj
Michal Luczaj wrote:
> Element replace (with a socket different from the one stored) may race with
> socket's close() link popping & unlinking. __sock_map_delete()
> unconditionally unrefs the (wrong) element:
>
> // set map[0] = s0
> map_update_elem(map, 0, s0)
>
> // drop fd of s0
> close(s0)
> sock_map_close()
> lock_sock(sk) (s0!)
> sock_map_remove_links(sk)
> link = sk_psock_link_pop()
> sock_map_unlink(sk, link)
> sock_map_delete_from_link
> // replace map[0] with s1
> map_update_elem(map, 0, s1)
> sock_map_update_elem
> (s1!) lock_sock(sk)
> sock_map_update_common
> psock = sk_psock(sk)
> spin_lock(&stab->lock)
> osk = stab->sks[idx]
> sock_map_add_link(..., &stab->sks[idx])
> sock_map_unref(osk, &stab->sks[idx])
> psock = sk_psock(osk)
> sk_psock_put(sk, psock)
> if (refcount_dec_and_test(&psock))
> sk_psock_drop(sk, psock)
> spin_unlock(&stab->lock)
> unlock_sock(sk)
> __sock_map_delete
> spin_lock(&stab->lock)
> sk = *psk // s1 replaced s0; sk == s1
> if (!sk_test || sk_test == sk) // sk_test (s0) != sk (s1); no branch
> sk = xchg(psk, NULL)
> if (sk)
> sock_map_unref(sk, psk) // unref s1; sks[idx] will dangle
> psock = sk_psock(sk)
> sk_psock_put(sk, psock)
> if (refcount_dec_and_test())
> sk_psock_drop(sk, psock)
> spin_unlock(&stab->lock)
> release_sock(sk)
>
> Then close(map) enqueues bpf_map_free_deferred, which finally calls
> sock_map_free(). This results in some refcount_t warnings along with a
> KASAN splat[1].
>
[...]
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> net/core/sock_map.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 20b348b1964a10a1b0bfbe1a90a4a4cd99715b81..f1b9b3958792cd599efcb591742874e9b3f4a76b 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -412,12 +412,11 @@ static void *sock_map_lookup_sys(struct bpf_map *map, void *key)
> static int __sock_map_delete(struct bpf_stab *stab, struct sock *sk_test,
> struct sock **psk)
> {
> - struct sock *sk;
> + struct sock *sk = NULL;
> int err = 0;
>
> spin_lock_bh(&stab->lock);
> - sk = *psk;
> - if (!sk_test || sk_test == sk)
> + if (!sk_test || sk_test == *psk)
> sk = xchg(psk, NULL);
>
> if (likely(sk))
>
> --
> 2.46.2
>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf 1/3] bpf, sockmap: Fix update element with same
2024-12-09 5:47 ` John Fastabend
@ 2024-12-09 9:54 ` Michal Luczaj
0 siblings, 0 replies; 9+ messages in thread
From: Michal Luczaj @ 2024-12-09 9:54 UTC (permalink / raw)
To: John Fastabend, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Jakub Sitnicki, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: bpf, linux-kselftest, netdev
On 12/9/24 06:47, John Fastabend wrote:
> Michal Luczaj wrote:
>> Consider a sockmap entry being updated with the same socket:
>>
>> osk = stab->sks[idx];
>> sock_map_add_link(psock, link, map, &stab->sks[idx]);
>> stab->sks[idx] = sk;
>> if (osk)
>> sock_map_unref(osk, &stab->sks[idx]);
>>
>> Due to sock_map_unref(), which invokes sock_map_del_link(), all the psock's
>> links for stab->sks[idx] are torn:
>>
>> list_for_each_entry_safe(link, tmp, &psock->link, list) {
>> if (link->link_raw == link_raw) {
>> ...
>> list_del(&link->list);
>> sk_psock_free_link(link);
>> }
>> }
>>
>> And that includes the new link sock_map_add_link() added just before the
>> unref.
>>
>> This results in a sockmap holding a socket, but without the respective
>> link. This in turn means that close(sock) won't trigger the cleanup, i.e. a
>> closed socket will not be automatically removed from the sockmap.
>>
>> Stop tearing the links when a matching link_raw is found.
>>
>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>> ---
>
> Thanks. LGTM.
>
> Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Thanks, and sorry for a missing tag:
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf 0/3] bpf, sockmap: Fix the element replace
2024-12-02 11:29 [PATCH bpf 0/3] bpf, sockmap: Fix the element replace Michal Luczaj
` (2 preceding siblings ...)
2024-12-02 11:29 ` [PATCH bpf 3/3] bpf, sockmap: Fix race between element replace and close() Michal Luczaj
@ 2024-12-10 16:50 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-12-10 16:50 UTC (permalink / raw)
To: Michal Luczaj
Cc: andrii, eddyz87, mykolal, ast, daniel, martin.lau, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
jakub, davem, edumazet, kuba, pabeni, horms, bpf, linux-kselftest,
netdev
Hello:
This series was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:
On Mon, 02 Dec 2024 12:29:22 +0100 you wrote:
> Series takes care of two issues with sockmap update: inconsistent behaviour
> after update with same, and race/refcount imbalance on element replace.
>
> I am hesitant if patch 3/3 ("bpf, sockmap: Fix race between element replace
> and close()") is the right approach. I might have missed some detail of the
> current __sock_map_delete() implementation. I'd be grateful for comments,
> thanks.
>
> [...]
Here is the summary with links:
- [bpf,1/3] bpf, sockmap: Fix update element with same
https://git.kernel.org/bpf/bpf/c/75e072a390da
- [bpf,2/3] selftest/bpf: Extend test for sockmap update with same
https://git.kernel.org/bpf/bpf/c/11d5245f608f
- [bpf,3/3] bpf, sockmap: Fix race between element replace and close()
https://git.kernel.org/bpf/bpf/c/ed1fc5d76b81
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-12-10 16:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-02 11:29 [PATCH bpf 0/3] bpf, sockmap: Fix the element replace Michal Luczaj
2024-12-02 11:29 ` [PATCH bpf 1/3] bpf, sockmap: Fix update element with same Michal Luczaj
2024-12-09 5:47 ` John Fastabend
2024-12-09 9:54 ` Michal Luczaj
2024-12-02 11:29 ` [PATCH bpf 2/3] selftest/bpf: Extend test for sockmap update " Michal Luczaj
2024-12-09 5:50 ` John Fastabend
2024-12-02 11:29 ` [PATCH bpf 3/3] bpf, sockmap: Fix race between element replace and close() Michal Luczaj
2024-12-09 6:11 ` John Fastabend
2024-12-10 16:50 ` [PATCH bpf 0/3] bpf, sockmap: Fix the element replace patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).