* XDP/eBPF map thread safety in kernel (e.g. lookup/delete) @ 2024-10-23 11:54 Vadim Goncharov 2024-10-23 12:10 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 5+ messages in thread From: Vadim Goncharov @ 2024-10-23 11:54 UTC (permalink / raw) To: xdp-newbies Hello, Where to find exact documentation about what happens in kernel BPF helpers calls with respect to locking? For example, I have `bpf_map_lookup_elem()` in one thread, then work on pointer, and at this time, another thread does `bpf_map_delete_elem()` for exactly same key. What happens to memory the first thread still continue to work on? Is it now dangling pointer to nowhere? In my particular case it's a bpf_timer callback who does `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if another thread did `lookup` and works already, is it possible to do so (in a performant way)? -- WBR, @nuclight ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete) 2024-10-23 11:54 XDP/eBPF map thread safety in kernel (e.g. lookup/delete) Vadim Goncharov @ 2024-10-23 12:10 ` Toke Høiland-Jørgensen 2024-10-23 12:31 ` Vadim Goncharov [not found] ` <20241023152810.42936dc4@nuclight.lan> 0 siblings, 2 replies; 5+ messages in thread From: Toke Høiland-Jørgensen @ 2024-10-23 12:10 UTC (permalink / raw) To: Vadim Goncharov, xdp-newbies Vadim Goncharov <vadimnuclight@gmail.com> writes: > Hello, > > Where to find exact documentation about what happens in kernel BPF > helpers calls with respect to locking? For example, I have > `bpf_map_lookup_elem()` in one thread, then work on pointer, and at this > time, another thread does `bpf_map_delete_elem()` for exactly same key. > What happens to memory the first thread still continue to work on? Is > it now dangling pointer to nowhere? > > In my particular case it's a bpf_timer callback who does > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if > another thread did `lookup` and works already, is it possible to do so > (in a performant way)? Map elements are RCU protected, so you already get exactly the behaviour you're after: if another thread deletes a map element that you already looked up, the element is guaranteed to stick around in memory until the BPF program exits. It won't be valid anymore *after* that of course, so if you're doing concurrent updates it's you own responsibility to sync appropriately. But there is no risk of the pointer suddenly being invalid in the middle of the program execution (so no crashes) :) -Toke ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete) 2024-10-23 12:10 ` Toke Høiland-Jørgensen @ 2024-10-23 12:31 ` Vadim Goncharov [not found] ` <20241023152810.42936dc4@nuclight.lan> 1 sibling, 0 replies; 5+ messages in thread From: Vadim Goncharov @ 2024-10-23 12:31 UTC (permalink / raw) To: xdp-newbies On Wed, 23 Oct 2024 14:10:11 +0200 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Vadim Goncharov <vadimnuclight@gmail.com> writes: > > > Hello, > > > > Where to find exact documentation about what happens in kernel BPF > > helpers calls with respect to locking? For example, I have > > `bpf_map_lookup_elem()` in one thread, then work on pointer, and at > > this time, another thread does `bpf_map_delete_elem()` for exactly > > same key. What happens to memory the first thread still continue to > > work on? Is it now dangling pointer to nowhere? > > > > In my particular case it's a bpf_timer callback who does > > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if > > another thread did `lookup` and works already, is it possible to do > > so (in a performant way)? > > Map elements are RCU protected, I see this phrase everywhere, but always without details, whether it's about just map structures itself (OK, this minimum is guaranteed) or the user data, too. > so you already get exactly the > behaviour you're after: if another thread deletes a map element that > you already looked up, the element is guaranteed to stick around in > memory until the BPF program exits. > > It won't be valid anymore *after* that of course, so if you're doing > concurrent updates it's you own responsibility to sync appropriately. > But there is no risk of the pointer suddenly being invalid in the > middle of the program execution (so no crashes) :) OK, no crash is half of good, thanks. But how'd I lock it from deletion? A "concurrent updates" are usually about memory area that still persist as a whole, but in my particular case it's a bpf_timer callback who does bpf_map_delete_elem(). This is a conntrack-like map - an entry contains some information about connection, a struct bpf_timer and expire field when this entry should be deleted due to inactivity, based on information in a connection (thus *_LRU map types are not suitable for me). So it's possible for a race condition here: 1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins processing, at end of which bpf_timer_start() will be called to reschedule removal into future. 2. At moment after lookup, timer callback fires and does bpf_map_delete_elem() while first thread is not yet finished. So how do I prevent connection record loss in such scenario? -- WBR, @nuclight ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20241023152810.42936dc4@nuclight.lan>]
[parent not found: <875xphftdq.fsf@toke.dk>]
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete) [not found] ` <875xphftdq.fsf@toke.dk> @ 2024-10-24 21:18 ` Vadim Goncharov 2024-10-25 11:06 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 5+ messages in thread From: Vadim Goncharov @ 2024-10-24 21:18 UTC (permalink / raw) To: Toke Høiland-Jørgensen, xdp-newbies On Thu, 24 Oct 2024 16:49:37 +0200 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Vadim Goncharov <vadimnuclight@gmail.com> writes: > > > On Wed, 23 Oct 2024 14:10:11 +0200 > > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > >> Vadim Goncharov <vadimnuclight@gmail.com> writes: > >> > >> > Hello, > >> > > >> > Where to find exact documentation about what happens in kernel > >> > BPF helpers calls with respect to locking? For example, I have > >> > `bpf_map_lookup_elem()` in one thread, then work on pointer, and > >> > at this time, another thread does `bpf_map_delete_elem()` for > >> > exactly same key. What happens to memory the first thread still > >> > continue to work on? Is it now dangling pointer to nowhere? > >> > > >> > In my particular case it's a bpf_timer callback who does > >> > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if > >> > another thread did `lookup` and works already, is it possible to > >> > do so (in a performant way)? > >> > >> Map elements are RCU protected, > > > > I see this phrase everywhere, but always without details, whether > > it's about just map structures itself (OK, this minimum is > > guaranteed) or the user data, too. > > The user data in a map item is allocated together with the > kernel-internal bits, so this goes for everything. OK, got it. > >> so you already get exactly the > >> behaviour you're after: if another thread deletes a map element > >> that you already looked up, the element is guaranteed to stick > >> around in memory until the BPF program exits. > >> > >> It won't be valid anymore *after* that of course, so if you're > >> doing concurrent updates it's you own responsibility to sync > >> appropriately. But there is no risk of the pointer suddenly being > >> invalid in the middle of the program execution (so no crashes) :) > > > > OK, no crash is half of good, thanks. But how'd I lock it from > > deletion? A "concurrent updates" are usually about memory area that > > still persist as a whole, but in my particular case it's a > > bpf_timer callback who does bpf_map_delete_elem(). This is a > > conntrack-like map - an entry contains some information about > > connection, a struct bpf_timer and expire field when this entry > > should be deleted due to inactivity, based on information in a > > connection (thus *_LRU map types are not suitable for me). So it's > > possible for a race condition here: > > > > 1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins > > processing, at end of which bpf_timer_start() will be called to > > reschedule removal into future. > > > > 2. At moment after lookup, timer callback fires and does > > bpf_map_delete_elem() while first thread is not yet finished. > > > > So how do I prevent connection record loss in such scenario? > > Yeah, there is no protection against this; you will have to do your > own locking to prevent it. Something like putting a spinlock into > your data structure and using that to synchronise access, maybe? Well, if I'll put bpf_spin_lock into map element, then it looks like the following scenario is possible: 1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key) 1a. Timer callback(map, same_key) does bpf_spin_lock on same_key 2. Thread 2 does bpf_spin_lock on same_key and waits. 2a. Timer callback sees lock successful, sets DELETED flag in entry and bpf_map_delete_elem()'s it. 3. Thread 1 unlocked and either updates map entry directly by pointer, or reinserts new entry reinitializing timer. Am I missing some race condition still present here? atomics or something -- WBR, @nuclight ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete) 2024-10-24 21:18 ` Vadim Goncharov @ 2024-10-25 11:06 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 5+ messages in thread From: Toke Høiland-Jørgensen @ 2024-10-25 11:06 UTC (permalink / raw) To: Vadim Goncharov, xdp-newbies Vadim Goncharov <vadimnuclight@gmail.com> writes: >> > OK, no crash is half of good, thanks. But how'd I lock it from >> > deletion? A "concurrent updates" are usually about memory area that >> > still persist as a whole, but in my particular case it's a >> > bpf_timer callback who does bpf_map_delete_elem(). This is a >> > conntrack-like map - an entry contains some information about >> > connection, a struct bpf_timer and expire field when this entry >> > should be deleted due to inactivity, based on information in a >> > connection (thus *_LRU map types are not suitable for me). So it's >> > possible for a race condition here: >> > >> > 1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins >> > processing, at end of which bpf_timer_start() will be called to >> > reschedule removal into future. >> > >> > 2. At moment after lookup, timer callback fires and does >> > bpf_map_delete_elem() while first thread is not yet finished. >> > >> > So how do I prevent connection record loss in such scenario? >> >> Yeah, there is no protection against this; you will have to do your >> own locking to prevent it. Something like putting a spinlock into >> your data structure and using that to synchronise access, maybe? > > Well, if I'll put bpf_spin_lock into map element, then it looks like > the following scenario is possible: > > 1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key) > > 1a. Timer callback(map, same_key) does bpf_spin_lock on same_key > > 2. Thread 2 does bpf_spin_lock on same_key and waits. > > 2a. Timer callback sees lock successful, sets DELETED flag in entry > and bpf_map_delete_elem()'s it. > > 3. Thread 1 unlocked and either updates map entry directly by pointer, > or reinserts new entry reinitializing timer. Yeah, something like this. > Am I missing some race condition still present here? atomics or > something Unfortunately, I think the delete and update can still race. Deletion is based on the key only, there is no cmpxchg() semantics. So I don't see any reason why this sequence of events couldn't happen: - T2 sets the DELETED flag - T2 releases the lock, and then gets preempted - T1 grabs the lock, sees the deleted flag, allocates a new item and inserts it with update() - T2 wakes back up and does a bpf_map_delete() with the expired key, which then ends up deleting the new entry The only way I can think of to avoid this, is if the timer callback uses bpf_map_lookup_and_delete_item(), then rechecks the value and re-inserts it if the DELETED flag disappeared between operations. But this still leaves a window where the entry doesn't exist. Maybe someone else has a good idea for how to avoid this being racy? -Toke ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-25 11:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 11:54 XDP/eBPF map thread safety in kernel (e.g. lookup/delete) Vadim Goncharov
2024-10-23 12:10 ` Toke Høiland-Jørgensen
2024-10-23 12:31 ` Vadim Goncharov
[not found] ` <20241023152810.42936dc4@nuclight.lan>
[not found] ` <875xphftdq.fsf@toke.dk>
2024-10-24 21:18 ` Vadim Goncharov
2024-10-25 11:06 ` Toke Høiland-Jørgensen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.