* XDP/eBPF map thread safety in kernel (e.g. lookup/delete)
@ 2024-10-23 11:54 Vadim Goncharov
2024-10-23 12:10 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 5+ messages in thread
From: Vadim Goncharov @ 2024-10-23 11:54 UTC (permalink / raw)
To: xdp-newbies
Hello,
Where to find exact documentation about what happens in kernel BPF
helpers calls with respect to locking? For example, I have
`bpf_map_lookup_elem()` in one thread, then work on pointer, and at this
time, another thread does `bpf_map_delete_elem()` for exactly same key.
What happens to memory the first thread still continue to work on? Is
it now dangling pointer to nowhere?
In my particular case it's a bpf_timer callback who does
`bpf_map_delete_elem()`. I'd prefer for it to not delete entry if
another thread did `lookup` and works already, is it possible to do so
(in a performant way)?
--
WBR, @nuclight
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)
2024-10-23 11:54 XDP/eBPF map thread safety in kernel (e.g. lookup/delete) Vadim Goncharov
@ 2024-10-23 12:10 ` Toke Høiland-Jørgensen
2024-10-23 12:31 ` Vadim Goncharov
[not found] ` <20241023152810.42936dc4@nuclight.lan>
0 siblings, 2 replies; 5+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-10-23 12:10 UTC (permalink / raw)
To: Vadim Goncharov, xdp-newbies
Vadim Goncharov <vadimnuclight@gmail.com> writes:
> Hello,
>
> Where to find exact documentation about what happens in kernel BPF
> helpers calls with respect to locking? For example, I have
> `bpf_map_lookup_elem()` in one thread, then work on pointer, and at this
> time, another thread does `bpf_map_delete_elem()` for exactly same key.
> What happens to memory the first thread still continue to work on? Is
> it now dangling pointer to nowhere?
>
> In my particular case it's a bpf_timer callback who does
> `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if
> another thread did `lookup` and works already, is it possible to do so
> (in a performant way)?
Map elements are RCU protected, so you already get exactly the behaviour
you're after: if another thread deletes a map element that you already
looked up, the element is guaranteed to stick around in memory until the
BPF program exits.
It won't be valid anymore *after* that of course, so if you're doing
concurrent updates it's you own responsibility to sync appropriately.
But there is no risk of the pointer suddenly being invalid in the middle
of the program execution (so no crashes) :)
-Toke
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)
2024-10-23 12:10 ` Toke Høiland-Jørgensen
@ 2024-10-23 12:31 ` Vadim Goncharov
[not found] ` <20241023152810.42936dc4@nuclight.lan>
1 sibling, 0 replies; 5+ messages in thread
From: Vadim Goncharov @ 2024-10-23 12:31 UTC (permalink / raw)
To: xdp-newbies
On Wed, 23 Oct 2024 14:10:11 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Vadim Goncharov <vadimnuclight@gmail.com> writes:
>
> > Hello,
> >
> > Where to find exact documentation about what happens in kernel BPF
> > helpers calls with respect to locking? For example, I have
> > `bpf_map_lookup_elem()` in one thread, then work on pointer, and at
> > this time, another thread does `bpf_map_delete_elem()` for exactly
> > same key. What happens to memory the first thread still continue to
> > work on? Is it now dangling pointer to nowhere?
> >
> > In my particular case it's a bpf_timer callback who does
> > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if
> > another thread did `lookup` and works already, is it possible to do
> > so (in a performant way)?
>
> Map elements are RCU protected,
I see this phrase everywhere, but always without details, whether it's
about just map structures itself (OK, this minimum is guaranteed) or
the user data, too.
> so you already get exactly the
> behaviour you're after: if another thread deletes a map element that
> you already looked up, the element is guaranteed to stick around in
> memory until the BPF program exits.
>
> It won't be valid anymore *after* that of course, so if you're doing
> concurrent updates it's you own responsibility to sync appropriately.
> But there is no risk of the pointer suddenly being invalid in the
> middle of the program execution (so no crashes) :)
OK, no crash is half of good, thanks. But how'd I lock it from
deletion? A "concurrent updates" are usually about memory area that
still persist as a whole, but in my particular case it's a
bpf_timer callback who does bpf_map_delete_elem(). This is a
conntrack-like map - an entry contains some information about
connection, a struct bpf_timer and expire field when this entry should
be deleted due to inactivity, based on information in a connection
(thus *_LRU map types are not suitable for me). So it's possible for a
race condition here:
1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins
processing, at end of which bpf_timer_start() will be called to
reschedule removal into future.
2. At moment after lookup, timer callback fires and does
bpf_map_delete_elem() while first thread is not yet finished.
So how do I prevent connection record loss in such scenario?
--
WBR, @nuclight
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)
[not found] ` <875xphftdq.fsf@toke.dk>
@ 2024-10-24 21:18 ` Vadim Goncharov
2024-10-25 11:06 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 5+ messages in thread
From: Vadim Goncharov @ 2024-10-24 21:18 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, xdp-newbies
On Thu, 24 Oct 2024 16:49:37 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Vadim Goncharov <vadimnuclight@gmail.com> writes:
>
> > On Wed, 23 Oct 2024 14:10:11 +0200
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> >> Vadim Goncharov <vadimnuclight@gmail.com> writes:
> >>
> >> > Hello,
> >> >
> >> > Where to find exact documentation about what happens in kernel
> >> > BPF helpers calls with respect to locking? For example, I have
> >> > `bpf_map_lookup_elem()` in one thread, then work on pointer, and
> >> > at this time, another thread does `bpf_map_delete_elem()` for
> >> > exactly same key. What happens to memory the first thread still
> >> > continue to work on? Is it now dangling pointer to nowhere?
> >> >
> >> > In my particular case it's a bpf_timer callback who does
> >> > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if
> >> > another thread did `lookup` and works already, is it possible to
> >> > do so (in a performant way)?
> >>
> >> Map elements are RCU protected,
> >
> > I see this phrase everywhere, but always without details, whether
> > it's about just map structures itself (OK, this minimum is
> > guaranteed) or the user data, too.
>
> The user data in a map item is allocated together with the
> kernel-internal bits, so this goes for everything.
OK, got it.
> >> so you already get exactly the
> >> behaviour you're after: if another thread deletes a map element
> >> that you already looked up, the element is guaranteed to stick
> >> around in memory until the BPF program exits.
> >>
> >> It won't be valid anymore *after* that of course, so if you're
> >> doing concurrent updates it's you own responsibility to sync
> >> appropriately. But there is no risk of the pointer suddenly being
> >> invalid in the middle of the program execution (so no crashes) :)
> >
> > OK, no crash is half of good, thanks. But how'd I lock it from
> > deletion? A "concurrent updates" are usually about memory area that
> > still persist as a whole, but in my particular case it's a
> > bpf_timer callback who does bpf_map_delete_elem(). This is a
> > conntrack-like map - an entry contains some information about
> > connection, a struct bpf_timer and expire field when this entry
> > should be deleted due to inactivity, based on information in a
> > connection (thus *_LRU map types are not suitable for me). So it's
> > possible for a race condition here:
> >
> > 1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins
> > processing, at end of which bpf_timer_start() will be called to
> > reschedule removal into future.
> >
> > 2. At moment after lookup, timer callback fires and does
> > bpf_map_delete_elem() while first thread is not yet finished.
> >
> > So how do I prevent connection record loss in such scenario?
>
> Yeah, there is no protection against this; you will have to do your
> own locking to prevent it. Something like putting a spinlock into
> your data structure and using that to synchronise access, maybe?
Well, if I'll put bpf_spin_lock into map element, then it looks like
the following scenario is possible:
1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key)
1a. Timer callback(map, same_key) does bpf_spin_lock on same_key
2. Thread 2 does bpf_spin_lock on same_key and waits.
2a. Timer callback sees lock successful, sets DELETED flag in entry
and bpf_map_delete_elem()'s it.
3. Thread 1 unlocked and either updates map entry directly by pointer,
or reinserts new entry reinitializing timer.
Am I missing some race condition still present here? atomics or something
--
WBR, @nuclight
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)
2024-10-24 21:18 ` Vadim Goncharov
@ 2024-10-25 11:06 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 5+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-10-25 11:06 UTC (permalink / raw)
To: Vadim Goncharov, xdp-newbies
Vadim Goncharov <vadimnuclight@gmail.com> writes:
>> > OK, no crash is half of good, thanks. But how'd I lock it from
>> > deletion? A "concurrent updates" are usually about memory area that
>> > still persist as a whole, but in my particular case it's a
>> > bpf_timer callback who does bpf_map_delete_elem(). This is a
>> > conntrack-like map - an entry contains some information about
>> > connection, a struct bpf_timer and expire field when this entry
>> > should be deleted due to inactivity, based on information in a
>> > connection (thus *_LRU map types are not suitable for me). So it's
>> > possible for a race condition here:
>> >
>> > 1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins
>> > processing, at end of which bpf_timer_start() will be called to
>> > reschedule removal into future.
>> >
>> > 2. At moment after lookup, timer callback fires and does
>> > bpf_map_delete_elem() while first thread is not yet finished.
>> >
>> > So how do I prevent connection record loss in such scenario?
>>
>> Yeah, there is no protection against this; you will have to do your
>> own locking to prevent it. Something like putting a spinlock into
>> your data structure and using that to synchronise access, maybe?
>
> Well, if I'll put bpf_spin_lock into map element, then it looks like
> the following scenario is possible:
>
> 1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key)
>
> 1a. Timer callback(map, same_key) does bpf_spin_lock on same_key
>
> 2. Thread 2 does bpf_spin_lock on same_key and waits.
>
> 2a. Timer callback sees lock successful, sets DELETED flag in entry
> and bpf_map_delete_elem()'s it.
>
> 3. Thread 1 unlocked and either updates map entry directly by pointer,
> or reinserts new entry reinitializing timer.
Yeah, something like this.
> Am I missing some race condition still present here? atomics or
> something
Unfortunately, I think the delete and update can still race. Deletion is
based on the key only, there is no cmpxchg() semantics. So I don't see
any reason why this sequence of events couldn't happen:
- T2 sets the DELETED flag
- T2 releases the lock, and then gets preempted
- T1 grabs the lock, sees the deleted flag, allocates a new item and
inserts it with update()
- T2 wakes back up and does a bpf_map_delete() with the expired key,
which then ends up deleting the new entry
The only way I can think of to avoid this, is if the timer callback uses
bpf_map_lookup_and_delete_item(), then rechecks the value and re-inserts
it if the DELETED flag disappeared between operations. But this still
leaves a window where the entry doesn't exist.
Maybe someone else has a good idea for how to avoid this being racy?
-Toke
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-25 11:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 11:54 XDP/eBPF map thread safety in kernel (e.g. lookup/delete) Vadim Goncharov
2024-10-23 12:10 ` Toke Høiland-Jørgensen
2024-10-23 12:31 ` Vadim Goncharov
[not found] ` <20241023152810.42936dc4@nuclight.lan>
[not found] ` <875xphftdq.fsf@toke.dk>
2024-10-24 21:18 ` Vadim Goncharov
2024-10-25 11:06 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox