Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vadim Goncharov <vadimnuclight@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>, xdp-newbies@vger.kernel.org
Subject: Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)
Date: Fri, 25 Oct 2024 00:18:19 +0300	[thread overview]
Message-ID: <20241025001819.2475ec77@nuclight.lan> (raw)
In-Reply-To: <875xphftdq.fsf@toke.dk>

On Thu, 24 Oct 2024 16:49:37 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Vadim Goncharov <vadimnuclight@gmail.com> writes:
> 
> > On Wed, 23 Oct 2024 14:10:11 +0200
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >  
> >> Vadim Goncharov <vadimnuclight@gmail.com> writes:
> >>   
> >> > Hello,
> >> >
> >> > Where to find exact documentation about what happens in kernel
> >> > BPF helpers calls with respect to locking? For example, I have
> >> > `bpf_map_lookup_elem()` in one thread, then work on pointer, and
> >> > at this time, another thread does `bpf_map_delete_elem()` for
> >> > exactly same key. What happens to memory the first thread still
> >> > continue to work on? Is it now dangling pointer to nowhere?
> >> >
> >> > In my particular case it's a bpf_timer callback who does
> >> > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if
> >> > another thread did `lookup` and works already, is it possible to
> >> > do so (in a performant way)?    
> >> 
> >> Map elements are RCU protected,   
> >
> > I see this phrase everywhere, but always without details, whether
> > it's about just map structures itself (OK, this minimum is
> > guaranteed) or the user data, too.  
> 
> The user data in a map item is allocated together with the
> kernel-internal bits, so this goes for everything.

OK, got it.

> >> so you already get exactly the
> >> behaviour you're after: if another thread deletes a map element
> >> that you already looked up, the element is guaranteed to stick
> >> around in memory until the BPF program exits.
> >> 
> >> It won't be valid anymore *after* that of course, so if you're
> >> doing concurrent updates it's you own responsibility to sync
> >> appropriately. But there is no risk of the pointer suddenly being
> >> invalid in the middle of the program execution (so no crashes) :)  
> >
> > OK, no crash is half of good, thanks. But how'd I lock it from
> > deletion? A "concurrent updates" are usually about memory area that
> > still persist as a whole, but in my particular case it's a
> > bpf_timer callback who does bpf_map_delete_elem(). This is a
> > conntrack-like map - an entry contains some information about
> > connection, a struct bpf_timer and expire field when this entry
> > should be deleted due to inactivity, based on information in a
> > connection (thus *_LRU map types are not suitable for me). So it's
> > possible for a race condition here:
> >
> > 1.  Thread 1 receives packet, does bpf_map_lookup_elem() and begins
> >     processing, at end of which bpf_timer_start() will be called to
> >     reschedule removal into future.
> >
> > 2.  At moment after lookup, timer callback fires and does
> >     bpf_map_delete_elem() while first thread is not yet finished.
> >
> > So how do I prevent connection record loss in such scenario?  
> 
> Yeah, there is no protection against this; you will have to do your
> own locking to prevent it. Something like putting a spinlock into
> your data structure and using that to synchronise access, maybe?

Well, if I'll put bpf_spin_lock into map element, then it looks like
the following scenario is possible:

1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key)

     1a. Timer callback(map, same_key) does bpf_spin_lock on same_key

2. Thread 2 does bpf_spin_lock on same_key and waits.

     2a. Timer callback sees lock successful, sets DELETED flag in entry
         and bpf_map_delete_elem()'s it.

3. Thread 1 unlocked and either updates map entry directly by pointer,
    or reinserts new entry reinitializing timer.

Am I missing some race condition still present here? atomics or something

-- 
WBR, @nuclight

next prev parent reply	other threads:[~2024-10-24 21:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23 11:54 XDP/eBPF map thread safety in kernel (e.g. lookup/delete) Vadim Goncharov
2024-10-23 12:10 ` Toke Høiland-Jørgensen
2024-10-23 12:31   ` Vadim Goncharov
     [not found]   ` <20241023152810.42936dc4@nuclight.lan>
     [not found]     ` <875xphftdq.fsf@toke.dk>
2024-10-24 21:18       ` Vadim Goncharov [this message]
2024-10-25 11:06         ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241025001819.2475ec77@nuclight.lan \
    --to=vadimnuclight@gmail.com \
    --cc=toke@redhat.com \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.