From: Vadim Goncharov <vadimnuclight@gmail.com>
To: bpf@vger.kernel.org
Subject: map records expiration problem / multi-references (conntrack)
Date: Thu, 10 Oct 2024 22:47:08 +0300 [thread overview]
Message-ID: <20241010224708.67f18726@nuclight.lan> (raw)
Hello,
I am trying to implement in XDP/eBPF a somewhat relaxed version of TCP
connection tracking (defend against DDos attacks). To do it correctly,
an expiration by different timeout values is needed - e.g. 20 seconds
for SYN state, 1 minute for established state, 10 seconds for FIN/RST.
Using *_LRU map variants is NOT an option - as it is anti-DDoS, an
attacker may evict legitimate connections by fresh ones, because those
maps do not offer explicit control on expiration policy.
In a classic programming environment, it's simple: a conntrack record,
in addition to `when_expire_unixtime` field, would have a LIST_ENTRY
and whenever time changes, be relinked from a previous time's list to
new list, under locks held on record and both list's heads. Then a
per-second timer will cleanup entire lists whose time is in past.
But not in XDP/eBPF. I've encountered multiple problems in tries of
different ideas.
First, let's assume 100 million conntrack records. We can't have
a `bpf_timer` instance in every record - it would not scale to 100M.
So still need one timer as in classic variant.
And there are no linked lists in eBPF, and no pointers from
multiplemaps to same object, so I came to idea to (ab)use LPM_TRIE as an
"index" by time and 4-tuple with value be bitset of in which main maps
to expire records (TCP, UDP, ...). Then I found that:
* can't `bpf_spin_lock` for several maps, and values could be modified
by several threads in parallel (modify old and new LPM values)
* `bpf_map_get_next_key()` is unavailable to kernel! So single BPF timer
callback can't get just needed records in a loop.
* kernel helper `bpf_for_each_map_elem` is unavailable for LPM_TRIE,
only for array/hash - very strange, as availability of get_next_key
implementation makes it trivial to implement for_each for *any* map
type.
So this leads to *userland* must clean up those records, but for
syscall this will lead to much worse performance; and
`BPF_MAP_TYPE_RINGBUF` is also of no help here...
The question is, how do I implement expiration properly in eBPF/XDP?
Anything I missed?..
--
WBR, @nuclight
reply other threads:[~2024-10-10 19:47 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241010224708.67f18726@nuclight.lan \
--to=vadimnuclight@gmail.com \
--cc=bpf@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox