Re: [RFC PATCH] bpf: introduce new bpf map type BPF_MAP_TYPE_WILDCARD

BPF List
 help / color / mirror / Atom feed

From: sdf@google.com
To: Anton Protopopov <aspsk@isovalent.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	Martynas Pumputis <m@lambda.lt>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Eric Torng <torng@msu.edu>
Subject: Re: [RFC PATCH] bpf: introduce new bpf map type BPF_MAP_TYPE_WILDCARD
Date: Wed, 7 Sep 2022 16:09:17 -0700	[thread overview]
Message-ID: <YxkknQJC1vWmU/o9@google.com> (raw)
In-Reply-To: <20220907080140.290413-1-aspsk@isovalent.com>

[sorry for the noise, got a bounce from the list, resend with the
message trimmed]

On 09/07, Anton Protopopov wrote:
> Add a new map, BPF_MAP_TYPE_WILDCARD, which provides means to implement  
> generic
> online packet classification. Here "online" stands for "fast lookups and  
> fast
> updates", and "generic" means that a user can create maps with custom  
> lookup
> schemes—different numbers of fields and different interpretation of  
> individual
> fields (prefix bitmasks, ranges, and direct matches).

> In particular, in Cilium we have several use cases for such a map:

>    * XDP Prefilter is a simple XDP-based DDoS mitigation system provided by
>      Cilium. At the moment it only supports filtering by source CIDRs. It  
> would
>      benefit from using this new map, as it allows to utilize wildcard  
> rules
>      without big penalty comparing to one hash or LPM lookup utilized now.

>    * XDP Packet Recorder (see  
> https://lpc.events/event/11/contributions/953/)

>    * K8S and Cilium Network Policies: as of k8s 1.25 port ranges are  
> considered
>      to be a stable feature, and this map allows to implement this easily  
> (and
>      also to provide more sophisticated filtering for Cilium Network  
> Policies)

> Keys for wildcard maps are defined using the struct wildcard_key  
> structure.
> Besides the type field, it contains the data field of an arbitrary size.  
> To
> educate map about what's contained inside the data field, two additional
> structures are used. The first one, struct wildcard_desc, also of  
> arbitrary
> size, tells how many fields are contained inside data, and the struct
> wildcard_rule_desc structure defines how individual fields look like.

> Fields (rules) can be of three types:  
> BPF_WILDCARD_RULE_{PREFIX,RANGE,MATCH}.

[..]

> The PREFIX rule means that inside data we have a binary value and a binary
> (prefix) mask:

>                 size             u32
>          <----------------> <----------->
>     ... |    rule value    |   prefix   | ...

> Here rule value is a binary value, e.g., 123.324.128.0, and prefix is a  
> u32 bit
> variable; we only use lower 8 bits of it. We allow 8, 16, 32, 64, and 128  
> bit
> values for PREFIX rules.

I haven't looked at the code, so pardon stupid questions. This sounds
like optimized LPM-trie?

If not, what's the difference?

If yes, can this be special cased/optimized in the existing LPM-trie
optimization? I think we've tried in the past, mostly gave up and
"fixed" by caching the state in the socket local storage.

Also, fixed 8/16/32/64 prefixes seems like a big limitation? At least if
I were to store ipv4 from the classless (cidr) world..

> The RANGE rule is determined by two binary values: minimum and maximum,  
> treated
> as unsigned integers of appropriate size:

>                 size               size
>          <----------------> <---------------->
>     ... |  min rule value  |  max rule value  | ...

> We only allow the 8, 16, 32, and 64-bit for RANGE rules.

That seems useful. I was thinking about similar 'rangemap' where
we can effectively store and lookup [a,b] ranges. Might be useful
in firewalls for storing lists of ports efficiently. So why not
a separate map instead? Are you able to share a lot of code with
the general matching map?

> The MATCH rule is determined by one binary value, and is basically the  
> same as
> (X,sizeof(X)*8) PREFIX rule, but can be processed a bit faster:

>                 size
>          <---------------->
>     ... |    rule value    | ...

> To speed up processing all the rules, including the prefix field, should  
> be
> stored in host byte order, and all elements in network byte order. 16-byte
> fields are stored as {lo,hi}—lower eight bytes, then higher eight bytes.

> For elements only values are stored.

Can these optimization be auto-magically derived whenever PREFIX map
with the right values is created? Why put this decision on the users?

> To simplify definition of key structures, the  
> BPF_WILDCARD_DESC_{1,2,3,4,5}
> macros should be used. For example, one can define an IPv4 4-tuple keys as
> follows:

>     BPF_WILDCARD_DESC_4(
>          capture4_wcard,
>          BPF_WILDCARD_RULE_PREFIX, __u32, saddr,
>          BPF_WILDCARD_RULE_PREFIX, __u32, daddr,
>          BPF_WILDCARD_RULE_RANGE, __u16, sport,
>          BPF_WILDCARD_RULE_RANGE, __u16, dport
>     );

> This macro will define the following structure:

>     struct capture4_wcard_key {
>          __u32 type;
>          __u32 priority;
>          union {
>              struct {
>                      __u32 saddr;
>                      __u32 saddr_prefix;
>                      __u32 daddr;
>                      __u32 daddr_prefix;
>                      __u16 sport_min;
>                      __u16 sport_max;
>                      __u16 dport_min;
>                      __u16 dport_max;
>              } __packed rule;
>              struct {
>                      __u32 saddr;
>                      __u32 daddr;
>                      __u16 sport;
>                      __u16 dport;
>              } __packed;
>          };
>     } __packed;

> Here type field should contain either BPF_WILDCARD_KEY_RULE or
> BPF_WILDCARD_KEY_ELEM so that kernel can differentiate between rules and
> elements. The rule structure is used to define (and lookup) rules, the  
> unnamed
> structure can be used to specify elements when matching them with rules.

> In order to simplify definition of a corresponding struct wildcard_desc,  
> the
> BPF_WILDCARD_DESC_* macros will create yet another structure:

>     struct capture4_wcard_desc {
>          __uint(n_rules, 4);
>          struct {
>                  __uint(type, BPF_WILDCARD_RULE_PREFIX);
>                  __uint(size, sizeof(__u32));
>          } saddr;
>          struct {
>                  __uint(type, BPF_WILDCARD_RULE_PREFIX);
>                  __uint(size, sizeof(__u32));
>          } daddr;
>          struct {
>                  __uint(type, BPF_WILDCARD_RULE_RANGE);
>                  __uint(size, sizeof(__u16));
>          } sport;
>          struct {
>                  __uint(type, BPF_WILDCARD_RULE_RANGE);
>                  __uint(size, sizeof(__u16));
>          } dport;
>     };

> This structure can be used in a (BTF) map definition as follows:

>      __type(wildcard_desc, struct capture4_wcard_desc);

> Then libbpf will create a corresponding struct wildcard_desc and pass it  
> to
> kernel in bpf_attr using new map_extra_data/map_extra_data_size fields.

[..]

> The map implementation allows users to specify which algorithm to use to  
> store
> rules and lookup packets. Currently, three algorithms are supported:

>    * Brute Force (suitable for map sizes of about 32 or below elements)

>    * Tuple Merge (a variant of the Tuple Merge algorithm described in the
>      "TupleMerge: Fast Software Packet Processing for Online Packet
>      Classification" white paper, see  
> https://nonsns.github.io/paper/rossi19ton.pdf.
>      The Tuple Merge algorithm is not protected by any patents.)

>    * Static Tuple Merge (a variant of Tuple Merge where a set of lookup  
> tables
>      is directly provided by a user)

As a user that has no clue how this map works, how do I decide which
algorithm to use? What I like with the current maps is that they don't
leak too much of their inner state. These controls seems a bit low
level?

next prev parent reply	other threads:[~2022-09-07 23:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07  8:01 [RFC PATCH] bpf: introduce new bpf map type BPF_MAP_TYPE_WILDCARD Anton Protopopov
2022-09-07 23:09 ` sdf [this message]
2022-09-08  8:48   ` Anton Protopopov
2022-09-08 22:37     ` sdf
2022-09-09  9:37       ` Anton Protopopov
2022-09-09 18:55         ` sdf
2022-09-12  8:21           ` Anton Protopopov
2022-09-08  1:22 ` Daniel Xu
2022-09-08  8:03   ` Anton Protopopov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxkknQJC1vWmU/o9@google.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=aspsk@isovalent.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=m@lambda.lt \
    --cc=martin.lau@linux.dev \
    --cc=razor@blackwall.org \
    --cc=torng@msu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox