From: Stanislav Fomichev <sdf@google.com>
To: netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
simon.horman@netronome.com, willemb@google.com,
peterpenkov96@gmail.com, Stanislav Fomichev <sdf@google.com>
Subject: [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment
Date: Mon, 1 Apr 2019 13:57:34 -0700 [thread overview]
Message-ID: <20190401205734.4400-6-sdf@google.com> (raw)
In-Reply-To: <20190401205734.4400-1-sdf@google.com>
Short doc on what BPF flow dissector should expect in the input
__sk_buff and flow_keys.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../networking/bpf_flow_dissector.txt | 115 ++++++++++++++++++
1 file changed, 115 insertions(+)
create mode 100644 Documentation/networking/bpf_flow_dissector.txt
diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
new file mode 100644
index 000000000000..513be8e20afb
--- /dev/null
+++ b/Documentation/networking/bpf_flow_dissector.txt
@@ -0,0 +1,115 @@
+==================
+BPF Flow Dissector
+==================
+
+Overview
+========
+
+Flow dissector is a routine that parses metadata out of the packets. It's
+used in the various places in the networking subsystem (RFS, flow hash, etc).
+
+BPF flow dissector is an attempt to reimplement C-based flow dissector logic
+in BPF to gain all the benefits of BPF verifier (namely, limits on the
+number of instructions and tail calls).
+
+API
+===
+
+BPF flow dissector programs operate on an __sk_buff. However, only the
+limited set of fields is allowed: data, data_end and flow_keys. flow_keys
+is 'struct bpf_flow_keys' and contains flow dissector input and
+output arguments.
+
+The inputs are:
+ * nhoff - initial offset of the networking header
+ * thoff - initial offset of the transport header, initialized to nhoff
+ * n_proto - L3 protocol type, parsed out of L2 header
+
+Flow dissector BPF program should fill out the rest of the 'struct
+bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
+adjusted accordingly.
+
+The return code of the BPF program is either BPF_OK to indicate successful
+dissection, or BPF_DROP to indicate parsing error.
+
+__sk_buff->data
+===============
+
+In the VLAN-less case, this is what the initial state of the BPF flow
+dissector looks like:
++------+------+------------+-----------+
+| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
++------+------+------------+-----------+
+ ^
+ |
+ +-- flow dissector starts here
+
+skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
+flow_keys->thoff = nhoff
+flow_keys->n_proto = ETHER_TYPE
+
+
+In case of VLAN, flow dissector can be called with the two different states.
+
+Pre-VLAN parsing:
++------+------+------+-----+-----------+-----------+
+| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
++------+------+------+-----+-----------+-----------+
+ ^
+ |
+ +-- flow dissector starts here
+
+skb->data + flow_keys->nhoff point the to first byte of TCI.
+flow_keys->thoff = nhoff
+flow_keys->n_proto = TPID
+
+Please note that TPID can be 802.1AD and, hence, BPF program would
+have to parse VLAN information twice for double tagged packets.
+
+
+Post-VLAN parsing:
++------+------+------+-----+-----------+-----------+
+| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
++------+------+------+-----+-----------+-----------+
+ ^
+ |
+ +-- flow dissector starts here
+
+skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
+flow_keys->thoff = nhoff
+flow_keys->n_proto = ETHER_TYPE
+
+In this case VLAN information has been processed before the flow dissector
+and BPF flow dissector is not required to handle it.
+
+
+The takeaway here is as follows: BPF flow dissector program can be called with
+the optional VLAN header and should gracefully handle both cases: when single
+or double VLAN is present and when it is not present. The same program
+can be called for both cases and would have to be written carefully to
+handle both cases.
+
+
+Reference Implementation
+========================
+
+See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
+implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
+the loader. bpftool can be used to load BPF flow dissector program as well.
+
+The reference implementation is organized as follows:
+* jmp_table map that contains sub-programs for each supported L3 protocol
+* _dissect routine - entry point; it does input n_proto parsing and does
+ bpf_tail_call to the appropriate L3 handler
+
+Since BPF at this point doesn't support looping (or any jumping back),
+jmp_table is used instead to handle multiple levels of encapsulation (and
+IPv6 options).
+
+
+Current Limitations
+===================
+BPF flow dissector doesn't support exporting all the metadata that in-kernel
+C-based implementation can export. Notable example is single VLAN (802.1Q)
+and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
+for a set of information that's currently can be exported from the BPF context.
--
2.21.0.392.gf8f6787159e-goog
next prev parent reply other threads:[~2019-04-01 20:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-01 20:57 [PATCH bpf 0/5] flow_dissector: lay groundwork for calling BPF hook from eth_get_headlen Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 1/5] selftests/bpf: fix vlan handling in flow dissector program Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 2/5] net/flow_dissector: pass flow_keys->n_proto to BPF programs Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 3/5] flow_dissector: fix clamping of BPF flow_keys for non-zero nhoff Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 4/5] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
2019-04-01 20:57 ` Stanislav Fomichev [this message]
2019-04-02 20:54 ` [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment Petar Penkov
2019-04-02 21:00 ` Stanislav Fomichev
2019-04-03 18:34 ` Jesper Dangaard Brouer
2019-04-03 18:50 ` Stanislav Fomichev
2019-04-02 20:17 ` [PATCH bpf 0/5] flow_dissector: lay groundwork for calling BPF hook from eth_get_headlen Willem de Bruijn
2019-04-02 20:52 ` Petar Penkov
2019-04-03 14:54 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190401205734.4400-6-sdf@google.com \
--to=sdf@google.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=peterpenkov96@gmail.com \
--cc=simon.horman@netronome.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).