From: Stanislav Fomichev <sdf@fomichev.me>
To: Petar Penkov <peterpenkov96@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>,
Network Development <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, David Miller <davem@davemloft.net>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Simon Horman <simon.horman@netronome.com>,
Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment
Date: Tue, 2 Apr 2019 14:00:46 -0700 [thread overview]
Message-ID: <20190402210046.GI7431@mini-arch.hsd1.ca.comcast.net> (raw)
In-Reply-To: <CA+DcSEgPWDBW1nnweb8pCeOjeLGj7LRgTei+YO-+JKY2VZDz1w@mail.gmail.com>
On 04/02, Petar Penkov wrote:
> On Mon, Apr 1, 2019 at 1:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Short doc on what BPF flow dissector should expect in the input
> > __sk_buff and flow_keys.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> > .../networking/bpf_flow_dissector.txt | 115 ++++++++++++++++++
> > 1 file changed, 115 insertions(+)
> > create mode 100644 Documentation/networking/bpf_flow_dissector.txt
> >
> > diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> > new file mode 100644
> > index 000000000000..513be8e20afb
> > --- /dev/null
> > +++ b/Documentation/networking/bpf_flow_dissector.txt
> > @@ -0,0 +1,115 @@
> > +==================
> > +BPF Flow Dissector
> > +==================
> > +
> > +Overview
> > +========
> > +
> > +Flow dissector is a routine that parses metadata out of the packets. It's
> > +used in the various places in the networking subsystem (RFS, flow hash, etc).
> > +
> > +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> > +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> > +number of instructions and tail calls).
> > +
> > +API
> > +===
> > +
> > +BPF flow dissector programs operate on an __sk_buff. However, only the
> > +limited set of fields is allowed: data, data_end and flow_keys. flow_keys
> > +is 'struct bpf_flow_keys' and contains flow dissector input and
> > +output arguments.
> > +
> > +The inputs are:
> > + * nhoff - initial offset of the networking header
> > + * thoff - initial offset of the transport header, initialized to nhoff
> > + * n_proto - L3 protocol type, parsed out of L2 header
> > +
> > +Flow dissector BPF program should fill out the rest of the 'struct
> > +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
> > +adjusted accordingly.
> > +
> > +The return code of the BPF program is either BPF_OK to indicate successful
> > +dissection, or BPF_DROP to indicate parsing error.
> I don't think this is actually enforced. I believe the current code
> just checks if the status is BPF_OK or not, rather than BPF_OK,
> BPF_DROP, or neither.
It's not universally enforced, but some codepaths in the kernel look at
the returned value (e.g. skb_get_poff and eth_get_headlen), so it's
better to set the expectations :-)
> > +
> > +__sk_buff->data
> > +===============
> > +
> > +In the VLAN-less case, this is what the initial state of the BPF flow
> > +dissector looks like:
> > ++------+------+------------+-----------+
> > +| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
> > ++------+------+------------+-----------+
> > + ^
> > + |
> > + +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = ETHER_TYPE
> > +
> > +
> > +In case of VLAN, flow dissector can be called with the two different states.
> > +
> > +Pre-VLAN parsing:
> > ++------+------+------+-----+-----------+-----------+
> > +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> > ++------+------+------+-----+-----------+-----------+
> > + ^
> > + |
> > + +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point the to first byte of TCI.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = TPID
> > +
> > +Please note that TPID can be 802.1AD and, hence, BPF program would
> > +have to parse VLAN information twice for double tagged packets.
> > +
> > +
> > +Post-VLAN parsing:
> > ++------+------+------+-----+-----------+-----------+
> > +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> > ++------+------+------+-----+-----------+-----------+
> > + ^
> > + |
> > + +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = ETHER_TYPE
> > +
> > +In this case VLAN information has been processed before the flow dissector
> > +and BPF flow dissector is not required to handle it.
> > +
> > +
> > +The takeaway here is as follows: BPF flow dissector program can be called with
> > +the optional VLAN header and should gracefully handle both cases: when single
> > +or double VLAN is present and when it is not present. The same program
> > +can be called for both cases and would have to be written carefully to
> > +handle both cases.
> > +
> > +
> > +Reference Implementation
> > +========================
> > +
> > +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
> > +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
> > +the loader. bpftool can be used to load BPF flow dissector program as well.
> > +
> > +The reference implementation is organized as follows:
> > +* jmp_table map that contains sub-programs for each supported L3 protocol
> > +* _dissect routine - entry point; it does input n_proto parsing and does
> > + bpf_tail_call to the appropriate L3 handler
> > +
> > +Since BPF at this point doesn't support looping (or any jumping back),
> > +jmp_table is used instead to handle multiple levels of encapsulation (and
> > +IPv6 options).
> > +
> > +
> > +Current Limitations
> > +===================
> > +BPF flow dissector doesn't support exporting all the metadata that in-kernel
> > +C-based implementation can export. Notable example is single VLAN (802.1Q)
> > +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
> > +for a set of information that's currently can be exported from the BPF context.
> > --
> > 2.21.0.392.gf8f6787159e-goog
> >
next prev parent reply other threads:[~2019-04-02 21:00 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-01 20:57 [PATCH bpf 0/5] flow_dissector: lay groundwork for calling BPF hook from eth_get_headlen Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 1/5] selftests/bpf: fix vlan handling in flow dissector program Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 2/5] net/flow_dissector: pass flow_keys->n_proto to BPF programs Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 3/5] flow_dissector: fix clamping of BPF flow_keys for non-zero nhoff Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 4/5] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
2019-04-01 20:57 ` [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment Stanislav Fomichev
2019-04-02 20:54 ` Petar Penkov
2019-04-02 21:00 ` Stanislav Fomichev [this message]
2019-04-03 18:34 ` Jesper Dangaard Brouer
2019-04-03 18:50 ` Stanislav Fomichev
2019-04-02 20:17 ` [PATCH bpf 0/5] flow_dissector: lay groundwork for calling BPF hook from eth_get_headlen Willem de Bruijn
2019-04-02 20:52 ` Petar Penkov
2019-04-03 14:54 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190402210046.GI7431@mini-arch.hsd1.ca.comcast.net \
--to=sdf@fomichev.me \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=peterpenkov96@gmail.com \
--cc=sdf@google.com \
--cc=simon.horman@netronome.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).