From: Brenden Blanco <bblanco@plumgrid.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
davem@davemloft.net, netdev@vger.kernel.org, tom@herbertland.com,
alexei.starovoitov@gmail.com, ogerlitz@mellanox.com,
eric.dumazet@gmail.com, ecree@solarflare.com,
john.fastabend@gmail.com, tgraf@suug.ch,
johannes@sipsolutions.net, eranlinuxmellanox@gmail.com,
lorenzo@google.com
Subject: Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
Date: Fri, 8 Apr 2016 09:48:55 -0700 [thread overview]
Message-ID: <20160408164854.GB28353@gmail.com> (raw)
In-Reply-To: <5707916A.2030305@iogearbox.net>
On Fri, Apr 08, 2016 at 01:09:30PM +0200, Daniel Borkmann wrote:
> On 04/08/2016 12:36 PM, Jesper Dangaard Brouer wrote:
> >On Thu, 7 Apr 2016 21:48:46 -0700
> >Brenden Blanco <bblanco@plumgrid.com> wrote:
> >
> >>Add a new bpf prog type that is intended to run in early stages of the
> >>packet rx path. Only minimal packet metadata will be available, hence a
> >>new context type, struct bpf_phys_dev_md, is exposed to userspace. So
> >>far only expose the readable packet length, and only in read mode.
> >>
> >>The PHYS_DEV name is chosen to represent that the program is meant only
> >>for physical adapters, rather than all netdevs.
> >>
> >>While the user visible struct is new, the underlying context must be
> >>implemented as a minimal skb in order for the packet load_* instructions
> >>to work. The skb filled in by the driver must have skb->len, skb->head,
> >>and skb->data set, and skb->data_len == 0.
> >>
> >>Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> >>---
> >> include/uapi/linux/bpf.h | 14 ++++++++++
> >> kernel/bpf/verifier.c | 1 +
> >> net/core/filter.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 83 insertions(+)
> >>
> >>diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>index 70eda5a..3018d83 100644
> >>--- a/include/uapi/linux/bpf.h
> >>+++ b/include/uapi/linux/bpf.h
> >>@@ -93,6 +93,7 @@ enum bpf_prog_type {
> >> BPF_PROG_TYPE_SCHED_CLS,
> >> BPF_PROG_TYPE_SCHED_ACT,
> >> BPF_PROG_TYPE_TRACEPOINT,
> >>+ BPF_PROG_TYPE_PHYS_DEV,
> >> };
> >>
> >> #define BPF_PSEUDO_MAP_FD 1
> >>@@ -368,6 +369,19 @@ struct __sk_buff {
> >> __u32 tc_classid;
> >> };
> >>
> >>+/* user return codes for PHYS_DEV prog type */
> >>+enum bpf_phys_dev_action {
> >>+ BPF_PHYS_DEV_DROP,
> >>+ BPF_PHYS_DEV_OK,
> >>+};
> >
> >I can imagine these extra return codes:
> >
> > BPF_PHYS_DEV_MODIFIED, /* Packet page/payload modified */
> > BPF_PHYS_DEV_STOLEN, /* E.g. forward use-case */
> > BPF_PHYS_DEV_SHARED, /* Queue for async processing, e.g. tcpdump use-case */
> >
> >The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> >which we can look at when we get that far...
>
> I'd probably let the tcpdump case be handled by the rest of the stack.
> Forwarding could be to some txqueue or perhaps directly to a virtual net
> device e.g. the veth end sitting in a container (where fake skb would
> need to be promoted to a real one). Or, perhaps instead of veth end,
> directly demuxed into a target socket queue in that container ...
> Alternatively for tcpdump use-case, you could also do sampling on the
> bpf_phy_dev with eBPF maps.
+1, there is plenty of infrastructure to deal with this already.
>
> >For the "MODIFIED" case, people caring about checksumming, might want
> >to voice their concern if we want additional info or return codes,
> >indicating if software need to recalc CSUM (or e.g. if only IP-pseudo
> >hdr was modified).
> >
> >>+/* user accessible metadata for PHYS_DEV packet hook
> >>+ * new fields must be added to the end of this structure
> >>+ */
> >>+struct bpf_phys_dev_md {
> >>+ __u32 len;
> >>+};
> >
> >This is userspace visible. I can easily imagine this structure will get
> >extended. How does a userspace eBPF program know that the struct got
> >extended? (bet you got some scheme, normally I would add a "version" as
> >first elem).
Since fields are only ever added to the end, programs that access beyond
the struct as understood by the running kernel will be rejected. The
struct ordering is a hard requirement. We've gone through quite a few
upgrades of this style of struct and not had any issues that I am aware.
> >
> >BTW, how and where is this "struct bpf_phys_dev_md" allocated?
>
> It isn't, see bpf_phys_dev_convert_ctx_access(). At load time the verifier
> will convert/rewrite this based on offsetof() to a real access of the per
> cpu sk_buff, that's the only purpose.
>
> Cheers,
> Daniel
next prev parent reply other threads:[~2016-04-08 16:48 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-08 4:48 [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter Brenden Blanco
2016-04-08 4:48 ` [RFC PATCH v2 2/5] net: add ndo to set bpf prog in adapter rx Brenden Blanco
2016-04-08 9:38 ` Jesper Dangaard Brouer
2016-04-08 16:39 ` Brenden Blanco
2016-04-08 4:48 ` [RFC PATCH v2 3/5] rtnl: add option for setting link bpf prog Brenden Blanco
2016-04-08 4:48 ` [RFC PATCH v2 4/5] mlx4: add support for fast rx drop bpf program Brenden Blanco
2016-04-08 11:41 ` Jesper Dangaard Brouer
2016-04-08 17:04 ` Brenden Blanco
2016-04-08 4:48 ` [RFC PATCH v2 5/5] Add sample for adding simple drop program to link Brenden Blanco
2016-04-09 14:48 ` Jamal Hadi Salim
2016-04-09 16:43 ` Brenden Blanco
2016-04-09 17:27 ` Jamal Hadi Salim
2016-04-10 18:38 ` Brenden Blanco
2016-04-13 10:40 ` Jamal Hadi Salim
2016-04-08 10:36 ` [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter Jesper Dangaard Brouer
2016-04-08 11:09 ` Daniel Borkmann
2016-04-08 16:48 ` Brenden Blanco [this message]
2016-04-08 12:33 ` Jesper Dangaard Brouer
2016-04-08 17:02 ` Brenden Blanco
2016-04-08 19:05 ` Jesper Dangaard Brouer
2016-04-08 17:26 ` Alexei Starovoitov
2016-04-08 20:08 ` Jesper Dangaard Brouer
2016-04-08 21:34 ` Alexei Starovoitov
2016-04-09 11:29 ` Tom Herbert
2016-04-09 15:29 ` Jamal Hadi Salim
2016-04-09 17:26 ` Alexei Starovoitov
2016-04-10 7:55 ` Thomas Graf
2016-04-10 16:53 ` Tom Herbert
2016-04-10 18:09 ` Jamal Hadi Salim
2016-04-10 13:07 ` Jamal Hadi Salim
2016-04-09 11:17 ` Tom Herbert
2016-04-09 12:27 ` Jesper Dangaard Brouer
2016-04-09 13:17 ` Tom Herbert
2016-04-09 17:00 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160408164854.GB28353@gmail.com \
--to=bblanco@plumgrid.com \
--cc=alexei.starovoitov@gmail.com \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=ecree@solarflare.com \
--cc=eranlinuxmellanox@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=johannes@sipsolutions.net \
--cc=john.fastabend@gmail.com \
--cc=lorenzo@google.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=tgraf@suug.ch \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).