From: Jesper Dangaard Brouer <brouer@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
Daniel Borkmann <borkmann@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
maze@google.com, lmb@cloudflare.com, shaun@tigera.io,
Lorenzo Bianconi <lorenzo@kernel.org>,
marek@cloudflare.com, Jakub Kicinski <kuba@kernel.org>,
eyal.birger@gmail.com, brouer@redhat.com
Subject: Re: [PATCH bpf-next V5 4/5] bpf: drop MTU check when doing TC-BPF redirect to ingress
Date: Mon, 2 Nov 2020 13:46:58 +0100 [thread overview]
Message-ID: <20201102134658.081fd974@carbon> (raw)
In-Reply-To: <5f9c7935c6991_16d420838@john-XPS-13-9370.notmuch>
On Fri, 30 Oct 2020 13:36:05 -0700
John Fastabend <john.fastabend@gmail.com> wrote:
> Jesper Dangaard Brouer wrote:
> > The use-case for dropping the MTU check when TC-BPF does redirect to
> > ingress, is described by Eyal Birger in email[0]. The summary is the
> > ability to increase packet size (e.g. with IPv6 headers for NAT64) and
> > ingress redirect packet and let normal netstack fragment packet as needed.
> >
> > [0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/
> >
> > V4:
> > - Keep net_device "up" (IFF_UP) check.
> > - Adjustment to handle bpf_redirect_peer() helper
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > include/linux/netdevice.h | 31 +++++++++++++++++++++++++++++--
> > net/core/dev.c | 19 ++-----------------
> > net/core/filter.c | 14 +++++++++++---
> > 3 files changed, 42 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 964b494b0e8d..bd02ddab8dfe 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3891,11 +3891,38 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
> > bool is_skb_forwardable(const struct net_device *dev,
> > const struct sk_buff *skb);
> >
> > +static __always_inline bool __is_skb_forwardable(const struct net_device *dev,
> > + const struct sk_buff *skb,
> > + const bool check_mtu)
>
> It looks like if check_mtu=false then this is just an interface up check.
> Can we leave is_skb_forwardable logic alone and just change the spots where
> this is called with false to something with a name that describes the check,
> such as is_dev_up(dev). I think it will make this change smaller and the
> code easier to read. Did I miss something?
People should realized that this is constructed such, the compiler will
compile-time remove the actual argument (the const bool check_mtu).
And this propagates also to ____dev_forward_skb() where the call places
are also inlined.
Yes, this (check_mtu=false) is basically an interface up check, but the
only place it is used directly is in the ndo_get_peer_dev() case, and
reading the code I find it more readable that is says
__is_skb_forwardable because this is used as part of a forwarding step,
and is_dev_up() doesn't convey the intent in this use-case.
> > +{
> > + const u32 vlan_hdr_len = 4; /* VLAN_HLEN */
> > + unsigned int len;
> > +
> > + if (!(dev->flags & IFF_UP))
> > + return false;
> > +
> > + if (!check_mtu)
> > + return true;
> > +
> > + len = dev->mtu + dev->hard_header_len + vlan_hdr_len;
> > + if (skb->len <= len)
> > + return true;
> > +
> > + /* if TSO is enabled, we don't care about the length as the packet
> > + * could be forwarded without being segmented before
> > + */
> > + if (skb_is_gso(skb))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > static __always_inline int ____dev_forward_skb(struct net_device *dev,
> > - struct sk_buff *skb)
> > + struct sk_buff *skb,
> > + const bool check_mtu)
> > {
>
> I guess you will get some duplication here if you have a dev_forward_skb()
> and a dev_forward_skb_nocheck() or something. Take it or leave it. I know
> I've added my share of bool swivel bits like this, but better to avoid
> it if possible IMO.
As I wrote the bool will actually get compile-time removed, so I don't
see that as problematic. And I avoided replicating the code in more
places.
The problematic part (which you didn't comment) on is this:
On Fri, 30 Oct 2020 17:51:07 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bd4a416bd9ad..71b78b8d443c 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2083,13 +2083,21 @@ static const struct bpf_func_proto bpf_csum_level_proto = {
>
> static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
> {
> - return dev_forward_skb(dev, skb);
> + int ret = ____dev_forward_skb(dev, skb, false);
> +
> + if (likely(!ret)) {
> + skb->protocol = eth_type_trans(skb, dev);
> + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> + ret = netif_rx(skb);
> + }
> +
> + return ret;
> }
I'm replicating two lines from dev_forward_skb(), but I couldn't find a
way to avoid this, without causing larger code changes (and slower code).
> Other than style aspects it looks correct to me.
>
> > if (skb_orphan_frags(skb, GFP_ATOMIC) ||
> > - unlikely(!is_skb_forwardable(dev, skb))) {
> > + unlikely(!__is_skb_forwardable(dev, skb, check_mtu))) {
> > atomic_long_inc(&dev->rx_dropped);
> > kfree_skb(skb);
> > return NET_RX_DROP;
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 9499a414d67e..445ccf92c149 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2188,28 +2188,13 @@ static inline void net_timestamp_set(struct sk_buff *skb)
> >
>
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2020-11-02 12:47 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-30 16:50 [PATCH bpf-next V5 0/5] Subj: bpf: New approach for BPF MTU handling Jesper Dangaard Brouer
2020-10-30 16:50 ` [PATCH bpf-next V5 1/5] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-30 16:50 ` [PATCH bpf-next V5 2/5] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-30 19:40 ` John Fastabend
2020-11-02 9:28 ` Jesper Dangaard Brouer
2020-11-02 15:59 ` David Ahern
2020-11-02 16:18 ` John Fastabend
2020-10-31 15:52 ` David Ahern
2020-10-30 16:51 ` [PATCH bpf-next V5 3/5] bpf: add BPF-helper for MTU checking Jesper Dangaard Brouer
2020-10-30 20:23 ` John Fastabend
2020-11-02 11:15 ` Jesper Dangaard Brouer
2020-11-02 18:04 ` John Fastabend
2020-11-02 20:10 ` Jesper Dangaard Brouer
2020-11-12 12:58 ` Jesper Dangaard Brouer
2020-10-30 16:51 ` [PATCH bpf-next V5 4/5] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
2020-10-30 20:36 ` John Fastabend
2020-11-02 12:46 ` Jesper Dangaard Brouer [this message]
2020-11-02 16:23 ` John Fastabend
2020-10-30 16:51 ` [PATCH bpf-next V5 5/5] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201102134658.081fd974@carbon \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=bpf@vger.kernel.org \
--cc=eyal.birger@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=lmb@cloudflare.com \
--cc=lorenzo@kernel.org \
--cc=marek@cloudflare.com \
--cc=maze@google.com \
--cc=netdev@vger.kernel.org \
--cc=shaun@tigera.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.