From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Saeed Mahameed <saeedm@mellanox.com>
Cc: "toke@redhat.com" <toke@redhat.com>,
"gtzalik@amazon.com" <gtzalik@amazon.com>,
"ilias.apalodimas@linaro.org" <ilias.apalodimas@linaro.org>,
"borkmann@iogearbox.net" <borkmann@iogearbox.net>,
"alexander.duyck@gmail.com" <alexander.duyck@gmail.com>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"akiyano@amazon.com" <akiyano@amazon.com>,
"zorik@amazon.com" <zorik@amazon.com>,
"alexei.starovoitov@gmail.com" <alexei.starovoitov@gmail.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"jeffrey.t.kirsher@intel.com" <jeffrey.t.kirsher@intel.com>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"dsahern@gmail.com" <dsahern@gmail.com>,
"lorenzo@kernel.org" <lorenzo@kernel.org>,
"willemdebruijn.kernel@gmail.com"
<willemdebruijn.kernel@gmail.com>,
brouer@redhat.com,
Steffen Klassert <steffen.klassert@secunet.com>,
Willy Tarreau <w@1wt.eu>
Subject: Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
Date: Tue, 14 Apr 2020 14:46:37 +0200 [thread overview]
Message-ID: <20200414144637.0dafdda5@carbon> (raw)
In-Reply-To: <ed0ce4d76e77b23aa3edcd821d5a4867e8bb27b1.camel@mellanox.com>
On Thu, 9 Apr 2020 03:31:14 +0000
Saeed Mahameed <saeedm@mellanox.com> wrote:
> On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote:
> > Finally, after all drivers have a frame size, allow BPF-helper
> > bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
> >
>
> can you provide a list of usecases for why tail extension is necessary
> ?
Use-cases:
(1) IPsec / XFRM needs a tail extend[1][2].
(2) DNS-cache replies in XDP.
(3) HA-proxy ALOHA would need it to convert to XDP.
> and what do you have in mind as immediate use of bpf_xdp_adjust_tail()
> ?
I guess Steffen Klassert's ipsec use-case(1) it the most immediate.
[1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf
[2] http://vger.kernel.org/netconf2019.html
> both cover letter and commit messages didn't list any actual use case..
Sorry about that.
> > Remember that helper/macro xdp_data_hard_end have reserved some
> > tailroom. Thus, this helper makes sure that the BPF-prog don't have
> > access to this tailroom area.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > include/uapi/linux/bpf.h | 4 ++--
> > net/core/filter.c | 18 ++++++++++++++++--
> > 2 files changed, 18 insertions(+), 4 deletions(-)
> >
[... cut ...]
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7628b947dbc3..4d58a147eed0 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto
> > bpf_xdp_adjust_head_proto = {
> >
> > BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> > {
> > + void *data_hard_end = xdp_data_hard_end(xdp);
> > void *data_end = xdp->data_end + offset;
> >
> > - /* only shrinking is allowed for now. */
> > - if (unlikely(offset >= 0))
> > + /* Notice that xdp_data_hard_end have reserved some tailroom */
> > + if (unlikely(data_end > data_hard_end))
> > return -EINVAL;
> >
>
> i don't know if i like this approach for couple of reasons.
>
> 1. drivers will provide arbitrary frames_sz, which is normally larger
> than mtu, and could be a full page size, for XDP_TX action this can be
> problematic if xdp progs will allow oversized packets to get caught at
> the driver level..
We already check if MTU is exceeded for a specific device when we
redirect into this, see helper xdp_ok_fwd_dev(). For the XDP_TX case,
I guess some drivers bypass that check, which should be fixed. The
XDP_TX case is IMHO a place where we allow drivers do special
optimizations, thus drivers can choose to do something faster than
calling generic helper xdp_ok_fwd_dev().
>
> 2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb shinfo
> and it introduces a reverse dependency between xdp buff and skbuff
>
(I'll address this in another mail)
> both of the above can be solved if the drivers provided the max
> allowed frame size, already accounting for mtu and shinfo when setting
> xdp_buff.frame_sz at the driver level.
It seems we look at the problem from two different angles. You have
the drivers perspective, while I have the network stacks perspective
(the XDP_PASS case). The mlx5 driver treats XDP as a special case, by
hiding or confining xdp_buff to functions fairly deep in the
call-stack. My goal is different (moving SKB out of drivers), I see
the xdp_buff/xdp_frame as the main packet object in the drivers, that
gets send up the network stack (after converting to xdp_frame) and
converted into SKB in core-code (yes, there is a long road-ahead). The
larger tailroom can be used by netstack in SKB-coalesce.
The next step is making xdp_buff (and xdp_frame) multi-buffer aware.
This is why I reserve room for skb_shared_info. I have considered
reducing the size of xdp_buff.frame_sz, with sizeof(skb_shared_info),
but it got kind of ugly having this in each drivers.
I also considered having drivers setup a direct pointer to
{skb,xdp}_shared_info section in xdp_buff, because will make it more
flexible (for what I imagined Alexander Duyck want). (But we can still
do/change that later, once we start work in multi-buffer code)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2020-04-14 12:47 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
2020-04-08 17:53 ` Jakub Kicinski
2020-04-09 0:48 ` Saeed Mahameed
2020-04-09 1:13 ` Jakub Kicinski
2020-04-09 23:07 ` Saeed Mahameed
2020-04-09 23:27 ` Jakub Kicinski
2020-04-14 14:16 ` Jesper Dangaard Brouer
2020-04-09 0:50 ` Saeed Mahameed
2020-04-16 13:02 ` Jesper Dangaard Brouer
2020-04-17 23:09 ` Saeed Mahameed
2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
2020-04-08 13:09 ` Lorenzo Bianconi
2020-04-14 8:07 ` Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 14:56 ` Haiyang Zhang
2020-04-08 11:51 ` [PATCH RFC v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
2020-04-22 8:39 ` Jubran, Samih
2020-04-08 11:51 ` [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
2020-04-08 12:57 ` Tariq Toukan
2020-04-14 8:19 ` Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
2020-04-08 12:52 ` Tariq Toukan
2020-04-16 12:04 ` Jesper Dangaard Brouer
2020-04-09 9:28 ` Maxim Mikityanskiy
2020-04-08 11:52 ` [PATCH RFC v2 18/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
2020-04-08 17:53 ` Jakub Kicinski
2020-04-14 14:02 ` Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 20/33] tun: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 21/33] vhost_net: also populate " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 22/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 23/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 24/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 25/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
2020-04-08 11:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 26/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 21:48 ` David Miller
2020-04-08 21:48 ` [Intel-wired-lan] " David Miller
2020-04-14 10:16 ` Jesper Dangaard Brouer
2020-04-14 10:16 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 27/33] ice: " Jesper Dangaard Brouer
2020-04-08 11:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-10 0:59 ` kbuild test robot
2020-04-14 10:39 ` Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
2020-04-08 11:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 17:31 ` Björn Töpel
2020-04-08 17:31 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-04-09 9:33 ` Maxim Mikityanskiy
2020-04-09 9:33 ` [Intel-wired-lan] " Maxim Mikityanskiy
2020-04-08 11:53 ` [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
2020-04-09 3:31 ` Saeed Mahameed
2020-04-14 12:46 ` Jesper Dangaard Brouer [this message]
2020-04-18 3:33 ` Saeed Mahameed
2020-04-14 9:56 ` Jesper Dangaard Brouer
2020-04-14 10:11 ` Toke Høiland-Jørgensen
2020-04-08 11:53 ` [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
2020-04-08 21:49 ` David Miller
2020-04-14 9:43 ` Jesper Dangaard Brouer
2020-04-08 11:53 ` [PATCH RFC v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
2020-04-08 11:53 ` [PATCH RFC v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
2020-04-08 11:53 ` [PATCH RFC v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
2020-04-08 16:55 ` [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Alexei Starovoitov
2020-04-08 16:55 ` [Intel-wired-lan] " Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200414144637.0dafdda5@carbon \
--to=brouer@redhat.com \
--cc=akiyano@amazon.com \
--cc=alexander.duyck@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=bpf@vger.kernel.org \
--cc=dsahern@gmail.com \
--cc=gtzalik@amazon.com \
--cc=ilias.apalodimas@linaro.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=john.fastabend@gmail.com \
--cc=lorenzo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=steffen.klassert@secunet.com \
--cc=toke@redhat.com \
--cc=w@1wt.eu \
--cc=willemdebruijn.kernel@gmail.com \
--cc=zorik@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.