From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Brenden Blanco <bblanco@plumgrid.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
Jamal Hadi Salim <jhs@mojatatu.com>,
Saeed Mahameed <saeedm@dev.mellanox.co.il>,
Martin KaFai Lau <kafai@fb.com>, Ari Saha <as754m@att.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Or Gerlitz <gerlitz.or@gmail.com>,
john.fastabend@gmail.com, hannes@stressinduktion.org,
Thomas Graf <tgraf@suug.ch>, Tom Herbert <tom@herbertland.com>,
Daniel Borkmann <daniel@iogearbox.net>,
brouer@redhat.com
Subject: Re: [PATCH v8 04/11] net/mlx4_en: add support for fast rx drop bpf program
Date: Thu, 14 Jul 2016 09:25:43 +0200 [thread overview]
Message-ID: <20160714092543.776f8d8c@redhat.com> (raw)
In-Reply-To: <1468309894-26258-5-git-send-email-bblanco@plumgrid.com>
I would really really like to see the XDP program associated with the
RX ring queues, instead of a single XDP program covering the entire NIC.
(Just move the bpf_prog pointer to struct mlx4_en_rx_ring)
So, why is this so important? It is a fundamental architectural choice.
With a single XDP program per NIC, then we are not better than DPDK,
where a single application monopolize the entire NIC. Recently netmap
added support for running on a single specific queue[1]. This is the
number one argument our customers give, for not wanting to run DPDK,
because they need to dedicate an entire NIC per high speed application.
As John Fastabend says, his NICs have thousands of queues, and he want
to bind applications to the queues. This idea of binding queues to
applications, goes all the way back to Van Jacobson's 2006
netchannels[2]. Where creating an application channel allow for lock
free single producer single consumer (SPSC) queue directly into the
application. A XDP program "locked" to a single RX queue can make
these optimizations, a global XDP programm cannot.
Why this change now, why can't this wait?
I'm starting to see more and more code assuming that a single global
XDP program owns the NIC. This will be harder and harder to cleanup.
I'm fine with the first patch iteration only supports setting the XDP
program on all RX queue (e.g. returns ENOSUPPORT on specific
queues). Only requesting that this is moved to struct mlx4_en_rx_ring,
and appropriate refcnt handling is done.
On Tue, 12 Jul 2016 00:51:27 -0700
Brenden Blanco <bblanco@plumgrid.com> wrote:
> Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.
>
> In tc/socket bpf programs, helpers linearize skb fragments as needed
> when the program touches the packet data. However, in the pursuit of
> speed, XDP programs will not be allowed to use these slower functions,
> especially if it involves allocating an skb.
>
[...]
>
> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 51 ++++++++++++++++++++++++++
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 37 +++++++++++++++++--
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 5 +++
> 3 files changed, 89 insertions(+), 4 deletions(-)
>
[...]
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index c1b3a9c..adfa123 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -743,6 +743,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring];
> struct mlx4_en_rx_alloc *frags;
> struct mlx4_en_rx_desc *rx_desc;
> + struct bpf_prog *prog;
> struct sk_buff *skb;
> int index;
> int nr;
> @@ -759,6 +760,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> if (budget <= 0)
> return polled;
>
> + prog = READ_ONCE(priv->prog);
prog = READ_ONCE(ring->prog);
> /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> * descriptor offset can be deduced from the CQE index instead of
> * reading 'cqe->index' */
> @@ -835,6 +838,35 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> l2_tunnel = (dev->hw_enc_features & NETIF_F_RXCSUM) &&
> (cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_L2_TUNNEL));
>
> + /* A bpf program gets first chance to drop the packet. It may
> + * read bytes but not past the end of the frag.
> + */
> + if (prog) {
> + struct xdp_buff xdp;
> + dma_addr_t dma;
> + u32 act;
> +
> + dma = be64_to_cpu(rx_desc->data[0].addr);
> + dma_sync_single_for_cpu(priv->ddev, dma,
> + priv->frag_info[0].frag_size,
> + DMA_FROM_DEVICE);
> +
> + xdp.data = page_address(frags[0].page) +
> + frags[0].page_offset;
> + xdp.data_end = xdp.data + length;
> +
> + act = bpf_prog_run_xdp(prog, &xdp);
> + switch (act) {
> + case XDP_PASS:
> + break;
> + default:
> + bpf_warn_invalid_xdp_action(act);
> + case XDP_ABORTED:
> + case XDP_DROP:
> + goto next;
> + }
> + }
[...]
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index d39bf59..35ecfa2 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
[...]
> @@ -590,6 +594,7 @@ struct mlx4_en_priv {
> struct hlist_head mac_hash[MLX4_EN_MAC_HASH_SIZE];
> struct hwtstamp_config hwtstamp_config;
> u32 counter_index;
> + struct bpf_prog *prog;
Move to struct mlx4_en_rx_ring.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-07-14 7:25 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-12 7:51 [PATCH v8 00/11] Add driver bpf hook for early packet drop and forwarding Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 01/11] bpf: add XDP prog type for early driver filter Brenden Blanco
2016-07-12 13:14 ` Jesper Dangaard Brouer
2016-07-12 14:52 ` Tom Herbert
2016-07-12 16:08 ` Jakub Kicinski
2016-07-13 4:14 ` Alexei Starovoitov
2016-07-12 7:51 ` [PATCH v8 02/11] net: add ndo to setup/query xdp prog in adapter rx Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 03/11] rtnl: add option for setting link xdp prog Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 04/11] net/mlx4_en: add support for fast rx drop bpf program Brenden Blanco
2016-07-12 12:02 ` Tariq Toukan
2016-07-13 11:27 ` David Laight
2016-07-13 14:08 ` Brenden Blanco
2016-07-14 7:25 ` Jesper Dangaard Brouer [this message]
2016-07-15 3:30 ` Alexei Starovoitov
2016-07-15 8:21 ` Jesper Dangaard Brouer
2016-07-15 16:56 ` Alexei Starovoitov
2016-07-15 16:18 ` Tom Herbert
2016-07-15 16:47 ` Alexei Starovoitov
2016-07-15 17:49 ` Tom Herbert
2016-07-18 9:10 ` Thomas Graf
2016-07-18 11:39 ` Tom Herbert
2016-07-18 12:48 ` Thomas Graf
2016-07-18 13:07 ` Tom Herbert
2016-07-19 2:45 ` Alexei Starovoitov
2016-07-18 19:03 ` Brenden Blanco
2016-07-15 19:09 ` Jesper Dangaard Brouer
2016-07-18 4:01 ` Alexei Starovoitov
2016-07-18 8:35 ` Daniel Borkmann
2016-07-15 18:08 ` Tom Herbert
2016-07-15 18:45 ` Jesper Dangaard Brouer
2016-07-12 7:51 ` [PATCH v8 05/11] Add sample for adding simple drop program to link Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 06/11] net/mlx4_en: add page recycle to prepare rx ring for tx support Brenden Blanco
2016-07-12 12:09 ` Tariq Toukan
2016-07-12 21:18 ` David Miller
2016-07-13 0:54 ` Brenden Blanco
2016-07-13 7:17 ` Tariq Toukan
2016-07-13 15:40 ` Brenden Blanco
2016-07-15 21:52 ` Brenden Blanco
[not found] ` <6d638467-eea6-d3e1-6984-88a1198ef303@gmail.com>
2016-07-19 17:41 ` Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 07/11] bpf: add XDP_TX xdp_action for direct forwarding Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 08/11] net/mlx4_en: break out tx_desc write into separate function Brenden Blanco
2016-07-12 12:16 ` Tariq Toukan
2016-07-12 7:51 ` [PATCH v8 09/11] net/mlx4_en: add xdp forwarding and data write support Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 10/11] bpf: enable direct packet data write for xdp progs Brenden Blanco
2016-07-12 7:51 ` [PATCH v8 11/11] bpf: add sample for xdp forwarding and rewrite Brenden Blanco
2016-07-12 14:38 ` [PATCH v8 00/11] Add driver bpf hook for early packet drop and forwarding Tariq Toukan
2016-07-13 15:00 ` Tariq Toukan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160714092543.776f8d8c@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=as754m@att.com \
--cc=bblanco@plumgrid.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=gerlitz.or@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=jhs@mojatatu.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@dev.mellanox.co.il \
--cc=tgraf@suug.ch \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).