From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Fastabend Subject: Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel Date: Mon, 2 Apr 2018 11:03:37 -0700 Message-ID: <7cfca503-3e17-6287-8888-92d43ce7a2e7@gmail.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit To: "Md. Islam" , netdev@vger.kernel.org, David Miller , David Ahern , stephen@networkplumber.org, agaceph@gmail.com, Pavel Emelyanov , Eric Dumazet , alexei.starovoitov@gmail.com, brouer@redhat.com Return-path: Received: from mail-pl0-f68.google.com ([209.85.160.68]:41585 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753423AbeDBSEL (ORCPT ); Mon, 2 Apr 2018 14:04:11 -0400 Received: by mail-pl0-f68.google.com with SMTP id bj1-v6so2666634plb.8 for ; Mon, 02 Apr 2018 11:04:11 -0700 (PDT) In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 04/01/2018 05:47 PM, Md. Islam wrote: > This patch implements IPv4 forwarding on xdp_buff. I added a new > config option XDP_ROUTER. Kernel would forward packets through fast > path when this option is enabled. But it would require driver support. > Currently it only works with veth. Here I have modified veth such that > it outputs xdp_buff. I created a testbed in Mininet. The Mininet > script (topology.py) is attached. Here the topology is: > > h1 -----r1-----h2 (r1 acts as a router) > > This patch improves the throughput from 53.8Gb/s to 60Gb/s on my > machine. Median RTT also improved from around .055 ms to around .035 > ms. > > Then I disabled hyperthreading and cpu frequency scaling in order to > utilize CPU cache (DPDK also utilizes CPU cache to improve > forwarding). This further improves per-packet forwarding latency from > around 400ns to 200 ns. More specifically, header parsing and fib > lookup only takes around 82 ns. This shows that this could be used to > implement linerate packet forwarding in kernel. > > The patch has been generated on 4.15.0+. Please let me know your > feedback and suggestions. Please feel free to let me know if this > approach make sense. Make sense although lets try to avoid hard coded routing into XDP xmit routines. See details below. > +#ifdef CONFIG_XDP_ROUTER > +int veth_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) > +{ This is nice but instead of building a new config_xdp_router just enable standard XDP for veth + a new helper call to do routing. Then it will be immediately usable from any XDP enabled device. > + struct veth_priv *priv = netdev_priv(dev); > + struct net_device *rcv; > + struct ethhdr *ethh; > + struct sk_buff *skb; > + int length = xdp->data_end - xdp->data; > + > + rcu_read_lock(); > + rcv = rcu_dereference(priv->peer); > + if (unlikely(!rcv)) { > + kfree(xdp); > + goto drop; > + } > + > + /* Update MAC address and checksum */ > + ethh = eth_hdr_xdp(xdp); > + ether_addr_copy(ethh->h_source, dev->dev_addr); > + ether_addr_copy(ethh->h_dest, rcv->dev_addr); > + > + /* if IP forwarding is enabled on the receiver, > + * call xdp_router_forward() > + */ > + if (is_forwarding_enabled(rcv)) { > + prefetch_xdp(xdp); > + if (likely(xdp_router_forward(rcv, xdp) == NET_RX_SUCCESS)) { > + struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); > + > + u64_stats_update_begin(&stats->syncp); > + stats->bytes += length; > + stats->packets++; > + u64_stats_update_end(&stats->syncp); > + goto success; > + } > + } > + > + /* Local deliver */ > + skb = (struct sk_buff *)xdp->data_meta; > + if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) { > + struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); > + > + u64_stats_update_begin(&stats->syncp); > + stats->bytes += length; > + stats->packets++; > + u64_stats_update_end(&stats->syncp); > + } else { > +drop: > + atomic64_inc(&priv->dropped); > + } > +success: > + rcu_read_unlock(); > + return NETDEV_TX_OK; > +} > +#endif > + > static const struct net_device_ops veth_netdev_ops = { > .ndo_init = veth_dev_init, > .ndo_open = veth_open, > @@ -290,6 +370,9 @@ static const struct net_device_ops veth_netdev_ops = { > .ndo_get_iflink = veth_get_iflink, > .ndo_features_check = passthru_features_check, > .ndo_set_rx_headroom = veth_set_rx_headroom, > +#ifdef CONFIG_XDP_ROUTER > + .ndo_xdp_xmit = veth_xdp_xmit, > +#endif > }; > [...] > +#ifdef CONFIG_XDP_ROUTER > +int ip_route_lookup(__be32 daddr, __be32 saddr, > + u8 tos, struct net_device *dev, > + struct fib_result *res); > +#endif > + Can the above be a normal BPF helper that returns an ifindex? Then something roughly like this patter would work for all drivers with redirect support, route_ifindex = ip_route_lookup(__daddr, ....) if (!route_ifindex) return do_foo() return xdp_redirect(route_ifindex); So my suggestion is, 1. enable veth xdp (including redirect support) 2. add a helper to lookup route from routing table Alternatively you can skip step (2) and encode the routing table in BPF directly. Maybe we need a more efficient data structure but that should also work. Thanks, John