From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel
Date: Mon, 2 Apr 2018 11:03:37 -0700
Message-ID: <7cfca503-3e17-6287-8888-92d43ce7a2e7@gmail.com>
References: <CAFgPn1DX9cOpDRGj=wFwvZq_bpq6VFnEOzR1YbMuC0+=DFEWxA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
To: "Md. Islam" <mislam4@kent.edu>, netdev@vger.kernel.org,
        David Miller <davem@davemloft.net>,
        David Ahern <dsahern@gmail.com>, stephen@networkplumber.org,
        agaceph@gmail.com, Pavel Emelyanov <xemul@openvz.org>,
        Eric Dumazet <edumazet@google.com>,
        alexei.starovoitov@gmail.com, brouer@redhat.com
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pl0-f68.google.com ([209.85.160.68]:41585 "EHLO
        mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753423AbeDBSEL (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 2 Apr 2018 14:04:11 -0400
Received: by mail-pl0-f68.google.com with SMTP id bj1-v6so2666634plb.8
        for <netdev@vger.kernel.org>; Mon, 02 Apr 2018 11:04:11 -0700 (PDT)
In-Reply-To: <CAFgPn1DX9cOpDRGj=wFwvZq_bpq6VFnEOzR1YbMuC0+=DFEWxA@mail.gmail.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On 04/01/2018 05:47 PM, Md. Islam wrote:
> This patch implements IPv4 forwarding on xdp_buff. I added a new
> config option XDP_ROUTER. Kernel would forward packets through fast
> path when this option is enabled. But it would require driver support.
> Currently it only works with veth. Here I have modified veth such that
> it outputs xdp_buff. I created a testbed in Mininet. The Mininet
> script (topology.py) is attached. Here the topology is:
> 
> h1 -----r1-----h2 (r1 acts as a router)
> 
> This patch improves the throughput from 53.8Gb/s to 60Gb/s on my
> machine. Median RTT also improved from around .055 ms to around .035
> ms.
> 
> Then I disabled hyperthreading and cpu frequency scaling in order to
> utilize CPU cache (DPDK also utilizes CPU cache to improve
> forwarding). This further improves per-packet forwarding latency from
> around 400ns to 200 ns. More specifically, header parsing and fib
> lookup only takes around 82 ns. This shows that this could be used to
> implement linerate packet forwarding in kernel.
> 
> The patch has been generated on 4.15.0+. Please let me know your
> feedback and suggestions. Please feel free to let me know if this
> approach make sense.

Make sense although lets try to avoid hard coded routing into
XDP xmit routines. See details below.


> +#ifdef CONFIG_XDP_ROUTER
> +int veth_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
> +{

This is nice but instead of building a new config_xdp_router
just enable standard XDP for veth + a new helper call to
do routing. Then it will be immediately usable from any XDP
enabled device.

> +    struct veth_priv *priv = netdev_priv(dev);
> +    struct net_device *rcv;
> +    struct ethhdr *ethh;
> +    struct sk_buff *skb;
> +    int length = xdp->data_end - xdp->data;
> +
> +    rcu_read_lock();
> +    rcv = rcu_dereference(priv->peer);
> +    if (unlikely(!rcv)) {
> +        kfree(xdp);
> +        goto drop;
> +    }
> +
> +    /* Update MAC address and checksum */
> +    ethh = eth_hdr_xdp(xdp);
> +    ether_addr_copy(ethh->h_source, dev->dev_addr);
> +    ether_addr_copy(ethh->h_dest, rcv->dev_addr);
> +
> +    /* if IP forwarding is enabled on the receiver,
> +     * call xdp_router_forward()
> +     */
> +    if (is_forwarding_enabled(rcv)) {
> +        prefetch_xdp(xdp);
> +        if (likely(xdp_router_forward(rcv, xdp) == NET_RX_SUCCESS)) {
> +            struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
> +
> +            u64_stats_update_begin(&stats->syncp);
> +            stats->bytes += length;
> +            stats->packets++;
> +            u64_stats_update_end(&stats->syncp);
> +            goto success;
> +        }
> +    }
> +
> +    /* Local deliver */
> +    skb = (struct sk_buff *)xdp->data_meta;
> +    if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
> +        struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
> +
> +        u64_stats_update_begin(&stats->syncp);
> +        stats->bytes += length;
> +        stats->packets++;
> +        u64_stats_update_end(&stats->syncp);
> +    } else {
> +drop:
> +        atomic64_inc(&priv->dropped);
> +    }
> +success:
> +    rcu_read_unlock();
> +    return NETDEV_TX_OK;
> +}
> +#endif
> +
>  static const struct net_device_ops veth_netdev_ops = {
>      .ndo_init            = veth_dev_init,
>      .ndo_open            = veth_open,
> @@ -290,6 +370,9 @@ static const struct net_device_ops veth_netdev_ops = {
>      .ndo_get_iflink        = veth_get_iflink,
>      .ndo_features_check    = passthru_features_check,
>      .ndo_set_rx_headroom    = veth_set_rx_headroom,
> +#ifdef CONFIG_XDP_ROUTER
> +    .ndo_xdp_xmit        = veth_xdp_xmit,
> +#endif
>  };
> 

[...]

> +#ifdef CONFIG_XDP_ROUTER
> +int ip_route_lookup(__be32 daddr, __be32 saddr,
> +                   u8 tos, struct net_device *dev,
> +                   struct fib_result *res);
> +#endif
> +

Can the above be a normal BPF helper that returns an
ifindex? Then something roughly like this patter would
work for all drivers with redirect support,


     route_ifindex = ip_route_lookup(__daddr, ....)
     if (!route_ifindex)
           return do_foo()
     return xdp_redirect(route_ifindex);
     
So my suggestion is,

  1. enable veth xdp (including redirect support)
  2. add a helper to lookup route from routing table

Alternatively you can skip step (2) and encode the routing
table in BPF directly. Maybe we need a more efficient data
structure but that should also work.

Thanks,
John