netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Martin Varghese <martinvarghesenokia@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	corbet@lwn.net, scott.drennan@nokia.com,
	Jiri Benc <jbenc@redhat.com>,
	martin.varghese@nokia.com
Subject: Re: [PATCH net-next 1/2] UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
Date: Tue, 8 Oct 2019 12:28:23 -0400	[thread overview]
Message-ID: <CA+FuTSc=uTot72dxn7VRfCv59GcfWb32ZM5XU1_GHt3Ci3PL_A@mail.gmail.com> (raw)
In-Reply-To: <5979d1bf0b5521c66f2f6fa31b7e1cbdddd8cea8.1570455278.git.martinvarghesenokia@gmail.com>

On Tue, Oct 8, 2019 at 5:51 AM Martin Varghese
<martinvarghesenokia@gmail.com> wrote:
>
> From: Martin <martin.varghese@nokia.com>
>
> The Bareudp tunnel module provides a generic L3 encapsulation
> tunnelling module for tunnelling different protocols like MPLS,
> IP,NSH etc inside a UDP tunnel.
>
> Signed-off-by: Martin Varghese <martinvarghesenokia@gmail.com>
> ---
>  Documentation/networking/bareudp.txt |  23 +
>  drivers/net/Kconfig                  |  13 +
>  drivers/net/Makefile                 |   1 +
>  drivers/net/bareudp.c                | 930 +++++++++++++++++++++++++++++++++++
>  include/net/bareudp.h                |  19 +
>  include/uapi/linux/if_link.h         |  12 +
>  6 files changed, 998 insertions(+)
>  create mode 100644 Documentation/networking/bareudp.txt
>  create mode 100644 drivers/net/bareudp.c
>  create mode 100644 include/net/bareudp.h
>
> diff --git a/Documentation/networking/bareudp.txt b/Documentation/networking/bareudp.txt
> new file mode 100644
> index 0000000..d2530e2
> --- /dev/null
> +++ b/Documentation/networking/bareudp.txt
> @@ -0,0 +1,23 @@
> +Bare UDP Tunnelling Module Documentation
> +========================================
> +
> +There are various L3 encapsulation standards using UDP being discussed to
> +leverage the UDP based load balancing capability of different networks.
> +MPLSoUDP (https://tools.ietf.org/html/rfc7510)is one among them.
> +
> +The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
> +support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
> +a UDP tunnel.

This patch set introduces a lot of code, much of which duplicates
existing tunnel devices, whether FOU using ipip tunnels and
fou_build_header or separate devices like vxlan and geneve. Let's try
to reuse what's there (and tested).

How does this differ from foo-over-udp (FOU). In supporting MPLS
alongside IP? If so, can it reuse the existing code, possibly with
patches to that existing tunnel logic?

I happened to have been playing around with MPLS in GRE recently. Have
not tried the same over UDP tunnels, but most of it should apply?

  ip tunnel add gre1 mode gre local 192.168.1.1 remote 192.168.1.2 dev eth0
  ip addr add 192.168.101.1 peer 192.168.101.2 dev gre1
  ip link set dev gre1 up

  sysctl net.mpls.conf.gre1.input=1
  sysctl net.mpls.platform_labels=17
  ip addr add 192.168.201.1/24 dev gre1
  ip route add 192.168.202.0/24 encap mpls 17 dev gre1
  ip -f mpls route add 16 dev lo


> +static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> +{
> +       struct bareudp_sock *bs;
> +       struct ethhdr *eh;
> +       struct bareudp_dev *bareudp;
> +       struct metadata_dst *tun_dst = NULL;
> +       struct pcpu_sw_netstats *stats;
> +       unsigned int len;
> +       int err = 0;
> +       void *oiph;
> +       u16 proto;
> +
> +       if (unlikely(!pskb_may_pull(skb, BAREUDP_BASE_HLEN)))
> +               goto drop;
> +
> +       bs = rcu_dereference_sk_user_data(sk);
> +       if (!bs)
> +               goto drop;
> +
> +       bareudp = bs->bareudp;
> +       proto = bareudp->ethertype;
> +
> +       if (iptunnel_pull_header(skb, BAREUDP_BASE_HLEN,
> +                                proto,
> +                                !net_eq(bareudp->net,
> +                                        dev_net(bareudp->dev)))) {
> +               bareudp->dev->stats.rx_dropped++;
> +               goto drop;
> +       }
> +       tun_dst = udp_tun_rx_dst(skb, bareudp_get_sk_family(bs), TUNNEL_KEY,
> +                                0, 0);
> +       if (!tun_dst) {
> +               bareudp->dev->stats.rx_dropped++;
> +               goto drop;
> +       }
> +       skb_dst_set(skb, &tun_dst->dst);

Is this dst metadata a firm requirement? It is optional in vxlan, say.
If here, too, please split such parts off into separate follow-on
patches.


> +       skb_push(skb, sizeof(struct ethhdr));
> +       eh = (struct ethhdr *)skb->data;
> +       eh->h_proto = proto;
> +
> +       skb_reset_mac_header(skb);
> +       skb->protocol = eth_type_trans(skb, bareudp->dev);
> +       skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> +       oiph = skb_network_header(skb);
> +       skb_reset_network_header(skb);
> +
> +       if (bareudp_get_sk_family(bs) == AF_INET)

This should be derived from packet contents, not socket state.
Although the one implies the other, I imagine.

> +static struct rtable *bareudp_get_v4_rt(struct sk_buff *skb,
> +                                       struct net_device *dev,
> +                                       struct bareudp_sock *bs4,
> +                                       struct flowi4 *fl4,
> +                                       const struct ip_tunnel_info *info)
> +{
> +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> +       struct bareudp_dev *bareudp = netdev_priv(dev);
> +       struct dst_cache *dst_cache;
> +       struct rtable *rt = NULL;
> +       __u8 tos;
> +
> +       if (!bs4)
> +               return ERR_PTR(-EIO);
> +
> +       memset(fl4, 0, sizeof(*fl4));
> +       fl4->flowi4_mark = skb->mark;
> +       fl4->flowi4_proto = IPPROTO_UDP;
> +       fl4->daddr = info->key.u.ipv4.dst;
> +       fl4->saddr = info->key.u.ipv4.src;
> +
> +       tos = info->key.tos;
> +       fl4->flowi4_tos = RT_TOS(tos);
> +
> +       dst_cache = (struct dst_cache *)&info->dst_cache;
> +       if (use_cache) {
> +               rt = dst_cache_get_ip4(dst_cache, &fl4->saddr);
> +               if (rt)
> +                       return rt;
> +       }
> +       rt = ip_route_output_key(bareudp->net, fl4);
> +       if (IS_ERR(rt)) {
> +               netdev_dbg(dev, "no route to %pI4\n", &fl4->daddr);
> +               return ERR_PTR(-ENETUNREACH);
> +       }
> +       if (rt->dst.dev == dev) { /* is this necessary? */
> +               netdev_dbg(dev, "circular route to %pI4\n", &fl4->daddr);
> +               ip_rt_put(rt);
> +               return ERR_PTR(-ELOOP);
> +       }
> +       if (use_cache)
> +               dst_cache_set_ip4(dst_cache, &rt->dst, fl4->saddr);
> +       return rt;
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +static struct dst_entry *bareudp_get_v6_dst(struct sk_buff *skb,
> +                                           struct net_device *dev,
> +                                           struct bareudp_sock *bs6,
> +                                           struct flowi6 *fl6,
> +                                           const struct ip_tunnel_info *info)
> +{
> +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> +       struct bareudp_dev *bareudp = netdev_priv(dev);
> +       struct dst_entry *dst = NULL;
> +       struct dst_cache *dst_cache;
> +       __u8 prio;
> +
> +       if (!bs6)
> +               return ERR_PTR(-EIO);
> +
> +       memset(fl6, 0, sizeof(*fl6));
> +       fl6->flowi6_mark = skb->mark;
> +       fl6->flowi6_proto = IPPROTO_UDP;
> +       fl6->daddr = info->key.u.ipv6.dst;
> +       fl6->saddr = info->key.u.ipv6.src;
> +       prio = info->key.tos;
> +
> +       fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
> +                                          info->key.label);
> +       dst_cache = (struct dst_cache *)&info->dst_cache;
> +       if (use_cache) {
> +               dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
> +               if (dst)
> +                       return dst;
> +       }
> +       if (ipv6_stub->ipv6_dst_lookup(bareudp->net, bs6->sock->sk, &dst,
> +                                      fl6)) {
> +               netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr);
> +               return ERR_PTR(-ENETUNREACH);
> +       }
> +       if (dst->dev == dev) { /* is this necessary? */
> +               netdev_dbg(dev, "circular route to %pI6\n", &fl6->daddr);
> +               dst_release(dst);
> +               return ERR_PTR(-ELOOP);
> +       }
> +
> +       if (use_cache)
> +               dst_cache_set_ip6(dst_cache, dst, &fl6->saddr);
> +       return dst;
> +}
> +#endif

The route lookup logic is very similar to vxlan_get_route and
vxlan6_get_route. Can be reused?

  parent reply	other threads:[~2019-10-08 16:29 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:48 [PATCH net-next 0/2] Bareudp Tunnel Module Martin Varghese
2019-10-08  9:48 ` [PATCH net-next 1/2] UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
2019-10-08 14:06   ` Jonathan Corbet
2019-10-08 14:57   ` Randy Dunlap
2019-10-08 16:04   ` Stephen Hemminger
2019-10-08 16:05   ` Stephen Hemminger
2019-10-08 16:28   ` Willem de Bruijn [this message]
2019-10-09 12:48     ` Martin Varghese
2019-10-09 14:58       ` Willem de Bruijn
2019-10-09 15:21         ` Willem de Bruijn
2019-10-09 15:42         ` Jiri Benc
2019-10-09 16:15           ` Willem de Bruijn
2019-10-18 20:03           ` Tom Herbert
2019-10-21 17:18             ` Jiri Benc
2019-10-17 13:20     ` Martin Varghese
2019-10-17 20:48       ` Willem de Bruijn
2019-10-18  8:20         ` Martin Varghese
2019-10-18 14:59           ` Willem de Bruijn
2019-10-23  2:40             ` Martin Varghese
2019-11-07 13:38             ` Martin Varghese
2019-11-07 15:53               ` Willem de Bruijn
2019-11-07 16:12                 ` Martin Varghese
2019-11-07 16:35                   ` Willem de Bruijn
2019-11-07 17:31                     ` Jiri Benc
2019-11-07 18:59                       ` Willem de Bruijn
2019-11-07 19:05                         ` Jiri Benc
2019-11-07 19:13                           ` Willem de Bruijn
2019-11-11 16:02                     ` Martin Varghese
2019-11-11 21:45                       ` Willem de Bruijn
2019-11-14 13:17                         ` Martin Varghese
2019-10-08  9:49 ` [PATCH net-next 2/2] Special handling for IP & MPLS Martin Varghese
2019-10-08 14:58   ` Randy Dunlap
2019-10-08 16:09   ` Willem de Bruijn
2019-10-09 13:38     ` Martin Varghese
2019-10-09 15:06       ` Willem de Bruijn
2019-10-09 15:19         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+FuTSc=uTot72dxn7VRfCv59GcfWb32ZM5XU1_GHt3Ci3PL_A@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=jbenc@redhat.com \
    --cc=martin.varghese@nokia.com \
    --cc=martinvarghesenokia@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=scott.drennan@nokia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).