All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Varghese <martinvarghesenokia@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	corbet@lwn.net, scott.drennan@nokia.com,
	Jiri Benc <jbenc@redhat.com>,
	martin.varghese@nokia.com
Subject: Re: [PATCH net-next 1/2] UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc.
Date: Thu, 17 Oct 2019 18:50:29 +0530	[thread overview]
Message-ID: <20191017132029.GA9982@martin-VirtualBox> (raw)
In-Reply-To: <CA+FuTSc=uTot72dxn7VRfCv59GcfWb32ZM5XU1_GHt3Ci3PL_A@mail.gmail.com>

On Tue, Oct 08, 2019 at 12:28:23PM -0400, Willem de Bruijn wrote:
> On Tue, Oct 8, 2019 at 5:51 AM Martin Varghese
> <martinvarghesenokia@gmail.com> wrote:
> >
> > From: Martin <martin.varghese@nokia.com>
> >
> > The Bareudp tunnel module provides a generic L3 encapsulation
> > tunnelling module for tunnelling different protocols like MPLS,
> > IP,NSH etc inside a UDP tunnel.
> >
> > Signed-off-by: Martin Varghese <martinvarghesenokia@gmail.com>
> > ---
> >  Documentation/networking/bareudp.txt |  23 +
> >  drivers/net/Kconfig                  |  13 +
> >  drivers/net/Makefile                 |   1 +
> >  drivers/net/bareudp.c                | 930 +++++++++++++++++++++++++++++++++++
> >  include/net/bareudp.h                |  19 +
> >  include/uapi/linux/if_link.h         |  12 +
> >  6 files changed, 998 insertions(+)
> >  create mode 100644 Documentation/networking/bareudp.txt
> >  create mode 100644 drivers/net/bareudp.c
> >  create mode 100644 include/net/bareudp.h
> >
> > diff --git a/Documentation/networking/bareudp.txt b/Documentation/networking/bareudp.txt
> > new file mode 100644
> > index 0000000..d2530e2
> > --- /dev/null
> > +++ b/Documentation/networking/bareudp.txt
> > @@ -0,0 +1,23 @@
> > +Bare UDP Tunnelling Module Documentation
> > +========================================
> > +
> > +There are various L3 encapsulation standards using UDP being discussed to
> > +leverage the UDP based load balancing capability of different networks.
> > +MPLSoUDP (https://tools.ietf.org/html/rfc7510)is one among them.
> > +
> > +The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
> > +support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
> > +a UDP tunnel.
> 
> This patch set introduces a lot of code, much of which duplicates
> existing tunnel devices, whether FOU using ipip tunnels and
> fou_build_header or separate devices like vxlan and geneve. Let's try
> to reuse what's there (and tested).
> 
> How does this differ from foo-over-udp (FOU). In supporting MPLS
> alongside IP? If so, can it reuse the existing code, possibly with
> patches to that existing tunnel logic?
> 
> I happened to have been playing around with MPLS in GRE recently. Have
> not tried the same over UDP tunnels, but most of it should apply?
> 
>   ip tunnel add gre1 mode gre local 192.168.1.1 remote 192.168.1.2 dev eth0
>   ip addr add 192.168.101.1 peer 192.168.101.2 dev gre1
>   ip link set dev gre1 up
> 
>   sysctl net.mpls.conf.gre1.input=1
>   sysctl net.mpls.platform_labels=17
>   ip addr add 192.168.201.1/24 dev gre1
>   ip route add 192.168.202.0/24 encap mpls 17 dev gre1
>   ip -f mpls route add 16 dev lo
> 
> 
> > +static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> > +{
> > +       struct bareudp_sock *bs;
> > +       struct ethhdr *eh;
> > +       struct bareudp_dev *bareudp;
> > +       struct metadata_dst *tun_dst = NULL;
> > +       struct pcpu_sw_netstats *stats;
> > +       unsigned int len;
> > +       int err = 0;
> > +       void *oiph;
> > +       u16 proto;
> > +
> > +       if (unlikely(!pskb_may_pull(skb, BAREUDP_BASE_HLEN)))
> > +               goto drop;
> > +
> > +       bs = rcu_dereference_sk_user_data(sk);
> > +       if (!bs)
> > +               goto drop;
> > +
> > +       bareudp = bs->bareudp;
> > +       proto = bareudp->ethertype;
> > +
> > +       if (iptunnel_pull_header(skb, BAREUDP_BASE_HLEN,
> > +                                proto,
> > +                                !net_eq(bareudp->net,
> > +                                        dev_net(bareudp->dev)))) {
> > +               bareudp->dev->stats.rx_dropped++;
> > +               goto drop;
> > +       }
> > +       tun_dst = udp_tun_rx_dst(skb, bareudp_get_sk_family(bs), TUNNEL_KEY,
> > +                                0, 0);
> > +       if (!tun_dst) {
> > +               bareudp->dev->stats.rx_dropped++;
> > +               goto drop;
> > +       }
> > +       skb_dst_set(skb, &tun_dst->dst);
> 
> Is this dst metadata a firm requirement? It is optional in vxlan, say.
> If here, too, please split such parts off into separate follow-on
> patches.
> 
> 
> > +       skb_push(skb, sizeof(struct ethhdr));
> > +       eh = (struct ethhdr *)skb->data;
> > +       eh->h_proto = proto;
> > +
> > +       skb_reset_mac_header(skb);
> > +       skb->protocol = eth_type_trans(skb, bareudp->dev);
> > +       skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> > +       oiph = skb_network_header(skb);
> > +       skb_reset_network_header(skb);
> > +
> > +       if (bareudp_get_sk_family(bs) == AF_INET)
> 
> This should be derived from packet contents, not socket state.
> Although the one implies the other, I imagine.
>

The IP Stack check IP headers & puts the packet in the correct socket, hence checking the ip headers again is reduntant correct?
In geneve & vxlan it is done the same way.

 
> > +static struct rtable *bareudp_get_v4_rt(struct sk_buff *skb,
> > +                                       struct net_device *dev,
> > +                                       struct bareudp_sock *bs4,
> > +                                       struct flowi4 *fl4,
> > +                                       const struct ip_tunnel_info *info)
> > +{
> > +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > +       struct bareudp_dev *bareudp = netdev_priv(dev);
> > +       struct dst_cache *dst_cache;
> > +       struct rtable *rt = NULL;
> > +       __u8 tos;
> > +
> > +       if (!bs4)
> > +               return ERR_PTR(-EIO);
> > +
> > +       memset(fl4, 0, sizeof(*fl4));
> > +       fl4->flowi4_mark = skb->mark;
> > +       fl4->flowi4_proto = IPPROTO_UDP;
> > +       fl4->daddr = info->key.u.ipv4.dst;
> > +       fl4->saddr = info->key.u.ipv4.src;
> > +
> > +       tos = info->key.tos;
> > +       fl4->flowi4_tos = RT_TOS(tos);
> > +
> > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > +       if (use_cache) {
> > +               rt = dst_cache_get_ip4(dst_cache, &fl4->saddr);
> > +               if (rt)
> > +                       return rt;
> > +       }
> > +       rt = ip_route_output_key(bareudp->net, fl4);
> > +       if (IS_ERR(rt)) {
> > +               netdev_dbg(dev, "no route to %pI4\n", &fl4->daddr);
> > +               return ERR_PTR(-ENETUNREACH);
> > +       }
> > +       if (rt->dst.dev == dev) { /* is this necessary? */
> > +               netdev_dbg(dev, "circular route to %pI4\n", &fl4->daddr);
> > +               ip_rt_put(rt);
> > +               return ERR_PTR(-ELOOP);
> > +       }
> > +       if (use_cache)
> > +               dst_cache_set_ip4(dst_cache, &rt->dst, fl4->saddr);
> > +       return rt;
> > +}
> > +
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +static struct dst_entry *bareudp_get_v6_dst(struct sk_buff *skb,
> > +                                           struct net_device *dev,
> > +                                           struct bareudp_sock *bs6,
> > +                                           struct flowi6 *fl6,
> > +                                           const struct ip_tunnel_info *info)
> > +{
> > +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > +       struct bareudp_dev *bareudp = netdev_priv(dev);
> > +       struct dst_entry *dst = NULL;
> > +       struct dst_cache *dst_cache;
> > +       __u8 prio;
> > +
> > +       if (!bs6)
> > +               return ERR_PTR(-EIO);
> > +
> > +       memset(fl6, 0, sizeof(*fl6));
> > +       fl6->flowi6_mark = skb->mark;
> > +       fl6->flowi6_proto = IPPROTO_UDP;
> > +       fl6->daddr = info->key.u.ipv6.dst;
> > +       fl6->saddr = info->key.u.ipv6.src;
> > +       prio = info->key.tos;
> > +
> > +       fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
> > +                                          info->key.label);
> > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > +       if (use_cache) {
> > +               dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
> > +               if (dst)
> > +                       return dst;
> > +       }
> > +       if (ipv6_stub->ipv6_dst_lookup(bareudp->net, bs6->sock->sk, &dst,
> > +                                      fl6)) {
> > +               netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr);
> > +               return ERR_PTR(-ENETUNREACH);
> > +       }
> > +       if (dst->dev == dev) { /* is this necessary? */
> > +               netdev_dbg(dev, "circular route to %pI6\n", &fl6->daddr);
> > +               dst_release(dst);
> > +               return ERR_PTR(-ELOOP);
> > +       }
> > +
> > +       if (use_cache)
> > +               dst_cache_set_ip6(dst_cache, dst, &fl6->saddr);
> > +       return dst;
> > +}
> > +#endif
> 
> The route lookup logic is very similar to vxlan_get_route and
> vxlan6_get_route. Can be reused?

I had a look at the vxlan & geneve and it seems the corresponding functions  in those modules are tightly coupled  to the rest of the module design.
More specifically wrt the ttl inheritance & the caching behaviour. It may not be possible for those modules to use a new generic API unless without a change in those module design.

The bareudp module is a generic L3 encapsulation module. It could be used to tunnel different l3 protocols. TTL Inheritance behaviour when tunnelled
could be different for these inner protocols. Hence moving  this function to a common place will make it tough to change it later when a need arises for a new protocol

Otherwise we should have more generic function which takes the  generic IP header params as arguments. Then the point is we don’t need a function like that
We can just fill up "struct flowi4" and call ip_route_output_key or dst_cache_get_ip4 to get the route table entry


Thanks
Martin

  parent reply	other threads:[~2019-10-17 13:20 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:48 [PATCH net-next 0/2] Bareudp Tunnel Module Martin Varghese
2019-10-08  9:48 ` [PATCH net-next 1/2] UDP tunnel encapsulation module for tunnelling different protocols like MPLS,IP,NSH etc Martin Varghese
2019-10-08 14:06   ` Jonathan Corbet
2019-10-08 14:57   ` Randy Dunlap
2019-10-08 16:04   ` Stephen Hemminger
2019-10-08 16:05   ` Stephen Hemminger
2019-10-08 16:28   ` Willem de Bruijn
2019-10-09 12:48     ` Martin Varghese
2019-10-09 14:58       ` Willem de Bruijn
2019-10-09 15:21         ` Willem de Bruijn
2019-10-09 15:42         ` Jiri Benc
2019-10-09 16:15           ` Willem de Bruijn
2019-10-18 20:03           ` Tom Herbert
2019-10-21 17:18             ` Jiri Benc
2019-10-17 13:20     ` Martin Varghese [this message]
2019-10-17 20:48       ` Willem de Bruijn
2019-10-18  8:20         ` Martin Varghese
2019-10-18 14:59           ` Willem de Bruijn
2019-10-23  2:40             ` Martin Varghese
2019-11-07 13:38             ` Martin Varghese
2019-11-07 15:53               ` Willem de Bruijn
2019-11-07 16:12                 ` Martin Varghese
2019-11-07 16:35                   ` Willem de Bruijn
2019-11-07 17:31                     ` Jiri Benc
2019-11-07 18:59                       ` Willem de Bruijn
2019-11-07 19:05                         ` Jiri Benc
2019-11-07 19:13                           ` Willem de Bruijn
2019-11-11 16:02                     ` Martin Varghese
2019-11-11 21:45                       ` Willem de Bruijn
2019-11-14 13:17                         ` Martin Varghese
2019-10-08  9:49 ` [PATCH net-next 2/2] Special handling for IP & MPLS Martin Varghese
2019-10-08 14:58   ` Randy Dunlap
2019-10-08 16:09   ` Willem de Bruijn
2019-10-09 13:38     ` Martin Varghese
2019-10-09 15:06       ` Willem de Bruijn
2019-10-09 15:19         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191017132029.GA9982@martin-VirtualBox \
    --to=martinvarghesenokia@gmail.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=jbenc@redhat.com \
    --cc=martin.varghese@nokia.com \
    --cc=netdev@vger.kernel.org \
    --cc=scott.drennan@nokia.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.