Netdev List
 help / color / mirror / Atom feed
From: David Ahern <dsa@cumulusnetworks.com>
To: Daniel Borkmann <daniel@iogearbox.net>, davem@davemloft.net
Cc: netdev@vger.kernel.org, Mahesh Bandewar <maheshb@google.com>,
	Florian Westphal <fw@strlen.de>, Martynas Pumputis <m@lambda.lt>
Subject: Re: [PATCH net] ipvlan, l3mdev: fix broken l3s mode wrt local routes
Date: Wed, 30 Jan 2019 15:24:17 -0700	[thread overview]
Message-ID: <2c0c0ea4-274f-b19e-8c4e-71940243bba9@cumulusnetworks.com> (raw)
In-Reply-To: <20190130114948.24227-1-daniel@iogearbox.net>

On 1/30/19 4:49 AM, Daniel Borkmann wrote:
> While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
> I ran into the issue that while l3 mode is working fine, l3s mode
> does not have any connectivity to kube-apiserver and hence all pods
> end up in Error state as well. The ipvlan master device sits on
> top of a bond device and hostns traffic to kube-apiserver (also running
> in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
> where the latter is the address of the bond0. While in l3 mode, a
> curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
> works fine from hostns, neither of them do in case of l3s. In the
> latter only a curl to https://127.0.0.1:37573 appeared to work where
> for local addresses of bond0 I saw kernel suddenly starting to emit
> ARP requests to query HW address of bond0 which remained unanswered
> and neighbor entries in INCOMPLETE state. These ARP requests only
> happen while in l3s.
> 
> Debugging this further, I found the issue is that l3s mode is piggy-
> backing on l3 master device, and in this case local routes are using
> l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
> f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
> if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
> a loopback"). I found that reverting them back into using the
> net->loopback_dev fixed ipvlan l3s connectivity and got everything
> working for the CNI.
> 
> Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
> l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
> on l3 master device is to get the l3mdev_ip_rcv() receive hook for
> setting the dst entry of the input route without adding its own
> ipvlan specific hacks into the receive path, however, any l3 domain
> semantics beyond just that are breaking l3s operation. Note that
> ipvlan also has the ability to dynamically switch its internal
> operation from l3 to l3s for all ports via ipvlan_set_port_mode()
> at runtime. In any case, l3 vs l3s soley distinguishes itself by
> 'de-confusing' netfilter through switching skb->dev to ipvlan slave
> device late in NF_INET_LOCAL_IN before handing the skb to L4.
> 
> Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
> if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
> without any additional l3mdev semantics on top. This should also have
> minimal impact since dev->priv_flags is already hot in cache. With
> this set, l3s mode is working fine and I also get things like
> masquerading pod traffic on the ipvlan master properly working.
> 
>   [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
> 
> Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
> Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
> Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: David Ahern <dsa@cumulusnetworks.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Martynas Pumputis <m@lambda.lt>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 6 +++---
>  include/linux/netdevice.h        | 8 ++++++++
>  include/net/l3mdev.h             | 3 ++-
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
I am not surprised that ipvlan needs a finer grained selection of the
l3mdev hooks.

Acked-by: David Ahern <dsa@cumulusnetworks.com>

  reply	other threads:[~2019-01-30 23:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30 11:49 [PATCH net] ipvlan, l3mdev: fix broken l3s mode wrt local routes Daniel Borkmann
2019-01-30 22:24 ` David Ahern [this message]
2019-01-31  6:14 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c0c0ea4-274f-b19e-8c4e-71940243bba9@cumulusnetworks.com \
    --to=dsa@cumulusnetworks.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=m@lambda.lt \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox