Netdev List
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <razor@blackwall.org>
To: Jonathan Toppins <jtoppins@redhat.com>, netdev@vger.kernel.org
Cc: toke@redhat.com, Long Xin <lxin@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v3] bond: add mac filter option for balance-xor
Date: Mon, 16 May 2022 20:22:40 +0300	[thread overview]
Message-ID: <53357187-aedf-20dd-a331-bc501aa0484e@blackwall.org> (raw)
In-Reply-To: <6431569f-fb09-096e-7a89-284a71aa5c0f@redhat.com>

On 16/05/2022 17:06, Jonathan Toppins wrote:
> On 5/15/22 02:32, Nikolay Aleksandrov wrote:
>> On 15/05/2022 00:41, Nikolay Aleksandrov wrote:
>>> On 13/05/2022 20:43, Jonathan Toppins wrote:
>>>> Implement a MAC filter that prevents duplicate frame delivery when
>>>> handling BUM traffic. This attempts to partially replicate OvS SLB
>>>> Bonding[1] like functionality without requiring significant change
>>>> in the Linux bridging code.
>>>>
>>>> A typical network setup for this feature would be:
>>>>
>>>>              .--------------------------------------------.
>>>>              |         .--------------------.             |
>>>>              |         |                    |             |
>>>>         .-------------------.               |             |
>>>>         |    | Bond 0  |    |               |             |
>>>>         | .--'---. .---'--. |               |             |
>>>>    .----|-| eth0 |-| eth1 |-|----.    .-----+----.   .----+------.
>>>>    |    | '------' '------' |    |    | Switch 1 |   | Switch 2  |
>>>>    |    '---,---------------'    |    |          +---+           |
>>>>    |       /                     |    '----+-----'   '----+------'
>>>>    |  .---'---.    .------.      |         |              |
>>>>    |  |  br0  |----| VM 1 |      |      ~~~~~~~~~~~~~~~~~~~~~
>>>>    |  '-------'    '------'      |     (                     )
>>>>    |      |        .------.      |     ( Rest of Network     )
>>>>    |      '--------| VM # |      |     (_____________________)
>>>>    |               '------'      |
>>>>    |  Host 1                     |
>>>>    '-----------------------------'
>>>>
>>>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with
>>>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One
>>>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a
>>>> single box, as depicted. Interfaces eth0 and eth1 provide redundant
>>>> connections to the data center with the requirement to use all bandwidth
>>>> when the system is functioning normally. Switch 1 and Switch 2 are
>>>> physical switches that do not implement any advanced L2 management
>>>> features such as MLAG, Cisco's VPC, or LACP.
>>>>
>>>> Combining this feature with vlan+srcmac hash policy allows a user to
>>>> create an access network without the need to use expensive switches that
>>>> support features like Cisco's VCP.
>>>>
>>>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding
>>>>
>>>> Co-developed-by: Long Xin <lxin@redhat.com>
>>>> Signed-off-by: Long Xin <lxin@redhat.com>
>>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>>> ---
>>>>
>>>> Notes:
>>>>      v2:
>>>>       * dropped needless abstraction functions and put code in module init
>>>>       * renamed variable "rc" to "ret" to stay consistent with most of the
>>>>         code
>>>>       * fixed parameter setting management, when arp-monitor is turned on
>>>>         this feature will be turned off similar to how miimon and arp-monitor
>>>>         interact
>>>>       * renamed bond_xor_recv to bond_mac_filter_recv for a little more
>>>>         clarity
>>>>       * it appears the implied default return code for any bonding recv probe
>>>>         must be `RX_HANDLER_ANOTHER`. Changed the default return code of
>>>>         bond_mac_filter_recv to use this return value to not break skb
>>>>         processing when the skb dev is switched to the bond dev:
>>>>           `skb->dev = bond->dev`
>>>>           v3: Nik's comments
>>>>       * clarified documentation
>>>>       * fixed inline and basic reverse Christmas tree formatting
>>>>       * zero'ed entry in mac_create
>>>>       * removed read_lock taking in bond_mac_filter_recv
>>>>       * made has_expired() atomic and removed critical sections
>>>>         surrounding calls to has_expired(), this also removed the
>>>>         use-after-free that would have occurred:
>>>>             spin_lock_irqsave(&entry->lock, flags);
>>>>                 if (has_expired(bond, entry))
>>>>                     mac_delete(bond, entry);
>>>>             spin_unlock_irqrestore(&entry->lock, flags); <---
>>>>       * moved init/destroy of mac_filter_tbl to bond_open/bond_close
>>>>         this removed the complex option dependencies, the only behavioural
>>>>         change the user will see is if the bond is up and mac_filter is
>>>>         enabled if they try and set arp_interval they will receive -EBUSY
>>>>       * in bond_changelink moved processing of mac_filter option just below
>>>>         mode processing
>>>>
>>>>   Documentation/networking/bonding.rst  |  20 +++
>>>>   drivers/net/bonding/Makefile          |   2 +-
>>>>   drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++
>>>>   drivers/net/bonding/bond_mac_filter.h |  37 +++++
>>>>   drivers/net/bonding/bond_main.c       |  30 ++++
>>>>   drivers/net/bonding/bond_netlink.c    |  13 ++
>>>>   drivers/net/bonding/bond_options.c    |  81 +++++++++--
>>>>   drivers/net/bonding/bonding_priv.h    |   1 +
>>>>   include/net/bond_options.h            |   1 +
>>>>   include/net/bonding.h                 |   3 +
>>>>   include/uapi/linux/if_link.h          |   1 +
>>>>   11 files changed, 373 insertions(+), 17 deletions(-)
>>>>   create mode 100644 drivers/net/bonding/bond_mac_filter.c
>>>>   create mode 100644 drivers/net/bonding/bond_mac_filter.h
>>>>
>>>
>> [snip]
>>
>> The same problem solved using a few nftables rules (in case you don't want to load eBPF):
>> $ nft 'add table netdev nt'
>> $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }'
>> $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }'
>> $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }'
>> $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset'
>> $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop'
>>
> 
> I get the following when trying to apply this on a fedora 35 install.
> 
> root@fedora ~]# ip link add bond0 type bond mode balance-xor xmit_hash_policy vlan+srcmac
> [root@fedora ~]# nft 'add table netdev nt'
> [root@fedora ~]# nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }'
> Error: unknown chain hook
> add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }
>                                                          ^^^^^^
> [root@fedora ~]# uname -a
> Linux fedora 5.17.5-200.fc35.x86_64 #1 SMP PREEMPT Thu Apr 28 15:41:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
> 

Well, take it up with the Fedora nftables package maintainer. :) 

Your nftables version is old (I'd guess <1.0.1):
 commit 510c4fad7e78
 Author: Lukas Wunner <lukas@wunner.de>
 Date:   Wed Mar 11 13:20:06 2020 +0100

     src: Support netdev egress hook

$ git tag --contains 510c4fad7e78f
v1.0.1
v1.0.2

I just tested it[1] on Linux 5.16.18-200.fc35.x86_64 #1 SMP PREEMPT Mon Mar 28 14:10:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cheers,
 Nik

[1] You can clearly see the dynamically learned mac on egress (52:54:00:23:5f:13) and traffic
    with that source is now blocked on ingress.

$ nft -a list table netdev nt
	set macset { # handle 10
		type ether_addr
		size 65535
		flags timeout
		elements = { 52:54:00:23:5f:13 timeout 5s expires 4s192ms }
	}

	chain bond0EgressFilter { # handle 8
		type filter hook egress device "bond0" priority filter; policy accept;
		update @macset { ether saddr timeout 5s } # handle 11
	}

	chain bond0IngressFilter { # handle 9
		type filter hook ingress device "bond0" priority filter; policy accept;
	}


  reply	other threads:[~2022-05-16 17:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-13 17:43 [PATCH net-next v3] bond: add mac filter option for balance-xor Jonathan Toppins
2022-05-14 21:41 ` Nikolay Aleksandrov
2022-05-15  6:32   ` Nikolay Aleksandrov
2022-05-16 14:06     ` Jonathan Toppins
2022-05-16 17:22       ` Nikolay Aleksandrov [this message]
2022-05-16 17:24         ` Nikolay Aleksandrov
2022-05-23 13:35     ` Jonathan Toppins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53357187-aedf-20dd-a331-bc501aa0484e@blackwall.org \
    --to=razor@blackwall.org \
    --cc=andy@greyhouse.net \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=j.vosburgh@gmail.com \
    --cc=jtoppins@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lxin@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox