From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [PATCH v2 net] bonding: fix 802.3ad state sent to partner when unbinding slave Date: Tue, 27 Nov 2018 07:20:23 -0800 Message-ID: <26036.1543332023@famine> References: <20181127145656.GA31534@peltzi-E7470.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: netdev@vger.kernel.org To: Toni Peltonen Return-path: Received: from youngberry.canonical.com ([91.189.89.112]:45496 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729209AbeK1CSm (ORCPT ); Tue, 27 Nov 2018 21:18:42 -0500 In-reply-to: <20181127145656.GA31534@peltzi-E7470.localdomain> Content-ID: <26035.1543332023.1@famine> Sender: netdev-owner@vger.kernel.org List-ID: Toni Peltonen wrote: >Previously when unbinding a slave the 802.3ad implementation only told >partner that the port is not suitable for aggregation by setting the port >aggregation state from aggregatable to individual. This is not enough. If the >physical layer still stays up and we only unbinded this port from the bond there >is nothing in the aggregation status alone to prevent the partner from sending >traffic towards us. To ensure that the partner doesn't consider this >port at all anymore we should also disable collecting and distributing to >signal that this actor is going away. Also clear AD_STATE_SYNCHRONIZATION to >ensure partner exits collecting + distributing state. > >I have tested this behaviour againts Arista EOS switches with mlx5 cards >(physical link stays up even when interface is down) and simulated >the same situation virtually Linux <-> Linux with two network namespaces >running two veth device pairs. In both cases setting aggregation to >individual doesn't alone prevent traffic from being to sent towards this >port given that the link stays up in partners end. Partner still keeps >it's end in collecting + distributing state and continues until timeout is >reached. In most cases this means we are losing the traffic partner sends >towards our port while we wait for timeout. This is most visible with slow >periodic time (LACP rate slow). > >Other open source implementations like Open VSwitch and libreswitch, and >vendor implementations like Arista EOS, seem to disable collecting + >distributing to when doing similar port disabling/detaching/removing change. >With this patch kernel implementation would behave the same way and ensure >partner doesn't consider our actor viable anymore. > >Signed-off-by: Toni Peltonen Signed-off-by: Jay Vosburgh >--- >v2 changes: >* Fix typo in commit message >* Also clear AD_STATE_SYNCHRONIZATION >--- > drivers/net/bonding/bond_3ad.c | 3 +++ > 1 file changed, 3 insertions(+) > >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c >index f43fb2f958a5..93dfcef8afc4 100644 >--- a/drivers/net/bonding/bond_3ad.c >+++ b/drivers/net/bonding/bond_3ad.c >@@ -2086,6 +2086,9 @@ void bond_3ad_unbind_slave(struct slave *slave) > aggregator->aggregator_identifier); > > /* Tell the partner that this port is not suitable for aggregation */ >+ port->actor_oper_port_state &= ~AD_STATE_SYNCHRONIZATION; >+ port->actor_oper_port_state &= ~AD_STATE_COLLECTING; >+ port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING; > port->actor_oper_port_state &= ~AD_STATE_AGGREGATION; > __update_lacpdu_from_port(port); > ad_lacpdu_send(port); >-- >2.19.0 >