From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [PATCH net] bonding: fix 802.3ad state sent to partner when unbinding slave Date: Mon, 26 Nov 2018 16:29:58 -0800 Message-ID: <6004.1543278598@famine> References: <20181126225438.GA2252@peltzi-E7470.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: netdev@vger.kernel.org To: Toni Peltonen Return-path: Received: from youngberry.canonical.com ([91.189.89.112]:55935 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727456AbeK0L0C (ORCPT ); Tue, 27 Nov 2018 06:26:02 -0500 In-reply-to: <20181126225438.GA2252@peltzi-E7470.localdomain> Content-ID: <6003.1543278598.1@famine> Sender: netdev-owner@vger.kernel.org List-ID: Toni Peltonen wrote: >Previously when unbinding a slave the 802.3ad implementation only told >partner that the port is not suitable for aggregation by setting the port >aggregation state from aggregatable to individual. This is not enough. If the >physical layer still stays up and we only unbinded this port from the bond there >is nothing in the aggregation status alone to prevent the partner from sending >traffic towards us. To ensure that the partner doesn't consider this >port at all anymore we should also disable collecting and distributing to >signal that this actor is going away. > >I have tested this behaviour againts Arista EOS switches with mlx5 cards >(physical link stays up when even when interface is down) and simulated >the same situation virtually Linux <-> Linux with two network namespaces >running two veth device pairs. In both cases setting aggregation to >individual doesn't alone prevent traffic from being to sent towards this >port given that the link stays up in partners end. Partner still keeps >it's end in collecting + distributing state and continues until timeout is >reached. In most cases this means we are losing the traffic partner sends >towards our port while we wait for timeout. This is most visible with slow >periodic time (LAPC rate slow). "LAPC" -> "LACP" >Other open source implementations like Open VSwitch and libreswitch, and >vendor implementations like Arista EOS, seem to disable collecting + >distributing to when doing similar port disabling/detaching/removing change. >With this patch kernel implementation would behave the same way and ensure >partner doesn't consider our actor viable anymore. After re-reading the relevant bits of 802.1AX (particularly 5.4.9 on recordPDU and update_Selected) I'm going to suggest also clearing AD_STATE_SYNCHRONIZATION, based on: Partner_Oper_Port_State.Synchronization is also set to TRUE if the value of Actor_State.Aggregation in the received PDU is set to FALSE (i.e., indicates an Individual link), Actor_State.Synchronization in the received PDU is set to TRUE, and LACP will actively maintain the link. Per the above, learing _SYNC in the LACPDU should un-sync the port, inducing the Mux state machine (figure 5-15) to exit C_D state and go to ATTACHED state (disabling Coll/Dist). But, either way, as this is a hint to get the link partner to stop using the port, this looks reasonable to me. Acked-by: Jay Vosburgh -J >Signed-off-by: Toni Peltonen >--- > drivers/net/bonding/bond_3ad.c | 2 ++ > 1 file changed, 2 insertions(+) > >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c >index f43fb2f958a5..6776c33753dc 100644 >--- a/drivers/net/bonding/bond_3ad.c >+++ b/drivers/net/bonding/bond_3ad.c >@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave) > aggregator->aggregator_identifier); > > /* Tell the partner that this port is not suitable for aggregation */ >+ port->actor_oper_port_state &= ~AD_STATE_COLLECTING; >+ port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING; > port->actor_oper_port_state &= ~AD_STATE_AGGREGATION; > __update_lacpdu_from_port(port); > ad_lacpdu_send(port); >-- >2.19.0 --- -Jay Vosburgh, jay.vosburgh@canonical.com