From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jonathan Toppins Subject: Re: [PATCH v2 net] bonding: fix 802.3ad state sent to partner when unbinding slave Date: Tue, 27 Nov 2018 13:12:08 -0500 Message-ID: <259f9a02-ad13-838e-00e2-4742f6a69a86@redhat.com> References: <20181127145656.GA31534@peltzi-E7470.localdomain> Reply-To: jtoppins@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: jay.vosburgh@canonical.com To: Toni Peltonen , netdev@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:53934 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727730AbeK1FK4 (ORCPT ); Wed, 28 Nov 2018 00:10:56 -0500 In-Reply-To: <20181127145656.GA31534@peltzi-E7470.localdomain> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 11/27/2018 09:56 AM, Toni Peltonen wrote: > Previously when unbinding a slave the 802.3ad implementation only told > partner that the port is not suitable for aggregation by setting the port > aggregation state from aggregatable to individual. This is not enough. If the > physical layer still stays up and we only unbinded this port from the bond there > is nothing in the aggregation status alone to prevent the partner from sending > traffic towards us. To ensure that the partner doesn't consider this > port at all anymore we should also disable collecting and distributing to > signal that this actor is going away. Also clear AD_STATE_SYNCHRONIZATION to > ensure partner exits collecting + distributing state. > > I have tested this behaviour againts Arista EOS switches with mlx5 cards > (physical link stays up even when interface is down) and simulated > the same situation virtually Linux <-> Linux with two network namespaces > running two veth device pairs. In both cases setting aggregation to > individual doesn't alone prevent traffic from being to sent towards this > port given that the link stays up in partners end. Partner still keeps > it's end in collecting + distributing state and continues until timeout is > reached. In most cases this means we are losing the traffic partner sends > towards our port while we wait for timeout. This is most visible with slow > periodic time (LACP rate slow). > > Other open source implementations like Open VSwitch and libreswitch, and > vendor implementations like Arista EOS, seem to disable collecting + > distributing to when doing similar port disabling/detaching/removing change. > With this patch kernel implementation would behave the same way and ensure > partner doesn't consider our actor viable anymore. > > Signed-off-by: Toni Peltonen > --- > v2 changes: > * Fix typo in commit message > * Also clear AD_STATE_SYNCHRONIZATION > --- > drivers/net/bonding/bond_3ad.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c > index f43fb2f958a5..93dfcef8afc4 100644 > --- a/drivers/net/bonding/bond_3ad.c > +++ b/drivers/net/bonding/bond_3ad.c > @@ -2086,6 +2086,9 @@ void bond_3ad_unbind_slave(struct slave *slave) > aggregator->aggregator_identifier); > > /* Tell the partner that this port is not suitable for aggregation */ > + port->actor_oper_port_state &= ~AD_STATE_SYNCHRONIZATION; required by IEEE std 801.AX 2014; figure 6-18 port disabled state > + port->actor_oper_port_state &= ~AD_STATE_COLLECTING; > + port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING; required by IEEE std 801.AX 2014; figure 6-21 detached state > port->actor_oper_port_state &= ~AD_STATE_AGGREGATION;> __update_lacpdu_from_port(port); > ad_lacpdu_send(port); > Acked-by: Jonathan Toppins