From mboxrd@z Thu Jan 1 00:00:00 1970 From: Veli-Matti Lintu Subject: Re: [PATCH net] bonding: fix 802.3ad aggregator reselection Date: Tue, 5 Jul 2016 17:01:57 +0300 Message-ID: References: <10542.1466716851@famine> <20295.1467215964@famine> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: netdev , Veaceslav Falico , Andy Gospodarek , zhuyj , "David S. Miller" To: Jay Vosburgh Return-path: Received: from mail-oi0-f49.google.com ([209.85.218.49]:33917 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752103AbcGEOB7 (ORCPT ); Tue, 5 Jul 2016 10:01:59 -0400 Received: by mail-oi0-f49.google.com with SMTP id s66so230002899oif.1 for ; Tue, 05 Jul 2016 07:01:58 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: 2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu : > 2016-06-29 18:59 GMT+03:00 Jay Vosburgh : >> Veli-Matti Lintu wrote: >> I tried this locally, but don't see any failure (at the end, the >> "Switch A" agg is still active with the single port). I am starting >> with just two ports in each aggregator (instead of three), so that may >> be relevant. > > When the connection problem occurs, /proc/net/bonding/bond0 always > shows the aggregator that has a link up active. Dumpcap sees at least > broadcast traffic on the port, but I haven't done extensive analysis > on that yet. All TCP connections are cut until the bond is up again > when more ports are enabled on the switch. ping doesn't work either > way. I did some further testing on this and it looks like I can get this working by enabling the ports in the new aggregator the same way as the ports in old aggregator are disabled in ad_agg_selection_logic(). Normally the ports seem to get enabled from ad_mux_machine() in "case AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there as the port does get enabled, but no traffic passes through. So far I haven't been able to figure out what happens. When the connection is lost, dumpcap sees traffic on the only active port in the bond, but it seems like nothing catches it. If I disable and re-enable the same port, traffic start flowing again normally. Here's the patch I used for testing on top of 4.7.0-rc6. I haven't tested this with other modes or h/w setups yet. diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index ca81f46..45c06c4 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1706,6 +1706,25 @@ static void ad_agg_selection_logic(struct aggregator *agg, __disable_port(port); } } + + /* Enable ports in the new aggregator */ + if (best) { + netdev_dbg(bond->dev, "Enable ports\n"); + + for (port = best->lag_ports; port; + port = port->next_port_in_aggregator) { + netdev_dbg(bond->dev, "Agg: %d, P=%d: Port: %s; Enabled=%d\n", + best->aggregator_identifier, + best->num_of_ports, + port->slave->dev->name, + __port_is_enabled(port)); + + if (!__port_is_enabled(port)) + __enable_port(port); + } + } + + /* Slave array needs update. */ *update_slave_arr = true; } Veli-Matti