netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH next] bonding: pair enable_port with slave_arr_updates
@ 2022-01-27  0:26 Mahesh Bandewar
  2022-01-27 23:54 ` Jay Vosburgh
  0 siblings, 1 reply; 3+ messages in thread
From: Mahesh Bandewar @ 2022-01-27  0:26 UTC (permalink / raw)
  To: Netdev, Jay Vosburgh, Andy Gospodarek, Veaceslav Falico
  Cc: David Miller, Jakub Kicinski, Mahesh Bandewar, Mahesh Bandewar

When 803.2ad mode enables a participating port, it should update
the slave-array. I have observed that the member links are participating
and are part of the active aggregator while the traffic is egressing via
only one member link (in a case where two links are participating). Via
krpobes I discovered that that slave-arr has only one link added while
the other participating link wasn't part of the slave-arr.

I couldn't see what caused that situation but the simple code-walk
through provided me hints that the enable_port wasn't always associated
with the slave-array update.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/bonding/bond_3ad.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6006c2e8fa2b..f20bbc18a03f 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1024,6 +1024,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
 
 					__enable_port(port);
 				}
+				/* Slave array needs update */
+				*update_slave_arr = true;
 			}
 			break;
 		default:
@@ -1779,6 +1781,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
 			     port = port->next_port_in_aggregator) {
 				__enable_port(port);
 			}
+			/* Slave array needs update. */
+			*update_slave_arr = true;
 		}
 	}
 
-- 
2.35.0.rc0.227.g00780c9af4-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH next] bonding: pair enable_port with slave_arr_updates
  2022-01-27  0:26 [PATCH next] bonding: pair enable_port with slave_arr_updates Mahesh Bandewar
@ 2022-01-27 23:54 ` Jay Vosburgh
  2022-01-28  1:26   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2022-01-27 23:54 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
	Jakub Kicinski, Mahesh Bandewar

Mahesh Bandewar <maheshb@google.com> wrote:

>When 803.2ad mode enables a participating port, it should update
>the slave-array. I have observed that the member links are participating
>and are part of the active aggregator while the traffic is egressing via
>only one member link (in a case where two links are participating). Via
>krpobes I discovered that that slave-arr has only one link added while
>the other participating link wasn't part of the slave-arr.
>
>I couldn't see what caused that situation but the simple code-walk
>through provided me hints that the enable_port wasn't always associated
>with the slave-array update.
>
>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>---
> drivers/net/bonding/bond_3ad.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 6006c2e8fa2b..f20bbc18a03f 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1024,6 +1024,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> 
> 					__enable_port(port);
> 				}
>+				/* Slave array needs update */
>+				*update_slave_arr = true;
> 			}

	Shouldn't this be in the same block as the __enable_port() call?
If I'm reading the code correctly, as written this will trigger an
update of the array on every pass of the state machine (every 100ms) if
any port is in COLLECTING_DISTRIBUTING state, which is the usual case.

> 			break;
> 		default:
>@@ -1779,6 +1781,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
> 			     port = port->next_port_in_aggregator) {
> 				__enable_port(port);
> 			}
>+			/* Slave array needs update. */
>+			*update_slave_arr = true;
> 		}

	I suspect this change would only affect your issue if the port
in question was failing to partner (i.e., the peer wasn't running LACP
or there was some failure in the LACP negotiation).  If the ports in
your test were in the same aggregator, that shouldn't be the case, as I
believe unpartnered ports are always individual (not in an aggregator).

	Do you have a test?

	-J

> 	}
> 
>-- 
>2.35.0.rc0.227.g00780c9af4-goog
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH next] bonding: pair enable_port with slave_arr_updates
  2022-01-27 23:54 ` Jay Vosburgh
@ 2022-01-28  1:26   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 3+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2022-01-28  1:26 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Netdev, Andy Gospodarek, Veaceslav Falico, David Miller,
	Jakub Kicinski, Mahesh Bandewar

On Thu, Jan 27, 2022 at 3:54 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Mahesh Bandewar <maheshb@google.com> wrote:
>
> >When 803.2ad mode enables a participating port, it should update
> >the slave-array. I have observed that the member links are participating
> >and are part of the active aggregator while the traffic is egressing via
> >only one member link (in a case where two links are participating). Via
> >krpobes I discovered that that slave-arr has only one link added while
> >the other participating link wasn't part of the slave-arr.
> >
> >I couldn't see what caused that situation but the simple code-walk
> >through provided me hints that the enable_port wasn't always associated
> >with the slave-array update.
> >
> >Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index 6006c2e8fa2b..f20bbc18a03f 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -1024,6 +1024,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> >
> >                                       __enable_port(port);
> >                               }
> >+                              /* Slave array needs update */
> >+                              *update_slave_arr = true;
> >                       }
>
>         Shouldn't this be in the same block as the __enable_port() call?
absolutely! It's inefficient to have outside of that if-clause and
would unnecessarily update slave-arr when it's not even needed. My
bad, I'll fix it in v2

> If I'm reading the code correctly, as written this will trigger an
> update of the array on every pass of the state machine (every 100ms) if
> any port is in COLLECTING_DISTRIBUTING state, which is the usual case.
>
> >                       break;
> >               default:
> >@@ -1779,6 +1781,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
> >                            port = port->next_port_in_aggregator) {
> >                               __enable_port(port);
> >                       }
> >+                      /* Slave array needs update. */
> >+                      *update_slave_arr = true;
> >               }
>
>         I suspect this change would only affect your issue if the port
> in question was failing to partner (i.e., the peer wasn't running LACP
> or there was some failure in the LACP negotiation).  If the ports in
> your test were in the same aggregator, that shouldn't be the case, as I
> believe unpartnered ports are always individual (not in an aggregator).
The condition seems to manifest randomly on some machines and not
always. All links are part of the same aggregator but some transient
situation does break the bond and almost always it reforms but
occasionally it gets into this state I mentioned.

My primary motive behind this fix/patch is to update the slave-arr
when LACP state is changing (for whatever reasons). Enabling port
seems to be an event which must be associated with updating the array
and found these two locations in the code where there is a chance that
update_array may not happen when a port gets enabled.

>
>         Do you have a test?
>
I'm not sure what triggers it and hence I don't have the exact repro
steps / test.

>         -J
>
> >       }
> >
> >--
> >2.35.0.rc0.227.g00780c9af4-goog
> >
>
> ---
>         -Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-28  1:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-27  0:26 [PATCH next] bonding: pair enable_port with slave_arr_updates Mahesh Bandewar
2022-01-27 23:54 ` Jay Vosburgh
2022-01-28  1:26   ` Mahesh Bandewar (महेश बंडेवार)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).