netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
@ 2025-06-24  6:53 Hangbin Liu
  2025-06-26 23:28 ` Jay Vosburgh
  0 siblings, 1 reply; 6+ messages in thread
From: Hangbin Liu @ 2025-06-24  6:53 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jiri Pirko, netdev

Hi Jay,

We have a customer setup involving two separate switches with identical
L2/VLAN configurations. Each switch forms an independent aggregator
(port-channel), and the end host connects to both with the same number of
links and equivalent bandwidth.

As a result, the host ends up with two aggregators under a single bond
interface. Since the user cannot arbitrarily override port count or
bandwidth, they are asking for a new mechanism, lacp_prio, to influence
aggregator selection via ad_select.

Do you think this is a reasonable addition?

If yes, what would be the best way to compare priorities?

1. Port Priority Only. Currently initialized to 0xff. We could add a parameter
   allowing users to configure it.
   a) Use the highest port priority within each aggregator for comparison
   b) Sum all port priorities in each aggregator and compare the totals

2. Full LACP Info Comparison. Compare fields such as system_priority, system,
   port_priority, port_id, etc.

At present, the libteam code has implemented an approach that selects the
aggregator based on the highest system_priority/system from both local and
partner data.[1]

Looking forward to your thoughts.

Best regards,
Hangbin

[1] https://github.com/jpirko/libteam/blob/master/teamd/teamd_runner_lacp.c#L402

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
  2025-06-24  6:53 [Bonding Draft Proposal] Add lacp_prio Support for ad_select? Hangbin Liu
@ 2025-06-26 23:28 ` Jay Vosburgh
  2025-06-27  4:33   ` Hangbin Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Jay Vosburgh @ 2025-06-26 23:28 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jiri Pirko, netdev

Hangbin Liu <liuhangbin@gmail.com> wrote:

>Hi Jay,
>
>We have a customer setup involving two separate switches with identical
>L2/VLAN configurations. Each switch forms an independent aggregator
>(port-channel), and the end host connects to both with the same number of
>links and equivalent bandwidth.
>
>As a result, the host ends up with two aggregators under a single bond
>interface. Since the user cannot arbitrarily override port count or
>bandwidth, they are asking for a new mechanism, lacp_prio, to influence
>aggregator selection via ad_select.
>
>Do you think this is a reasonable addition?

	In principle, I don't see a reason not to use the system
priority, et al, to influence the aggregator selection when bonding ends
up with multiple aggregators.  I'm undecided as to whether it should be
a separate ad_select policy or a "tiebreaker," but a separate policy is
probably simpler to deal with.

>If yes, what would be the best way to compare priorities?
>
>1. Port Priority Only. Currently initialized to 0xff. We could add a parameter
>   allowing users to configure it.
>   a) Use the highest port priority within each aggregator for comparison
>   b) Sum all port priorities in each aggregator and compare the totals

	I'm not a fan of this, as explained below.

	Also, note that in LACP-land, when comparing priorities, the
higher priority is numerically smaller, which makes "add them up and
compare" a little counter intuitive to me.

>2. Full LACP Info Comparison. Compare fields such as system_priority, system,
>   port_priority, port_id, etc.

	I think it makes more sense to use the System ID (system
priority and aggregator MAC address) from the LAG ID of the local
aggregator.  In the bonding implementation, an aggregator is assigned a
MAC when an interface is added, so the only aggregators lacking a MAC
are ones that have no ports (which can't be active).

	If we want to use the partner System ID, that's a little more
complicated.  If aggregators in question both have LACP partners, then
the System IDs will be unique, since the MAC addresses will differ.  If
the aggregators don't have LACP partners, then they'll be individual
ports, and the partner information won't be available.

	Modulo the fact that bonding assigns a MAC to an aggregator
before the standard does (for the System ID), this is approximately
what's described in 802.1AX-2014 6.7.1, although the context there is
criteria for prioritizing between ports during selection for aggregation
when limited capabilities exist (i.e., 6 ports but only the ability to
accomodate 4 in an aggregrator).

	FWIW, the 802.1AX standard is pretty quiet on this whole
situation.  It recognises that "A System may contain multiple
Aggregators, serving multiple Aggregator Clients" (802.1AX-2014 6.2.1)
but doesn't have any verbiage that I can find on requirements for
choosing between multiple such Aggregators if only one can be active.  I
think the presumption in the standard is that the multiple aggregators
would or could be active simultaneously as independent entities.

	Anyway, the upshot is that we can pretty much choose as we see
fit for this particular case.

>At present, the libteam code has implemented an approach that selects the
>aggregator based on the highest system_priority/system from both local and
>partner data.[1]

	Just looking at libteam code, it's more complicated than that.
The documentation says "Aggregator with highest priority according to
LACP standard will be selected," and the code looks to be doing memcmp()
on

struct lacpdu_info {
	uint16_t		system_priority;
	uint8_t			system[ETH_ALEN]; /* ID */
	uint16_t		key;
	uint16_t		port_priority;
	uint16_t		port; /* ID */
	uint8_t			state;
} __attribute__((__packed__));

	which is essentially the local portion of a LAG ID per
802.1AX-2014 6.3.6.1, with the key and state set to zero for the
comparison.  Also, the function you reference, get_lacp_port_prio_info,
is choosing between the actor and partner information for reasons that
aren't explained in the code, but I suspect might to comply with the
statement in 6.3.6.1:

	"To simplify comparison of LAG IDs it is conventional to order
	these such that S is the numerically smaller of S and T."

	where S and T are System Identifiers (comprised of System
Priority and the MAC address), or perhaps 6.7.1, as described above.

	Anyway, without knowing exactly why the team function is doing
what it does, I'm not sure if that's the proper algorithm to use.  Jiri
is on Cc, maybe he remembers.

	-J

>Looking forward to your thoughts.
>
>Best regards,
>Hangbin
>
>[1] https://github.com/jpirko/libteam/blob/master/teamd/teamd_runner_lacp.c#L402

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
  2025-06-26 23:28 ` Jay Vosburgh
@ 2025-06-27  4:33   ` Hangbin Liu
  2025-07-01 17:08     ` Jay Vosburgh
  0 siblings, 1 reply; 6+ messages in thread
From: Hangbin Liu @ 2025-06-27  4:33 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jiri Pirko, netdev

On Thu, Jun 26, 2025 at 04:28:35PM -0700, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@gmail.com> wrote:
> 
> >Hi Jay,
> >
> >We have a customer setup involving two separate switches with identical
> >L2/VLAN configurations. Each switch forms an independent aggregator
> >(port-channel), and the end host connects to both with the same number of
> >links and equivalent bandwidth.
> >
> >As a result, the host ends up with two aggregators under a single bond
> >interface. Since the user cannot arbitrarily override port count or
> >bandwidth, they are asking for a new mechanism, lacp_prio, to influence
> >aggregator selection via ad_select.
> >
> >Do you think this is a reasonable addition?
> 
> 	In principle, I don't see a reason not to use the system
> priority, et al, to influence the aggregator selection when bonding ends
> up with multiple aggregators.  I'm undecided as to whether it should be
> a separate ad_select policy or a "tiebreaker," but a separate policy is
> probably simpler to deal with.

There is only one system priority in the bond, which means all aggregators
share the same system priority — right?

Or do you mean we should also take the partner's system priority into account?

> 
> >If yes, what would be the best way to compare priorities?
> >
> >1. Port Priority Only. Currently initialized to 0xff. We could add a parameter
> >   allowing users to configure it.
> >   a) Use the highest port priority within each aggregator for comparison
> >   b) Sum all port priorities in each aggregator and compare the totals
> 
> 	I'm not a fan of this, as explained below.
> 
> 	Also, note that in LACP-land, when comparing priorities, the
> higher priority is numerically smaller, which makes "add them up and
> compare" a little counter intuitive to me.

Yeah..

> 
> >2. Full LACP Info Comparison. Compare fields such as system_priority, system,
> >   port_priority, port_id, etc.
> 
> 	I think it makes more sense to use the System ID (system
> priority and aggregator MAC address) from the LAG ID of the local
> aggregator.  In the bonding implementation, an aggregator is assigned a
> MAC when an interface is added, so the only aggregators lacking a MAC
> are ones that have no ports (which can't be active).

Same question, the system priority and aggregator MAC address are all same
in the same bonding interface. So how can we prioritize between two
aggregators within the same bond?

Unless we take the partner's System ID into account. Which looks like, if
we want to choose a better aggregator in bond, we need to config the switch side...

> 
> 	If we want to use the partner System ID, that's a little more
> complicated.  If aggregators in question both have LACP partners, then
> the System IDs will be unique, since the MAC addresses will differ.  If
> the aggregators don't have LACP partners, then they'll be individual
> ports, and the partner information won't be available.

Can we active a aggregator that don't have LACP partner? If not, then
we don't need to compare that aggregator.

> 
> 	Modulo the fact that bonding assigns a MAC to an aggregator
> before the standard does (for the System ID), this is approximately
> what's described in 802.1AX-2014 6.7.1, although the context there is
> criteria for prioritizing between ports during selection for aggregation
> when limited capabilities exist (i.e., 6 ports but only the ability to
> accomodate 4 in an aggregrator).
> 
> 	FWIW, the 802.1AX standard is pretty quiet on this whole
> situation.  It recognises that "A System may contain multiple
> Aggregators, serving multiple Aggregator Clients" (802.1AX-2014 6.2.1)
> but doesn't have any verbiage that I can find on requirements for
> choosing between multiple such Aggregators if only one can be active.  I
> think the presumption in the standard is that the multiple aggregators
> would or could be active simultaneously as independent entities.
> 
> 	Anyway, the upshot is that we can pretty much choose as we see
> fit for this particular case.

Yes

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
  2025-06-27  4:33   ` Hangbin Liu
@ 2025-07-01 17:08     ` Jay Vosburgh
  2025-07-02  7:44       ` Hangbin Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Jay Vosburgh @ 2025-07-01 17:08 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jiri Pirko, netdev

Hangbin Liu <liuhangbin@gmail.com> wrote:

>On Thu, Jun 26, 2025 at 04:28:35PM -0700, Jay Vosburgh wrote:
>> Hangbin Liu <liuhangbin@gmail.com> wrote:
>> 
>> >Hi Jay,
>> >
>> >We have a customer setup involving two separate switches with identical
>> >L2/VLAN configurations. Each switch forms an independent aggregator
>> >(port-channel), and the end host connects to both with the same number of
>> >links and equivalent bandwidth.
>> >
>> >As a result, the host ends up with two aggregators under a single bond
>> >interface. Since the user cannot arbitrarily override port count or
>> >bandwidth, they are asking for a new mechanism, lacp_prio, to influence
>> >aggregator selection via ad_select.
>> >
>> >Do you think this is a reasonable addition?
>> 
>> 	In principle, I don't see a reason not to use the system
>> priority, et al, to influence the aggregator selection when bonding ends
>> up with multiple aggregators.  I'm undecided as to whether it should be
>> a separate ad_select policy or a "tiebreaker," but a separate policy is
>> probably simpler to deal with.
>
>There is only one system priority in the bond, which means all aggregators
>share the same system priority — right?
>
>Or do you mean we should also take the partner's system priority into account?

	Ok, this is what I get for reading the standard and not checking
the code carefully while I do it.

	That said, I still think that:

	(a) the system priority is the logical fit for this purpose, at
least from what the standard seems to intend, given that "System" there
means "Aggregation System," and,

	(b) the implementation we have (as well as few switches I
checked) is totally impractical for using system priority in this
manner, as all of them implement it as a global, or at least not on a
per-Aggregation System basis.

	Granted, the switches I have are pretty old, but even those that
require ports to be explicitly configured for particular channel groups
don't permit setting the system priority on a per-channel group (i.e.,
Aggregation System) basis.

>> >If yes, what would be the best way to compare priorities?
>> >
>> >1. Port Priority Only. Currently initialized to 0xff. We could add a parameter
>> >   allowing users to configure it.
>> >   a) Use the highest port priority within each aggregator for comparison
>> >   b) Sum all port priorities in each aggregator and compare the totals
>> 
>> 	I'm not a fan of this, as explained below.
>> 
>> 	Also, note that in LACP-land, when comparing priorities, the
>> higher priority is numerically smaller, which makes "add them up and
>> compare" a little counter intuitive to me.
>
>Yeah..
>
>> 
>> >2. Full LACP Info Comparison. Compare fields such as system_priority, system,
>> >   port_priority, port_id, etc.
>> 
>> 	I think it makes more sense to use the System ID (system
>> priority and aggregator MAC address) from the LAG ID of the local
>> aggregator.  In the bonding implementation, an aggregator is assigned a
>> MAC when an interface is added, so the only aggregators lacking a MAC
>> are ones that have no ports (which can't be active).
>
>Same question, the system priority and aggregator MAC address are all same
>in the same bonding interface. So how can we prioritize between two
>aggregators within the same bond?
>
>Unless we take the partner's System ID into account. Which looks like, if
>we want to choose a better aggregator in bond, we need to config the switch side...

	Again, moving on from my lack of paying complete attention,
looking at the teamd implementation, I think that's what it does:

static void get_lacp_port_prio_info(struct lacp_port *lacp_port,
				    struct lacpdu_info *prio_info)
{
	int prio_diff;
	int system_diff;

	prio_diff = ntohs(lacp_port->actor.system_priority) -
		    ntohs(lacp_port->partner.system_priority);
	system_diff = memcmp(lacp_port->actor.system,
			     lacp_port->partner.system, ETH_ALEN);
	if (prio_diff < 0 || (prio_diff == 0 && system_diff < 0))
		*prio_info = lacp_port->actor;
	if (prio_diff > 0 || (prio_diff == 0 && system_diff >= 0))
		*prio_info = lacp_port->partner;

	Right here, it chooses between returning the actor or partner
data based on comparisons of the actor and partner System Identifier,
returning the data for whichever is higher priority after comparing the
System Identifiers (which is the System Priority plus Partner System,
the latter of which is the MAC address, 802.1AX-2014 6.3.2).

	I would hazard to guess that Jiri did it this way for the same
reason we're having this conversation: there's not really a better way
without rearranging a lot of the innards of how configuration of this
stuff is done.

	It looks like lacp_find_new_agg_lead() runs though all of the
ports in all of the aggregators and chooses the aggregator with the
"best" port of all.

	One downside if we were to adapt this logic or something similar
to bonding is that there's currently no way to set the Port Priority of
interfaces in the bond.  There is a "prio" that can be set via ip set
... type bond_slave prio X, which is IFLA_BOND_SLAVE_PRIO, but that's a
failover priority, not the LACP Port Priority.

	So right now, if the above logic were put into bonding, the
local selection criteria would end up based entirely on the port number,
which isn't configurable, and so doesn't seem especially better than
what we have now.

>> 	If we want to use the partner System ID, that's a little more
>> complicated.  If aggregators in question both have LACP partners, then
>> the System IDs will be unique, since the MAC addresses will differ.  If
>> the aggregators don't have LACP partners, then they'll be individual
>> ports, and the partner information won't be available.
>
>Can we active a aggregator that don't have LACP partner? If not, then
>we don't need to compare that aggregator.

	Yes, we can.  If no ports have a LACP partner, then all of the
ports are "individual," which act like an aggregator with just one port.
If all ports in the bond are individual, then one is chosen to become
active.

	The rationale behind this is to permit LACP-unaware hosts or
things like PXE to be able to communicate when LACP is not up and
running on the host system.

>> 	Modulo the fact that bonding assigns a MAC to an aggregator
>> before the standard does (for the System ID), this is approximately
>> what's described in 802.1AX-2014 6.7.1, although the context there is
>> criteria for prioritizing between ports during selection for aggregation
>> when limited capabilities exist (i.e., 6 ports but only the ability to
>> accomodate 4 in an aggregrator).
>> 
>> 	FWIW, the 802.1AX standard is pretty quiet on this whole
>> situation.  It recognises that "A System may contain multiple
>> Aggregators, serving multiple Aggregator Clients" (802.1AX-2014 6.2.1)
>> but doesn't have any verbiage that I can find on requirements for
>> choosing between multiple such Aggregators if only one can be active.  I
>> think the presumption in the standard is that the multiple aggregators
>> would or could be active simultaneously as independent entities.
>> 
>> 	Anyway, the upshot is that we can pretty much choose as we see
>> fit for this particular case.
>
>Yes

	From the above, I suspect we'll have to add some additional
configuration parameters somewhere.  It would be nice if the System
Priority were configurable on a per-aggregator basis, but that seems
complicated from a UI perspective (other than something like a mapping
of agg ID to system prio).

	-J

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
  2025-07-01 17:08     ` Jay Vosburgh
@ 2025-07-02  7:44       ` Hangbin Liu
  2025-07-03 19:42         ` Jay Vosburgh
  0 siblings, 1 reply; 6+ messages in thread
From: Hangbin Liu @ 2025-07-02  7:44 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jiri Pirko, netdev

Hi Jay,
On Tue, Jul 01, 2025 at 10:08:56AM -0700, Jay Vosburgh wrote:
> 	It looks like lacp_find_new_agg_lead() runs though all of the
> ports in all of the aggregators and chooses the aggregator with the
> "best" port of all.

Yes, based on the ad_select policy.

> 
> 	One downside if we were to adapt this logic or something similar
> to bonding is that there's currently no way to set the Port Priority of
> interfaces in the bond.  There is a "prio" that can be set via ip set
> ... type bond_slave prio X, which is IFLA_BOND_SLAVE_PRIO, but that's a
> failover priority, not the LACP Port Priority.

How about adding a similar parameter, e.g., ad_actor_port_prio?
Currently, the actor port priority is initialized directly as 0xFF.
We could introduce a per-port ad_actor_port_prio to be used in the
ad_select policy.

I understand that, according to the IEEE standard, port priority is used to
select the best port among multiple ports within a single aggregator.
However, since the IEEE standard doesn't define how to select between two
aggregators, we could repurpose this value similarly to how the bandwidth
and count options work in the current ad_select policy.

> 
> 	So right now, if the above logic were put into bonding, the
> local selection criteria would end up based entirely on the port number,
> which isn't configurable, and so doesn't seem especially better than
> what we have now.

[...]
> 
> 	From the above, I suspect we'll have to add some additional
> configuration parameters somewhere.  It would be nice if the System
> Priority were configurable on a per-aggregator basis, but that seems
> complicated from a UI perspective (other than something like a mapping
> of agg ID to system prio).

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
  2025-07-02  7:44       ` Hangbin Liu
@ 2025-07-03 19:42         ` Jay Vosburgh
  0 siblings, 0 replies; 6+ messages in thread
From: Jay Vosburgh @ 2025-07-03 19:42 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jiri Pirko, netdev

Hangbin Liu <liuhangbin@gmail.com> wrote:

>Hi Jay,
>On Tue, Jul 01, 2025 at 10:08:56AM -0700, Jay Vosburgh wrote:
>> 	It looks like lacp_find_new_agg_lead() runs though all of the
>> ports in all of the aggregators and chooses the aggregator with the
>> "best" port of all.
>
>Yes, based on the ad_select policy.
>
>> 
>> 	One downside if we were to adapt this logic or something similar
>> to bonding is that there's currently no way to set the Port Priority of
>> interfaces in the bond.  There is a "prio" that can be set via ip set
>> ... type bond_slave prio X, which is IFLA_BOND_SLAVE_PRIO, but that's a
>> failover priority, not the LACP Port Priority.
>
>How about adding a similar parameter, e.g., ad_actor_port_prio?
>Currently, the actor port priority is initialized directly as 0xFF.
>We could introduce a per-port ad_actor_port_prio to be used in the
>ad_select policy.

	This is an easy choice, we should be able to adjust the Port
Priority on a per-interface basis.  At the moment, we don't use it for
anything, but that may change.

>I understand that, according to the IEEE standard, port priority is used to
>select the best port among multiple ports within a single aggregator.
>However, since the IEEE standard doesn't define how to select between two
>aggregators, we could repurpose this value similarly to how the bandwidth
>and count options work in the current ad_select policy.

	Well, one specific use is in Annex C.3 (of 802.1AX-2014), for
use in cases where an aggregator can accomodate N active ports, but
there are N+X ports that are available to that aggregator.  In this
case, the Port Priority can be used to select the "best" subset of the
available ports.  I recall there's some more detailed verbiage along
these same lines somewhere else in the standard, but it's eluding me at
the moment.

	But, yes, we could in principle say something like "the best
aggregator is the one with the higher total sum of Port Priorities
across its active ports."  I'm not sure right offhand if that's the best
algorithm, but I'm fairly sure it's legal from the standard's point of
view.

	-J

>> 
>> 	So right now, if the above logic were put into bonding, the
>> local selection criteria would end up based entirely on the port number,
>> which isn't configurable, and so doesn't seem especially better than
>> what we have now.
>
>[...]
>> 
>> 	From the above, I suspect we'll have to add some additional
>> configuration parameters somewhere.  It would be nice if the System
>> Priority were configurable on a per-aggregator basis, but that seems
>> complicated from a UI perspective (other than something like a mapping
>> of agg ID to system prio).
>
>Thanks
>Hangbin

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-03 19:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24  6:53 [Bonding Draft Proposal] Add lacp_prio Support for ad_select? Hangbin Liu
2025-06-26 23:28 ` Jay Vosburgh
2025-06-27  4:33   ` Hangbin Liu
2025-07-01 17:08     ` Jay Vosburgh
2025-07-02  7:44       ` Hangbin Liu
2025-07-03 19:42         ` Jay Vosburgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).