From: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
To: Jay Vosburgh <jv@jvosburgh.net>, Tonghao Zhang <tonghao@bamaicloud.com>
Cc: carlos.bilbao@kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, sforshee@kernel.org, bilbao@vt.edu
Subject: Re: [PATCH] bonding: Improve the accuracy of LACPDU transmissions
Date: Tue, 24 Jun 2025 17:33:38 -0500 [thread overview]
Message-ID: <f34cec6a-67f4-444c-8807-5fbf3c4335b7@gmail.com> (raw)
In-Reply-To: <32313811-2215-41c3-852f-8e257487dfb6@gmail.com>
On 6/24/25 17:27, Carlos Bilbao wrote:
> Hello both,
>
> On 6/24/25 16:15, Jay Vosburgh wrote:
>> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>>
>>>
>>>
>>>> 2025年6月19日 03:53,carlos.bilbao@kernel.org 写道:
>>>>
>>>> From: Carlos Bilbao <carlos.bilbao@kernel.org>
>>>>
>>>> Improve the timing accuracy of LACPDU transmissions in the bonding
>>>> 802.3ad
>>>> (LACP) driver. The current approach relies on a decrementing
>>>> counter to
>>>> limit the transmission rate. In our experience, this method is
>>>> susceptible
>>>> to delays (such as those caused by CPU contention or soft lockups)
>>>> which
>>>> can lead to accumulated drift in the LACPDU send interval. Over
>>>> time, this
>>>> drift can cause synchronization issues with the top-of-rack (ToR)
>>>> switch
>>>> managing the LAG, manifesting as lag map flapping. This in turn can
>>>> trigger
>>>> temporary interface removal and potential packet loss.
>> So, you're saying that contention or soft lockups are causing
>> the queue_delayed_work() of bond_3ad_state_machine_handler() to be
>> scheduled late, and, then, because the LACPDU TX limiter is based on the
>> number of state machine executions (which should be every 100ms), it is
>> then late sending LACPDUs?
>>
>> If the core problem is that the state machine as a whole isn't
>> running regularly, how is doing a clock-based time check reliable? Or
>> should I take the word "improve" from the Subject literally, and assume
>> it's making things "better" but not "totally perfect"?
>>
>> Is the sync issue with the TOR due to missing / delayed LACPDUs,
>> or is there more to it? At the fast LACP rate, the periodic timeout is
>> 3 seconds for a nominal 1 second LACPDU interval, which is fairly
>> generous.
Also, I didn’t ignore your comments -- I agree there’s a deeper issue at
the state machine level. I’m looking into it and will get back to y'all.
>>
>>>> This patch improves stability with a jiffies-based mechanism to
>>>> track and
>>>> enforce the minimum transmission interval; keeping track of when
>>>> the next
>>>> LACPDU should be sent.
>>>>
>>>> Suggested-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
>>>> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
>>>> ---
>>>> drivers/net/bonding/bond_3ad.c | 18 ++++++++----------
>>>> include/net/bond_3ad.h | 5 +----
>>>> 2 files changed, 9 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/bond_3ad.c
>>>> b/drivers/net/bonding/bond_3ad.c
>>>> index c6807e473ab7..47610697e4e5 100644
>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>> @@ -1375,10 +1375,12 @@ static void ad_churn_machine(struct port
>>>> *port)
>>>> */
>>>> static void ad_tx_machine(struct port *port)
>>>> {
>>>> - /* check if tx timer expired, to verify that we do not send more
>>>> than
>>>> - * 3 packets per second
>>>> - */
>>>> - if (port->sm_tx_timer_counter && !(--port->sm_tx_timer_counter)) {
>>>> + unsigned long now = jiffies;
>>>> +
>>>> + /* Check if enough time has passed since the last LACPDU sent */
>>>> + if (time_after_eq(now, port->sm_tx_next_jiffies)) {
>>>> + port->sm_tx_next_jiffies += ad_ticks_per_sec / AD_MAX_TX_IN_SECOND;
>>>> +
>>>> /* check if there is something to send */
>>>> if (port->ntt && (port->sm_vars & AD_PORT_LACP_ENABLED)) {
>>>> __update_lacpdu_from_port(port);
>>>> @@ -1395,10 +1397,6 @@ static void ad_tx_machine(struct port *port)
>>>> port->ntt = false;
>>>> }
>>>> }
>>>> - /* restart tx timer(to verify that we will not exceed
>>>> - * AD_MAX_TX_IN_SECOND
>>>> - */
>>>> - port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
>>>> }
>>>> }
>>>>
>>>> @@ -2199,9 +2197,9 @@ void bond_3ad_bind_slave(struct slave *slave)
>>>> /* actor system is the bond's system */
>>>> __ad_actor_update_port(port);
>>>> /* tx timer(to verify that no more than MAX_TX_IN_SECOND
>>>> - * lacpdu's are sent in one second)
>>>> + * lacpdu's are sent in the configured interval (1 or 30 secs))
>>>> */
>>>> - port->sm_tx_timer_counter = ad_ticks_per_sec/AD_MAX_TX_IN_SECOND;
>>>> + port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec /
>>>> AD_MAX_TX_IN_SECOND;
>>> If CONFIG_HZ is 1000, there is 1000 tick per second, but
>>> "ad_ticks_per_sec / AD_MAX_TX_IN_SECOND” == 10/3 == 3, so that means
>>> send lacp packets every 3 ticks ?
>> Agreed, I think the math is off here.
>>
>> ad_ticks_per_sec is 10, it's the number of times the entire LACP
>> state machine runs per second. It is unrelated to jiffies, and can't be
>> used directly with jiffy units (the duration of which varies depending
>> on what CONFIG_HZ is). I agree that it's confusingly similar to
>> ad_delta_in_ticks, which is measured in jiffy units.
>>
>> You'll probably want to use msecs_to_jiffies() somewhere.
>
>
> Thank you for taking the time to review my patch!
>
> I agree, the math is off here and I plan to fix that in v2. It shouldn't
> be:
>
> port->sm_tx_next_jiffies = jiffies + ad_ticks_per_sec /
> AD_MAX_TX_IN_SECOND;
> port->sm_tx_next_jiffies = jiffies + 3;
>
> ... instead, it should be:
>
> port->sm_tx_next_jiffies = jiffies + HZ / AD_MAX_TX_IN_SECOND;
>
> ... which, assuming CONFIG_HZ is 1000, it would be:
>
> port->sm_tx_next_jiffies = jiffies + 333;
>
> This math also works in terms of the units too:
>
> HZ = ticks / sec
> AD_MAX_X_IN_SECOND = packets / sec
>
> so:
>
> (ticks / sec) / (packets /sec) = ticks / packet
>
>
>>
>> How did you test this to insure the TX machine doesn't overrun
>> (i.e., exceed AD_MAX_TX_IN_SECOND LACPDU transmissions in one second)?
>>
>> -J
>>
>>>> __disable_port(port);
>>>>
>>>> diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h
>>>> index 2053cd8e788a..956d4cb45db1 100644
>>>> --- a/include/net/bond_3ad.h
>>>> +++ b/include/net/bond_3ad.h
>>>> @@ -231,10 +231,7 @@ typedef struct port {
>>>> mux_states_t sm_mux_state; /* state machine mux state */
>>>> u16 sm_mux_timer_counter; /* state machine mux timer counter */
>>>> tx_states_t sm_tx_state; /* state machine tx state */
>>>> - u16 sm_tx_timer_counter; /* state machine tx timer counter
>>>> - * (always on - enter to transmit
>>>> - * state 3 time per second)
>>>> - */
>>>> + unsigned long sm_tx_next_jiffies;/* expected jiffies for next
>>>> LACPDU sent */
>>>> u16 sm_churn_actor_timer_counter;
>>>> u16 sm_churn_partner_timer_counter;
>>>> u32 churn_actor_count;
>>>> --
>>>> 2.43.0
>>>>
>>>>
>>>>
>> ---
>> -Jay Vosburgh, jv@jvosburgh.net
>>
>
> Thanks,
>
> Carlos
>
Thanks,
Carlos
next prev parent reply other threads:[~2025-06-24 22:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 19:53 [PATCH] bonding: Improve the accuracy of LACPDU transmissions carlos.bilbao
2025-06-20 3:15 ` Tonghao Zhang
2025-06-24 21:15 ` Jay Vosburgh
2025-06-24 22:27 ` Carlos Bilbao
2025-06-24 22:33 ` Carlos Bilbao [this message]
2025-06-24 22:47 ` Seth Forshee
2025-06-25 16:06 ` Seth Forshee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f34cec6a-67f4-444c-8807-5fbf3c4335b7@gmail.com \
--to=carlos.bilbao.osdev@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=bilbao@vt.edu \
--cc=carlos.bilbao@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sforshee@kernel.org \
--cc=tonghao@bamaicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).