From: Hangbin Liu <liuhangbin@gmail.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: netdev@vger.kernel.org, Veaceslav Falico <vfalico@gmail.com>,
Andy Gospodarek <andy@greyhouse.net>,
Jiri Pirko <jiri@resnulli.us>,
Jonathan Toppins <jtoppins@redhat.com>
Subject: Re: [Draft PATCH net-next] Bonding: add missed_max option
Date: Mon, 1 Nov 2021 18:46:45 +0800 [thread overview]
Message-ID: <YX/FlQKJNea3c4/B@Laptop-X1> (raw)
In-Reply-To: <7320.1635545478@famine>
Hi Jay, Denis,
Thanks for the comments. Please see my replies below.
On Fri, Oct 29, 2021 at 03:11:18PM -0700, Jay Vosburgh wrote:
> >+missed_max
> >+
> >+ Maximum number of arp_interval for missed ARP replies.
> >+ If this number is exceeded, link is reported as down.
> >+
> >+ Note a small value means a strict time. e.g. missed_max is 1 means
> >+ the correct arp reply must be recived during the interval.
> >+
> >+ default 3
>
> I'd suggest "arp" in the option name to make the scope more
> obvious.
I didn't add arp prefix in purpose because I'd like to re-use this parameter
after adding bonding IPv6 NS/NA support. I will add this reason in the commit
description.
Or if you like to make a difference for ARP and IPv6 NS, I can add the arp_
prefix.
> >@@ -3224,7 +3224,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
> >
> > /* Backup slave is down if:
> > * - No current_arp_slave AND
> >- * - more than 3*delta since last receive AND
> >+ * - more than missed_max*delta since last receive AND
> > * - the bond has an IP address
> > *
> > * Note: a non-null current_arp_slave indicates
> >@@ -3236,20 +3236,20 @@ static int bond_ab_arp_inspect(struct bonding *bond)
> > */
> > if (!bond_is_active_slave(slave) &&
> > !rcu_access_pointer(bond->current_arp_slave) &&
> >- !bond_time_in_interval(bond, last_rx, 3)) {
> >+ !bond_time_in_interval(bond, last_rx, bond->params.missed_max)) {
> > bond_propose_link_state(slave, BOND_LINK_DOWN);
> > commit++;
> > }
> >
> > /* Active slave is down if:
> >- * - more than 2*delta since transmitting OR
> >- * - (more than 2*delta since receive AND
> >+ * - more than missed_max*delta since transmitting OR
> >+ * - (more than missed_max*delta since receive AND
> > * the bond has an IP address)
> > */
> > trans_start = dev_trans_start(slave->dev);
> > if (bond_is_active_slave(slave) &&
> >- (!bond_time_in_interval(bond, trans_start, 2) ||
> >- !bond_time_in_interval(bond, last_rx, 2))) {
> >+ (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
> >+ !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) {
> > bond_propose_link_state(slave, BOND_LINK_DOWN);
> > commit++;
> > }
>
> The above two changes make the backup and active logic both
> switch to using the missed_max value (i.e., both set to the same value),
> when previously these two cases used differing values (2 for active, 3
> for backup).
>
> Historically, these intervals were staggered deliberately; an
> old comment removed by b2220cad583c9b states:
>
> if ((slave != bond->curr_active_slave) &&
> (!bond->current_arp_slave) &&
> (time_after_eq(jiffies, slave_last_rx(bond, slave) + 3*delta_in_ticks))) {
> /* a backup slave has gone down; three times
> * the delta allows the current slave to be
> * taken out before the backup slave.
>
> I think it would be prudent to insure that having the active and
> backup timeouts set in lockstep does not result in an undesirable change
> of behavior.
Yes, I'm also a little concern about this. What about make the backup slave
timeout 1 plus delta then active slave. i.e. for backup slave
bond_time_in_interval(bond, last_rx, bond->params.missed_max + 1)
> >@@ -270,6 +272,15 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> > .values = bond_intmax_tbl,
> > .set = bond_option_arp_interval_set
> > },
> >+ [BOND_OPT_MISSED_MAX] = {
> >+ .id = BOND_OPT_MISSED_MAX,
> >+ .name = "missed_max",
> >+ .desc = "Maximum number of missed ARP interval",
> >+ .unsuppmodes = BIT(BOND_MODE_8023AD) | BIT(BOND_MODE_TLB) |
> >+ BIT(BOND_MODE_ALB),
> >+ .values = bond_intmax_tbl,
>
> This allows missed_max to be set to 0; is that intended to be a
> valid setting?
In bond_time_in_interval() there is an arp_interval/2 extra time. Do you think
if this is enough for fast network when we set missed_max to 0?
Of course in the very fast network the value should be at lease 1 in case
the ARP send/recv time frame is same.
Thanks
Hangbin
prev parent reply other threads:[~2021-11-01 10:46 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-29 6:55 [Draft PATCH net-next] Bonding: add missed_max option Hangbin Liu
2021-10-29 13:45 ` Denis Kirjanov
2021-10-29 22:11 ` Jay Vosburgh
2021-11-01 10:46 ` Hangbin Liu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YX/FlQKJNea3c4/B@Laptop-X1 \
--to=liuhangbin@gmail.com \
--cc=andy@greyhouse.net \
--cc=jay.vosburgh@canonical.com \
--cc=jiri@resnulli.us \
--cc=jtoppins@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=vfalico@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox