From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Aichun Li <liaichun@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, vfalico@gmail.com,
andy@greyhouse.net, netdev@vger.kernel.org, rose.chen@huawei.com,
moyufeng <moyufeng@huawei.com>
Subject: Re: [PATCH net v2]bonding: check port and aggregator when select
Date: Tue, 02 Feb 2021 12:02:09 -0800 [thread overview]
Message-ID: <19742.1612296129@famine> (raw)
In-Reply-To: <20210128082034.866-1-liaichun@huawei.com>
Aichun Li <liaichun@huawei.com> wrote:
>When the network service is repeatedly restarted in 802.3ad, there is a low
> probability that oops occurs.
>Test commands:systemctl restart network
>
>1.crash: __enable_port():port->slave is NULL
[...]
> PC: ffff000000e2fcd0 [ad_agg_selection_logic+328]
[...]
>2.I also have another call stack, same as in another person's post:
>https://lore.kernel.org/netdev/52630cba-cc60-a024-8dd0-8319e5245044@huawei.com/
What hardware platform are you using here?
moyufeng <moyufeng@huawei.com> appears to be using the same
platform, and I've not had any success so far with the provided script
to reproduce the issue. I'm using an x86_64 system, however, so I
wonder if perhaps this platform needs a barrier somewhere that x86 does
not, or there's something different in the timing of the device driver
close logic.
>Signed-off-by: Aichun Li <liaichun@huawei.com>
>---
> drivers/net/bonding/bond_3ad.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index aa001b16765a..9c8894631bdd 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -183,7 +183,7 @@ static inline void __enable_port(struct port *port)
> {
> struct slave *slave = port->slave;
>
>- if ((slave->link == BOND_LINK_UP) && bond_slave_is_up(slave))
>+ if (slave && (slave->link == BOND_LINK_UP) && bond_slave_is_up(slave))
> bond_set_slave_active_flags(slave, BOND_SLAVE_NOTIFY_LATER);
> }
This change seems like a band aid to cover the real problem.
The caller of __enable_port is ad_agg_selection_logic, and it shouldn't
be possible for port->slave to be NULL when assigned to an aggregator.
>@@ -1516,6 +1516,7 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> } else {
>+ port->aggregator = &(SLAVE_AD_INFO(slave)->aggregator);
> slave_err(bond->dev, port->slave->dev,
> "Port %d did not find a suitable aggregator\n",
> port->actor_port_number);
This change isn't correct; it's assigning the port to a more or
less random aggregator. This would eliminate the panic, but isn't doing
the right thing. At this point in the code, the selection logic has
failed to find an aggregator that matches, and also failed to find a
free aggregator.
I do need to fix up the failure handling here when it hits the
"did not find a suitable agg" case; the code here is simply wrong, and
has been wrong since the beginning. I'll hack the driver to induce this
situation rather than reproducing whatever problem is making it unable
to find a suitable aggregator.
-J
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2021-02-02 20:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-28 8:20 [PATCH net v2]bonding: check port and aggregator when select Aichun Li
2021-02-01 21:52 ` Jakub Kicinski
2021-02-02 20:02 ` Jay Vosburgh [this message]
-- strict thread matches above, loose matches on Subject: below --
2021-02-03 1:54 liaichun
2021-04-15 11:58 liaichun
2021-05-24 15:21 liaichun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19742.1612296129@famine \
--to=jay.vosburgh@canonical.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=liaichun@huawei.com \
--cc=moyufeng@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=rose.chen@huawei.com \
--cc=vfalico@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).