From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: David Miller <davem@davemloft.net>
Cc: jinyiting@huawei.com, vfalico@gmail.com, andy@greyhouse.net,
kuba@kernel.org, netdev@vger.kernel.org, security@kernel.org,
linux-kernel@vger.kernel.org, xuhanbing@huawei.com,
wangxiaogang3@huawei.com
Subject: Re: [PATCH] bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine
Date: Mon, 26 Apr 2021 12:29:22 -0700 [thread overview]
Message-ID: <31539.1619465362@famine> (raw)
In-Reply-To: <20210426.120822.232032630973964712.davem@davemloft.net>
David Miller <davem@davemloft.net> wrote:
>From: Jay Vosburgh <jay.vosburgh@canonical.com>
>Date: Mon, 26 Apr 2021 08:22:37 -0700
>
>> David Miller <davem@davemloft.net> wrote:
>>
>>>From: jinyiting <jinyiting@huawei.com>
>>>Date: Wed, 21 Apr 2021 16:38:21 +0800
>>>
>>>> The bond works in mode 4, and performs down/up operations on the bond
>>>> that is normally negotiated. The probability of bond-> slave_arr is NULL
>>>>
>>>> Test commands:
>>>> ifconfig bond1 down
>>>> ifconfig bond1 up
>>>>
>>>> The conflict occurs in the following process:
>>>>
>>>> __dev_open (CPU A)
>>>> --bond_open
>>>> --queue_delayed_work(bond->wq,&bond->ad_work,0);
>>>> --bond_update_slave_arr
>>>> --bond_3ad_get_active_agg_info
>>>>
>>>> ad_work(CPU B)
>>>> --bond_3ad_state_machine_handler
>>>> --ad_agg_selection_logic
>>>>
>>>> ad_work runs on cpu B. In the function ad_agg_selection_logic, all
>>>> agg->is_active will be cleared. Before the new active aggregator is
>>>> selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
>>>> bond->slave_arr will be set to NULL. The best aggregator in
>>>> ad_agg_selection_logic has not changed, no need to update slave arr.
>>>>
>>>> The conflict occurred in that ad_agg_selection_logic clears
>>>> agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr
>>>> is inspecting agg->is_active outside the lock.
>>>>
>>>> Also, bond_update_slave_arr is normal for potential sleep when
>>>> allocating memory, so replace the WARN_ON with a call to might_sleep.
>>>>
>>>> Signed-off-by: jinyiting <jinyiting@huawei.com>
>>>> ---
>>>>
>>>> Previous versions:
>>>> * https://lore.kernel.org/netdev/612b5e32-ea11-428e-0c17-e2977185f045@huawei.com/
>>>>
>>>> drivers/net/bonding/bond_main.c | 7 ++++---
>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>> index 74cbbb2..83ef62d 100644
>>>> --- a/drivers/net/bonding/bond_main.c
>>>> +++ b/drivers/net/bonding/bond_main.c
>>>> @@ -4406,7 +4404,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>>>> if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>>>> struct ad_info ad_info;
>>>>
>>>> + spin_lock_bh(&bond->mode_lock);
>>>
>>>The code paths that call this function with mode_lock held will now deadlock.
>>
>> No path should be calling bond_update_slave_arr with mode_lock
>> already held (it expects RTNL only); did you find one?
>>
>> My concern is that there's something else that does the opposite
>> order, i.e., mode_lock first, then RTNL, but I haven't found an example.
>>
>
>This patch is removing a lockdep assertion masking sure that mode_lock was held
>when this function was called. That should have been triggering all the time, right?
The line in question is:
#ifdef CONFIG_LOCKDEP
WARN_ON(lockdep_is_held(&bond->mode_lock));
#endif
The WARN_ON is triggering if mode_lock is held, not asserting
that mode_lock is held. I think that's wrong anyway, since mode_lock
could be held by some other thread, leading to false positives, thus the
change to might_sleep.
-J
---
-Jay Vosburgh, jay.vosburgh@canonical.com
prev parent reply other threads:[~2021-04-26 19:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-21 8:38 [PATCH] bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine jinyiting
2021-04-23 20:07 ` David Miller
2021-04-26 15:22 ` Jay Vosburgh
2021-04-26 19:08 ` David Miller
2021-04-26 19:29 ` Jay Vosburgh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=31539.1619465362@famine \
--to=jay.vosburgh@canonical.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=jinyiting@huawei.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=security@kernel.org \
--cc=vfalico@gmail.com \
--cc=wangxiaogang3@huawei.com \
--cc=xuhanbing@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.