From: John Fastabend <john.fastabend@gmail.com>
To: Scott Feldman <sfeldma@gmail.com>, Jiri Pirko <jiri@resnulli.us>
Cc: Netdev <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Ido Schimmel <idosch@mellanox.com>, Elad Raz <eladr@mellanox.com>,
Florian Fainelli <f.fainelli@gmail.com>,
Guenter Roeck <linux@roeck-us.net>,
Vivien Didelot <vivien.didelot@savoirfairelinux.com>,
"andrew@lunn.ch" <andrew@lunn.ch>,
David Laight <David.Laight@aculab.com>,
"stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
Date: Fri, 16 Oct 2015 10:16:30 -0700 [thread overview]
Message-ID: <562130EE.3040000@gmail.com> (raw)
In-Reply-To: <CAE4R7bDLSOTutnJ65nkFbHF-gC6euGTFwjuK2f-AXfjCZuEqqw@mail.gmail.com>
On 15-10-16 09:20 AM, Scott Feldman wrote:
> On Fri, Oct 16, 2015 at 1:23 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastabend@gmail.com wrote:
>>> On 15-10-14 10:40 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>> or if he has to defer the att_set processing for later.
>>>>
>>>> This also allows drivers to sleep inside attr_set and report operation
>>>> status back to switchdev core. Switchdev core then warns if status is
>>>> not ok, instead of silent errors happening in drivers.
>>>>
>>>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>>>
>>>
>>> A nit but the patch description should note your setting the defer bit
>>> on the bridge set state.
>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>> include/net/switchdev.h | 1 +
>>>> net/bridge/br_stp.c | 3 +-
>>>> net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>>> 3 files changed, 46 insertions(+), 66 deletions(-)
>>>>
>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index d1c7f90..f7de6f8 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,7 @@
>>>>
>>>> #define SWITCHDEV_F_NO_RECURSE BIT(0)
>>>> #define SWITCHDEV_F_SKIP_EOPNOTSUPP BIT(1)
>>>> +#define SWITCHDEV_F_DEFER BIT(2)
>>>>
>>>> struct switchdev_trans_item {
>>>> struct list_head list;
>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>> index db6d243de..80c34d7 100644
>>>> --- a/net/bridge/br_stp.c
>>>> +++ b/net/bridge/br_stp.c
>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>> {
>>>> struct switchdev_attr attr = {
>>>> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>> + .flags = SWITCHDEV_F_DEFER,
>>>> .u.stp_state = state,
>>>> };
>>>
>>>
>>> This creates a possible race (with 6/8) I think, please check!
>>
>> Wait. This patch does not change the previous behaviour. Patch 6 does,
>> so I don't understand why you are asking here. Confusing.
>>
>>
>>>
>>> In del_nbp() we call br_stp_disable_port() to set the port state
>>> to BR_STATE_DISABLE and disabling learning events. But with this
>>> patch it can be deferred. Also note the STP agent may be in userspace
>>> which actually seems more likely the case because you likely want to
>>> run some more modern variant of STP than the kernel supports.
>>>
>>> So at some point in the future the driver will turn off learning. At
>>> the same time we call br_fdb_delete_by_port which calls a deferred
>>> set of fdb deletes.
>>>
>>> I don't see how you guarantee learning is off before you start doing
>>> the deletes here and possibly learning new addresses after the software
>>> side believes the port is down.
>>>
>>> So
>>>
>>> br_stp_disable_port
>>> br_fdb_delete_by_port
>>> {fdb_del_external_learn}
>>> [hw learns a fdb]
>>> [hw disables learning]
>>>
>>> What stops this from happening?
>>
>> Okay. This behaviour is the same as without the patchset. What would
>> resolve the issue it to put switchdev_deferred_process() after
>> br_stp_disable_port() and before br_fdb_delete_by_port() call.
>> That would enforce stp change to happen in hw before fdbs are explicitly
>> deleted. Sound good to you?
>
> Doesn't HW already see things in the right order since items are
> dequeued from the deferred list in the order queued?
>
I just noticed there is the rtnl lock around the deferred process so
this should mean only a single deferred task is run at a time. OK sort
of a big hammer though.
+static void switchdev_deferred_process_work(struct work_struct *work)
+{
+ rtnl_lock();
+ switchdev_deferred_process();
+ rtnl_unlock();
+}
But I don't see how this block of code works,
spin_lock_bh(&br->lock);
br_stp_disable_port(p);
spin_unlock_bh(&br->lock);
br_ifinfo_notify(RTM_DELLINK, p);
list_del_rcu(&p->list);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();
If you defer disabling learning then setup a bunch of deletes also
deferred, what guarantees that learning was disabled before you setup
the deferred fdb delete list?
It seems you may call rocker_port_fdb_learn_work() after the
br_fdb_delete_by_port call above but before the deferred learn disable
call which would have disabled it. Also the rocker_port_fdb_learn_work
queue is on a lw->work and doesn't use the same locking?
Its sort of an interesting hieararchy of work queues being built up
here. I'm being a bit overly paranoid at this point because I've seen
some really ugly bugs around these types of things that are difficult
to spot because they are correctness bugs instead of hard crashes.
.John
next prev parent reply other threads:[~2015-10-16 17:16 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 1/8] switchdev: introduce switchdev deferred ops infrastructure Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls Jiri Pirko
2015-10-14 19:01 ` Vivien Didelot
2015-10-14 17:40 ` [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
2015-10-15 4:34 ` Scott Feldman
2015-10-15 5:57 ` Jiri Pirko
2015-10-15 15:21 ` John Fastabend
2015-10-16 8:23 ` Jiri Pirko
2015-10-16 16:20 ` Scott Feldman
2015-10-16 17:16 ` John Fastabend [this message]
2015-10-17 2:11 ` John Fastabend
2015-10-14 17:40 ` [patch net-next v5 4/8] switchdev: remove pointers from switchdev objects Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
2015-10-19 7:55 ` Scott Feldman
2015-10-19 8:24 ` Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 6/8] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 7/8] rocker: remove nowait from switchdev callbacks Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 8/8] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
2015-10-15 13:10 ` [patch net-next v5 0/8] switchdev: change locking David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=562130EE.3040000@gmail.com \
--to=john.fastabend@gmail.com \
--cc=David.Laight@aculab.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=eladr@mellanox.com \
--cc=f.fainelli@gmail.com \
--cc=idosch@mellanox.com \
--cc=jiri@resnulli.us \
--cc=linux@roeck-us.net \
--cc=netdev@vger.kernel.org \
--cc=sfeldma@gmail.com \
--cc=stephen@networkplumber.org \
--cc=vivien.didelot@savoirfairelinux.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).