All of lore.kernel.org
 help / color / mirror / Atom feed
From: roopa <roopa@cumulusnetworks.com>
To: Scott Feldman <sfeldma@gmail.com>
Cc: "Jiří Pírko" <jiri@resnulli.us>,
	"Nikolay Aleksandrov" <nikolay@cumulusnetworks.com>,
	Netdev <netdev@vger.kernel.org>,
	bridge@lists.linux-foundation.org,
	"Wilson Kok" <wkok@cumulusnetworks.com>,
	"David Miller" <davem@davemloft.net>
Subject: Re: [Bridge] [PATCH net-next] bridge: skip fdb add if the port shouldn't learn
Date: Tue, 02 Jun 2015 10:14:41 -0700	[thread overview]
Message-ID: <556DE481.5030201@cumulusnetworks.com> (raw)
In-Reply-To: <CAE4R7bDtY1jDcLnvEMx-y4U=SVvf867SPZc3vRjfrUYBA8W=0g@mail.gmail.com>

On 5/27/15, 9:01 AM, Scott Feldman wrote:
> On Wed, May 27, 2015 at 1:35 AM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> On Wed, May 27, 2015 at 9:59 AM, Scott Feldman <sfeldma@gmail.com> wrote:
>>> On Wed, May 27, 2015 at 12:05 AM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>> On Tue, May 26, 2015 at 7:28 PM, Stephen Hemminger
>>>> <stephen@networkplumber.org> wrote:
>>>>> On Thu, 21 May 2015 03:42:57 -0700
>>>>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>>>>
>>>>>> From: Wilson Kok <wkok@cumulusnetworks.com>
>>>>>>
>>>>>> Check in fdb_add_entry() if the source port should learn, similar
>>>>>> check is used in br_fdb_update.
>>>>>> Note that new fdb entries which are added manually or
>>>>>> as local ones are still permitted.
>>>>>> This patch has been tested by running traffic via a bridge port and
>>>>>> switching the port's state, also by manually adding/removing entries
>>>>>> from the bridge's fdb.
>>>>>>
>>>>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>>>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>> What is the problem this is trying to solve?
>>>>>
>>>>> I think user should be allowed to manually add any entry
>>>>> even if learning.
>>>> Hi Stephen,
>>>> I have been thinking about the use case and have discussed it
>>>> internally with colleagues and the patch
>>>> author, the main problem is when there's an external software that
>>>> adds dynamic entries (learning) and
>>>> it could experience a race condition, here's a possible situation:
>>>> * external software learns a mac from hw, sends an add to kernel
>>>>   * right before that, port goes blocking (or down) and kernel flushes
>>>>     mac, sends notification about the state change and mac flush
>>>>   * right after that, kernel gets the previous add from external software, it's
>>>>     allowed to add, and then sends an add notification
>>>>   * mean while, external software processes the link block/down and mac flush,
>>>>     followed by the mac add from kernel.  At this point, external software can't
>>>>     really know whether it's a user adding the mac intentionally or it's
>>>>     a race.
>>>>
>>>> This issue can't really be avoided in user-space.
>>>> As I've noted local and static entries are still allowed, and iproute2
>>>> bridge utility always
>>>> marks the entries as static (NUD_NOARP), this only affects external
>>>> dynamic entries which
>>>> are usually sent by something that does the learning externally.
>>>> I'll keep digging to see if there's another way to go about this since
>>>> I'd like to give the user
>>>> full freedom. Personally I don't have strong feeling for this patch
>>>> and if it's not preferred then
>>>> I'll post a revert.
>>> So there is already a switchdev API to add/del an externally learned
>>> FDB entry which holds rtnl_lock and avoids these races.  I would
>>> suggest using that and revert this patch.
>>>
>>> See call_switchdev_notifiers(SWITCHDEV_FDB_ADD|SWITCHDEV_FDB_DEL) and
>>> the handler in br.c:br_switchdev_event().
>>>
>>> -scott
>> Hmm, I'm new to the switchdev API and am possibly missing something,
>> but how do you suggest to use it here ?
> You need to call
> call_switchdev_notifiers(SWITCHDEV_FDB_ADD|SWITCHDEV_FDB_DEL) when the
> device learns a new mac/vlan on the port interface.
>
>> How can we differentiate between user-added entry and an externally
>> learned one ?
> Externally added ones will be marked with NTF_EXT_LEARNED set in
> ndm->ndm_flags in the netlink echo.  Manually added ones from the user
> will have ndm->ndm_state set to NUD_NOARP in the netlink echo.
>
>> Do you mean to use (for example) the NTF_EXT_LEARNED flag when adding
>> an entry from user-space so
>> the API can get called in br_fdb_add ?
> No.  br_fdb_add is the bridge's .ndo_fdb_add handler called when user
> manually adds an FDB entry using netlink RTM_NEWNEIGH.  For externally
> learned entries, use the internal
> call_switchdev_notifiers(SWITCHDEV_FDB_ADD|SWITCHDEV_FDB_DEL).

scott, I am assuming you are ok with an external learning entity (user 
space driver or a controller) pushing entries
with the NTF_EXT_LEARNED correct ?. Because NTF_EXT_LEARNED is in uapi 
(and analogous to RTNH_F_OFFLOAD in the fib offload world IMO)

WARNING: multiple messages have this Message-ID (diff)
From: roopa <roopa@cumulusnetworks.com>
To: Scott Feldman <sfeldma@gmail.com>
Cc: "Jiří Pírko" <jiri@resnulli.us>,
	"Nikolay Aleksandrov" <nikolay@cumulusnetworks.com>,
	Netdev <netdev@vger.kernel.org>,
	bridge@lists.linux-foundation.org,
	"Wilson Kok" <wkok@cumulusnetworks.com>,
	"David Miller" <davem@davemloft.net>
Subject: Re: [PATCH net-next] bridge: skip fdb add if the port shouldn't learn
Date: Tue, 02 Jun 2015 10:14:41 -0700	[thread overview]
Message-ID: <556DE481.5030201@cumulusnetworks.com> (raw)
In-Reply-To: <CAE4R7bDtY1jDcLnvEMx-y4U=SVvf867SPZc3vRjfrUYBA8W=0g@mail.gmail.com>

On 5/27/15, 9:01 AM, Scott Feldman wrote:
> On Wed, May 27, 2015 at 1:35 AM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> On Wed, May 27, 2015 at 9:59 AM, Scott Feldman <sfeldma@gmail.com> wrote:
>>> On Wed, May 27, 2015 at 12:05 AM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>> On Tue, May 26, 2015 at 7:28 PM, Stephen Hemminger
>>>> <stephen@networkplumber.org> wrote:
>>>>> On Thu, 21 May 2015 03:42:57 -0700
>>>>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>>>>
>>>>>> From: Wilson Kok <wkok@cumulusnetworks.com>
>>>>>>
>>>>>> Check in fdb_add_entry() if the source port should learn, similar
>>>>>> check is used in br_fdb_update.
>>>>>> Note that new fdb entries which are added manually or
>>>>>> as local ones are still permitted.
>>>>>> This patch has been tested by running traffic via a bridge port and
>>>>>> switching the port's state, also by manually adding/removing entries
>>>>>> from the bridge's fdb.
>>>>>>
>>>>>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
>>>>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>> What is the problem this is trying to solve?
>>>>>
>>>>> I think user should be allowed to manually add any entry
>>>>> even if learning.
>>>> Hi Stephen,
>>>> I have been thinking about the use case and have discussed it
>>>> internally with colleagues and the patch
>>>> author, the main problem is when there's an external software that
>>>> adds dynamic entries (learning) and
>>>> it could experience a race condition, here's a possible situation:
>>>> * external software learns a mac from hw, sends an add to kernel
>>>>   * right before that, port goes blocking (or down) and kernel flushes
>>>>     mac, sends notification about the state change and mac flush
>>>>   * right after that, kernel gets the previous add from external software, it's
>>>>     allowed to add, and then sends an add notification
>>>>   * mean while, external software processes the link block/down and mac flush,
>>>>     followed by the mac add from kernel.  At this point, external software can't
>>>>     really know whether it's a user adding the mac intentionally or it's
>>>>     a race.
>>>>
>>>> This issue can't really be avoided in user-space.
>>>> As I've noted local and static entries are still allowed, and iproute2
>>>> bridge utility always
>>>> marks the entries as static (NUD_NOARP), this only affects external
>>>> dynamic entries which
>>>> are usually sent by something that does the learning externally.
>>>> I'll keep digging to see if there's another way to go about this since
>>>> I'd like to give the user
>>>> full freedom. Personally I don't have strong feeling for this patch
>>>> and if it's not preferred then
>>>> I'll post a revert.
>>> So there is already a switchdev API to add/del an externally learned
>>> FDB entry which holds rtnl_lock and avoids these races.  I would
>>> suggest using that and revert this patch.
>>>
>>> See call_switchdev_notifiers(SWITCHDEV_FDB_ADD|SWITCHDEV_FDB_DEL) and
>>> the handler in br.c:br_switchdev_event().
>>>
>>> -scott
>> Hmm, I'm new to the switchdev API and am possibly missing something,
>> but how do you suggest to use it here ?
> You need to call
> call_switchdev_notifiers(SWITCHDEV_FDB_ADD|SWITCHDEV_FDB_DEL) when the
> device learns a new mac/vlan on the port interface.
>
>> How can we differentiate between user-added entry and an externally
>> learned one ?
> Externally added ones will be marked with NTF_EXT_LEARNED set in
> ndm->ndm_flags in the netlink echo.  Manually added ones from the user
> will have ndm->ndm_state set to NUD_NOARP in the netlink echo.
>
>> Do you mean to use (for example) the NTF_EXT_LEARNED flag when adding
>> an entry from user-space so
>> the API can get called in br_fdb_add ?
> No.  br_fdb_add is the bridge's .ndo_fdb_add handler called when user
> manually adds an FDB entry using netlink RTM_NEWNEIGH.  For externally
> learned entries, use the internal
> call_switchdev_notifiers(SWITCHDEV_FDB_ADD|SWITCHDEV_FDB_DEL).

scott, I am assuming you are ok with an external learning entity (user 
space driver or a controller) pushing entries
with the NTF_EXT_LEARNED correct ?. Because NTF_EXT_LEARNED is in uapi 
(and analogous to RTNH_F_OFFLOAD in the fib offload world IMO)

  parent reply	other threads:[~2015-06-02 17:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 10:42 [Bridge] [PATCH net-next] bridge: skip fdb add if the port shouldn't learn Nikolay Aleksandrov
2015-05-21 10:42 ` Nikolay Aleksandrov
2015-05-25  2:59 ` [Bridge] " David Miller
2015-05-25  2:59   ` David Miller
2015-05-25 11:35   ` [Bridge] " Nikolay Aleksandrov
2015-05-25 11:35     ` Nikolay Aleksandrov
2015-05-25 11:41   ` [Bridge] " Nikolay Aleksandrov
2015-05-25 11:41     ` Nikolay Aleksandrov
2015-05-25 13:39 ` [Bridge] [PATCH net-next v2] " Nikolay Aleksandrov
2015-05-25 13:39   ` Nikolay Aleksandrov
2015-05-26 17:28 ` [Bridge] [PATCH net-next] " Stephen Hemminger
2015-05-26 17:28   ` Stephen Hemminger
2015-05-27  7:05   ` [Bridge] " Nikolay Aleksandrov
2015-05-27  7:05     ` Nikolay Aleksandrov
2015-05-27  7:59     ` [Bridge] " Scott Feldman
2015-05-27  7:59       ` Scott Feldman
2015-05-27  8:35       ` [Bridge] " Nikolay Aleksandrov
2015-05-27  8:35         ` Nikolay Aleksandrov
2015-05-27 16:01         ` [Bridge] " Scott Feldman
2015-05-27 16:01           ` Scott Feldman
2015-05-27 16:14           ` [Bridge] " Nikolay Aleksandrov
2015-05-27 16:14             ` Nikolay Aleksandrov
2015-05-27 20:41             ` [Bridge] " Scott Feldman
2015-05-27 20:41               ` Scott Feldman
2015-06-02 17:14           ` roopa [this message]
2015-06-02 17:14             ` roopa
2015-06-03  5:57             ` [Bridge] " Scott Feldman
2015-06-03  5:57               ` Scott Feldman
2015-06-04  8:14               ` [Bridge] " Nikolay Aleksandrov
2015-06-04  8:14                 ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=556DE481.5030201@cumulusnetworks.com \
    --to=roopa@cumulusnetworks.com \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=sfeldma@gmail.com \
    --cc=wkok@cumulusnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.