All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.r.fastabend@intel.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	bhutchings@solarflare.com, roprabhu@cisco.com,
	netdev@vger.kernel.org, mst@redhat.com, chrisw@redhat.com,
	davem@davemloft.net, gregory.v.rose@intel.com,
	kvm@vger.kernel.org, sri@us.ibm.com, kernel@wantstofly.org
Subject: Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
Date: Wed, 29 Feb 2012 10:19:48 -0800	[thread overview]
Message-ID: <4F4E6C44.9070502@intel.com> (raw)
In-Reply-To: <20120229095204.48885405@nehalam.linuxnetplumber.net>

On 2/29/2012 9:52 AM, Stephen Hemminger wrote:
> On Wed, 29 Feb 2012 09:25:56 -0800
> John Fastabend <john.r.fastabend@intel.com> wrote:
> 
>> On 2/29/2012 5:56 AM, Jamal Hadi Salim wrote:
>>> On Tue, 2012-02-28 at 20:40 -0800, John Fastabend wrote:
>>>
>>>> OK back to this. The last piece is where to put these messages...
>>>> we could take PF_ROUTE:RTM_*NEIGH
>>>>
>>>>      PF_ROUTE:RTM_NEWNEIGH - Add a new FDB entry to an offloaded
>>>>                              switch.
>>>>      PF_ROUTE:RTM_DELNEIGH - Delete a FDB entry from an offlaoded
>>>>                              switch.
>>>>      PF_ROUTE:RTM_GETNEIGH - Dumps the embedded FDB table
>>>>
>>>
>>> Why RTM_*NEIGH? RTM tends to map to Route/L3 and NEIGH tends to map
>>> to ndisc or ARP both tied to IP address resolution. While both ARP/Ndisc
>>> may play a role in the user space app populating the FDB, i dont think
>>> they are necessary players.
>>> Learning could be via a table entry miss and packet redirect to user
>>> space.
>>> So my suggestion is to use FDB_*ENTRY for names
>>>  
>>
>> Well I think NETLINK_ROUTE is the most correct type to use in this
>> case. Per netlink.h its for routing and device hooks.
>>
>> #define NETLINK_ROUTE           0       /* Routing/device hook                          */
>>
>> And NETLINK_ROUTE msg_types use the RTM_* prefix. The _*NEIGH postfix
>> were merely a copy from the SW BRIDGE code paths. How about,
>>
>> PF_BRIDGE:RTM_FDB_NEWENTRY
>> PF_BRIDGE:RTM_FDB_DELENTRY
>> PF_BRIDGE:RTM_FDB_GETENTRY
>>
>> And a new group RTNLGRP_FDB. Also using NETLINK_ROUTE gives the correct
>> rtnl locking semantics for free.
>>
>>>> The neighbor code is using the PF_UNSPEC protocol type so we won't
>>>> collide with these unless someone was using PF_ROUTE and relying on
>>>> falling back to PF_UNSPEC however I couldn't find any programs that
>>>> did this iproute2 certainly doesn't. And the bridge pieces are using
>>>> PF_BRIDGE so no collision there.
>>>
>>> They have to be different calls from the calls that talk to the s/ware
>>> bridge. In my opinion, as controversial as this may sound, you need to
>>> be flexible enough that some vendor can replace these calls with
>>> proprietary calls which are more efficient for their hardware. So a
>>> "plugin" to replace these calls in the user space code would be a 
>>> good idea. Alternatively, you could make that something they do at
>>> the driver level i.e from user space to kernel it is "hardware, please
>>> addthistotheFDBtable()" call and the implementation of that could be
>>> proprietary to the specific hardware.
>>>
>>
>> Agreed. I think adding some ndo_ops for bridging offloads here would
>> work. For example the DSA infrastructure and/or macvlan devices might
>> need this. Along the lines of extending this RFC,
>>
>> [RFC] hardware bridging support for DSA switches
>> http://patchwork.ozlabs.org/patch/16578/
> 
> I want to see a unified API so that user space control applications (RSTP, TRILL?)
> can use one set of netlink calls for both software bridge and hardware offloaded
> bridges.  Does this proposal meet that requirement?
> 

With the patches I sent out last night the same netlink calls are used
for both SW and HW with a flag set in ndm_flags to indicate it is a hardware
entry. The flag is needed when a port has offload support and is also
a slave of a SW bridge. Another option would be to apply the command to both
hardware and software tables. This might be good enough and user space would
not have to make distinctions between HW and SW bridges. Also helps with my
original use case where I want the SW and HW bridge FDBs to be in sync.

In response to Jamal's comment I proposed changing the type to RTM_FDB_XXXENTRY
but the message contents are the same in both cases.

Jamal, so why do "They have to be different calls"? I'm not so sure anymore...
moving to RTM_FDB_XXXENTRY saved some refactoring in the bridge module but that
is just cosmetic.

Thanks,
John

  reply	other threads:[~2012-02-29 18:19 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-09  3:22 [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware John Fastabend
2012-02-09  3:22 ` [RFC PATCH v0 2/2] ixgbe: add NETIF_F_HW_FDB to supported flags John Fastabend
2012-02-09  4:36 ` [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware Stephen Hemminger
2012-02-09 17:36   ` John Fastabend
2012-02-09 17:40     ` Stephen Hemminger
2012-02-09 17:52       ` John Fastabend
2012-02-09 21:11         ` jamal
2012-02-10  2:14           ` John Fastabend
2012-02-10  4:14             ` John Fastabend
2012-02-10 15:18               ` jamal
2012-02-10 16:39                 ` Stephen Hemminger
2012-02-13 13:54                   ` jamal
2012-02-13 15:13                 ` John Fastabend
2012-02-14 13:18                   ` jamal
2012-02-14 18:57                     ` John Fastabend
2012-02-14 19:05                       ` Stephen Hemminger
2012-02-14 19:08                         ` John Fastabend
2012-02-15 14:10                       ` Jamal Hadi Salim
2012-02-16  1:26                         ` John Fastabend
2012-02-17 14:28                           ` jamal
2012-02-17 17:10                             ` John Fastabend
2012-02-18 12:41                               ` jamal
2012-02-29  4:40                                 ` John Fastabend
2012-02-29  5:14                                   ` John Fastabend
2012-02-29 13:57                                     ` Jamal Hadi Salim
2012-02-29 13:56                                   ` Jamal Hadi Salim
2012-02-29 17:25                                     ` John Fastabend
2012-02-29 17:52                                       ` Stephen Hemminger
2012-02-29 18:19                                         ` John Fastabend [this message]
2012-03-01 13:36                                           ` Jamal Hadi Salim
2012-03-01 22:17                                             ` John Fastabend
2012-03-02 13:20                                               ` jamal
2012-03-05 17:00                                             ` Lennert Buytenhek
2012-03-01 13:24                                       ` Jamal Hadi Salim
2012-03-01 14:14                                       ` Michael S. Tsirkin
2012-03-01 22:10                                         ` John Fastabend
2012-03-05 16:53                                   ` Lennert Buytenhek
2012-03-06  3:45                                     ` John Fastabend
2012-03-06 14:15                                       ` Lennert Buytenhek
2012-03-06 13:42                                     ` jamal
2012-03-06 14:09                                       ` Lennert Buytenhek
2012-03-07 14:11                                         ` Jamal Hadi Salim
2012-03-12  8:48                                           ` Lennert Buytenhek
2012-03-13 13:52                                             ` Jamal Hadi Salim
2012-02-16  3:58                 ` Ben Hutchings
2012-02-16 19:18                   ` Shradha Shah
2012-02-17 14:37                   ` jamal
2012-02-10 13:45     ` Roopa Prabhu
2012-02-09 18:14 ` Sridhar Samudrala
2012-02-09 20:30   ` John Fastabend
2012-02-10  0:39     ` Sridhar Samudrala
2012-02-10  0:51       ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4E6C44.9070502@intel.com \
    --to=john.r.fastabend@intel.com \
    --cc=bhutchings@solarflare.com \
    --cc=chrisw@redhat.com \
    --cc=davem@davemloft.net \
    --cc=gregory.v.rose@intel.com \
    --cc=jhs@mojatatu.com \
    --cc=kernel@wantstofly.org \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=roprabhu@cisco.com \
    --cc=shemminger@vyatta.com \
    --cc=sri@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.