netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] bridge and friends: reduce TheLinuxWay(tm)
@ 2013-10-14 21:32 Jamal Hadi Salim
  2013-10-14 21:41 ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-14 21:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vlad Yasevich, netdev@vger.kernel.org

IOW, TheCutNpasteTrain.

There's a lot of clutter on the netlink interface used
by bridge and vxlan.
1) A lot of things which are boolean on/off end up using a uchar.
Example:
IFLA_BRPORT_MODE, IFLA_BRPORT_GUARD, IFLA_BRPORT_PROTECT,
IFLA_BRPORT_LEARNING, IFLA_BRPORT_UNICAST_FLOOD, BRIDGE_VLAN_INFO_PVID,
BRIDGE_VLAN_INFO_UNTAGGED, BRIDGE_MODE_VEPA,
IFLA_VXLAN_PROXY,IFLA_VXLAN_RSC, etc

2) There's a few fields which are basically intended to project the
same message to the kernel but are redefined a few times:
Example:
BRIDGE_VLAN_INFO_MASTER vs NTF_MASTER vs BRIDGE_FLAGS_MASTER

Also i am not sure why multicast snooping needs its own subheader.

Is it too late to make changes? git logs shows some of these feature
have only been on the last 2-3 months.
One approach to resolve this is introduce a new BRIDGE_FLAGS TLV
which will work like the ifi_flags/change and put all these flags in
one spot. New iproute will use these and the old one will continue using
the old approach. It would require to EOL the old interface at some
point.

If this is agreeable and it is not on someone todo list - I will send
patches later on.

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-14 21:32 [RFC] bridge and friends: reduce TheLinuxWay(tm) Jamal Hadi Salim
@ 2013-10-14 21:41 ` Stephen Hemminger
  2013-10-16 13:54   ` Jamal Hadi Salim
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2013-10-14 21:41 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Stephen Hemminger, Vlad Yasevich, netdev@vger.kernel.org

On Mon, 14 Oct 2013 17:32:05 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> IOW, TheCutNpasteTrain.
> 
> There's a lot of clutter on the netlink interface used
> by bridge and vxlan.
> 1) A lot of things which are boolean on/off end up using a uchar.
> Example:
> IFLA_BRPORT_MODE, IFLA_BRPORT_GUARD, IFLA_BRPORT_PROTECT,
> IFLA_BRPORT_LEARNING, IFLA_BRPORT_UNICAST_FLOOD, BRIDGE_VLAN_INFO_PVID,
> BRIDGE_VLAN_INFO_UNTAGGED, BRIDGE_MODE_VEPA,
> IFLA_VXLAN_PROXY,IFLA_VXLAN_RSC, etc
> 
> 2) There's a few fields which are basically intended to project the
> same message to the kernel but are redefined a few times:
> Example:
> BRIDGE_VLAN_INFO_MASTER vs NTF_MASTER vs BRIDGE_FLAGS_MASTER
> 
> Also i am not sure why multicast snooping needs its own subheader.
> 
> Is it too late to make changes? git logs shows some of these feature
> have only been on the last 2-3 months.
> One approach to resolve this is introduce a new BRIDGE_FLAGS TLV
> which will work like the ifi_flags/change and put all these flags in
> one spot. New iproute will use these and the old one will continue using
> the old approach. It would require to EOL the old interface at some
> point.

Unfortunately, by now this is all set in ABI.
It was a side effect of the per-feature evolutionary style of development.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-14 21:41 ` Stephen Hemminger
@ 2013-10-16 13:54   ` Jamal Hadi Salim
  2013-10-16 14:05     ` Jamal Hadi Salim
  2013-10-16 17:11     ` Vlad Yasevich
  0 siblings, 2 replies; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-16 13:54 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephen Hemminger, Vlad Yasevich, netdev@vger.kernel.org

On 10/14/13 17:41, Stephen Hemminger wrote:

> Unfortunately, by now this is all set in ABI.
> It was a side effect of the per-feature evolutionary style of development.

Sadly, I agree. This is the dark side of "have code will travel";
you let these things out in the wild, they breed and you cant
take them back.
BTW: I dont think what i suggested will be a harmful refactoring because
no existing interfaces are removed - your call.

In similar vein:
What is the motivation behind IFLA_EXT_MASK? Could you not have used
ifm ifindex to relay the interface of interest? Currently the ifindex is
not used at all. IMO, the following interfaces are useful:
- get attributes for all bridge ports (this is there)
- get attributes for bridge interface XXX; there using IFLA_EXT_MASK
I think it should be using ifm->ifindex
- get attributes for all bridge ports for bridge br-blah (not there)
you could also use the ifindex of br-blah here instead

Separate issue:
To provide equivalence to brctl:
- PF_BRIDGE should allow me to attach a bridge port to bridge of choice
with SETLINK

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 13:54   ` Jamal Hadi Salim
@ 2013-10-16 14:05     ` Jamal Hadi Salim
  2013-10-16 17:11     ` Vlad Yasevich
  1 sibling, 0 replies; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-16 14:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephen Hemminger, Vlad Yasevich, netdev@vger.kernel.org

On 10/16/13 09:54, Jamal Hadi Salim wrote:

> - get attributes for bridge interface XXX; there using IFLA_EXT_MASK
> I think it should be using ifm->ifindex


No this doesnt work. Filtering needs to be done in user space ;-<
IMO: The behavior here should be no different then when getting a link.
Will you accept a patch for this?

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 13:54   ` Jamal Hadi Salim
  2013-10-16 14:05     ` Jamal Hadi Salim
@ 2013-10-16 17:11     ` Vlad Yasevich
  2013-10-16 17:49       ` Jamal Hadi Salim
  1 sibling, 1 reply; 13+ messages in thread
From: Vlad Yasevich @ 2013-10-16 17:11 UTC (permalink / raw)
  To: Jamal Hadi Salim, Stephen Hemminger
  Cc: Stephen Hemminger, netdev@vger.kernel.org

On 10/16/2013 09:54 AM, Jamal Hadi Salim wrote:
> On 10/14/13 17:41, Stephen Hemminger wrote:
>
>> Unfortunately, by now this is all set in ABI.
>> It was a side effect of the per-feature evolutionary style of
>> development.
>
> Sadly, I agree. This is the dark side of "have code will travel";
> you let these things out in the wild, they breed and you cant
> take them back.
> BTW: I dont think what i suggested will be a harmful refactoring because
> no existing interfaces are removed - your call.
>
> In similar vein:
> What is the motivation behind IFLA_EXT_MASK?

This was to display or filter out virtual function data.

> Could you not have used
> ifm ifindex to relay the interface of interest? Currently the ifindex is
> not used at all. IMO, the following interfaces are useful:
> - get attributes for all bridge ports (this is there)
> - get attributes for bridge interface XXX; there using IFLA_EXT_MASK
> I think it should be using ifm->ifindex

I probably doesn't need to use this as we want the bridge data, not the 
VF data stored as part of PF interface.

> - get attributes for all bridge ports for bridge br-blah (not there)
> you could also use the ifindex of br-blah here instead

This would be usefull.


>
> Separate issue:
> To provide equivalence to brctl:
> - PF_BRIDGE should allow me to attach a bridge port to bridge of choice
> with SETLINK

You can already do this with:

	ip link set dev ethX master brX

I know, not very intuative, but it's there :(

-vlad



>
> cheers,
> jamal
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 17:11     ` Vlad Yasevich
@ 2013-10-16 17:49       ` Jamal Hadi Salim
  2013-10-16 18:35         ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-16 17:49 UTC (permalink / raw)
  To: vyasevic, Stephen Hemminger; +Cc: Stephen Hemminger, netdev@vger.kernel.org

On 10/16/13 13:11, Vlad Yasevich wrote:
> On 10/16/2013 09:54 AM, Jamal Hadi Salim wrote:
>
>
> This was to display or filter out virtual function data.
>

Stephen explained - and it seems to make sense (working around
netlink weaknesses). It could be made more generic.


> I probably doesn't need to use this as we want the bridge data, not the
> VF data stored as part of PF interface.
>

I think we need some filtering in the kernel. As it stands today,
unfortunately you get everything ;-> I was suprised at the amount of
data i get from the kernel these days when i ask for a simple netdev
info (I think no less than 1K per netdev; everything from /proc entries
from some bread crumbs i dont see any use for.

>> - get attributes for all bridge ports for bridge br-blah (not there)
>> you could also use the ifindex of br-blah here instead
>
> This would be usefull.
>

Its a usability improvement.
We still have the ifi->ifindex available. It could be used to signal
which bridge device's ports i want.
But then i get _all_ the ports for that bridge.

>
>
> You can already do this with:
>
>      ip link set dev ethX master brX
>

Yes, I know - it sucks from a usability pov.

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 17:49       ` Jamal Hadi Salim
@ 2013-10-16 18:35         ` Eric Dumazet
  2013-10-16 18:49           ` Stephen Hemminger
  2013-10-16 18:50           ` Vlad Yasevich
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2013-10-16 18:35 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: vyasevic, Stephen Hemminger, Stephen Hemminger,
	netdev@vger.kernel.org

On Wed, 2013-10-16 at 13:49 -0400, Jamal Hadi Salim wrote:

> I think we need some filtering in the kernel. As it stands today,
> unfortunately you get everything ;-> I was suprised at the amount of
> data i get from the kernel these days when i ask for a simple netdev
> info (I think no less than 1K per netdev; everything from /proc entries
> from some bread crumbs i dont see any use for.

By the way, "ip link show dev xxxx" seems to dump all devices info from
the kernel...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 18:35         ` Eric Dumazet
@ 2013-10-16 18:49           ` Stephen Hemminger
  2013-10-16 18:50           ` Vlad Yasevich
  1 sibling, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2013-10-16 18:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jamal Hadi Salim, vyasevic, Stephen Hemminger,
	netdev@vger.kernel.org

On Wed, 16 Oct 2013 11:35:50 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Wed, 2013-10-16 at 13:49 -0400, Jamal Hadi Salim wrote:
> 
> > I think we need some filtering in the kernel. As it stands today,
> > unfortunately you get everything ;-> I was suprised at the amount of
> > data i get from the kernel these days when i ask for a simple netdev
> > info (I think no less than 1K per netdev; everything from /proc entries
> > from some bread crumbs i dont see any use for.
> 
> By the way, "ip link show dev xxxx" seems to dump all devices info from
> the kernel...
> 
> 
As I remember.

The problem is that according to original netlink design there is supposed to
be a NLM_F_MATCH, but it seems it never got implemented right, and the legacy
of breaking applications keeps it from happening.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 18:35         ` Eric Dumazet
  2013-10-16 18:49           ` Stephen Hemminger
@ 2013-10-16 18:50           ` Vlad Yasevich
  2013-10-16 19:02             ` Jamal Hadi Salim
  1 sibling, 1 reply; 13+ messages in thread
From: Vlad Yasevich @ 2013-10-16 18:50 UTC (permalink / raw)
  To: Eric Dumazet, Jamal Hadi Salim
  Cc: Stephen Hemminger, Stephen Hemminger, netdev@vger.kernel.org

On 10/16/2013 02:35 PM, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 13:49 -0400, Jamal Hadi Salim wrote:
>
>> I think we need some filtering in the kernel. As it stands today,
>> unfortunately you get everything ;-> I was suprised at the amount of
>> data i get from the kernel these days when i ask for a simple netdev
>> info (I think no less than 1K per netdev; everything from /proc entries
>> from some bread crumbs i dont see any use for.
>
> By the way, "ip link show dev xxxx" seems to dump all devices info from
> the kernel...
>
>

Right.  ip link show is dumb.  It asks for info from all devices and
then filters based on the device you asked for, but the filtering
is done in iproute instead of the kernel.

brdige command inherited this through code re-use.

-vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 18:50           ` Vlad Yasevich
@ 2013-10-16 19:02             ` Jamal Hadi Salim
  2013-10-16 20:19               ` Jamal Hadi Salim
  0 siblings, 1 reply; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-16 19:02 UTC (permalink / raw)
  To: vyasevic, Eric Dumazet
  Cc: Stephen Hemminger, Stephen Hemminger, netdev@vger.kernel.org

On 10/16/13 14:50, Vlad Yasevich wrote:

> Right.  ip link show is dumb.  It asks for info from all devices and
> then filters based on the device you asked for, but the filtering
> is done in iproute instead of the kernel.
>

That maybe small flaw in iproute2 - but there's no issue in the kernel.
You can send an ifi and specify the proper ifindex of choice.
But you cant do the same with bridge.
Thats what i was whining to Stephen about. I get every
bridge port with no exception and there's no way to specify one
(the ifi ifindex is never used).
Think 20K bridge ports - each giving me 1K of data ...
scalability problem

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 19:02             ` Jamal Hadi Salim
@ 2013-10-16 20:19               ` Jamal Hadi Salim
  2013-10-16 20:22                 ` Vlad Yasevich
  0 siblings, 1 reply; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-16 20:19 UTC (permalink / raw)
  To: vyasevic, Eric Dumazet
  Cc: Stephen Hemminger, Stephen Hemminger, netdev@vger.kernel.org


And another little glitch, iflink specifies

     [IFLA_AF_SPEC] = {
         [AF_INET] = {
             [IFLA_INET_CONF] = ...,
         },
         [AF_INET6] = {
             [IFLA_INET6_FLAGS] = ...,
             [IFLA_INET6_CONF] = ...,
         }

But then bridge hijacks it and specifies:
   [IFLA_AF_SPEC] = {
       [IFLA_BRIDGE_FLAGS]
       [IFLA_BRIDGE_MODE]
       [IFLA_BRIDGE_VLAN_INFO]
   }

Was that intended as:

  [IFLA_AF_SPEC] = {
    [AF_BRIDGE] = {
        [IFLA_BRIDGE_FLAGS]
        [IFLA_BRIDGE_MODE]
        [IFLA_BRIDGE_VLAN_INFO]
    }
  }

Now granted that bridging uses PF_BRIDGE as the family and iflink uses
PF_UNSPEC - but i do think this one is polluting the namespace.

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 20:19               ` Jamal Hadi Salim
@ 2013-10-16 20:22                 ` Vlad Yasevich
  2013-10-18 10:42                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 13+ messages in thread
From: Vlad Yasevich @ 2013-10-16 20:22 UTC (permalink / raw)
  To: Jamal Hadi Salim, Eric Dumazet
  Cc: Stephen Hemminger, Stephen Hemminger, netdev@vger.kernel.org

On 10/16/2013 04:19 PM, Jamal Hadi Salim wrote:
>
> And another little glitch, iflink specifies
>
>      [IFLA_AF_SPEC] = {
>          [AF_INET] = {
>              [IFLA_INET_CONF] = ...,
>          },
>          [AF_INET6] = {
>              [IFLA_INET6_FLAGS] = ...,
>              [IFLA_INET6_CONF] = ...,
>          }
>
> But then bridge hijacks it and specifies:
>    [IFLA_AF_SPEC] = {
>        [IFLA_BRIDGE_FLAGS]
>        [IFLA_BRIDGE_MODE]
>        [IFLA_BRIDGE_VLAN_INFO]
>    }
>
> Was that intended as:
>
>   [IFLA_AF_SPEC] = {
>     [AF_BRIDGE] = {
>         [IFLA_BRIDGE_FLAGS]
>         [IFLA_BRIDGE_MODE]
>         [IFLA_BRIDGE_VLAN_INFO]
>     }
>   }
>
> Now granted that bridging uses PF_BRIDGE as the family and iflink uses
> PF_UNSPEC - but i do think this one is polluting the namespace.
>

Yes, I know about that one.  BRIDGE_FLAGS started the polution and then 
we continued it with the other attributes....

-vlad

> cheers,
> jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] bridge and friends: reduce TheLinuxWay(tm)
  2013-10-16 20:22                 ` Vlad Yasevich
@ 2013-10-18 10:42                   ` Jamal Hadi Salim
  0 siblings, 0 replies; 13+ messages in thread
From: Jamal Hadi Salim @ 2013-10-18 10:42 UTC (permalink / raw)
  To: vyasevic, Eric Dumazet
  Cc: Stephen Hemminger, Stephen Hemminger, netdev@vger.kernel.org

On 10/16/13 16:22, Vlad Yasevich wrote:

> Yes, I know about that one.  BRIDGE_FLAGS started the polution and then
> we continued it with the other attributes....
>

Understood - note the email has subject TheLinuxWay ;->
So i fear this one also falls under "it is already in the wild, cant do
anything about it".

BTW: the flooding and learning knobs are still not exposed to iproute2.

In regards to a GET always being translated to DUMP forcing user space
to filter:
If neither you nor Stephen planning to take of it, I will because it
is a scaling problem.  Please let me know.


cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-10-18 10:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-14 21:32 [RFC] bridge and friends: reduce TheLinuxWay(tm) Jamal Hadi Salim
2013-10-14 21:41 ` Stephen Hemminger
2013-10-16 13:54   ` Jamal Hadi Salim
2013-10-16 14:05     ` Jamal Hadi Salim
2013-10-16 17:11     ` Vlad Yasevich
2013-10-16 17:49       ` Jamal Hadi Salim
2013-10-16 18:35         ` Eric Dumazet
2013-10-16 18:49           ` Stephen Hemminger
2013-10-16 18:50           ` Vlad Yasevich
2013-10-16 19:02             ` Jamal Hadi Salim
2013-10-16 20:19               ` Jamal Hadi Salim
2013-10-16 20:22                 ` Vlad Yasevich
2013-10-18 10:42                   ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).