Distributed Switch Architecture(DSA)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Distributed Switch Architecture(DSA)
@ 2010-06-18  7:06 Joakim Tjernlund
  2010-06-18  7:33 ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-18  7:06 UTC (permalink / raw)
  To: Lennert Buytenhek, netdev

I am trying to wrap my head around DSA and I need some help.

Assume the example from Lennert:

		 +-----------+       +-----------+
		 |           | RGMII |           |
		 |           +-------+           +------ 1000baseT MDI ("WAN")
		 |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
		 |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
		 |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
		 |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
		 |           |       |           |
		 +-----------+       +-----------+

If I understand this correctly I get at least 5 virtual I/Fs corresponding
to WAN, LAN1-4, but how is the RGMII I/F modelled?
I guess I will have one "real" ethX I/F which maps to RGMII but do I get one
virtual I/F too?
What use are these virtual I/Fs? Just to read status from the corresponding
ports? Can one TX and RX network pkgs over these I/Fs too?

Now I want to add STP/RSTP to the switch. How would one do that?

 Jocke

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18  7:06 Distributed Switch Architecture(DSA) Joakim Tjernlund
@ 2010-06-18  7:33 ` Lennert Buytenhek
  2010-06-18  9:15   ` Joakim Tjernlund
  0 siblings, 1 reply; 14+ messages in thread
From: Lennert Buytenhek @ 2010-06-18  7:33 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Fri, Jun 18, 2010 at 09:06:52AM +0200, Joakim Tjernlund wrote:

> I am trying to wrap my head around DSA and I need some help.
> 
> Assume the example from Lennert:
> 
> 		 +-----------+       +-----------+
> 		 |           | RGMII |           |
> 		 |           +-------+           +------ 1000baseT MDI ("WAN")
> 		 |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
> 		 |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
> 		 |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
> 		 |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
> 		 |           |       |           |
> 		 +-----------+       +-----------+
> 
> If I understand this correctly I get at least 5 virtual I/Fs corresponding
> to WAN, LAN1-4, but how is the RGMII I/F modelled?

The RGMII interface is just the interface that your "real" network
driver exports.  In the case of the Kirkwood 6281 A0 Reference Design
(which I developed this code on), that would be eth0.  After the DSA
driver is instantiated, you don't send or receive over eth0 directly
anymore -- eth0 becomes purely a transport for DSA-tagged packets.

> I guess I will have one "real" ethX I/F which maps to RGMII but do I
> get one virtual I/F too?

You get a virtual interface for each of the ports on the switch (that
are not CPU or inter-switch ports), i.e. all ports on the right of the
diagram -- wan, lan1, lan2, lan3, lan4.  These interfaces are created
by net/dsa/slave.c and are called DSA interfaces or slave interfaces.

> What use are these virtual I/Fs? Just to read status from the
> corresponding ports?

That's one of the purposes, yes.  There's a polling routine that
periodically checks the status of each of the ports on the switch (via
the MII management interface) and feeds back that status to the virtual
interfaces.  I.e. if you plug a cable into lan3, you'll see a syslog
message about the link on the virtual interface lan3 having come up,
with the link speed, etc.

> Can one TX and RX network pkgs over these I/Fs too?

Sure -- that's the whole point.

> Now I want to add STP/RSTP to the switch. How would one do that?

First, you'll want the hardware bridging patches that I posted to
netdev@ a while back, e.g.:

	http://patchwork.ozlabs.org/patch/16578/

They aren't in upstream-mergeable form in their current form, but they
do the job.  These will propagate brctl addif/delif calls into the switch
chip, so that switching between those ports will be done in hardware.

Now if all you want is regular STP, with that patch you'll be done --
the ->bridge_set_stp_state() hook propagates the spanning tree state of
each of the DSA virtual interfaces into the switch chip automatically.

If you want to use a userspace STP implementation, you'll just have to
make sure that STP state (listening/learning/blocking/forwarding/etc) is
correctly propagated to the switch chip similarly to how it's done in the
patch.

(Ideally, these patches should be reworked to receive bridge configuration
and port status changes via netlink.  Unfortunately, I was asked to return
all my Marvell hardware when I left Marvell, so someone else will have to
do this work.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18  7:33 ` Lennert Buytenhek
@ 2010-06-18  9:15   ` Joakim Tjernlund
  2010-06-18  9:59     ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-18  9:15 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev

Lennert Buytenhek <buytenh@wantstofly.org> wrote on 2010/06/18 09:33:09:
>
> On Fri, Jun 18, 2010 at 09:06:52AM +0200, Joakim Tjernlund wrote:
>
> > I am trying to wrap my head around DSA and I need some help.
> >
> > Assume the example from Lennert:
> >
> >        +-----------+       +-----------+
> >        |           | RGMII |           |
> >        |           +-------+           +------ 1000baseT MDI ("WAN")
> >        |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
> >        |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
> >        |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
> >        |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
> >        |           |       |           |
> >        +-----------+       +-----------+
> >
> > If I understand this correctly I get at least 5 virtual I/Fs corresponding
> > to WAN, LAN1-4, but how is the RGMII I/F modelled?
>
> The RGMII interface is just the interface that your "real" network
> driver exports.  In the case of the Kirkwood 6281 A0 Reference Design
> (which I developed this code on), that would be eth0.  After the DSA
> driver is instantiated, you don't send or receive over eth0 directly
> anymore -- eth0 becomes purely a transport for DSA-tagged packets.

hmm, but how do I send normal pkgs form the CPU to the switch then?
I envision I would get some interface in the CPU I can set an IP address
on and use as a normal I/F which would be switched by the HW switch to
the appropriate port.

>
>
> > I guess I will have one "real" ethX I/F which maps to RGMII but do I
> > get one virtual I/F too?
>
> You get a virtual interface for each of the ports on the switch (that
> are not CPU or inter-switch ports), i.e. all ports on the right of the
> diagram -- wan, lan1, lan2, lan3, lan4.  These interfaces are created
> by net/dsa/slave.c and are called DSA interfaces or slave interfaces.
>
>
> > What use are these virtual I/Fs? Just to read status from the
> > corresponding ports?
>
> That's one of the purposes, yes.  There's a polling routine that
> periodically checks the status of each of the ports on the switch (via
> the MII management interface) and feeds back that status to the virtual
> interfaces.  I.e. if you plug a cable into lan3, you'll see a syslog
> message about the link on the virtual interface lan3 having come up,
> with the link speed, etc.
>
>
> > Can one TX and RX network pkgs over these I/Fs too?
>
> Sure -- that's the whole point.

TX:ing pkgs on such virtual I/F would go directly to the port, bypassing
normal switching?
What about RX? What decides which pkg to route through the switch and
which pgk to send up to the virtual I/F?

>
>
> > Now I want to add STP/RSTP to the switch. How would one do that?
>
> First, you'll want the hardware bridging patches that I posted to
> netdev@ a while back, e.g.:
>
>    http://patchwork.ozlabs.org/patch/16578/

I see, will have to study this a bit closer. One question though,
does this disable MAC learning in the linux bridge?

Do you have any idea how to do DSA on a Broadcom switch?
The control plane is an attached with PCI and has a big
user space lib/apps to manage the switch.

>
> They aren't in upstream-mergeable form in their current form, but they
> do the job.  These will propagate brctl addif/delif calls into the switch
> chip, so that switching between those ports will be done in hardware.
>
> Now if all you want is regular STP, with that patch you'll be done --
> the ->bridge_set_stp_state() hook propagates the spanning tree state of
> each of the DSA virtual interfaces into the switch chip automatically.
>
> If you want to use a userspace STP implementation, you'll just have to
> make sure that STP state (listening/learning/blocking/forwarding/etc) is
> correctly propagated to the switch chip similarly to how it's done in the
> patch.
>
> (Ideally, these patches should be reworked to receive bridge configuration
> and port status changes via netlink.  Unfortunately, I was asked to return
> all my Marvell hardware when I left Marvell, so someone else will have to
> do this work.)
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18  9:15   ` Joakim Tjernlund
@ 2010-06-18  9:59     ` Lennert Buytenhek
  2010-06-18 11:09       ` Joakim Tjernlund
  0 siblings, 1 reply; 14+ messages in thread
From: Lennert Buytenhek @ 2010-06-18  9:59 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Fri, Jun 18, 2010 at 11:15:09AM +0200, Joakim Tjernlund wrote:

> > > I am trying to wrap my head around DSA and I need some help.
> > >
> > > Assume the example from Lennert:
> > >
> > >        +-----------+       +-----------+
> > >        |           | RGMII |           |
> > >        |           +-------+           +------ 1000baseT MDI ("WAN")
> > >        |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
> > >        |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
> > >        |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
> > >        |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
> > >        |           |       |           |
> > >        +-----------+       +-----------+
> > >
> > > If I understand this correctly I get at least 5 virtual I/Fs corresponding
> > > to WAN, LAN1-4, but how is the RGMII I/F modelled?
> >
> > The RGMII interface is just the interface that your "real" network
> > driver exports.  In the case of the Kirkwood 6281 A0 Reference Design
> > (which I developed this code on), that would be eth0.  After the DSA
> > driver is instantiated, you don't send or receive over eth0 directly
> > anymore -- eth0 becomes purely a transport for DSA-tagged packets.
> 
> hmm, but how do I send normal pkgs form the CPU to the switch then?

Define what you mean by 'normal pkgs'.


> I envision I would get some interface in the CPU I can set an IP address
> on and use as a normal I/F which would be switched by the HW switch to
> the appropriate port.

Yes, these are the DSA/slave interfaces created by net/dsa/slave.c.
You are free to attach IP addresses to the wan/lanX interfaces, and
things will work as you'd expect them to.


> > > I guess I will have one "real" ethX I/F which maps to RGMII but do I
> > > get one virtual I/F too?
> >
> > You get a virtual interface for each of the ports on the switch (that
> > are not CPU or inter-switch ports), i.e. all ports on the right of the
> > diagram -- wan, lan1, lan2, lan3, lan4.  These interfaces are created
> > by net/dsa/slave.c and are called DSA interfaces or slave interfaces.
> >
> >
> > > What use are these virtual I/Fs? Just to read status from the
> > > corresponding ports?
> >
> > That's one of the purposes, yes.  There's a polling routine that
> > periodically checks the status of each of the ports on the switch (via
> > the MII management interface) and feeds back that status to the virtual
> > interfaces.  I.e. if you plug a cable into lan3, you'll see a syslog
> > message about the link on the virtual interface lan3 having come up,
> > with the link speed, etc.
> >
> >
> > > Can one TX and RX network pkgs over these I/Fs too?
> >
> > Sure -- that's the whole point.
> 
> TX:ing pkgs on such virtual I/F would go directly to the port, bypassing
> normal switching?

Define what you mean by 'normal switching'.


> What about RX? What decides which pkg to route through the switch and
> which pgk to send up to the virtual I/F?

By default, which is until you enable bridging on some subset of the
ports, all ports have their own address database, and all received
packets are passed directly up to the CPU, where the DSA code will
then make those packets be received on the DSA slave interfaces.


> > > Now I want to add STP/RSTP to the switch. How would one do that?
> >
> > First, you'll want the hardware bridging patches that I posted to
> > netdev@ a while back, e.g.:
> >
> >    http://patchwork.ozlabs.org/patch/16578/
> 
> I see, will have to study this a bit closer. One question though,
> does this disable MAC learning in the linux bridge?

No, why should it?


> Do you have any idea how to do DSA on a Broadcom switch?

I have no idea.  When I originally submitted the DSA code for merging,
I contacted Broadcom people about adding support for Broadcom switch
chips to it, but I never heard back from them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18  9:59     ` Lennert Buytenhek
@ 2010-06-18 11:09       ` Joakim Tjernlund
  2010-06-18 12:12         ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-18 11:09 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev

Lennert Buytenhek <buytenh@wantstofly.org> wrote on 2010/06/18 11:59:23:
>
> On Fri, Jun 18, 2010 at 11:15:09AM +0200, Joakim Tjernlund wrote:
>
> > > > I am trying to wrap my head around DSA and I need some help.
> > > >
> > > > Assume the example from Lennert:
> > > >
> > > >        +-----------+       +-----------+
> > > >        |           | RGMII |           |
> > > >        |           +-------+           +------ 1000baseT MDI ("WAN")
> > > >        |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
> > > >        |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
> > > >        |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
> > > >        |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
> > > >        |           |       |           |
> > > >        +-----------+       +-----------+
> > > >
> > > > If I understand this correctly I get at least 5 virtual I/Fs corresponding
> > > > to WAN, LAN1-4, but how is the RGMII I/F modelled?
> > >
> > > The RGMII interface is just the interface that your "real" network
> > > driver exports.  In the case of the Kirkwood 6281 A0 Reference Design
> > > (which I developed this code on), that would be eth0.  After the DSA
> > > driver is instantiated, you don't send or receive over eth0 directly
> > > anymore -- eth0 becomes purely a transport for DSA-tagged packets.
> >
> > hmm, but how do I send normal pkgs form the CPU to the switch then?
>
> Define what you mean by 'normal pkgs'.

An ethernet broadcast pkg flooded onto all ports.
A normal ethernet host DST address would be looked up by
the switch HW and sent to the appropriate port.

>
>
> > I envision I would get some interface in the CPU I can set an IP address
> > on and use as a normal I/F which would be switched by the HW switch to
> > the appropriate port.
>
> Yes, these are the DSA/slave interfaces created by net/dsa/slave.c.
> You are free to attach IP addresses to the wan/lanX interfaces, and
> things will work as you'd expect them to.

Not sure what to expect here actually.

>
>
> > > > I guess I will have one "real" ethX I/F which maps to RGMII but do I
> > > > get one virtual I/F too?
> > >
> > > You get a virtual interface for each of the ports on the switch (that
> > > are not CPU or inter-switch ports), i.e. all ports on the right of the
> > > diagram -- wan, lan1, lan2, lan3, lan4.  These interfaces are created
> > > by net/dsa/slave.c and are called DSA interfaces or slave interfaces.
> > >
> > >
> > > > What use are these virtual I/Fs? Just to read status from the
> > > > corresponding ports?
> > >
> > > That's one of the purposes, yes.  There's a polling routine that
> > > periodically checks the status of each of the ports on the switch (via
> > > the MII management interface) and feeds back that status to the virtual
> > > interfaces.  I.e. if you plug a cable into lan3, you'll see a syslog
> > > message about the link on the virtual interface lan3 having come up,
> > > with the link speed, etc.
> > >
> > >
> > > > Can one TX and RX network pkgs over these I/Fs too?
> > >
> > > Sure -- that's the whole point.
> >
> > TX:ing pkgs on such virtual I/F would go directly to the port, bypassing
> > normal switching?
>
> Define what you mean by 'normal switching'.
>
>
> > What about RX? What decides which pkg to route through the switch and
> > which pgk to send up to the virtual I/F?
>
> By default, which is until you enable bridging on some subset of the
> ports, all ports have their own address database, and all received
> packets are passed directly up to the CPU, where the DSA code will
> then make those packets be received on the DSA slave interfaces.

ah, so until I enable bridging, all ports are viewed as a separate
network I/F?
Once I create a linux bridge device and add the virtual I/Fs, one
enables the bridge function.
One drawback with that is that you kill the bridge when you reboot
linux.

>
>
> > > > Now I want to add STP/RSTP to the switch. How would one do that?
> > >
> > > First, you'll want the hardware bridging patches that I posted to
> > > netdev@ a while back, e.g.:
> > >
> > >    http://patchwork.ozlabs.org/patch/16578/
> >
> > I see, will have to study this a bit closer. One question though,
> > does this disable MAC learning in the linux bridge?
>
> No, why should it?

Doesn't the HW switch handle all MAC leaning? Why duplicate
this in the SW bridge?
I figured the HW switch would offload the SW bridge this task.

>
>
> > Do you have any idea how to do DSA on a Broadcom switch?
>
> I have no idea.  When I originally submitted the DSA code for merging,
> I contacted Broadcom people about adding support for Broadcom switch
> chips to it, but I never heard back from them.

OK. With DSA, how does one configure VLANs, policing and parameters in the
HW switch that don't map or exist in the linux bridge?

 Jocke


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18 11:09       ` Joakim Tjernlund
@ 2010-06-18 12:12         ` Lennert Buytenhek
  2010-06-18 15:13           ` Joakim Tjernlund
  0 siblings, 1 reply; 14+ messages in thread
From: Lennert Buytenhek @ 2010-06-18 12:12 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Fri, Jun 18, 2010 at 01:09:32PM +0200, Joakim Tjernlund wrote:

> > > > > I am trying to wrap my head around DSA and I need some help.
> > > > >
> > > > > Assume the example from Lennert:
> > > > >
> > > > >        +-----------+       +-----------+
> > > > >        |           | RGMII |           |
> > > > >        |           +-------+           +------ 1000baseT MDI ("WAN")
> > > > >        |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
> > > > >        |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
> > > > >        |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
> > > > >        |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
> > > > >        |           |       |           |
> > > > >        +-----------+       +-----------+
> > > > >
> > > > > If I understand this correctly I get at least 5 virtual I/Fs corresponding
> > > > > to WAN, LAN1-4, but how is the RGMII I/F modelled?
> > > >
> > > > The RGMII interface is just the interface that your "real" network
> > > > driver exports.  In the case of the Kirkwood 6281 A0 Reference Design
> > > > (which I developed this code on), that would be eth0.  After the DSA
> > > > driver is instantiated, you don't send or receive over eth0 directly
> > > > anymore -- eth0 becomes purely a transport for DSA-tagged packets.
> > >
> > > hmm, but how do I send normal pkgs form the CPU to the switch then?
> >
> > Define what you mean by 'normal pkgs'.
> 
> An ethernet broadcast pkg flooded onto all ports.

This statement assumes that all ports have been configured into a
bridge, which is not the default case.  (And why would it be?  Having each
port in the same VLAN/subnet is only one of the many possible ways of
configuring your switch ports -- and regular (non-DSA) Linux network
interfaces aren't bridged together by default either.)  I.e. after boot,
each of the switch ports behaves as if it's independent.

> A normal ethernet host DST address would be looked up by
> the switch HW and sent to the appropriate port.

In current upstream kernels, if you in fact bridge all switch ports
together using Linux bridging, this address lookup will be done by the
Linux bridging code.

> > > I envision I would get some interface in the CPU I can set an IP address
> > > on and use as a normal I/F which would be switched by the HW switch to
> > > the appropriate port.
> >
> > Yes, these are the DSA/slave interfaces created by net/dsa/slave.c.
> > You are free to attach IP addresses to the wan/lanX interfaces, and
> > things will work as you'd expect them to.
> 
> Not sure what to expect here actually.

That the DSA interfaces will behave just like non-DSA Linux network
interfaces.

> > > What about RX? What decides which pkg to route through the switch and
> > > which pgk to send up to the virtual I/F?
> >
> > By default, which is until you enable bridging on some subset of the
> > ports, all ports have their own address database, and all received
> > packets are passed directly up to the CPU, where the DSA code will
> > then make those packets be received on the DSA slave interfaces.
> 
> ah, so until I enable bridging, all ports are viewed as a separate
> network I/F?

Yes.  The original DSA commit message says as much:

    The switch driver presents each port on the switch as a separate
    network interface to Linux, [...]

> Once I create a linux bridge device and add the virtual I/Fs, one
> enables the bridge function.

Yes and no.  Right now there is no hardware switch offload code in the
upstream kernel, so all bridging will still be done in software.  You
will need something along the lines of the patch I pointed you to to
enable hardware bridging.

> One drawback with that is that you kill the bridge when you reboot
> linux.

With the hardware bridging patch, hardware bridging will continue if
you don't break down your br0 interface before rebooting.  (Of course,
your board might still have a hardware reset line that resets the
switch when the CPU resets.)

> > > > > Now I want to add STP/RSTP to the switch. How would one do that?
> > > >
> > > > First, you'll want the hardware bridging patches that I posted to
> > > > netdev@ a while back, e.g.:
> > > >
> > > >    http://patchwork.ozlabs.org/patch/16578/
> > >
> > > I see, will have to study this a bit closer. One question though,
> > > does this disable MAC learning in the linux bridge?
> >
> > No, why should it?
> 
> Doesn't the HW switch handle all MAC leaning? Why duplicate
> this in the SW bridge?
> I figured the HW switch would offload the SW bridge this task.

Imagine the case where you bridge lan1, lan2 (both on the switch chip)
into br0, together with wlan0 (which is not on the switch chip).

Now a packet is sent out of br0.  Should it be sent to wlan0 or to the
switch chip?  How will you make this decision without an address database
on the Linux side?

> > > Do you have any idea how to do DSA on a Broadcom switch?
> >
> > I have no idea.  When I originally submitted the DSA code for merging,
> > I contacted Broadcom people about adding support for Broadcom switch
> > chips to it, but I never heard back from them.
> 
> OK. With DSA, how does one configure VLANs, policing and parameters in the
> HW switch that don't map or exist in the linux bridge?

The idea is to use existing kernel interface for this as much as
possible.  So e.g. if you do:

	vconfig add lan1 123
	vconfig add lan2 123
	brctl addbr br123
	brctl addif br123 lan1.123
	brctl addif br123 lan2.123

Then the DSA code (or some userspace netlink listener helper, or some
combination of both) should ideally also detect that VLAN 123 on
interfaces lan1 and lan2 are to be bridged together, and program the
switch chip accordingly.  I think all VLAN configurations that at least
the Marvell hardware supports can be expressed this way.

To configure things like ingress/egress rate limiting and such in the
switch chip for which there is no Linux counterpart interface, I suppose
some sysfs interface or so might suffice.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18 12:12         ` Lennert Buytenhek
@ 2010-06-18 15:13           ` Joakim Tjernlund
  2010-06-18 20:12             ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-18 15:13 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev

Lennert Buytenhek <buytenh@wantstofly.org> wrote on 2010/06/18 14:12:23:
>
> On Fri, Jun 18, 2010 at 01:09:32PM +0200, Joakim Tjernlund wrote:
>
> > > > > > I am trying to wrap my head around DSA and I need some help.
> > > > > >
> > > > > > Assume the example from Lennert:
> > > > > >
> > > > > >        +-----------+       +-----------+
> > > > > >        |           | RGMII |           |
> > > > > >        |           +-------+           +------ 1000baseT MDI ("WAN")
> > > > > >        |           |       |  6-port   +------ 1000baseT MDI ("LAN1")
> > > > > >        |    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
> > > > > >        |           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
> > > > > >        |           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
> > > > > >        |           |       |           |
> > > > > >        +-----------+       +-----------+
> > > > > >
> > > > > > If I understand this correctly I get at least 5 virtual I/Fs corresponding
> > > > > > to WAN, LAN1-4, but how is the RGMII I/F modelled?
> > > > >
> > > > > The RGMII interface is just the interface that your "real" network
> > > > > driver exports.  In the case of the Kirkwood 6281 A0 Reference Design
> > > > > (which I developed this code on), that would be eth0.  After the DSA
> > > > > driver is instantiated, you don't send or receive over eth0 directly
> > > > > anymore -- eth0 becomes purely a transport for DSA-tagged packets.
> > > >
> > > > hmm, but how do I send normal pkgs form the CPU to the switch then?
> > >
> > > Define what you mean by 'normal pkgs'.
> >
> > An ethernet broadcast pkg flooded onto all ports.
>
> This statement assumes that all ports have been configured into a
> bridge, which is not the default case.  (And why would it be?  Having each
> port in the same VLAN/subnet is only one of the many possible ways of
> configuring your switch ports -- and regular (non-DSA) Linux network
> interfaces aren't bridged together by default either.)  I.e. after boot,
> each of the switch ports behaves as if it's independent.
>
>
> > A normal ethernet host DST address would be looked up by
> > the switch HW and sent to the appropriate port.
>
> In current upstream kernels, if you in fact bridge all switch ports
> together using Linux bridging, this address lookup will be done by the
> Linux bridging code.

Yes, I am getting there mentally. I just have a hard time letting go of
viewing the HW switch as an external entity :)

[SNIP]

>
> > Once I create a linux bridge device and add the virtual I/Fs, one
> > enables the bridge function.
>
> Yes and no.  Right now there is no hardware switch offload code in the
> upstream kernel, so all bridging will still be done in software.  You
> will need something along the lines of the patch I pointed you to to
> enable hardware bridging.
>
>
> > One drawback with that is that you kill the bridge when you reboot
> > linux.
>
> With the hardware bridging patch, hardware bridging will continue if
> you don't break down your br0 interface before rebooting.  (Of course,
> your board might still have a hardware reset line that resets the
> switch when the CPU resets.)

hmm, one will have to recreate the exact config in several steps(create br0, add each
I/F etc.). I guess if done carefully one can avoid disturbing the switch.

>
> > > > > > Now I want to add STP/RSTP to the switch. How would one do that?
> > > > >
> > > > > First, you'll want the hardware bridging patches that I posted to
> > > > > netdev@ a while back, e.g.:
> > > > >
> > > > >    http://patchwork.ozlabs.org/patch/16578/
> > > >
> > > > I see, will have to study this a bit closer. One question though,
> > > > does this disable MAC learning in the linux bridge?
> > >
> > > No, why should it?
> >
> > Doesn't the HW switch handle all MAC leaning? Why duplicate
> > this in the SW bridge?
> > I figured the HW switch would offload the SW bridge this task.
>
> Imagine the case where you bridge lan1, lan2 (both on the switch chip)
> into br0, together with wlan0 (which is not on the switch chip).
>
> Now a packet is sent out of br0.  Should it be sent to wlan0 or to the
> switch chip?  How will you make this decision without an address database
> on the Linux side?

True, in this case you need it, but for only HW switch I/Fs you don't
need it and there can be several hundreds of MAC addresses passing
trough the HW switch. It would be nice if one didn't need to pass
all those up to the SW bridge, especially if you have a small embedded
CPU.

>
>
> > > > Do you have any idea how to do DSA on a Broadcom switch?
> > >
> > > I have no idea.  When I originally submitted the DSA code for merging,
> > > I contacted Broadcom people about adding support for Broadcom switch
> > > chips to it, but I never heard back from them.
> >
> > OK. With DSA, how does one configure VLANs, policing and parameters in the
> > HW switch that don't map or exist in the linux bridge?
>
> The idea is to use existing kernel interface for this as much as
> possible.  So e.g. if you do:
>
>    vconfig add lan1 123
>    vconfig add lan2 123
>    brctl addbr br123
>    brctl addif br123 lan1.123
>    brctl addif br123 lan2.123
>
> Then the DSA code (or some userspace netlink listener helper, or some
> combination of both) should ideally also detect that VLAN 123 on
> interfaces lan1 and lan2 are to be bridged together, and program the
> switch chip accordingly.  I think all VLAN configurations that at least
> the Marvell hardware supports can be expressed this way.

Yes, but I image that this breaks down when you want to do something a bit more
advanced. For example I don't think linux VLANs supports "shared VLAN learning"(SVL)
and to configure a HW switch to do SVL one would first have to impl.
that in Linux VLAN and then add the DSA code to get the config to the switch.

Not sure how one would express whether VLAN tags should be stripped off or not when
egressing the HW switch's physical port.

Furthermore, suppose one have a big HW switch, 48 ports, and lots of VLANs in that
HW switch one would have to create a lot of virtual I/Fs and VLANs in linux
just to configure the HW switch. This wastes resources on the CPU.

>
> To configure things like ingress/egress rate limiting and such in the
> switch chip for which there is no Linux counterpart interface, I suppose
> some sysfs interface or so might suffice.

Yes, there are aspects of a HW switch that doesn't map into DSA currently.
Perhaps one should add some framework to support this?

     Jocke


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18 15:13           ` Joakim Tjernlund
@ 2010-06-18 20:12             ` Lennert Buytenhek
  2010-06-19 14:22               ` Joakim Tjernlund
  0 siblings, 1 reply; 14+ messages in thread
From: Lennert Buytenhek @ 2010-06-18 20:12 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Fri, Jun 18, 2010 at 05:13:03PM +0200, Joakim Tjernlund wrote:

> > > > > > > Now I want to add STP/RSTP to the switch. How would one do that?
> > > > > >
> > > > > > First, you'll want the hardware bridging patches that I posted to
> > > > > > netdev@ a while back, e.g.:
> > > > > >
> > > > > >    http://patchwork.ozlabs.org/patch/16578/
> > > > >
> > > > > I see, will have to study this a bit closer. One question though,
> > > > > does this disable MAC learning in the linux bridge?
> > > >
> > > > No, why should it?
> > >
> > > Doesn't the HW switch handle all MAC leaning? Why duplicate
> > > this in the SW bridge?
> > > I figured the HW switch would offload the SW bridge this task.
> >
> > Imagine the case where you bridge lan1, lan2 (both on the switch chip)
> > into br0, together with wlan0 (which is not on the switch chip).
> >
> > Now a packet is sent out of br0.  Should it be sent to wlan0 or to the
> > switch chip?  How will you make this decision without an address database
> > on the Linux side?
> 
> True, in this case you need it, but for only HW switch I/Fs you don't
> need it and there can be several hundreds of MAC addresses passing
> trough the HW switch. It would be nice if one didn't need to pass
> all those up to the SW bridge, especially if you have a small embedded
> CPU.

I think you overestimate the effect that address learning will have on
the host CPU.  It only needs to happen for the first packet for every
new MAC address, and address flooding attacks is something you'll need
to address in either case.

If you're really worried about this scenario, then just configure your
boot loader to bridge all switch ports together, and don't load the DSA
driver.  The switch will then appear as a single interface, 'eth0' (or
whatever your SoC calls it), over which you can talk directly without
any form of tagging.  You won't be able to use any advanced features,
though.


> > > > > Do you have any idea how to do DSA on a Broadcom switch?
> > > >
> > > > I have no idea.  When I originally submitted the DSA code for merging,
> > > > I contacted Broadcom people about adding support for Broadcom switch
> > > > chips to it, but I never heard back from them.
> > >
> > > OK. With DSA, how does one configure VLANs, policing and parameters in the
> > > HW switch that don't map or exist in the linux bridge?
> >
> > The idea is to use existing kernel interface for this as much as
> > possible.  So e.g. if you do:
> >
> >    vconfig add lan1 123
> >    vconfig add lan2 123
> >    brctl addbr br123
> >    brctl addif br123 lan1.123
> >    brctl addif br123 lan2.123
> >
> > Then the DSA code (or some userspace netlink listener helper, or some
> > combination of both) should ideally also detect that VLAN 123 on
> > interfaces lan1 and lan2 are to be bridged together, and program the
> > switch chip accordingly.  I think all VLAN configurations that at least
> > the Marvell hardware supports can be expressed this way.
> 
> Yes, but I image that this breaks down when you want to do something
> a bit more advanced. For example I don't think linux VLANs supports
> "shared VLAN learning"(SVL) and to configure a HW switch to do SVL
> one would first have to impl. that in Linux VLAN and then add the DSA
> code to get the config to the switch.

Yes.  But that's really the best way to do it, in my humble opinion.

If you don't go the host networking stack integration route, you end
up with something like the vendor drivers.  Which work fine for most
scenarios.. until you want to do something like talking TCP/IP using
the host TCP stack over some of the switch ports, at which point the
lack of host networking stack integration comes to bite you.


> Not sure how one would express whether VLAN tags should be stripped
> off or not when egressing the HW switch's physical port.

If you transmit a packet onto 'lan', it will be sent to the switch chip
with an "untagged" DSA tag.  If you transmit a packet onto 'lan.123',
it will be sent to the switch chip with a "tagged" DSA tag.  See
net/dsa/tag_dsa.c for details.


> Furthermore, suppose one have a big HW switch, 48 ports, and lots of
> VLANs in that HW switch one would have to create a lot of virtual I/Fs
> and VLANs in linux just to configure the HW switch. This wastes
> resources on the CPU.

Where the 'resource waste' is on the order of a couple of tens or
hundreds of kilobytes of RAM.  If this is a problem for your host
CPU, I think you have bigger problems anyway.


> > To configure things like ingress/egress rate limiting and such in the
> > switch chip for which there is no Linux counterpart interface, I suppose
> > some sysfs interface or so might suffice.
> 
> Yes, there are aspects of a HW switch that doesn't map into DSA currently.
> Perhaps one should add some framework to support this?

Sounds good.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-18 20:12             ` Lennert Buytenhek
@ 2010-06-19 14:22               ` Joakim Tjernlund
  2010-06-19 16:56                 ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-19 14:22 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev

Lennert Buytenhek <buytenh@wantstofly.org> wrote on 2010/06/18 22:12:43:
> From: Lennert Buytenhek <buytenh@wantstofly.org>
> To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Cc: netdev@vger.kernel.org
> Date: 2010/06/18 22:12
> Subject: Re: Distributed Switch Architecture(DSA)
>
> On Fri, Jun 18, 2010 at 05:13:03PM +0200, Joakim Tjernlund wrote:
>
> > > > > > > > Now I want to add STP/RSTP to the switch. How would one do that?
> > > > > > >
> > > > > > > First, you'll want the hardware bridging patches that I posted to
> > > > > > > netdev@ a while back, e.g.:
> > > > > > >
> > > > > > >    http://patchwork.ozlabs.org/patch/16578/
> > > > > >
> > > > > > I see, will have to study this a bit closer. One question though,
> > > > > > does this disable MAC learning in the linux bridge?
> > > > >
> > > > > No, why should it?
> > > >
> > > > Doesn't the HW switch handle all MAC leaning? Why duplicate
> > > > this in the SW bridge?
> > > > I figured the HW switch would offload the SW bridge this task.
> > >
> > > Imagine the case where you bridge lan1, lan2 (both on the switch chip)
> > > into br0, together with wlan0 (which is not on the switch chip).
> > >
> > > Now a packet is sent out of br0.  Should it be sent to wlan0 or to the
> > > switch chip?  How will you make this decision without an address database
> > > on the Linux side?
> >
> > True, in this case you need it, but for only HW switch I/Fs you don't
> > need it and there can be several hundreds of MAC addresses passing
> > trough the HW switch. It would be nice if one didn't need to pass
> > all those up to the SW bridge, especially if you have a small embedded
> > CPU.
>
> I think you overestimate the effect that address learning will have on
> the host CPU.  It only needs to happen for the first packet for every
> new MAC address, and address flooding attacks is something you'll need
> to address in either case.

Possibly, I am just being careful.

>
> If you're really worried about this scenario, then just configure your
> boot loader to bridge all switch ports together, and don't load the DSA
> driver.  The switch will then appear as a single interface, 'eth0' (or
> whatever your SoC calls it), over which you can talk directly without
> any form of tagging.  You won't be able to use any advanced features,
> though.

Na, that is no fun :)

>
>
> > > > > > Do you have any idea how to do DSA on a Broadcom switch?
> > > > >
> > > > > I have no idea.  When I originally submitted the DSA code for merging,
> > > > > I contacted Broadcom people about adding support for Broadcom switch
> > > > > chips to it, but I never heard back from them.
> > > >
> > > > OK. With DSA, how does one configure VLANs, policing and parameters in the
> > > > HW switch that don't map or exist in the linux bridge?
> > >
> > > The idea is to use existing kernel interface for this as much as
> > > possible.  So e.g. if you do:
> > >
> > >    vconfig add lan1 123
> > >    vconfig add lan2 123
> > >    brctl addbr br123
> > >    brctl addif br123 lan1.123
> > >    brctl addif br123 lan2.123
> > >
> > > Then the DSA code (or some userspace netlink listener helper, or some
> > > combination of both) should ideally also detect that VLAN 123 on
> > > interfaces lan1 and lan2 are to be bridged together, and program the
> > > switch chip accordingly.  I think all VLAN configurations that at least
> > > the Marvell hardware supports can be expressed this way.
> >
> > Yes, but I image that this breaks down when you want to do something
> > a bit more advanced. For example I don't think linux VLANs supports
> > "shared VLAN learning"(SVL) and to configure a HW switch to do SVL
> > one would first have to impl. that in Linux VLAN and then add the DSA
> > code to get the config to the switch.
>
> Yes.  But that's really the best way to do it, in my humble opinion.

I will buy that for the moment. I can't see a better way either if
you truly want to integrate a HW switch into linux. I just wish
Linux VLANs had some support for SVL too

>
> If you don't go the host networking stack integration route, you end
> up with something like the vendor drivers.  Which work fine for most
> scenarios.. until you want to do something like talking TCP/IP using
> the host TCP stack over some of the switch ports, at which point the
> lack of host networking stack integration comes to bite you.

Just doing STP will bite you :)

>
>
> > Not sure how one would express whether VLAN tags should be stripped
> > off or not when egressing the HW switch's physical port.
>
> If you transmit a packet onto 'lan', it will be sent to the switch chip
> with an "untagged" DSA tag.  If you transmit a packet onto 'lan.123',
> it will be sent to the switch chip with a "tagged" DSA tag.  See
> net/dsa/tag_dsa.c for details.

Ah, now I get it, thanks.
However, how does this work for LAN to LAN pkgs? LAN1 and LAN2 could be
in the same VLAN but one is implicit(port) VLAN and the
other is explicit. How do I config the HW switch to do that?

>
>
> > Furthermore, suppose one have a big HW switch, 48 ports, and lots of
> > VLANs in that HW switch one would have to create a lot of virtual I/Fs
> > and VLANs in linux just to configure the HW switch. This wastes
> > resources on the CPU.
>
> Where the 'resource waste' is on the order of a couple of tens or
> hundreds of kilobytes of RAM.  If this is a problem for your host
> CPU, I think you have bigger problems anyway.

That is not a very good argument, this is how bloat builds.
However, lets ignore this for now.

>
>
> > > To configure things like ingress/egress rate limiting and such in the
> > > switch chip for which there is no Linux counterpart interface, I suppose
> > > some sysfs interface or so might suffice.
> >
> > Yes, there are aspects of a HW switch that doesn't map into DSA currently.
> > Perhaps one should add some framework to support this?
>
> Sounds good.

Any idea how such an framework should look like? What transport
mechanism is suitable to talk to a user space daemon?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-19 14:22               ` Joakim Tjernlund
@ 2010-06-19 16:56                 ` Lennert Buytenhek
  2010-06-19 18:48                   ` Joakim Tjernlund
  0 siblings, 1 reply; 14+ messages in thread
From: Lennert Buytenhek @ 2010-06-19 16:56 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Sat, Jun 19, 2010 at 04:22:18PM +0200, Joakim Tjernlund wrote:

> > > > > OK. With DSA, how does one configure VLANs, policing and
> > > > > parameters in the HW switch that don't map or exist in the
> > > > > linux bridge?
> > > >
> > > > The idea is to use existing kernel interface for this as much as
> > > > possible.  So e.g. if you do:
> > > >
> > > >    vconfig add lan1 123
> > > >    vconfig add lan2 123
> > > >    brctl addbr br123
> > > >    brctl addif br123 lan1.123
> > > >    brctl addif br123 lan2.123
> > > >
> > > > Then the DSA code (or some userspace netlink listener helper, or some
> > > > combination of both) should ideally also detect that VLAN 123 on
> > > > interfaces lan1 and lan2 are to be bridged together, and program the
> > > > switch chip accordingly.  I think all VLAN configurations that at least
> > > > the Marvell hardware supports can be expressed this way.
> > >
> > > Yes, but I image that this breaks down when you want to do something
> > > a bit more advanced. For example I don't think linux VLANs supports
> > > "shared VLAN learning"(SVL) and to configure a HW switch to do SVL
> > > one would first have to impl. that in Linux VLAN and then add the DSA
> > > code to get the config to the switch.
> >
> > Yes.  But that's really the best way to do it, in my humble opinion.
> 
> I will buy that for the moment. I can't see a better way either if
> you truly want to integrate a HW switch into linux. I just wish
> Linux VLANs had some support for SVL too

You know how to fix that. :)


> > If you don't go the host networking stack integration route, you end
> > up with something like the vendor drivers.  Which work fine for most
> > scenarios.. until you want to do something like talking TCP/IP using
> > the host TCP stack over some of the switch ports, at which point the
> > lack of host networking stack integration comes to bite you.
> 
> Just doing STP will bite you :)

Most people deal with this by running a userland STP daemon that uses
raw sockets to inject manually (i.e. in userspace) DSA-tagged packets
onto the eth0 (or whatever) interface.  This "works" (for some
definitions of 'works') for UDP apps such as a DHCP server as well --
this crappy approach unfortunately only really breaks down for TCP.


> > > Not sure how one would express whether VLAN tags should be stripped
> > > off or not when egressing the HW switch's physical port.
> >
> > If you transmit a packet onto 'lan', it will be sent to the switch chip
> > with an "untagged" DSA tag.  If you transmit a packet onto 'lan.123',
> > it will be sent to the switch chip with a "tagged" DSA tag.  See
> > net/dsa/tag_dsa.c for details.
> 
> Ah, now I get it, thanks.
> However, how does this work for LAN to LAN pkgs? LAN1 and LAN2 could be
> in the same VLAN but one is implicit(port) VLAN and the
> other is explicit.

If you tell the HW switch to forward these packets, they will never
appear at the CPU interface, so the DSA tagging/untagging doesn't enter
the picture.


> How do I config the HW switch to do that?

Tell the switch that the vlan is native on one of the ports but not on
the other.  It's been a while since I looked at the chip docs but there
are ways of doing this.


> > > Furthermore, suppose one have a big HW switch, 48 ports, and lots of
> > > VLANs in that HW switch one would have to create a lot of virtual I/Fs
> > > and VLANs in linux just to configure the HW switch. This wastes
> > > resources on the CPU.
> >
> > Where the 'resource waste' is on the order of a couple of tens or
> > hundreds of kilobytes of RAM.  If this is a problem for your host
> > CPU, I think you have bigger problems anyway.
> 
> That is not a very good argument, this is how bloat builds.

If you have a better way of getting all the features while spending
less resources, please step forward with your ideas.  The current design
is the best I could come up with, but I'm sure it's not optimal in its
current form.


> > > > To configure things like ingress/egress rate limiting and such in the
> > > > switch chip for which there is no Linux counterpart interface, I suppose
> > > > some sysfs interface or so might suffice.
> > >
> > > Yes, there are aspects of a HW switch that doesn't map into DSA currently.
> > > Perhaps one should add some framework to support this?
> >
> > Sounds good.
> 
> Any idea how such an framework should look like? What transport
> mechanism is suitable to talk to a user space daemon?

Have a look at netlink.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-19 16:56                 ` Lennert Buytenhek
@ 2010-06-19 18:48                   ` Joakim Tjernlund
  2010-06-19 18:57                     ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-19 18:48 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev

Lennert Buytenhek <buytenh@wantstofly.org> wrote on 2010/06/19 18:56:43:
>
> On Sat, Jun 19, 2010 at 04:22:18PM +0200, Joakim Tjernlund wrote:
>
> > > > > > OK. With DSA, how does one configure VLANs, policing and
> > > > > > parameters in the HW switch that don't map or exist in the
> > > > > > linux bridge?
> > > > >
> > > > > The idea is to use existing kernel interface for this as much as
> > > > > possible.  So e.g. if you do:
> > > > >
> > > > >    vconfig add lan1 123
> > > > >    vconfig add lan2 123
> > > > >    brctl addbr br123
> > > > >    brctl addif br123 lan1.123
> > > > >    brctl addif br123 lan2.123
> > > > >
> > > > > Then the DSA code (or some userspace netlink listener helper, or some
> > > > > combination of both) should ideally also detect that VLAN 123 on
> > > > > interfaces lan1 and lan2 are to be bridged together, and program the
> > > > > switch chip accordingly.  I think all VLAN configurations that at least
> > > > > the Marvell hardware supports can be expressed this way.
> > > >
> > > > Yes, but I image that this breaks down when you want to do something
> > > > a bit more advanced. For example I don't think linux VLANs supports
> > > > "shared VLAN learning"(SVL) and to configure a HW switch to do SVL
> > > > one would first have to impl. that in Linux VLAN and then add the DSA
> > > > code to get the config to the switch.
> > >
> > > Yes.  But that's really the best way to do it, in my humble opinion.
> >
> > I will buy that for the moment. I can't see a better way either if
> > you truly want to integrate a HW switch into linux. I just wish
> > Linux VLANs had some support for SVL too
>
> You know how to fix that. :)

Possibly :)

>
>
> > > If you don't go the host networking stack integration route, you end
> > > up with something like the vendor drivers.  Which work fine for most
> > > scenarios.. until you want to do something like talking TCP/IP using
> > > the host TCP stack over some of the switch ports, at which point the
> > > lack of host networking stack integration comes to bite you.
> >
> > Just doing STP will bite you :)
>
> Most people deal with this by running a userland STP daemon that uses
> raw sockets to inject manually (i.e. in userspace) DSA-tagged packets
> onto the eth0 (or whatever) interface.  This "works" (for some
> definitions of 'works') for UDP apps such as a DHCP server as well --
> this crappy approach unfortunately only really breaks down for TCP.
>
>
> > > > Not sure how one would express whether VLAN tags should be stripped
> > > > off or not when egressing the HW switch's physical port.
> > >
> > > If you transmit a packet onto 'lan', it will be sent to the switch chip
> > > with an "untagged" DSA tag.  If you transmit a packet onto 'lan.123',
> > > it will be sent to the switch chip with a "tagged" DSA tag.  See
> > > net/dsa/tag_dsa.c for details.
> >
> > Ah, now I get it, thanks.
> > However, how does this work for LAN to LAN pkgs? LAN1 and LAN2 could be
> > in the same VLAN but one is implicit(port) VLAN and the
> > other is explicit.
>
> If you tell the HW switch to forward these packets, they will never
> appear at the CPU interface, so the DSA tagging/untagging doesn't enter
> the picture.

"tell the HW switch"? Doesn't DSA do that already? If not, what
is the point of DSA then if it doesn't use the native forwarding
capabilities of the HW switch?

>
>
> > How do I config the HW switch to do that?
>
> Tell the switch that the vlan is native on one of the ports but not on
> the other.  It's been a while since I looked at the chip docs but there
> are ways of doing this.

The current DSA impl. does not support this? There should be some
way to manage this within the DSA framework.

>
>
> > > > Furthermore, suppose one have a big HW switch, 48 ports, and lots of
> > > > VLANs in that HW switch one would have to create a lot of virtual I/Fs
> > > > and VLANs in linux just to configure the HW switch. This wastes
> > > > resources on the CPU.
> > >
> > > Where the 'resource waste' is on the order of a couple of tens or
> > > hundreds of kilobytes of RAM.  If this is a problem for your host
> > > CPU, I think you have bigger problems anyway.
> >
> > That is not a very good argument, this is how bloat builds.
>
> If you have a better way of getting all the features while spending
> less resources, please step forward with your ideas.  The current design
> is the best I could come up with, but I'm sure it's not optimal in its
> current form.

I don't, I am not that familiar with the inner working of Linux
networking code.

>
>
> > > > > To configure things like ingress/egress rate limiting and such in the
> > > > > switch chip for which there is no Linux counterpart interface, I suppose
> > > > > some sysfs interface or so might suffice.
> > > >
> > > > Yes, there are aspects of a HW switch that doesn't map into DSA currently.
> > > > Perhaps one should add some framework to support this?
> > >
> > > Sounds good.
> >
> > Any idea how such an framework should look like? What transport
> > mechanism is suitable to talk to a user space daemon?
>
> Have a look at netlink.

I was afraid you would say that, I have no experience with netlink :)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-19 18:48                   ` Joakim Tjernlund
@ 2010-06-19 18:57                     ` Lennert Buytenhek
  2010-06-20 14:41                       ` Joakim Tjernlund
  0 siblings, 1 reply; 14+ messages in thread
From: Lennert Buytenhek @ 2010-06-19 18:57 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Sat, Jun 19, 2010 at 08:48:31PM +0200, Joakim Tjernlund wrote:

> > > > > Not sure how one would express whether VLAN tags should be stripped
> > > > > off or not when egressing the HW switch's physical port.
> > > >
> > > > If you transmit a packet onto 'lan', it will be sent to the switch chip
> > > > with an "untagged" DSA tag.  If you transmit a packet onto 'lan.123',
> > > > it will be sent to the switch chip with a "tagged" DSA tag.  See
> > > > net/dsa/tag_dsa.c for details.
> > >
> > > Ah, now I get it, thanks.
> > > However, how does this work for LAN to LAN pkgs? LAN1 and LAN2 could be
> > > in the same VLAN but one is implicit(port) VLAN and the
> > > other is explicit.
> >
> > If you tell the HW switch to forward these packets, they will never
> > appear at the CPU interface, so the DSA tagging/untagging doesn't enter
> > the picture.
> 
> "tell the HW switch"? Doesn't DSA do that already?

Not in its current iteration, as I've explained in previous emails.


> If not, what is the point of DSA then if it doesn't use the native
> forwarding capabilities of the HW switch?

The point is and always was to provide a framework for proper integration
of hardware switch chips into the Linux kernel.  This framework doesn't
become useless just because it doesn't already support every single
hardware feature at this point.


> > > How do I config the HW switch to do that?
> >
> > Tell the switch that the vlan is native on one of the ports but not on
> > the other.  It's been a while since I looked at the chip docs but there
> > are ways of doing this.
> 
> The current DSA impl. does not support this? There should be some
> way to manage this within the DSA framework.

Have you even tried the DSA code?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-19 18:57                     ` Lennert Buytenhek
@ 2010-06-20 14:41                       ` Joakim Tjernlund
  2010-07-05 17:24                         ` Lennert Buytenhek
  0 siblings, 1 reply; 14+ messages in thread
From: Joakim Tjernlund @ 2010-06-20 14:41 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev

Lennert Buytenhek <buytenh@wantstofly.org> wrote on 2010/06/19 20:57:39:
>
> On Sat, Jun 19, 2010 at 08:48:31PM +0200, Joakim Tjernlund wrote:
>
> > > > > > Not sure how one would express whether VLAN tags should be stripped
> > > > > > off or not when egressing the HW switch's physical port.
> > > > >
> > > > > If you transmit a packet onto 'lan', it will be sent to the switch chip
> > > > > with an "untagged" DSA tag.  If you transmit a packet onto 'lan.123',
> > > > > it will be sent to the switch chip with a "tagged" DSA tag.  See
> > > > > net/dsa/tag_dsa.c for details.
> > > >
> > > > Ah, now I get it, thanks.
> > > > However, how does this work for LAN to LAN pkgs? LAN1 and LAN2 could be
> > > > in the same VLAN but one is implicit(port) VLAN and the
> > > > other is explicit.
> > >
> > > If you tell the HW switch to forward these packets, they will never
> > > appear at the CPU interface, so the DSA tagging/untagging doesn't enter
> > > the picture.
> >
> > "tell the HW switch"? Doesn't DSA do that already?
>
> Not in its current iteration, as I've explained in previous emails.

Sorry, I didn't quite get that.

>
>
> > If not, what is the point of DSA then if it doesn't use the native
> > forwarding capabilities of the HW switch?
>
> The point is and always was to provide a framework for proper integration
> of hardware switch chips into the Linux kernel.  This framework doesn't
> become useless just because it doesn't already support every single
> hardware feature at this point.

Right, sorry if I sounded a bit harsh.

So DSA currently does a very minimal config of the HW switch to get
things going.
If you want to do something more fancy one has to
add a control plane to DSA which would possibly talk
to a user space app. Is that correct?

>
>
> > > > How do I config the HW switch to do that?
> > >
> > > Tell the switch that the vlan is native on one of the ports but not on
> > > the other.  It's been a while since I looked at the chip docs but there
> > > are ways of doing this.
> >
> > The current DSA impl. does not support this? There should be some
> > way to manage this within the DSA framework.
>
> Have you even tried the DSA code?

Not yet and I don't have any MV HW either :(


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Distributed Switch Architecture(DSA)
  2010-06-20 14:41                       ` Joakim Tjernlund
@ 2010-07-05 17:24                         ` Lennert Buytenhek
  0 siblings, 0 replies; 14+ messages in thread
From: Lennert Buytenhek @ 2010-07-05 17:24 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Sun, Jun 20, 2010 at 04:41:31PM +0200, Joakim Tjernlund wrote:

> > > If not, what is the point of DSA then if it doesn't use the native
> > > forwarding capabilities of the HW switch?
> >
> > The point is and always was to provide a framework for proper integration
> > of hardware switch chips into the Linux kernel.  This framework doesn't
> > become useless just because it doesn't already support every single
> > hardware feature at this point.
> 
> Right, sorry if I sounded a bit harsh.
> 
> So DSA currently does a very minimal config of the HW switch to get
> things going.

Correct.


> If you want to do something more fancy one has to
> add a control plane to DSA which would possibly talk
> to a user space app. Is that correct?

Yes and no -- yes in the sense that if you want to use more functionality
of the switch chip, you'll have to add some code that extracts that info
from the Linux network interface config and turns it into commands for the
switch chip, and no in the sense that I'm not sure yet what the best way
to implement this would be.  (Doing it all in userspace is one option.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-07-05 17:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-18  7:06 Distributed Switch Architecture(DSA) Joakim Tjernlund
2010-06-18  7:33 ` Lennert Buytenhek
2010-06-18  9:15   ` Joakim Tjernlund
2010-06-18  9:59     ` Lennert Buytenhek
2010-06-18 11:09       ` Joakim Tjernlund
2010-06-18 12:12         ` Lennert Buytenhek
2010-06-18 15:13           ` Joakim Tjernlund
2010-06-18 20:12             ` Lennert Buytenhek
2010-06-19 14:22               ` Joakim Tjernlund
2010-06-19 16:56                 ` Lennert Buytenhek
2010-06-19 18:48                   ` Joakim Tjernlund
2010-06-19 18:57                     ` Lennert Buytenhek
2010-06-20 14:41                       ` Joakim Tjernlund
2010-07-05 17:24                         ` Lennert Buytenhek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).