Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH 1/2] gianfar: convert to phylink
From: Vladimir Oltean @ 2019-08-24 15:21 UTC (permalink / raw)
  To: Arseny Solokha
  Cc: Claudiu Manoil, Russell King, Ioana Ciornei, Andrew Lunn, netdev,
	Florian Fainelli
In-Reply-To: <87lfwfio13.fsf@eldim>

Hi Arseny.

On Tue, 30 Jul 2019 at 17:40, Arseny Solokha <asolokha@kb.kras.ru> wrote:
>
> > Hi Arseny,
> >
> > Nice project!
>
> Vladimir, Russell, thanks for your review. I'm on vacation now, so won't fully
> address your comments in a few weeks: while I can build the code, I won't have
> access to hardware to test.
>
> So it seems this patch will turn into a series where we'll have some cleanup
> patches preceding the actual conversion (and the latter will also contain a
> documentation change in Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> which I've overlooked in the first submission). I'll try to post trivial
> cleanups independently while still on vacation.
>

Yes, ideally the cleanup would be separate from the conversion.

>
> >> @@ -891,11 +912,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
> >>
> >>         err = of_property_read_string(np, "phy-connection-type", &ctype);
> >>
> >> -       /* We only care about rgmii-id.  The rest are autodetected */
> >> -       if (err == 0 && !strcmp(ctype, "rgmii-id"))
> >> -               priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> >> -       else
> >> +       /* We only care about rgmii-id and sgmii - the former
> >> +        * is indistinguishable from rgmii in hardware, and phylink needs
> >> +        * the latter to be set appropriately for correct phy configuration.
> >> +        * The rest are autodetected
> >> +        */
> >> +       if (err == 0) {
> >> +               if (!strcmp(ctype, "rgmii-id"))
> >> +                       priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> >> +               else if (!strcmp(ctype, "sgmii"))
> >> +                       priv->interface = PHY_INTERFACE_MODE_SGMII;
> >> +               else
> >> +                       priv->interface = PHY_INTERFACE_MODE_MII;
> >> +       } else {
> >>                 priv->interface = PHY_INTERFACE_MODE_MII;
> >> +       }
> >>
> >
> > No. Don't do this. Just do:
> >
> >     err = of_get_phy_mode(np);
> >     if (err < 0)
> >         goto err_grp_init;
> >
> >     priv->interface = err;
> >
> >>         if (of_find_property(np, "fsl,magic-packet", NULL))
> >>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_MAGIC_PACKET;
> >> @@ -903,19 +934,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
> >>         if (of_get_property(np, "fsl,wake-on-filer", NULL))
> >>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_WAKE_ON_FILER;
> >>
> >> -       priv->phy_node = of_parse_phandle(np, "phy-handle", 0);
> >> +       priv->device_node = np;
> >> +       priv->speed = SPEED_UNKNOWN;
> >>
> >> -       /* In the case of a fixed PHY, the DT node associated
> >> -        * to the PHY is the Ethernet MAC DT node.
> >> -        */
> >> -       if (!priv->phy_node && of_phy_is_fixed_link(np)) {
> >> -               err = of_phy_register_fixed_link(np);
> >> -               if (err)
> >> -                       goto err_grp_init;
> >> +       priv->phylink_config.dev = &priv->ndev->dev;
> >> +       priv->phylink_config.type = PHYLINK_NETDEV;
> >>
> >> -               priv->phy_node = of_node_get(np);
> >> +       phylink = phylink_create(&priv->phylink_config, of_fwnode_handle(np),
> >> +                                priv->interface, &gfar_phylink_ops);
> >
> > You introduced a bug here.
> > of_phy_connect used to take the PHY interface type (for good or bad)
> > from gfar_get_interface() (which is reconstructing it from the MAC
> > registers).
> > You are now passing the PHY interface type to phylink_create from the
> > "phy-connection-type" DT property.
> > At the very least, you are breaking LS1021A which uses phy-mode
> > instead of phy-connection-type (hence my comment above to use the
> > generic OF helper).
> > Actually I think you just uncovered a latent bug, in that the DT
> > bindings for phy-mode didn't mean much at all to the driver - it would
> > rely on what the bootloader had set up.
> > Actually DT bindings for phy-connection-type were most likely simply
> > bolt on on top of gianfar when they figured they couldn't just
> > auto-detect the various species of required RGMII delays.
> > But gfar_get_interface is a piece of history that was introduced in
> > the same commit as the enum phy_interface_t itself: e8a2b6a42073
> > ("[PATCH] PHY: Add support for configuring the PHY connection
> > interface"). Its time has come.
>
> <…>
>
> >>         }
> >>
> >> +       priv->tbi_phy = NULL;
> >> +       interface = gfar_get_interface(dev);
> >
> > Be consistent and just go for priv->interface. Nobody's changing it anyway.
>
> So if I get you right, I'm supposed to drop gfar_get_interface() and rely on DT
> bindings entirely?
>

Oof, I checked arch/powerpc/boot/dts/fsl and the following boards
using the gianfar driver are guilty of populating phy-handle but not
phy-connection-type:
mpc8540ads
mpc8541cds
mpc8548cds_32b
mpc8548cds_36b
mpc8555cds
mpc8560ads
mpc8568mds
ppa8548

I think a sane logic for populating priv->interface would be:
- Attempt of_get_phy_mode
- If phy-mode or phy-connection-type properties are not found, revert
to gfar_get_interface for the legacy blobs above.

>
> >> @@ -3387,23 +3384,6 @@ static irqreturn_t gfar_interrupt(int irq, void *grp_id)
> >>         return IRQ_HANDLED;
> >>  }
> >>
> >> -/* Called every time the controller might need to be made
> >> - * aware of new link state.  The PHY code conveys this
> >> - * information through variables in the phydev structure, and this
> >> - * function converts those variables into the appropriate
> >> - * register values, and can bring down the device if needed.
> >> - */
> >> -static void adjust_link(struct net_device *dev)
> >> -{
> >> -       struct gfar_private *priv = netdev_priv(dev);
> >> -       struct phy_device *phydev = dev->phydev;
> >> -
> >> -       if (unlikely(phydev->link != priv->oldlink ||
> >> -                    (phydev->link && (phydev->duplex != priv->oldduplex ||
> >> -                                      phydev->speed != priv->oldspeed))))
> >> -               gfar_update_link_state(priv);
> >> -}
> >
> > Getting rid of the cruft from this function deserves its own patch.
>
> How am I supposed to remove it without breaking the PHYLIB-based driver? Or do
> you mean making it call gfar_update_link_state() just before the conversion
> which will then remove adjust_link() altogether?
>

I don't know, if you can't refactor without breaking anything then ok.

>
> >>
> >>         if (unlikely(test_bit(GFAR_RESETTING, &priv->state)))
> >>                 return;
> >>
> >> -       if (phydev->link) {
> >> -               u32 tempval1 = gfar_read(&regs->maccfg1);
> >> -               u32 tempval = gfar_read(&regs->maccfg2);
> >> -               u32 ecntrl = gfar_read(&regs->ecntrl);
> >> -               u32 tx_flow_oldval = (tempval1 & MACCFG1_TX_FLOW);
> >> +       if (unlikely(phylink_autoneg_inband(mode)))
> >> +               return;
> >>
> >> -               if (phydev->duplex != priv->oldduplex) {
> >> -                       if (!(phydev->duplex))
> >> -                               tempval &= ~(MACCFG2_FULL_DUPLEX);
> >> -                       else
> >> -                               tempval |= MACCFG2_FULL_DUPLEX;
> >> +       maccfg1 = gfar_read(&regs->maccfg1);
> >> +       maccfg2 = gfar_read(&regs->maccfg2);
> >> +       ecntrl = gfar_read(&regs->ecntrl);
> >>
> >> -                       priv->oldduplex = phydev->duplex;
> >> -               }
> >> +       new_maccfg2 = maccfg2 & ~(MACCFG2_FULL_DUPLEX | MACCFG2_IF);
> >> +       new_ecntrl = ecntrl & ~ECNTRL_R100;
> >>
> >> -               if (phydev->speed != priv->oldspeed) {
> >> -                       switch (phydev->speed) {
> >> -                       case 1000:
> >> -                               tempval =
> >> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_GMII);
> >> +       if (state->duplex)
> >> +               new_maccfg2 |= MACCFG2_FULL_DUPLEX;
> >>
> >> -                               ecntrl &= ~(ECNTRL_R100);
> >> -                               break;
> >> -                       case 100:
> >> -                       case 10:
> >> -                               tempval =
> >> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_MII);
> >> -
> >> -                               /* Reduced mode distinguishes
> >> -                                * between 10 and 100
> >> -                                */
> >> -                               if (phydev->speed == SPEED_100)
> >> -                                       ecntrl |= ECNTRL_R100;
> >> -                               else
> >> -                                       ecntrl &= ~(ECNTRL_R100);
> >> -                               break;
> >> -                       default:
> >> -                               netif_warn(priv, link, priv->ndev,
> >> -                                          "Ack!  Speed (%d) is not 10/100/1000!\n",
> >> -                                          phydev->speed);
> >
> > Please 1. remove "Ack!" 2. treat SPEED_UNKNOWN here by setting the MAC
> > into a low-power state (e.g. 10 Mbps - the power savings are real).
> > Don't print that Speed -1 is not 10/100/1000, we know that.
>
> In my first conversion attempt I see "Ack!" when changing link speed on when
> shutting it down, so switching to 10 Mbps doesn't seem right for me—hence early
> return in this case. Maybe I'm doing something wrong here.
>

When mac_config calls with SPEED_UNKNOWN, the link is down, and you
can put the MAC in the lowest energy state it can go to (10 Mbps, in
this case). Or so I've been told. Maybe Russell can chime in. Anyway,
you don't need to print anything, there's lots of prints from PHYLINK
already.

>
> >> diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> >> index 3433b46b90c1..146b30d07789 100644
> >> --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
> >> +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> >> @@ -35,7 +35,7 @@
> >>  #include <asm/types.h>
> >>  #include <linux/ethtool.h>
> >>  #include <linux/mii.h>
> >> -#include <linux/phy.h>
> >> +#include <linux/phylink.h>
> >>  #include <linux/sort.h>
> >>  #include <linux/if_vlan.h>
> >>  #include <linux/of_platform.h>
> >> @@ -207,12 +207,10 @@ static void gfar_get_regs(struct net_device *dev, struct ethtool_regs *regs,
> >>  static unsigned int gfar_usecs2ticks(struct gfar_private *priv,
> >>                                      unsigned int usecs)
> >>  {
> >> -       struct net_device *ndev = priv->ndev;
> >> -       struct phy_device *phydev = ndev->phydev;
> >
> > Are you sure this still works? You missed a ndev->phydev check from
> > gfar_gcoalesce, where this is called from. Technically you can still
> > check ndev->phydev, it's just that PHYLINK doesn't guarantee you'll
> > have one (e.g. fixed-link interfaces).
>
> It still works for RGMII PHYs, SGMII and 1000Base-X in my testing. I didn't
> check it with fixed-link, though.
>
>
> >> @@ -1519,6 +1472,24 @@ static int gfar_get_ts_info(struct net_device *dev,
> >>         return 0;
> >>  }
> >>
> >> +/* Set link ksettings (phy address, speed) for ethtools */
> >
> > ethtool, not ethtools. Also, I'm not quite sure what you mean by
> > setting the "phy address" with ethtool.
>
> Well, I know where I've copied it from… Thanks for pointing it out.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Andrew Lunn @ 2019-08-24 15:24 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

On Sat, Aug 24, 2019 at 04:42:47AM +0200, Marek Behún wrote:
> Hi,
> this is my attempt to solve the multi-CPU port issue for DSA.
> 
> Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> If more than one CPU port is found in a tree, the code assigns CPU ports
> to user/DSA ports in a round robin way. So for the simplest case where
> we have one switch with N ports, 2 of them of type CPU connected to eth0
> and eth1, and the other ports labels being lan1, lan2, ..., the code
> assigns them to CPU ports this way:
>   lan1 <-> eth0
>   lan2 <-> eth1
>   lan3 <-> eth0
>   lan4 <-> eth1
>   lan5 <-> eth0

Hi Marek

That is what i've always argued is a good default. So i'm happy with
this.

> Patch 2 adds a new operation to the net device operations structure.
> Currently we use the iflink property of a net device to report to which
> CPU port a given switch port si connected to. The ip link utility from
> iproute2 reports this as "lan1@eth0". We add a new net device operation,
> ndo_set_iflink, which can be used to set this property. We call this
> function from the netlink handlers.

That is a new idea. Interesting.

I would like to look around and see what else uses this "lan1@eth0"
concept. We need to ensure it is not counter intuitive in general,
when you consider all possible users.

> Patch 3 implements this new ndo_set_iflink operation for DSA slave
> device. Thus the userspace can request a change of CPU port of a given
> port.

So this is all about transmit from the host out the switch. What about
receive? How do you tell the switch which CPU interface it should use
for a port?

    Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Vladimir Oltean @ 2019-08-24 15:40 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Hi Marek,

On Sat, 24 Aug 2019 at 05:43, Marek Behún <marek.behun@nic.cz> wrote:
>
> Hi,
> this is my attempt to solve the multi-CPU port issue for DSA.
>
> Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> If more than one CPU port is found in a tree, the code assigns CPU ports
> to user/DSA ports in a round robin way. So for the simplest case where
> we have one switch with N ports, 2 of them of type CPU connected to eth0
> and eth1, and the other ports labels being lan1, lan2, ..., the code
> assigns them to CPU ports this way:
>   lan1 <-> eth0
>   lan2 <-> eth1
>   lan3 <-> eth0
>   lan4 <-> eth1
>   lan5 <-> eth0
>   ...
>
> Patch 2 adds a new operation to the net device operations structure.
> Currently we use the iflink property of a net device to report to which
> CPU port a given switch port si connected to. The ip link utility from
> iproute2 reports this as "lan1@eth0". We add a new net device operation,
> ndo_set_iflink, which can be used to set this property. We call this
> function from the netlink handlers.
>
> Patch 3 implements this new ndo_set_iflink operation for DSA slave
> device. Thus the userspace can request a change of CPU port of a given
> port.
>
> I am also sending patch for iproute2-next, to add support for setting
> this iflink value.
>
> Marek
>

The topic is interesting.
This changeset leaves the reader wanting to see a driver
implementation of .port_change_cpu_port. (mostly to understand what
your hardware is capable of)
Will DSA assume that all CPU ports are equal in terms of tagging
protocol abilities? There are switches where one of the CPU ports can
do tagging and the other can't.
Is the static assignment between slave and CPU ports going to be the
only use case? What about link aggregation? Flow steering perhaps?
And like Andrew pointed out, how do you handle the receive case? What
happens to flooded frames, will the switch send them to both CPU
interfaces, and get received twice in Linux? How do you prevent that?

> Marek Behún (3):
>   net: dsa: allow for multiple CPU ports
>   net: add ndo for setting the iflink property
>   net: dsa: implement ndo_set_netlink for chaning port's CPU port
>
>  include/linux/netdevice.h |  5 +++
>  include/net/dsa.h         | 11 ++++-
>  net/core/dev.c            | 15 +++++++
>  net/core/rtnetlink.c      |  7 ++++
>  net/dsa/dsa2.c            | 84 +++++++++++++++++++++++++--------------
>  net/dsa/slave.c           | 35 ++++++++++++++++
>  6 files changed, 126 insertions(+), 31 deletions(-)
>
> --
> 2.21.0
>

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports
From: Andrew Lunn @ 2019-08-24 15:43 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824024251.4542-2-marek.behun@nic.cz>

> +static int dsa_tree_setup_default_cpus(struct dsa_switch_tree *dst)
>  {
>  	struct dsa_switch *ds;
>  	struct dsa_port *dp;
> -	int device, port;
> +	int device, port, i;
>  
> -	/* DSA currently only supports a single CPU port */
> -	dst->cpu_dp = dsa_tree_find_first_cpu(dst);
> -	if (!dst->cpu_dp) {
> +	dsa_tree_fill_cpu_ports(dst);
> +	if (!dst->num_cpu_dps) {
>  		pr_warn("Tree has no master device\n");
>  		return -EINVAL;
>  	}
>  
> -	/* Assign the default CPU port to all ports of the fabric */
> +	/* Assign the default CPU port to all ports of the fabric in a round
> +	 * robin way. This should work nicely for all sane switch tree designs.
> +	 */
> +	i = 0;
> +
>  	for (device = 0; device < DSA_MAX_SWITCHES; device++) {
>  		ds = dst->ds[device];
>  		if (!ds)
> @@ -238,18 +249,20 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
>  		for (port = 0; port < ds->num_ports; port++) {
>  			dp = &ds->ports[port];
>  
> -			if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp))
> -				dp->cpu_dp = dst->cpu_dp;
> +			if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp)) {
> +				dp->cpu_dp = dst->cpu_dps[i++];
> +				if (i == dst->num_cpu_dps)
> +					i = 0;
> +			}

Hi Marek

For a single switch, i think this is O.K, but when you have a cluster,
maybe a different allocation should be considered? If this switch has
a local CPU port, use it. Only round robing between remote CPU ports
when there is no local CPU port?

For a two switch setup and each switch having its own CPU port, your
allocation will cause half the CPU traffic to go across the DSA link
between the two switches. But we really want to keep the DSA link for
traffic between user ports on different switches.

But i don't know if it is worth the effort. I've never seen a D in DSA
setup with multiple CPUs ports. I've only ever seen an single switch
with multiple CPU ports.

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Vladimir Oltean @ 2019-08-24 15:44 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <CA+h21hpBKnueT0QrVDL=Hhcp9X0rnaPW8omxiegq4TkcQ18EVQ@mail.gmail.com>

On Sat, 24 Aug 2019 at 18:40, Vladimir Oltean <olteanv@gmail.com> wrote:
>
> Hi Marek,
>
> On Sat, 24 Aug 2019 at 05:43, Marek Behún <marek.behun@nic.cz> wrote:
> >
> > Hi,
> > this is my attempt to solve the multi-CPU port issue for DSA.
> >
> > Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> > If more than one CPU port is found in a tree, the code assigns CPU ports
> > to user/DSA ports in a round robin way. So for the simplest case where
> > we have one switch with N ports, 2 of them of type CPU connected to eth0
> > and eth1, and the other ports labels being lan1, lan2, ..., the code
> > assigns them to CPU ports this way:
> >   lan1 <-> eth0
> >   lan2 <-> eth1
> >   lan3 <-> eth0
> >   lan4 <-> eth1
> >   lan5 <-> eth0
> >   ...
> >
> > Patch 2 adds a new operation to the net device operations structure.
> > Currently we use the iflink property of a net device to report to which
> > CPU port a given switch port si connected to. The ip link utility from
> > iproute2 reports this as "lan1@eth0". We add a new net device operation,
> > ndo_set_iflink, which can be used to set this property. We call this
> > function from the netlink handlers.
> >
> > Patch 3 implements this new ndo_set_iflink operation for DSA slave
> > device. Thus the userspace can request a change of CPU port of a given
> > port.
> >
> > I am also sending patch for iproute2-next, to add support for setting
> > this iflink value.
> >
> > Marek
> >
>
> The topic is interesting.
> This changeset leaves the reader wanting to see a driver
> implementation of .port_change_cpu_port. (mostly to understand what
> your hardware is capable of)
> Will DSA assume that all CPU ports are equal in terms of tagging
> protocol abilities? There are switches where one of the CPU ports can
> do tagging and the other can't.

Just to be clear. You can argue that such switches are weird, and
that's ok. Just want to understand the general type of hardware for
which such a patch is intended.

> Is the static assignment between slave and CPU ports going to be the
> only use case? What about link aggregation? Flow steering perhaps?
> And like Andrew pointed out, how do you handle the receive case? What
> happens to flooded frames, will the switch send them to both CPU
> interfaces, and get received twice in Linux? How do you prevent that?
>
> > Marek Behún (3):
> >   net: dsa: allow for multiple CPU ports
> >   net: add ndo for setting the iflink property
> >   net: dsa: implement ndo_set_netlink for chaning port's CPU port
> >
> >  include/linux/netdevice.h |  5 +++
> >  include/net/dsa.h         | 11 ++++-
> >  net/core/dev.c            | 15 +++++++
> >  net/core/rtnetlink.c      |  7 ++++
> >  net/dsa/dsa2.c            | 84 +++++++++++++++++++++++++--------------
> >  net/dsa/slave.c           | 35 ++++++++++++++++
> >  6 files changed, 126 insertions(+), 31 deletions(-)
> >
> > --
> > 2.21.0
> >
>
> Regards,
> -Vladimir

^ permalink raw reply

* Re: [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port
From: Andrew Lunn @ 2019-08-24 15:47 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824024251.4542-4-marek.behun@nic.cz>

On Sat, Aug 24, 2019 at 04:42:50AM +0200, Marek Behún wrote:
> Implement ndo_set_iflink for DSA slave device. In multi-CPU port setup
> this should be used to change to which CPU destination port a given port
> should be connected.
> 
> This adds a new operation into the DSA switch operations structure,
> port_change_cpu_port. A driver implementing this function has the
> ability to change CPU destination port of a given port.
> 
> Signed-off-by: Marek Behún <marek.behun@nic.cz>
> ---
>  include/net/dsa.h |  6 ++++++
>  net/dsa/slave.c   | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 64bd70608f2f..4f3f0032b886 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -545,6 +545,12 @@ struct dsa_switch_ops {
>  	 */
>  	netdev_tx_t (*port_deferred_xmit)(struct dsa_switch *ds, int port,
>  					  struct sk_buff *skb);
> +
> +	/*
> +	 * Multi-CPU port support
> +	 */
> +	int	(*port_change_cpu_port)(struct dsa_switch *ds, int port,
> +					struct dsa_port *new_cpu_dp);
>  };

Hi Marek

We need to see an actual implementation of this. We don't add new APIs
without having a user.

	Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Andrew Lunn @ 2019-08-24 15:56 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Marek Behún, netdev, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <CA+h21hpBKnueT0QrVDL=Hhcp9X0rnaPW8omxiegq4TkcQ18EVQ@mail.gmail.com>

> Will DSA assume that all CPU ports are equal in terms of tagging
> protocol abilities? There are switches where one of the CPU ports can
> do tagging and the other can't.

Hi Vladimir

Given the current definition of what a CPU port is, we have to assume
the port is using tags. Frames have to be directed out a specific
egress port, otherwise things like BPDU, PTP will break. You cannot
rely on MAC address learning.

> Is the static assignment between slave and CPU ports going to be the
> only use case? What about link aggregation? Flow steering perhaps?
> And like Andrew pointed out, how do you handle the receive case? What
> happens to flooded frames, will the switch send them to both CPU
> interfaces, and get received twice in Linux? How do you prevent that?

I expect bad things will happen if frames are flooded to multiple CPU
ports. For this to work, the whole switch design needs to support
multiple CPU ports. I doubt this will work on any old switch.

Having a host interface connected to a user port of the switch is a
completely different uses case, and not what this patchset is about.

	   Andrew

^ permalink raw reply

* [PATCH net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments.
From: Justin Pettit @ 2019-08-24 16:58 UTC (permalink / raw)
  To: netdev, Pravin Shelar; +Cc: Joe Stringer

When IP fragments are reassembled before being sent to conntrack, the
key from the last fragment is used.  Unless there are reordering
issues, the last fragment received will not contain the L4 ports, so the
key for the reassembled datagram won't contain them.  This patch updates
the key once we have a reassembled datagram.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
---
 net/openvswitch/conntrack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 848c6eb55064..f40ad2a42086 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -524,6 +524,10 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
 		return -EPFNOSUPPORT;
 	}

+	/* The key extracted from the fragment that completed this datagram
+	 * likely didn't have an L4 header, so regenerate it. */
+	ovs_flow_key_update(skb, key);
+
 	key->ip.frag = OVS_FRAG_TYPE_NONE;
 	skb_clear_hash(skb);
 	skb->ignore_df = 1;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: Justin Pettit @ 2019-08-24 16:58 UTC (permalink / raw)
  To: netdev, Pravin Shelar; +Cc: Joe Stringer
In-Reply-To: <20190824165846.79627-1-jpettit@ovn.org>

Only the first fragment in a datagram contains the L4 headers.  When the
Open vSwitch module parses a packet, it always sets the IP protocol
field in the key, but can only set the L4 fields on the first fragment.
The original behavior would not clear the L4 portion of the key, so
garbage values would be sent in the key for "later" fragments.  This
patch clears the L4 fields in that circumstance to prevent sending those
garbage values as part of the upcall.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
---
 net/openvswitch/flow.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index bc89e16e0505..0fb2cec08523 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -623,6 +623,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		offset = nh->frag_off & htons(IP_OFFSET);
 		if (offset) {
 			key->ip.frag = OVS_FRAG_TYPE_LATER;
+			memset(&key->tp, 0, sizeof(key->tp));
 			return 0;
 		}
 		if (nh->frag_off & htons(IP_MF) ||
@@ -740,8 +741,10 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 			return error;
 		}

-		if (key->ip.frag == OVS_FRAG_TYPE_LATER)
+		if (key->ip.frag == OVS_FRAG_TYPE_LATER) {
+			memset(&key->tp, 0, sizeof(key->tp));
 			return 0;
+		}
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
 			key->ip.frag = OVS_FRAG_TYPE_FIRST;

-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports
From: Marek Behun @ 2019-08-24 17:41 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824154302.GB8251@lunn.ch>

On Sat, 24 Aug 2019 17:43:02 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> But i don't know if it is worth the effort. I've never seen a D in DSA
> setup with multiple CPUs ports. I've only ever seen an single switch
> with multiple CPU ports.

Yes, that exactly. I was thinking about the most optimal algorithm, but
such would need to consider speeds between links too. For example the
DSA port between two switches can be linked at 1 GB, but cpu can be
connected to switch with 2.5G. What assignment is best in that case?

I think that we should try to solve such issue when it arises, if ever.
Such cases are more reason to add the ability to change cpu ports for
given ports.

Marek

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-24 17:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824152407.GA8251@lunn.ch>

On Sat, 24 Aug 2019 17:24:07 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> So this is all about transmit from the host out the switch. What about
> receive? How do you tell the switch which CPU interface it should use
> for a port?

Andrew, we use the same. The DSA slave implementation of ndo_set_iflink
will also tell the switch driver to change the CPU port for that port.
Patch 3 also adds operation port_change_cpu_port to the DSA switch
operations. This is called from dsa_slave_set_iflink (at least in this
first proposal).

Marek

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Andrew Lunn @ 2019-08-24 17:54 UTC (permalink / raw)
  To: Marek Behun
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824194546.5c436bd6@nic.cz>

On Sat, Aug 24, 2019 at 07:45:46PM +0200, Marek Behun wrote:
> On Sat, 24 Aug 2019 17:24:07 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > So this is all about transmit from the host out the switch. What about
> > receive? How do you tell the switch which CPU interface it should use
> > for a port?
> 
> Andrew, we use the same. The DSA slave implementation of ndo_set_iflink
> will also tell the switch driver to change the CPU port for that port.
> Patch 3 also adds operation port_change_cpu_port to the DSA switch
> operations. This is called from dsa_slave_set_iflink (at least in this
> first proposal).

Yes, i noticed this later. The cover letter did not include a change
to a driver, so it was not clear you had considered receive, which is
very much in the hard of the switch driver, not the DSA core.

     Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-24 17:55 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <CA+h21ho=injFxAkm9AByk6An5EzQMOyGVkFA8eKUP-rgGFEW2Q@mail.gmail.com>

On Sat, 24 Aug 2019 18:44:44 +0300
Vladimir Oltean <olteanv@gmail.com> wrote:

> Just to be clear. You can argue that such switches are weird, and
> that's ok. Just want to understand the general type of hardware for
> which such a patch is intended.

Vladimir,

the general part should solve for devices like Turris 1.x (qca8k) and
Turris Omnia (mv88e6xxx). In these devices the switch is connected to
CPU via 2 ports, and 5 ports are connected to RJ-45s.

I answered Andrew's question about the receive path in previous mail.
To your other question I still would have to think about, but the
general idea is that for other types of frames the switch driver
should only use one CPU port, so that no frame would reach CPU 2 times.

I shall send proposed implementation for mv88e6xxx in next version,
perhaps this night.

Marek

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-24 17:58 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vladimir Oltean, netdev, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <20190824155636.GD8251@lunn.ch>

On Sat, 24 Aug 2019 17:56:36 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> I expect bad things will happen if frames are flooded to multiple CPU
> ports. For this to work, the whole switch design needs to support
> multiple CPU ports. I doubt this will work on any old switch.
> 
> Having a host interface connected to a user port of the switch is a
> completely different uses case, and not what this patchset is about.

In the next proposal I shall also add a guard to all DSA drivers, that
if more than one CPU port is set, the driver will not probe.

After that the next patch will try to add multi-CPU support to
mv88e6xxx (while removeing the guard for that driver).

qca8k should also be possible to do, since we used it in such a way in
openwrt. I shall look into that afterwards.

Marek

^ permalink raw reply

* Re: [PATCH net-next v2 3/9] net: dsa: mv88e6xxx: fix port hidden register macros
From: Vivien Didelot @ 2019-08-24 19:32 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190823212603.13456-4-marek.behun@nic.cz>

Hi Marek,

On Fri, 23 Aug 2019 23:25:57 +0200, Marek Behún <marek.behun@nic.cz> wrote:
>  /* Offset 0x1a: Magic undocumented errata register */

   /* Offset 0x1A: Reserved */

(nitpicking here, for consistency this other definitions as shown in docs.)

> -#define PORT_RESERVED_1A			0x1a
> -#define PORT_RESERVED_1A_BUSY			BIT(15)
> -#define PORT_RESERVED_1A_WRITE			BIT(14)
> -#define PORT_RESERVED_1A_READ			0
> -#define PORT_RESERVED_1A_PORT_SHIFT		5
> -#define PORT_RESERVED_1A_BLOCK			(0xf << 10)
> -#define PORT_RESERVED_1A_CTRL_PORT		4
> -#define PORT_RESERVED_1A_DATA_PORT		5
> +#define MV88E6XXX_PORT_RESERVED_1A		0x1a
> +#define MV88E6XXX_PORT_RESERVED_1A_BUSY		0x8000
> +#define MV88E6XXX_PORT_RESERVED_1A_WRITE	0x4000
> +#define MV88E6XXX_PORT_RESERVED_1A_READ		0x0000
> +#define MV88E6XXX_PORT_RESERVED_1A_PORT_SHIFT	5
> +#define MV88E6XXX_PORT_RESERVED_1A_BLOCK	0x3c00
> +#define MV88E6XXX_PORT_RESERVED_1A_CTRL_PORT	0x04
> +#define MV88E6XXX_PORT_RESERVED_1A_DATA_PORT	0x05

You are already using these macros in the previous patch. I guess you meant
to introduce this patch before. But since you are moving and renaming the
same code without functional changes, you may squash them together.


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH net-next v2 4/9] net: dsa: mv88e6xxx: create chip->info->ops->serdes_get_lane method
From: Vivien Didelot @ 2019-08-24 19:45 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190823212603.13456-5-marek.behun@nic.cz>

Hi Marek,

On Fri, 23 Aug 2019 23:25:58 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> +	/* SERDES lane mapping */
> +	int (*serdes_get_lane)(struct mv88e6xxx_chip *chip, int port);

I would prefer to keep the return code strictly for error checking as commonly
used in the driver:

    int (*serdes_get_lane)(struct mv88e6xxx_chip *chip, int port, int *lane);

Also the "lane" seems to be an address, so maybe u8 or u16 if more appropriate?


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH net-next v2 4/9] net: dsa: mv88e6xxx: create chip->info->ops->serdes_get_lane method
From: Vivien Didelot @ 2019-08-24 19:52 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190824154502.GD32555@t480s.localdomain>

Also can you place the mv88e6xxx_serdes_get_lane() function as static inline
in the serdes.h header? So that it's obvious that it's a wrapper and not a
switch implementation.

Ho and you can skip the 'chip->info->ops->' from the commit subject line ;-)

^ permalink raw reply

* Re: [PATCH net-next 6/6] net: dsa: clear VLAN flags for CPU port
From: Vladimir Oltean @ 2019-08-24 19:53 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Vivien Didelot, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <3c88db34-464a-1ab7-a525-66791faad698@gmail.com>

Hi Florian,

On Fri, 23 Aug 2019 at 20:00, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On 8/22/19 4:51 PM, Vladimir Oltean wrote:
> > On 8/22/19 11:13 PM, Vivien Didelot wrote:
> >> When the bridge offloads a VLAN on a slave port, we also need to
> >> program its dedicated CPU port as a member of the VLAN.
> >>
> >> Drivers may handle the CPU port's membership as they want. For example,
> >> Marvell as a special "Unmodified" mode to pass frames as is through
> >> such ports.
> >>
> >> Even though DSA expects the drivers to handle the CPU port membership,
> >> they are unlikely to program such VLANs untagged, and certainly not as
> >> PVID. This patch clears the VLAN flags before programming the CPU port.
> >>
> >> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> >> Suggested-by: Vladimir Oltean <olteanv@gmail.com>
> >> ---
> >>   net/dsa/slave.c | 6 ++++++
> >>   1 file changed, 6 insertions(+)
> >>
> >> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> >> index 8267c156a51a..48df48f76c67 100644
> >> --- a/net/dsa/slave.c
> >> +++ b/net/dsa/slave.c
> >> @@ -332,6 +332,12 @@ static int dsa_slave_vlan_add(struct net_device
> >> *dev,
> >>       if (err)
> >>           return err;
> >>   +    /* We need the dedicated CPU port to be a member of the VLAN as
> >> well.
> >> +     * Even though drivers often handle CPU membership in special ways,
> >> +     * CPU ports are likely to be tagged, so clear the VLAN flags.
> >> +     */
> >> +    vlan.flags = 0;
> >> +
> >
> > How does this work exactly?
> > If I run 'sudo bridge vlan add vid 1 dev swp4 pvid untagged', then the
> > CPU port starts sending VLAN-tagged traffic. I see this in tcpdump on
> > the DSA master port, but if I tcpdump on swp4, the VLAN tag is removed.
> > Who is doing that?
>
> If vlan.flags = 0, then it does not have either BRIDGE_VLAN_INFO_PVID or
> BRIDGE_VLAN_INFO_UNTAGGED which means the VLAN should be programmed
> tagged on the CPU.
>
> Since swp4 is part of the same VLAN, but has it configured PVID
> untagged, the tag is removed, that sounds about what I would expect to
> see...
> --
> Florian

The VLAN is "egress untagged", and "ingress tagged" (at least so it
becomes with this patch). Of course in tcpdump I was looking for
ingress traffic.
This patch is relying now on __netif_receive_skb_core[1] to remove the
VLAN header from frames as soon as they exit the DSA master and before
they enter the DSA packet_type handler. My point is that even untagged
traffic gets pvid-tagged on ingress, and the net core has to remove
the tag when it previously didn't have to. I'm not sure of other
implications.
Vivien, can't you just unset the PVID flag? Keeping the same
tagged/untagged setting on ingress as on egress does make more sense.

Regards,
-Vladimir

[1]: https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4898

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Florian Fainelli @ 2019-08-24 20:04 UTC (permalink / raw)
  To: Marek Behún, netdev
  Cc: Andrew Lunn, Vivien Didelot, David Ahern, Stephen Hemminger,
	Chris Healy, Vladimir Oltean
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

On 8/23/2019 7:42 PM, Marek Behún wrote:
> Hi,
> this is my attempt to solve the multi-CPU port issue for DSA.
> 
> Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> If more than one CPU port is found in a tree, the code assigns CPU ports
> to user/DSA ports in a round robin way. So for the simplest case where
> we have one switch with N ports, 2 of them of type CPU connected to eth0
> and eth1, and the other ports labels being lan1, lan2, ..., the code
> assigns them to CPU ports this way:
>   lan1 <-> eth0
>   lan2 <-> eth1
>   lan3 <-> eth0
>   lan4 <-> eth1
>   lan5 <-> eth0
>   ...
> 
> Patch 2 adds a new operation to the net device operations structure.
> Currently we use the iflink property of a net device to report to which
> CPU port a given switch port si connected to. The ip link utility from
> iproute2 reports this as "lan1@eth0". We add a new net device operation,
> ndo_set_iflink, which can be used to set this property. We call this
> function from the netlink handlers.
> 
> Patch 3 implements this new ndo_set_iflink operation for DSA slave
> device. Thus the userspace can request a change of CPU port of a given
> port.
> 
> I am also sending patch for iproute2-next, to add support for setting
> this iflink value.

This is going to be a long email, that is broken into several parts,
feel free to skip/reply on the parts you would like.

- review/comments on your approach here
- history of multiple CPU ports within Broadcom switches
- specific use case for a device that uses upstream drivers and a
Broadcom switch (BCM7278)

1) Your approach is kind of interesting here, not sure if it is the best
but it is not outright wrong. In the past, we had been talking about
different approaches, some of which seemed too simplistic or too narrow
on the use case, and some of which that are up in the air and were not
worked on.

- John Crispin submitted a patch series for the MTK switch driver a
while back that was picked up by Frank Wunderlich more recently. This
approach uses a Device Tree based configuration in order to statically
assign ports, or groups of ports to a specific DSA master device. This
is IMHO wrong because a) DT is not to impose a policy but strictly
describe HW, and b) there was no way to change that static assignment at
runtime.

- Based on that patch series, Andrew, Vivien, Frank and myself discussed
two possible options:
	- allowing the enslaving of DSA master devices in the bridge, so as to
provide a hint that specific DSA slave network devices should be
"bound"/"linked" to a specific DSA master device. This requires
modifications in the bridge layer to avoid undoing what commit
8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e ("net: bridge: reject
DSA-enabled master netdevices as bridge members"). This would also
require a bridge to be set-up

	- enhancing the iproute2 command and backing kernel code in order to
allow defining that a DSA slave device may be enslaved into a specific
DSA master, similarly to how you currently enslave a device into a
bridge, you could "enslave" a DSA slave to a DSA master with something
that could look like this:

	ip link set dev sw0p0 master eth0	# Associate port 0 with eth0
	ip link set dev sw0p1 master eth1	# Associate port 1 with eth1

To date, this may be the best way to do what we want here, rather than
use the iflink which is a bit re-purposing something that is not exactly
meant for that.

2) With Broadcom Ethernet switches there has been historically few
designs that allowed the following:

- 6 port(s) switches: port 5 was the CPU port, always and supports
tagging (EthType + 4bytes Broadcom tag). This is the 5325, 5365 class of
switches, they are nearly 20 years old now.

- 9 port(s) switches: both port 5 and port 8 could be defined as In-band
Managemement Port(s) (IMP) and port 5 is IMP1 and port 8 is IMP0 by
default, preference is to use IMP0. Tagging is only supported on those
two ports. These are the 5395, 53125 and similar switches. Port 5 is
typically meant to be connected to a WAN interface where it might be
necessary to run some specific management protocol like 802.1x and such
that would require a managed switch to assist with those tasks. Port 5
can do most of what port 8 does, except when it comes to classification
and specific remapping rules, that limitation is carried forward with
all switches described below.

- 9 port(s) switches: port 5, 7 or 8 support tagging, with port 5 and 8
being possible management ports. Tagging is permitted on port 7 to
provide in-band information about packets (e.g.: classification ID, QoS,
etc.) to an "accelerator" (whatever it is), or a MoCA interface behind
port 7. This is the BCM5301X class, the NorthStar Plus (58xxx), and the
BCM7445 switches. Tagging can be enabled on RX, or TX, or both.

- 9 port(s) switches: all ports support tagging, with RX only, TX only,
or both directions supporting tagging. Again, port 8 remains the default
and preferred management port. This is the BCM7278 class of switches.

What needs to be considered here is that while multiple CPU ports may be
defined, if say, both port 8 and port 5 are defined, it is preferable to
continue using port 8 as the default management port because it is the
most capable, therefore having a switch driver callback that allows us
to elect the most suitable CPU/management port might be a good idea. If
other switches treat all CPU ports equal, no need to provide that callback.

3) On the BCM7278 system we have 3 external ports wired, 1 internal port
to an audio/video streaming accelerator and 2 ports wired to Ethernet
MACs, on port 8 and port 5, an ascii diagram looks like this:

-------------------	-------------------
|SYSTEMPORT Lite 0|	|SYSTEMPORT Lite 1|
-------------------	-------------------
	|			|
	|			|
-------------------------------------------------
| 	Port 8			Port 5		|
|						|     ----------------
|					Port 7	|-----| A/V streaming|
|						|     ----------------
| Port 0   Port 1   Port 2			|
------------------------------------------------|
   GPHY    RGMII_1  RGMII_2

GPHY is an integrated Gigabit PHY, RGMII_1 and 2 connect to external
MII/RevMII/GMII/RGMII external PHYs.

The Device Tree for that system declares both CPU ports by providing a
phandle to the respective Ethernet MAC controllers. Now, depending on
the kernel version though, you may have different behaviors:

4.9 is our downstream production kernel ATM and on such a system you have:

DSA master:
eth0 (port 8)

DSA slaves:
gphy (port 0)
rgmii_1 (port 1)
rgmii_2 (port 2)
asp (port 7)
wifi (port 5)

Standard interface:
eth1 (port 5)

And here you should be like: hold on Florian, you have two interfaces
that each represent one side of the pipe (wifi, eth1), is not that
counter to the very DSA principles?

On an upstream kernel though, eth0 is still present, but eth1, because
it is connected to port 5 and thus has a lower number, gets chosen as
the DSA master, and then the "wifi" interface is not created at all. all
of that is expeccted.

Now, the 4.9 kernel behavior actually works just fine because eth1 is
not a special interface, so no tagging is expected, and "wifi", although
it supports DSA tagging, represents another side of the CPU/host network
stack, so you never have to inject frames into the switch, because you
can use eth1 to do that and let MAC learning do its job to forward to
the correct port of the switch.

Likewise, for a frame ingressing port 0 for a MAC address that is behind
port 5 that works too. The "wifi" interface here acts as a control
interface that allows us to have a configuration end-point for port
number 5.

The typical set-up we have involves two bridge devices:

br-lan spans port 0, port 1, port 2, port 7 and port 5 and takes care of
putting all of these ports in the same broadcast domain

br-wifi spans eth1 and another device, e.g: wlan0 and takes care of
bridging LAN to WLAN

Now, let's say your use case involves doubling the bandwidth for
routing/NAT and you have two CPU ports for that purpose. You could use
exactly that same set-up as described, and create a LAN bridge that does
not span the switch port to which your second Ethernet MAC is connected
to. Leave that WAN port as a standalone DSA device. Although you do need
a way to indicate somehow that port X connects to Ethernet MAC Y, which
Device Tree already provides.

Giving configuration control such that you can arbitrarily assign DSA
slaves to a given DSA master is fine, but what problems does it
potentially creates?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 8/9] net: dsa: mv88e6xxx: support Block Address setting in hidden registers
From: Vivien Didelot @ 2019-08-24 20:13 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190823212603.13456-9-marek.behun@nic.cz>

Hi Marek,

On Fri, 23 Aug 2019 23:26:02 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> -int mv88e6xxx_port_hidden_write(struct mv88e6xxx_chip *chip, int port, int reg,
> -				u16 val);
> +int mv88e6xxx_port_hidden_write(struct mv88e6xxx_chip *chip, int block, int port,
> +				int reg, u16 val);
>  int mv88e6xxx_port_hidden_wait(struct mv88e6xxx_chip *chip);
> -int mv88e6xxx_port_hidden_read(struct mv88e6xxx_chip *chip, int port, int reg,
> -			       u16 *val);
> +int mv88e6xxx_port_hidden_read(struct mv88e6xxx_chip *chip, int block, int port,
> +			       int reg, u16 *val);

There's something I'm having trouble to follow here. This series keeps
adding and modifying its own code. Wouldn't it be simpler for everyone
if you directly implement the final mv88e6xxx_port_hidden_{read,write}
functions taking this block argument, and update the code to switch to it?

While at it, I don't really mind the "hidden" name, but is this the name
used in the documentation, if any?

Thank for you patience,

	Vivien

^ permalink raw reply

* Re: [PATCH] af_unix: utilize skb's fragment list for sending large datagrams
From: Denis Lunev @ 2019-08-24 20:38 UTC (permalink / raw)
  To: David Miller, Jan Dakinevich
  Cc: linux-kernel@vger.kernel.org, Konstantin Khorenko,
	pabeni@redhat.com, viro@zeniv.linux.org.uk, axboe@kernel.dk,
	hare@suse.com, kgraul@linux.ibm.com, kyeongdon.kim@lge.com,
	tglx@linutronix.de, netdev@vger.kernel.org
In-Reply-To: <20190822.120421.71092037400077946.davem@davemloft.net>

On 8/22/19 9:04 PM, David Miller wrote:
> From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
> Date: Thu, 22 Aug 2019 10:38:39 +0000
>
>> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
>> datagram causes increasing skb's data buffer. Thus, if any user-space
>> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
>> maximum allowed size (wmem_max) it becomes able to cause any amount
>> of uncontrolled high-order kernel allocations.
> So?  You want huge SKBs you get the high order allocations, seems
> rather reasonable to me.
>
> SKBs using fragment lists are the most difficult and cpu intensive
> geometry for an SKB to have and we should avoid using it where
> feasible.
>
> I don't want to apply this, sorry.
Under even mediocre memory pressure this will either takes seconds or fail,
which does not look good. We can try to allocate memory of big order
but not that hard and switch to fragments when possible.

Please also note that even ordinary user could trigger really big
allocations
and thus force the whole node to dance.

Den

Den

^ permalink raw reply

* Re: pull-request: ieee802154 for net 2019-08-24
From: David Miller @ 2019-08-24 20:47 UTC (permalink / raw)
  To: stefan; +Cc: linux-wpan, alex.aring, netdev
In-Reply-To: <20190824121953.27839-1-stefan@datenfreihafen.org>

From: Stefan Schmidt <stefan@datenfreihafen.org>
Date: Sat, 24 Aug 2019 14:19:53 +0200

> An update from ieee802154 for your *net* tree.
> 
> Yue  Haibing fixed two bugs discovered by KASAN in the hwsim driver for
> ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
> 
> If there are any problems let me know.

Pulled, thank you.

^ permalink raw reply

* Re: [PATCH net-next v2 8/9] net: dsa: mv88e6xxx: support Block Address setting in hidden registers
From: Marek Behun @ 2019-08-24 20:52 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190824161328.GI32555@t480s.localdomain>

On Sat, 24 Aug 2019 16:13:28 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> Hi Marek,
> 
> On Fri, 23 Aug 2019 23:26:02 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> > -int mv88e6xxx_port_hidden_write(struct mv88e6xxx_chip *chip, int port, int reg,
> > -				u16 val);
> > +int mv88e6xxx_port_hidden_write(struct mv88e6xxx_chip *chip, int block, int port,
> > +				int reg, u16 val);
> >  int mv88e6xxx_port_hidden_wait(struct mv88e6xxx_chip *chip);
> > -int mv88e6xxx_port_hidden_read(struct mv88e6xxx_chip *chip, int port, int reg,
> > -			       u16 *val);
> > +int mv88e6xxx_port_hidden_read(struct mv88e6xxx_chip *chip, int block, int port,
> > +			       int reg, u16 *val);  
> 
> 
> There's something I'm having trouble to follow here. This series keeps
> adding and modifying its own code. Wouldn't it be simpler for everyone
> if you directly implement the final mv88e6xxx_port_hidden_{read,write}
> functions taking this block argument, and update the code to switch to it?

I wanted the commits to be atomic, in the sense that one commit does
not do three different things at once. Renaming macros is cosmetic
change, and moving functions to another file is a not a semantic
change, while adding additional argument to functions is a semantic
change. I can of course do all in one patch, but I though it would be
better not to.

> While at it, I don't really mind the "hidden" name, but is this the name
> used in the documentation, if any?

Yes, the registers are indeed named Hidden Registers in documentation.

^ permalink raw reply

* Re: [PATCH net-next v2 3/9] net: dsa: mv88e6xxx: fix port hidden register macros
From: Marek Behun @ 2019-08-24 20:54 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190824153254.GB32555@t480s.localdomain>

On Sat, 24 Aug 2019 15:32:54 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> You are already using these macros in the previous patch. I guess you meant
> to introduce this patch before. But since you are moving and renaming the
> same code without functional changes, you may squash them together.

Hm, you are right, I accidently created a commit which would not
build. :( I thought that I tried to build after each commit, but it
seems I forgot at least one.

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-24 21:01 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, Andrew Lunn, Vivien Didelot, David Ahern,
	Stephen Hemminger, Chris Healy, Vladimir Oltean
In-Reply-To: <a7fed8ab-60f3-a30c-5634-fd89e4daf44d@gmail.com>

On Sat, 24 Aug 2019 13:04:04 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> 1) Your approach is kind of interesting here, not sure if it is the best
> but it is not outright wrong. In the past, we had been talking about
> different approaches, some of which seemed too simplistic or too narrow
> on the use case, and some of which that are up in the air and were not
> worked on.
> 
> - John Crispin submitted a patch series for the MTK switch driver a
> while back that was picked up by Frank Wunderlich more recently. This
> approach uses a Device Tree based configuration in order to statically
> assign ports, or groups of ports to a specific DSA master device. This
> is IMHO wrong because a) DT is not to impose a policy but strictly
> describe HW, and b) there was no way to change that static assignment at
> runtime.
> 
> - Based on that patch series, Andrew, Vivien, Frank and myself discussed
> two possible options:
> 	- allowing the enslaving of DSA master devices in the bridge, so as to
> provide a hint that specific DSA slave network devices should be
> "bound"/"linked" to a specific DSA master device. This requires
> modifications in the bridge layer to avoid undoing what commit
> 8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e ("net: bridge: reject
> DSA-enabled master netdevices as bridge members"). This would also
> require a bridge to be set-up
> 
> 	- enhancing the iproute2 command and backing kernel code in order to
> allow defining that a DSA slave device may be enslaved into a specific
> DSA master, similarly to how you currently enslave a device into a
> bridge, you could "enslave" a DSA slave to a DSA master with something
> that could look like this:
> 
> 	ip link set dev sw0p0 master eth0	# Associate port 0 with eth0
> 	ip link set dev sw0p1 master eth1	# Associate port 1 with eth1
> 
> To date, this may be the best way to do what we want here, rather than
> use the iflink which is a bit re-purposing something that is not exactly
> meant for that.

We cannot use "set master" to set CPU port, since that is used for
enslaving interfaces to bridges. There are usecases where these would
conflict with each other. The semantics would become complicated and
the documentation would became weird to users.

We are *already* using the iflink property to report which CPU device
is used as CPU destination port for a given switch slave interface. So
why to use that for changing this, also?

If you think that iflink should not be used for this, and other agree,
then we should create a new property, something like dsa-upstream, (eg.
ip link set dev sw0p0 dsa-upstream eth0). Using the "master" property
is not right, IMO.

Marek

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox