Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/7] net: dsa: mv88e6xxx: add ability to set default queue priorities per port
From: Vivien Didelot @ 2019-09-10 16:43 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-3-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:48 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> +static int mv88e6xxx_set_port_defqpri(struct mv88e6xxx_chip *chip, int port)
> +{
> +	struct dsa_switch *ds = chip->ds;
> +	struct device_node *dn = ds->ports[port].dn;
> +	int err;
> +	u32 pri;
> +
> +	if (!dn || !chip->info->ops->port_set_defqpri)
> +		return 0;
> +
> +	err = of_property_read_u32(dn, "defqpri", &pri);
> +	if (err < 0)
> +		return 0;
> +
> +	return chip->info->ops->port_set_defqpri(chip, port, (u16)pri);
> +}
> +
>  static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> @@ -2176,6 +2193,10 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  			return err;
>  	}
>  
> +	err = mv88e6xxx_set_port_defqpri(chip, port);
> +	if (err)
> +		return err;
> +
>  	/* Port Association Vector: when learning source addresses
>  	 * of packets, add the address to the address database using
>  	 * a port bitmap that has only the bit for this port set and
> @@ -3107,6 +3128,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,

Please use a reference model, like mv88e6352_port_set_defqpri to avoid
confusion with a generic wrapper or implementation.

>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3190,6 +3212,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3407,6 +3430,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3750,6 +3774,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 4646e46d47f2..2d2c24f5a79d 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -383,6 +383,7 @@ struct mv88e6xxx_ops {
>  				   u16 etype);
>  	int (*port_set_jumbo_size)(struct mv88e6xxx_chip *chip, int port,
>  				   size_t size);
> +	int (*port_set_defqpri)(struct mv88e6xxx_chip *chip, int port, u16 pri);

The default queue priority seems to be an integer in the [0:3] range, not
a register mask or value per-se. So an unsigned int seems more appropriate.

>  
>  	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port);
>  	int (*port_pause_limit)(struct mv88e6xxx_chip *chip, int port, u8 in,
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 04309ef0a1cc..3a45fcd5cd9c 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1147,6 +1147,25 @@ int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL2, reg);
>  }
>  
> +int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri)
> +{
> +	u16 reg;
> +	int err;
> +
> +	if (pri > 3)
> +		return -EINVAL;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL2, &reg);
> +	if (err)
> +		return err;
> +
> +	reg &= ~MV88E6XXX_PORT_CTL2_DEFQPRI_MASK;
> +	reg |= pri << MV88E6XXX_PORT_CTL2_DEFQPRI_SHIFT;

                      __bf_shf(MV88E6XXX_PORT_CTL2_DEFQPRI_MASK)

> +	reg |= MV88E6XXX_PORT_CTL2_USE_DEFQPRI;
> +
> +	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL2, reg);
> +}
> +
>  /* Offset 0x09: Port Rate Control */
>  
>  int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
> diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
> index 8d5a6cd6fb19..03884bbaa762 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.h
> +++ b/drivers/net/dsa/mv88e6xxx/port.h
> @@ -197,6 +197,9 @@
>  #define MV88E6XXX_PORT_CTL2_DEFAULT_FORWARD		0x0040
>  #define MV88E6XXX_PORT_CTL2_EGRESS_MONITOR		0x0020
>  #define MV88E6XXX_PORT_CTL2_INGRESS_MONITOR		0x0010
> +#define MV88E6XXX_PORT_CTL2_USE_DEFQPRI		0x0008
> +#define MV88E6XXX_PORT_CTL2_DEFQPRI_MASK		0x0006
> +#define MV88E6XXX_PORT_CTL2_DEFQPRI_SHIFT		1

No shift macro needed, MV88E6XXX_PORT_CTL2_DEFQPRI_MASK is enough.

>  #define MV88E6095_PORT_CTL2_CPU_PORT_MASK		0x000f
>  
>  /* Offset 0x09: Egress Rate Control */
> @@ -326,6 +329,7 @@ int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
>  				    bool message_port);
>  int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  				  size_t size);
> +int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
>  int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
>  int mv88e6097_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,

Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Florian Fainelli @ 2019-09-10 16:49 UTC (permalink / raw)
  To: Robert Beckett, netdev
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Ido Schimmel,
	Jiri Pirko
In-Reply-To: <20190910154238.9155-1-bob.beckett@collabora.com>

+Ido, Jiri,

On 9/10/19 8:41 AM, Robert Beckett wrote:
> This patch-set adds support for some features of the Marvell switch
> chips that can be used to handle packet storms.
> 
> The rationale for this was a setup that requires the ability to receive
> traffic from one port, while a packet storm is occuring on another port
> (via an external switch with a deliberate loop). This is needed to
> ensure vital data delivery from a specific port, while mitigating any
> loops or DoS that a user may introduce on another port (can't guarantee
> sensible users).

The use case is reasonable, but the implementation is not really. You
are using Device Tree which is meant to describe hardware as a policy
holder for setting up queue priorities and likewise for queue scheduling.

The tool that should be used for that purpose is tc and possibly an
appropriately offloaded queue scheduler in order to map the desired
scheduling class to what the hardware supports.

Jiri, Ido, how do you guys support this with mlxsw?

> 
> [patch 1/7] configures auto negotiation for CPU ports connected with
> phys to enable pause frame propogation.
> 
> [patch 2/7] allows setting of port's default output queue priority for
> any ingressing packets on that port.
> 
> [patch 3/7] dt-bindings for patch 2.
> 
> [patch 4/7] allows setting of a port's queue scheduling so that it can
> prioritise egress of traffic routed from high priority ports.
> 
> [patch 5/7] dt-bindings for patch 4.
> 
> [patch 6/7] allows ports to rate limit their egress. This can be used to
> stop the host CPU from becoming swamped by packet delivery and exhasting
> descriptors.
> 
> [patch 7/7] dt-bindings for patch 6.
> 
> 
> Robert Beckett (7):
>   net/dsa: configure autoneg for CPU port
>   net: dsa: mv88e6xxx: add ability to set default queue priorities per
>     port
>   dt-bindings: mv88e6xxx: add ability to set default queue priorities
>     per port
>   net: dsa: mv88e6xxx: add ability to set queue scheduling
>   dt-bindings: mv88e6xxx: add ability to set queue scheduling
>   net: dsa: mv88e6xxx: add egress rate limiting
>   dt-bindings: mv88e6xxx: add egress rate limiting
> 
>  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +++++
>  drivers/net/dsa/mv88e6xxx/chip.c              | 122 ++++++++++++---
>  drivers/net/dsa/mv88e6xxx/chip.h              |   5 +-
>  drivers/net/dsa/mv88e6xxx/port.c              | 140 +++++++++++++++++-
>  drivers/net/dsa/mv88e6xxx/port.h              |  24 ++-
>  include/dt-bindings/net/dsa-mv88e6xxx.h       |  22 +++
>  net/dsa/port.c                                |  10 ++
>  7 files changed, 327 insertions(+), 34 deletions(-)
>  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH 3/7] dt-bindings: mv88e6xxx: add ability to set default queue priorities per port
From: Vivien Didelot @ 2019-09-10 16:49 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Robert Beckett, netdev, Andrew Lunn, David S. Miller, Rob Herring,
	Mark Rutland, devicetree, Jiri Pirko, Ido Schimmel
In-Reply-To: <23101286-4da2-2a53-e7cd-71ead263bbaa@gmail.com>

Hi Robert,

On Tue, 10 Sep 2019 09:42:24 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> This is a vendor specific driver/property,
> marvell,default-queue-priority (which be cheapskate on words) would be
> more readable. But still, I have some more fundamental issues with the
> general approach, see my response in the cover letter.

As Florian said, the DT is unlikely to welcome vendor specific nodes for
configuration which may be generic through standard network userspace tools.


Thanks,

	Vivien

^ permalink raw reply

* RE: [PATCH net-next 0/6] net: stmmac: Improvements for -next
From: Jose Abreu @ 2019-09-10 16:50 UTC (permalink / raw)
  To: Jose Abreu, netdev@vger.kernel.org
  Cc: Joao Pinto, Giuseppe Cavallaro, Alexandre Torgue, David S. Miller,
	Maxime Coquelin, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <cover.1568126224.git.joabreu@synopsys.com>

From: Jose Abreu <joabreu@synopsys.com>
Date: Sep/10/2019, 15:41:21 (UTC+00:00)

> Misc patches for -next. It includes:
>  - Two fixes for features in -next only
>  - New features support for GMAC cores (which includes GMAC4 and GMAC5)

BTW, just for reference (and because I forgot to attach it earlier), 
this is the selftests output for GMAC5.10 after this patchset:

# ethtool -t ens4
The test result is PASS
The test extra info:
 1. MAC Loopback         	 0
 2. PHY Loopback         	 0
 3. MMC Counters         	 0
 4. EEE                  	 -95
 5. Hash Filter MC       	 0
 6. Perfect Filter UC    	 0
 7. MC Filter            	 0
 8. UC Filter            	 0
 9. Flow Control         	 0
10. RSS                  	 -95
11. VLAN Filtering       	 0
12. Double VLAN Filtering	 0
13. Flexible RX Parser   	 0
14. SA Insertion (desc)  	 0
15. SA Replacement (desc)	 0
16. SA Insertion (reg)  	 0
17. SA Replacement (reg)	 0
18. VLAN TX Insertion   	 0
19. SVLAN TX Insertion  	 0
20. L3 DA Filtering     	 -95
21. L3 SA Filtering     	 -95
22. L4 DA TCP Filtering 	 -95
23. L4 SA TCP Filtering 	 -95
24. L4 DA UDP Filtering 	 -95
25. L4 SA UDP Filtering 	 -95
26. ARP Offload         	 0
27. Jumbo Frame         	 0
28. Multichannel Jumbo  	 0
29. Split Header        	 -95

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: Is bug 200755 in anyone's queue??
From: Paolo Abeni @ 2019-09-10 16:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Steve Zabele, Eric Dumazet, Mark KEATON, Network Development,
	shum@canndrew.org, vladimir116@gmail.com, saifi.khan@strikr.in,
	Daniel Borkmann, on2k16nm@gmail.com, Stephen Hemminger,
	Craig Gallek
In-Reply-To: <CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com>

Hi all,

On Tue, 2019-09-10 at 11:52 -0400, Willem de Bruijn wrote:
> This clearly has some loose ends and is no shorter or simpler. So
> unless anyone has comments or a different solution, I'll finish
> up the first variant.

I'm sorry for the late feedback.

I was wondering if we could use a new UDP-specific setsockopt to remove
the connected socket from the reuseport group at connect() time?

That would not have any behavioral change for existing application
leveraging the current reuseport implementation and requires possibly a
simpler implementation, but would need application changes for UDP
servers doing reuse/connect().

WDYT?

Cheers,

Paolo


^ permalink raw reply

* Re: Default qdisc not correctly initialized with custom MTU
From: Cong Wang @ 2019-09-10 16:56 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Netdev
In-Reply-To: <dbc359d3-5cac-9b2e-6520-df4a25964bd3@applied-asynchrony.com>

On Tue, Sep 10, 2019 at 2:14 AM Holger Hoffstätte
<holger@applied-asynchrony.com> wrote:
>
> On 9/10/19 12:52 AM, Cong Wang wrote:
> > On Mon, Sep 9, 2019 at 5:44 AM Holger Hoffstätte
> > <holger@applied-asynchrony.com> wrote:
> >> I can't help but feel this is a slight bug in terms of initialization order,
> >> and that the default qdisc should only be created when it's first being
> >> used/attached to a link, not when the sysctls are configured.
> >
> > Yeah, this is because the fq_codel qdisc is initialized once and
> > doesn't get any notification when the netdev's MTU get changed.
>
> My point was that it shouldn't be created or initialized at all when
> the sysctl is configured, only the name should be validated/stored and
> queried when needed. If any interface is brought up before that point,
> no value (yet) would just mean "trod along with the defaults" to whoever
> is doing the work.

It is _not_ created when sysctl is configured, it is either created via tc
command, or implicitly created by kernel when you bring up eth0.
sysctl only tells kernel what to create by default, but never commits it.

>
> > We can "fix" this by adding a NETDEV_CHANGEMTU notifier to
> > qdisc's, but I don't know if it is really worth the effort.
>
> This is essentially the opposite of what I had in mind. The problem is
> that the entity was created, not that it needs to be notified.

Hmm? You did change MTU after adding fq_codel to eth0, right?
So how do you fix this without notification or recreation of fq_codel
in your mind?

I am happy to hear more details.

> Also I don't think that would work for scenarios with multiple links
> using different MTUs.

The fq_codel you created is apparently attached to a netdev,
I don't think this is even a problem. I _guess_ you somehow
believe you create a standalone fq_codel during sysctl setting,
this is just impossible. It must be attached to an interface, no
matter who creates it.

>
> > Is there any reason you can't change that order?
>
> Yes, because that wouldn't solve anything?

Really? You already said it works for you like below, I am confused.


> Like i said I can just kick the root qdisc to update itself in
> a post interface-setup script, and that works fine. Since I need
> that script anyway for setting several other parameters for
> the device it's no big deal - just another workaround.
>
> A brief look at the initialization in sch_mq/sch_generic unfortunately
> didn't really help clear things up for me, hence I guess my real
> question is whether a qdisc *must* be created early for some reason
> (assuming sysctls come before link setup), or whether this is something
> that could be delayed and done on-demand.

The default qdisc is created by kernel when you don't create any.
Again, you can create your own after changing the MTU, this should
solve the problem you see. It is all about ordering.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Florian Fainelli @ 2019-09-10 16:56 UTC (permalink / raw)
  To: Robert Beckett, netdev; +Cc: Andrew Lunn, Vivien Didelot, David S. Miller
In-Reply-To: <20190910154238.9155-2-bob.beckett@collabora.com>

On 9/10/19 8:41 AM, Robert Beckett wrote:
> Configure autoneg for phy connected CPU ports.
> This allows us to use autoneg between the CPU port's phy and the link
> partner's phy.
> This enables us to negoatiate pause frame transmission to prioritise
> packet delivery over throughput.

s/autoneg/auto-negotiation/
s/phy/PHY/
s/negoatiate/negotiate/
s/prioritise/prioritize/ (maybe the latter is just my US english
dictionary tripping up)

Also the subject should be net: dsa: Configure auto-negotiation for CPU
port to match previous submissions done to that file.

Fixing up that code path sounds reasonable, but are you not hitting the
PHYLINK code path instead?

> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  net/dsa/port.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index f071acf2842b..1b6832eac2c5 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
>  		return PTR_ERR(phydev);
>  
>  	if (enable) {
> +		phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII |
> +				    SUPPORTED_AUI | SUPPORTED_FIBRE |
> +				    SUPPORTED_BNC | SUPPORTED_Pause |
> +				    SUPPORTED_Asym_Pause;
> +		phydev->advertising = phydev->supported;
> +
>  		err = genphy_config_init(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
>  
> +		err = genphy_config_aneg(phydev);
> +		if (err < 0)
> +			goto err_put_dev;
> +
>  		err = genphy_resume(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
> 


-- 
Florian

^ permalink raw reply

* Re: Is bug 200755 in anyone's queue??
From: Willem de Bruijn @ 2019-09-10 17:10 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, Steve Zabele, Eric Dumazet, Mark KEATON,
	Network Development, shum@canndrew.org, vladimir116@gmail.com,
	saifi.khan@strikr.in, Daniel Borkmann, Stephen Hemminger,
	Craig Gallek
In-Reply-To: <e364d94b2d2a2342f192d6e80fec4798578a5d07.camel@redhat.com>

On Tue, Sep 10, 2019 at 12:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi all,
>
> On Tue, 2019-09-10 at 11:52 -0400, Willem de Bruijn wrote:
> > This clearly has some loose ends and is no shorter or simpler. So
> > unless anyone has comments or a different solution, I'll finish
> > up the first variant.
>
> I'm sorry for the late feedback.
>
> I was wondering if we could use a new UDP-specific setsockopt to remove
> the connected socket from the reuseport group at connect() time?
>
> That would not have any behavioral change for existing application
> leveraging the current reuseport implementation and requires possibly a
> simpler implementation, but would need application changes for UDP
> servers doing reuse/connect().
>
> WDYT?

Thanks for taking a look, too, Paolo.

I looked into detaching the sockets from the group at connect time. It
could be done without setsockopt, even. Unfortunately, it brings other
problems.

The reuseport group is still there, so may still match sockets
before the connection. If the connected socket no longer has
sk_reuseport set, binding new sockets will fail on conflict. And so a
few more.

^ permalink raw reply

* Re: [PATCH 6/7] net: dsa: mv88e6xxx: add egress rate limiting
From: Vivien Didelot @ 2019-09-10 17:13 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-7-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:52 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> Add code for specifying egress rate limiting per port.
> The rate can be specified as ethernet frames or bits per second.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c |  72 ++++++++++++++-------
>  drivers/net/dsa/mv88e6xxx/chip.h |   3 +-
>  drivers/net/dsa/mv88e6xxx/port.c | 106 ++++++++++++++++++++++++++++---
>  drivers/net/dsa/mv88e6xxx/port.h |  14 +++-
>  4 files changed, 158 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 2bc22c59200c..8c116496ab2f 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2120,6 +2120,32 @@ static int mv88e6xxx_set_port_sched(struct mv88e6xxx_chip *chip, int port)
>  	return chip->info->ops->port_set_sched(chip, port, (u16)sched);
>  }
>  
> +static int mv88e6xxx_set_port_egress_rate_limiting(struct mv88e6xxx_chip *chip,
> +						   int port)
> +{
> +	struct dsa_switch *ds = chip->ds;
> +	struct device_node *dn = ds->ports[port].dn;
> +	int err;
> +	u32 mode, count;
> +
> +	if (!dn || !chip->info->ops->port_egress_rate_limiting)
> +		return 0;
> +
> +	err = of_property_read_u32(dn, "egress-limit-mode", &mode);
> +	if (err < 0)
> +		goto disable;
> +
> +	err = of_property_read_u32(dn, "egress-limit-count", &count);
> +	if (err < 0)
> +		goto disable;
> +
> +	return chip->info->ops->port_egress_rate_limiting(chip, port, count,
> +							  mode);
> +
> +disable:
> +	return chip->info->ops->port_egress_rate_limiting(chip, port, 0, 0);
> +}
> +
>  static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> @@ -2263,11 +2289,9 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  			return err;
>  	}
>  
> -	if (chip->info->ops->port_egress_rate_limiting) {
> -		err = chip->info->ops->port_egress_rate_limiting(chip, port);
> -		if (err)
> -			return err;
> -	}
> +	err = mv88e6xxx_set_port_egress_rate_limiting(chip, port);
> +	if (err)
> +		return err;
>  
>  	err = mv88e6xxx_setup_message_port(chip, port);
>  	if (err)
> @@ -2809,7 +2833,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
>  	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -2879,7 +2903,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6095_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -2951,7 +2975,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_upstream_port = mv88e6095_port_set_upstream_port,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_set_pause = mv88e6185_port_set_pause,
>  	.port_link_state = mv88e6352_port_link_state,
> @@ -2994,7 +3018,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3034,7 +3058,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3108,7 +3132,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3150,7 +3174,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3193,7 +3217,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3235,7 +3259,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3275,7 +3299,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
>  	.port_set_speed = mv88e6185_port_set_speed,
>  	.port_set_frame_mode = mv88e6085_port_set_frame_mode,
>  	.port_set_egress_floods = mv88e6185_port_set_egress_floods,
> -	.port_egress_rate_limiting = mv88e6095_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_upstream_port = mv88e6095_port_set_upstream_port,
>  	.port_set_pause = mv88e6185_port_set_pause,
>  	.port_link_state = mv88e6185_port_link_state,
> @@ -3454,7 +3478,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3587,7 +3611,7 @@ static const struct mv88e6xxx_ops mv88e6320_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3630,7 +3654,7 @@ static const struct mv88e6xxx_ops mv88e6321_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3673,7 +3697,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3716,7 +3740,7 @@ static const struct mv88e6xxx_ops mv88e6350_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3755,7 +3779,7 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3799,7 +3823,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3851,7 +3875,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6390_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3900,7 +3924,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6390_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index ff3e35eceee0..75fbd5df4aae 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -385,7 +385,8 @@ struct mv88e6xxx_ops {
>  				   size_t size);
>  	int (*port_set_defqpri)(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  
> -	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port);
> +	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port,
> +					 u32 count, u32 mode);
>  	int (*port_set_sched)(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  	int (*port_pause_limit)(struct mv88e6xxx_chip *chip, int port, u8 in,
>  				u8 out);
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 236732fc598d..41418cfaca56 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1166,21 +1166,107 @@ int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri)
>  	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL2, reg);
>  }
>  
> -/* Offset 0x09: Port Rate Control */
> +/* Offset 0x09: Port Rate Control
> + * Offset 0x0A: Egress Rate Control 2
> + */
>  
> -int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
> +#define Kb			1000
> +#define Mb			(1000 * Kb)
> +#define Gb			(1000ull * Mb)
> +#define EGRESS_FRAME_RATE_MIN	7632
> +#define EGRESS_FRAME_RATE_MAX	31250000
> +#define EGRESS_BPS_RATE_MIN	(64 * Kb)
> +#define EGRESS_BPS_RATE_MAX	(1 * Gb)
> +#define EGRESS_RATE_PERIOD	32
> +int mv88e6xxx_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port,
> +					u32 count, u32 mode)
>  {
> -	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> -				    0x0000);
> -}
> +	u16 reg1, reg2;
> +	int err;
>  
> -int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
> -{
> -	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> -				    0x0001);
> +	/* quick exit for disabling */
> +	if (count == 0) {
> +		err = mv88e6xxx_port_read(chip, port,
> +					  MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +					  &reg2);
> +		if (err)
> +			return err;
> +		reg2 &= ~MV88E6XXX_PORT_EGRESS_RATE_MASK;
> +		err =  mv88e6xxx_port_write(chip, port,
> +					    MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +					    reg2);
> +		return err;
> +	}
> +
> +	if (mode > MV88E6XXX_PORT_EGRESS_COUNT_MODE_L3)
> +		return -EINVAL;
> +
> +	if (mode == MV88E6XXX_PORT_EGRESS_COUNT_MODE_FRAMES &&
> +	    (count < EGRESS_FRAME_RATE_MIN || count > EGRESS_FRAME_RATE_MAX))
> +		return -EINVAL;
> +
> +	if (mode != MV88E6XXX_PORT_EGRESS_COUNT_MODE_FRAMES &&
> +	    (count < EGRESS_BPS_RATE_MIN || count > EGRESS_BPS_RATE_MAX))
> +		return -EINVAL;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> +				  &reg1);
> +	if (err)
> +		return err;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				  &reg2);
> +	if (err)
> +		return err;
> +
> +	reg1 &= ~MV88E6XXX_PORT_EGRESS_DEC_MASK;
> +	reg2 &= ~MV88E6XXX_PORT_EGRESS_COUNT_MODE_MASK;
> +
> +	if (mode == MV88E6XXX_PORT_EGRESS_COUNT_MODE_FRAMES) {
> +		u32 val;
> +
> +		/* recommended to use dec of 1 for frame based */
> +		reg1 |= 1 << MV88E6XXX_PORT_EGRESS_DEC_SHIFT;
> +
> +		reg2 |= mode << MV88E6XXX_PORT_EGRESS_COUNT_MODE_SHIFT;
> +		reg2 &= ~MV88E6XXX_PORT_EGRESS_RATE_MASK;
> +
> +		val = NSEC_PER_SEC / (EGRESS_RATE_PERIOD * count);
> +		if (NSEC_PER_SEC % (EGRESS_RATE_PERIOD * count))
> +			val++;
> +		reg2 |= (u16)(val << MV88E6XXX_PORT_EGRESS_RATE_SHIFT);
> +	} else {
> +		u16 egress_dec, egress_rate;
> +		u64 dec_bytes, ns_bits;
> +
> +		if (count < (1 * Mb))
> +			egress_dec = (u16)roundup(count, (64 * Kb));
> +		else if (count < (100 * Mb))
> +			egress_dec = (u16)roundup(count, (1 * Mb));
> +		else
> +			egress_dec = (u16)roundup(count, (10 * Mb));
> +
> +		reg1 |= egress_dec;
> +
> +		dec_bytes = 8ull * NSEC_PER_SEC * egress_dec;
> +		ns_bits = 32ull * count;
> +		egress_rate = (u16)div64_u64(dec_bytes, ns_bits);
> +		reg2 |= egress_rate;
> +	}
> +
> +	err =  mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> +				    reg1);
> +	if (err)
> +		return err;
> +
> +	err =  mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				    reg2);
> +	if (err)
> +		return err;
> +
> +	return 0;
>  }
>  
> -/* Offset 0x0A: Egress Rate Control 2 */
>  int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched)
>  {
>  	u16 reg;
> diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
> index 710d6eccafae..724f839c570a 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.h
> +++ b/drivers/net/dsa/mv88e6xxx/port.h
> @@ -205,13 +205,23 @@
>  
>  /* Offset 0x09: Egress Rate Control */
>  #define MV88E6XXX_PORT_EGRESS_RATE_CTL1		0x09
> +#define MV88E6XXX_PORT_EGRESS_DEC_SHIFT		0
> +#define MV88E6XXX_PORT_EGRESS_DEC_MASK		0x7f
>  
>  /* Offset 0x0A: Egress Rate Control 2 */
>  #define MV88E6XXX_PORT_EGRESS_RATE_CTL2		0x0a
> +#define MV88E6XXX_PORT_EGRESS_COUNT_MODE_SHIFT	14
> +#define MV88E6XXX_PORT_EGRESS_COUNT_MODE_MASK	\
> +	(0x3 << MV88E6XXX_PORT_EGRESS_COUNT_MODE_SHIFT)

No shift macros please, only 0x1234 masks and their values named as in the
documentation. This way we see clearly how the 16-bit registers are organized.

> +/* see MV88E6XXX_PORT_EGRESS_COUNT_* in
> + * include/dt-bindings/net/dsa-mv88e6xxx.h
> + */
>  #define MV88E6XXX_PORT_SCHED_SHIFT		12
>  #define MV88E6XXX_PORT_SCHED_MASK \
>  	(0x3 << MV88E6XXX_PORT_SCHED_SHIFT)
>  /* see MV88E6XXX_PORT_SCHED_* in include/dt-bindings/net/dsa-mv88e6xxx.h */
> +#define MV88E6XXX_PORT_EGRESS_RATE_SHIFT	0
> +#define MV88E6XXX_PORT_EGRESS_RATE_MASK		0xfff
>  
>  /* Offset 0x0B: Port Association Vector */
>  #define MV88E6XXX_PORT_ASSOC_VECTOR			0x0b
> @@ -335,8 +345,8 @@ int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
>  int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  				  size_t size);
>  int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri);
> -int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
> -int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
> +int mv88e6xxx_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port,
> +					u32 count, u32 mode);
>  int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  int mv88e6097_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,
>  			       u8 out);

This patch does not look good. Implementations in port.c must be simple
functions ordered per Port register, implementing read write operations for
them, and eventually checking unsupported values. No logic in them. You may
abstract some values with an enum defined in chip.h if needed. (some models
don't use the same values for various definitions for example.)


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH 4/7] net: dsa: mv88e6xxx: add ability to set queue scheduling
From: Vivien Didelot @ 2019-09-10 17:18 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-5-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:50 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> Add code to set Schedule for any port that specifies "schedule" in their
> device tree node.
> This allows port prioritization in conjunction with port default queue
> priorities or packet priorities.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 25 +++++++++++++++++++++++++
>  drivers/net/dsa/mv88e6xxx/chip.h |  1 +
>  drivers/net/dsa/mv88e6xxx/port.c | 21 +++++++++++++++++++++
>  drivers/net/dsa/mv88e6xxx/port.h |  6 ++++++
>  4 files changed, 53 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 5005a35493e3..2bc22c59200c 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2103,6 +2103,23 @@ static int mv88e6xxx_set_port_defqpri(struct mv88e6xxx_chip *chip, int port)
>  	return chip->info->ops->port_set_defqpri(chip, port, (u16)pri);
>  }
>  
> +static int mv88e6xxx_set_port_sched(struct mv88e6xxx_chip *chip, int port)
> +{
> +	struct dsa_switch *ds = chip->ds;
> +	struct device_node *dn = ds->ports[port].dn;
> +	int err;
> +	u32 sched;
> +
> +	if (!dn || !chip->info->ops->port_set_sched)
> +		return 0;
> +
> +	err = of_property_read_u32(dn, "schedule", &sched);
> +	if (err < 0)
> +		return 0;
> +
> +	return chip->info->ops->port_set_sched(chip, port, (u16)sched);
> +}
> +
>  static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> @@ -2218,6 +2235,10 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  	if (err)
>  		return err;
>  
> +	err = mv88e6xxx_set_port_sched(chip, port);
> +	if (err)
> +		return err;
> +
>  	if (chip->info->ops->port_pause_limit) {
>  		err = chip->info->ops->port_pause_limit(chip, port, 0, 0);
>  		if (err)
> @@ -3130,6 +3151,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3214,6 +3236,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3432,6 +3455,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3776,6 +3800,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 2d2c24f5a79d..ff3e35eceee0 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -386,6 +386,7 @@ struct mv88e6xxx_ops {
>  	int (*port_set_defqpri)(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  
>  	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port);
> +	int (*port_set_sched)(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  	int (*port_pause_limit)(struct mv88e6xxx_chip *chip, int port, u8 in,
>  				u8 out);
>  	int (*port_disable_learn_limit)(struct mv88e6xxx_chip *chip, int port);
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 3a45fcd5cd9c..236732fc598d 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1180,6 +1180,27 @@ int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
>  				    0x0001);
>  }
>  
> +/* Offset 0x0A: Egress Rate Control 2 */
> +int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched)
> +{
> +	u16 reg;
> +	int err;
> +
> +	if (sched > MV88E6XXX_PORT_SCHED_STRICT_ALL)
> +		return -EINVAL;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				  &reg);
> +	if (err)
> +		return err;
> +
> +	reg &= ~MV88E6XXX_PORT_SCHED_MASK;
> +	reg |= sched << MV88E6XXX_PORT_SCHED_SHIFT;
> +
> +	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				    reg);
> +}
> +
>  /* Offset 0x0C: Port ATU Control */
>  
>  int mv88e6xxx_port_disable_learn_limit(struct mv88e6xxx_chip *chip, int port)
> diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
> index 03884bbaa762..710d6eccafae 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.h
> +++ b/drivers/net/dsa/mv88e6xxx/port.h
> @@ -11,6 +11,7 @@
>  #ifndef _MV88E6XXX_PORT_H
>  #define _MV88E6XXX_PORT_H
>  
> +#include <dt-bindings/net/dsa-mv88e6xxx.h>
>  #include "chip.h"
>  
>  /* Offset 0x00: Port Status Register */
> @@ -207,6 +208,10 @@
>  
>  /* Offset 0x0A: Egress Rate Control 2 */
>  #define MV88E6XXX_PORT_EGRESS_RATE_CTL2		0x0a
> +#define MV88E6XXX_PORT_SCHED_SHIFT		12
> +#define MV88E6XXX_PORT_SCHED_MASK \
> +	(0x3 << MV88E6XXX_PORT_SCHED_SHIFT)
> +/* see MV88E6XXX_PORT_SCHED_* in include/dt-bindings/net/dsa-mv88e6xxx.h */
>  
>  /* Offset 0x0B: Port Association Vector */
>  #define MV88E6XXX_PORT_ASSOC_VECTOR			0x0b
> @@ -332,6 +337,7 @@ int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
>  int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
> +int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  int mv88e6097_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,
>  			       u8 out);
>  int mv88e6390_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,

Same comments applied as for the other patches adding implementations in port.c.


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Vivien Didelot @ 2019-09-10 17:19 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-1-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:46 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> This patch-set adds support for some features of the Marvell switch
> chips that can be used to handle packet storms.
> 
> The rationale for this was a setup that requires the ability to receive
> traffic from one port, while a packet storm is occuring on another port
> (via an external switch with a deliberate loop). This is needed to
> ensure vital data delivery from a specific port, while mitigating any
> loops or DoS that a user may introduce on another port (can't guarantee
> sensible users).
> 
> [patch 1/7] configures auto negotiation for CPU ports connected with
> phys to enable pause frame propogation.
> 
> [patch 2/7] allows setting of port's default output queue priority for
> any ingressing packets on that port.
> 
> [patch 3/7] dt-bindings for patch 2.
> 
> [patch 4/7] allows setting of a port's queue scheduling so that it can
> prioritise egress of traffic routed from high priority ports.
> 
> [patch 5/7] dt-bindings for patch 4.
> 
> [patch 6/7] allows ports to rate limit their egress. This can be used to
> stop the host CPU from becoming swamped by packet delivery and exhasting
> descriptors.
> 
> [patch 7/7] dt-bindings for patch 6.
> 
> 
> Robert Beckett (7):
>   net/dsa: configure autoneg for CPU port
>   net: dsa: mv88e6xxx: add ability to set default queue priorities per
>     port
>   dt-bindings: mv88e6xxx: add ability to set default queue priorities
>     per port
>   net: dsa: mv88e6xxx: add ability to set queue scheduling
>   dt-bindings: mv88e6xxx: add ability to set queue scheduling
>   net: dsa: mv88e6xxx: add egress rate limiting
>   dt-bindings: mv88e6xxx: add egress rate limiting
> 
>  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +++++
>  drivers/net/dsa/mv88e6xxx/chip.c              | 122 ++++++++++++---
>  drivers/net/dsa/mv88e6xxx/chip.h              |   5 +-
>  drivers/net/dsa/mv88e6xxx/port.c              | 140 +++++++++++++++++-
>  drivers/net/dsa/mv88e6xxx/port.h              |  24 ++-
>  include/dt-bindings/net/dsa-mv88e6xxx.h       |  22 +++
>  net/dsa/port.c                                |  10 ++
>  7 files changed, 327 insertions(+), 34 deletions(-)
>  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h

Feature series targeting netdev must be prefixed "PATCH net-next". As
this approach was a PoC, sending it as "RFC net-next" would be even more
appropriate.


Thank you,

	Vivien

^ permalink raw reply

* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Sami Tolvanen @ 2019-09-10 17:22 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Kees Cook, Martin Lau,
	Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <4f4136f5-db54-f541-2843-ccb35be25ab4@fb.com>

On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote:
> You did not mention BPF_BINARY_HEADER_MAGIC and added member
> of `magic` in bpf_binary_header. Could you add some details
> on what is the purpose for this `magic` member?

Sure, I'll add a description to the next version.

The magic is a random number used to identify bpf_binary_header in
memory. The purpose of this patch is to limit the possible call
targets for the function pointer and checking for the magic helps
ensure we are jumping to a page that contains a jited function,
instead of allowing calls to arbitrary targets.

This is particularly useful when combined with the compiler-based
Control-Flow Integrity (CFI) mitigation, which Google started shipping
in Pixel kernels last year. The compiler injects checks to all
indirect calls, but cannot obviously validate jumps to dynamically
generated code.

> > +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
> > +{
> > +	const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
> > +
> > +	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
> > +		return prog->bpf_func(ctx, prog->insnsi);
> > +
> > +	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> > +		     !arch_bpf_jit_check_func(prog))) {
> > +		WARN(1, "attempt to jump to an invalid address");
> > +		return 0;
> > +	}
> > +
> > +	return prog->bpf_func(ctx, prog->insnsi);
> > +}

> The above can be rewritten as
> 	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
> 	    hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> 	    !arch_bpf_jit_check_func(prog))) {
> 		WARN(1, "attempt to jump to an invalid address");
> 		return 0;
> 	}

That doesn't look quite equivalent, but yes, this can be rewritten as a
single if statement like this:

	if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
	     prog->jited) &&
	    (hdr->magic != BPF_BINARY_HEADER_MAGIC ||
	     !arch_bpf_jit_check_func(prog)))

I think splitting the interpreter and JIT paths would be more readable,
but I can certainly change this if you prefer.

> BPF_PROG_RUN() will be called during xdp fast path.
> Have you measured how much slowdown the above change could
> cost for the performance?

I have not measured the overhead, but it shouldn't be significant. Is
there a particular benchmark you'd like me to run?

Sami

^ permalink raw reply

* RE: [PATCH net-next v2 1/2] net: stmmac: Only enable enhanced addressing mode when needed
From: Jose Abreu @ 2019-09-10 17:25 UTC (permalink / raw)
  To: Thierry Reding, Jose Abreu
  Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
	linux-tegra@vger.kernel.org
In-Reply-To: <20190910135427.GB9897@ulmo>

From: Thierry Reding <thierry.reding@gmail.com>
Date: Sep/10/2019, 14:54:27 (UTC+00:00)

> On Tue, Sep 10, 2019 at 08:32:38AM +0000, Jose Abreu wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > Date: Sep/09/2019, 20:11:27 (UTC+00:00)
> > 
> > > On Mon, Sep 09, 2019 at 04:07:04PM +0000, Jose Abreu wrote:
> > > > From: Thierry Reding <thierry.reding@gmail.com>
> > > > Date: Sep/09/2019, 16:25:45 (UTC+00:00)
> > > > 
> > > > > @@ -92,6 +92,7 @@ struct stmmac_dma_cfg {
> > > > >  	int fixed_burst;
> > > > >  	int mixed_burst;
> > > > >  	bool aal;
> > > > > +	bool eame;
> > > > 
> > > > bools should not be used in struct's, please change to int.
> > > 
> > > Huh? Since when? "aal" right above it is also bool. Can you provide a
> > > specific rationale for why we shouldn't use bool in structs?
> > 
> > Please see https://lkml.org/lkml/2017/11/21/384.
> 
> The context is slightly different here. stmmac_dma_cfg exists once for
> each of these ethernet devices in the system, and I would assume that in
> the vast majority of cases there's exactly one such device in the system
> so the potential size increase is very small. On the other hand, there
> are potentially very many struct sched_dl_entity, so the size impact is
> multiplied.
> 
> Anyway, if you insist I'll rewrite this to use an unsigned int bitfield.

For new code I would rather prefer "int" but I guess it's up to David to 
decide this. I'm okay with both options as the check for this usage was 
removed in checkpatch: 
https://lkml.org/lkml/2019/1/10/975

---
Thanks,
Jose Miguel 
Abreu

^ permalink raw reply

* Re: [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: David Miller @ 2019-09-10 17:25 UTC (permalink / raw)
  To: tanhuazhong
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, huangguangbin2
In-Reply-To: <1568105908-60983-2-git-send-email-tanhuazhong@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>
Date: Tue, 10 Sep 2019 16:58:22 +0800

> +	/* Set to user value, no larger than max_rss_size. */
> +	if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
> +	    kinfo->req_rss_size <= max_rss_size) {
> +		dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
> +			 kinfo->rss_size, kinfo->req_rss_size);
> +		kinfo->rss_size = kinfo->req_rss_size;

Please do not emit kernel log messages for normal operations.

^ permalink raw reply

* Re: [PATCH net-next 5/5] sctp: add spt_pathcpthld in struct sctp_paddrthlds
From: David Miller @ 2019-09-10 17:27 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <604e6ac718c29aa5b1a8c4b164a126b82bc42a2f.1568015756.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon,  9 Sep 2019 15:56:51 +0800

> diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> index a15cc28..dfd81e1 100644
> --- a/include/uapi/linux/sctp.h
> +++ b/include/uapi/linux/sctp.h
> @@ -1069,6 +1069,7 @@ struct sctp_paddrthlds {
>  	struct sockaddr_storage spt_address;
>  	__u16 spt_pathmaxrxt;
>  	__u16 spt_pathpfthld;
> +	__u16 spt_pathcpthld;
>  };
>  
>  /*

As pointed out you can't add things to this structure without breaking existing
binaries.

^ permalink raw reply

* Re: [Patch net] net_sched: check cops->tcf_block in tc_bind_tclass()
From: David Miller @ 2019-09-10 17:29 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, syzbot+21b29db13c065852f64b, jhs, jiri
In-Reply-To: <20190908191123.31059-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Sun,  8 Sep 2019 12:11:23 -0700

> At least sch_red and sch_tbf don't implement ->tcf_block()
> while still have a non-zero tc "class".
> 
> Instead of adding nop implementations to each of such qdisc's,
> we can just relax the check of cops->tcf_block() in
> tc_bind_tclass(). They don't support TC filter anyway.
> 
> Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied.

^ permalink raw reply

* Re: [Patch net] sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero
From: David Miller @ 2019-09-10 17:31 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: netdev, syzbot+bc6297c11f19ee807dc2, syzbot+041483004a7f45f1f20a,
	syzbot+55be5f513bed37fc4367, jhs, jiri, vtlam
In-Reply-To: <20190908204051.760-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Sun,  8 Sep 2019 13:40:51 -0700

> In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero,
> it would make no progress inside the loop in hhf_dequeue() thus
> kernel would get stuck.
> 
> Fix this by checking this corner case in hhf_change().
> 
> Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com
> Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com
> Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: Terry Lam <vtlam@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] sctp: fix the missing put_user when dumping transport thresholds
From: David Miller @ 2019-09-10 17:33 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <3fa4f7700c93f06530c80bc666d1696cb7c077de.1568014409.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon,  9 Sep 2019 15:33:29 +0800

> This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
> a transport thresholds info.
> 
> Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.
> 
> Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: Default qdisc not correctly initialized with custom MTU
From: Holger Hoffstätte @ 2019-09-10 17:45 UTC (permalink / raw)
  To: Cong Wang; +Cc: Netdev
In-Reply-To: <CAM_iQpUO3vedg+XOcMb8s6hE=+hdvjPJp9DitjHZE6oNtDVkVQ@mail.gmail.com>

On 9/10/19 6:56 PM, Cong Wang wrote:
> It is _not_ created when sysctl is configured, it is either created via tc
> command, or implicitly created by kernel when you bring up eth0.
> sysctl only tells kernel what to create by default, but never commits it.

Ok, thank you - that's good to know, because it means there is something
wrong with how my interface is initially brought up. And indeed I found
the problem: my startup scripts apparently bring up the interface twice -
once to "pre-start" (load/verify modules etc.) and then again after
applying mtu/route/etc. settings. Obviously without MTU bringing up the
interface will pull in the default qdisc in the interface's default config,
and that's what I saw after boot. Weird but what can I say.

Anyway, thanks for trying to help. :)

Holger

^ permalink raw reply

* Re: [PATCH net-next 2/2] mlx5: fix type mismatch
From: Saeed Mahameed @ 2019-09-10 17:56 UTC (permalink / raw)
  To: dledford@redhat.com, davem@davemloft.net, jgg@ziepe.ca,
	arnd@arndb.de, leon@kernel.org
  Cc: linux-rdma@vger.kernel.org, Yishai Hadas, Mark Bloch,
	linux-kernel@vger.kernel.org, Erez Shitrit,
	netdev@vger.kernel.org, Alex Vesker, Ariel Levkovich
In-Reply-To: <20190909195024.3268499-2-arnd@arndb.de>

On Mon, 2019-09-09 at 21:50 +0200, Arnd Bergmann wrote:
> In mlx5, pointers to 'phys_addr_t' and 'u64' are mixed since the
> addition
> of the pool memory allocator, leading to incorrect behavior on 32-bit
> architectures and this compiler warning:
> 
> drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c:121:8:
> error: incompatible pointer types passing 'u64 *' (aka 'unsigned long
> long *') to parameter of type 'phys_addr_t *' (aka 'unsigned int *')
> [-Werror,-Wincompatible-pointer-types]
>                                    &icm_mr->dm.addr, &icm_mr-
> >dm.obj_id);
>                                    ^~~~~~~~~~~~~~~~
> include/linux/mlx5/driver.h:1092:39: note: passing argument to
> parameter 'addr' here
>                          u64 length, u16 uid, phys_addr_t *addr, u32
> *obj_id);
> 
> Change the code to use 'u64' consistently in place of 'phys_addr_t'
> to
> fix this. The alternative of using phys_addr_t more would require a
> larger
> rework.
> 
> Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Hi Arnd,

Nathan Chancellor Already submitted a patch to fix this and it is more
minimal:
https://patchwork.ozlabs.org/patch/1158177/

I would like to use that patch if it is ok with you.. 

> ---
>  drivers/infiniband/hw/mlx5/cmd.c                 | 4 ++--
>  drivers/infiniband/hw/mlx5/cmd.h                 | 4 ++--
>  drivers/infiniband/hw/mlx5/mlx5_ib.h             | 2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c | 4 ++--
>  include/linux/mlx5/driver.h                      | 4 ++--
>  5 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/cmd.c
> b/drivers/infiniband/hw/mlx5/cmd.c
> index 4937947400cd..fbcad70ef6dc 100644
> --- a/drivers/infiniband/hw/mlx5/cmd.c
> +++ b/drivers/infiniband/hw/mlx5/cmd.c
> @@ -82,7 +82,7 @@ int mlx5_cmd_modify_cong_params(struct
> mlx5_core_dev *dev,
>  	return mlx5_cmd_exec(dev, in, in_size, out, sizeof(out));
>  }
>  
> -int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, phys_addr_t *addr,
> +int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, u64 *addr,
>  			 u64 length, u32 alignment)
>  {
>  	struct mlx5_core_dev *dev = dm->dev;
> @@ -157,7 +157,7 @@ int mlx5_cmd_alloc_memic(struct mlx5_dm *dm,
> phys_addr_t *addr,
>  	return -ENOMEM;
>  }
>  
> -int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, phys_addr_t addr, u64
> length)
> +int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, u64 addr, u64 length)
>  {
>  	struct mlx5_core_dev *dev = dm->dev;
>  	u64 hw_start_addr = MLX5_CAP64_DEV_MEM(dev,
> memic_bar_start_addr);
> diff --git a/drivers/infiniband/hw/mlx5/cmd.h
> b/drivers/infiniband/hw/mlx5/cmd.h
> index 169cab4915e3..2ea7a45a6abb 100644
> --- a/drivers/infiniband/hw/mlx5/cmd.h
> +++ b/drivers/infiniband/hw/mlx5/cmd.h
> @@ -44,9 +44,9 @@ int mlx5_cmd_query_cong_params(struct mlx5_core_dev
> *dev, int cong_point,
>  int mlx5_cmd_query_ext_ppcnt_counters(struct mlx5_core_dev *dev,
> void *out);
>  int mlx5_cmd_modify_cong_params(struct mlx5_core_dev *mdev,
>  				void *in, int in_size);
> -int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, phys_addr_t *addr,
> +int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, u64 *addr,
>  			 u64 length, u32 alignment);
> -int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, phys_addr_t addr, u64
> length);
> +int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, u64 addr, u64
> length);
>  void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16
> uid);
>  void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16
> uid);
>  void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16
> uid);
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 2ceaef3ea3fb..476d4447f901 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -561,7 +561,7 @@ enum mlx5_ib_mtt_access_flags {
>  
>  struct mlx5_ib_dm {
>  	struct ib_dm		ibdm;
> -	phys_addr_t		dev_addr;
> +	u64			dev_addr;
>  	u32			type;
>  	size_t			size;
>  	union {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
> b/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
> index e065c2f68f5a..ad4d7484fa63 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
> @@ -90,7 +90,7 @@ void mlx5_dm_cleanup(struct mlx5_core_dev *dev)
>  }
>  
>  int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum
> mlx5_sw_icm_type type,
> -			 u64 length, u16 uid, phys_addr_t *addr, u32
> *obj_id)
> +			 u64 length, u16 uid, u64 *addr, u32 *obj_id)
>  {
>  	u32 num_blocks = DIV_ROUND_UP_ULL(length,
> MLX5_SW_ICM_BLOCK_SIZE(dev));
>  	u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {};
> @@ -175,7 +175,7 @@ int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev
> *dev, enum mlx5_sw_icm_type type,
>  EXPORT_SYMBOL_GPL(mlx5_dm_sw_icm_alloc);
>  
>  int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum
> mlx5_sw_icm_type type,
> -			   u64 length, u16 uid, phys_addr_t addr, u32
> obj_id)
> +			   u64 length, u16 uid, u64 addr, u32 obj_id)
>  {
>  	u32 num_blocks = DIV_ROUND_UP_ULL(length,
> MLX5_SW_ICM_BLOCK_SIZE(dev));
>  	u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {};
> diff --git a/include/linux/mlx5/driver.h
> b/include/linux/mlx5/driver.h
> index 3e80f03a387f..e07f9daf7d42 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1089,9 +1089,9 @@ int mlx5_lag_query_cong_counters(struct
> mlx5_core_dev *dev,
>  struct mlx5_uars_page *mlx5_get_uars_page(struct mlx5_core_dev
> *mdev);
>  void mlx5_put_uars_page(struct mlx5_core_dev *mdev, struct
> mlx5_uars_page *up);
>  int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum
> mlx5_sw_icm_type type,
> -			 u64 length, u16 uid, phys_addr_t *addr, u32
> *obj_id);
> +			 u64 length, u16 uid, u64 *addr, u32 *obj_id);
>  int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum
> mlx5_sw_icm_type type,
> -			   u64 length, u16 uid, phys_addr_t addr, u32
> obj_id);
> +			   u64 length, u16 uid, u64 addr, u32 obj_id);
>  
>  #ifdef CONFIG_MLX5_CORE_IPOIB
>  struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev
> *mdev,

^ permalink raw reply

* Re: [PATCH] net/mlx5: Declare 'rt' as corresponding enum type
From: Saeed Mahameed @ 2019-09-10 17:59 UTC (permalink / raw)
  To: austindh.kim@gmail.com, leon@kernel.org
  Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mark Bloch, davem@davemloft.net, netdev@vger.kernel.org,
	Alex Vesker, Erez Shitrit
In-Reply-To: <20190910092731.GA173476@LGEARND20B15>

On Tue, 2019-09-10 at 18:27 +0900, Austin Kim wrote:
> When building kernel with clang, we can observe below warning
> message:
> 
> drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1080:9:
> warning: implicit conversion from enumeration type 'enum
> mlx5_reformat_ctx_type'
> to different enumeration type 'enum mlx5dr_action_type' [-   Wenum-
> conversion]
> 	rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
>        			  ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1082:9:
> warning: implicit conversion from enumeration type 'enum
> mlx5_reformat_ctx_type'
> to different enumeration type 'enum mlx5dr_action_type' [-   Wenum-
> conversion]
> 	rt = MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL;
>         ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1084:51:
> warning: implicit conversion from enumeration type 'enum
> mlx5dr_action_type'
> to different enumeration type 'enum mlx5_reformat_ctx_type'
> [-  Wenum-conversion]
> 	ret = mlx5dr_cmd_create_reformat_ctx(dmn->mdev, rt, data_sz,
> data,
>          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            ^~
> 
> Declare 'rt' as corresponding enum mlx5_reformat_ctx_type type.
> 
> Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Hi Austin, Thanks for the patch:

We already have a similar patch queued for submission.
https://patchwork.ozlabs.org/patch/1158175/

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git
> a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
> b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
> index a02f87f..7d81a77 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
> @@ -1074,7 +1074,7 @@ dr_action_create_reformat_action(struct
> mlx5dr_domain *dmn,
>  	case DR_ACTION_TYP_L2_TO_TNL_L2:
>  	case DR_ACTION_TYP_L2_TO_TNL_L3:
>  	{
> -		enum mlx5dr_action_type rt;
> +		enum mlx5_reformat_ctx_type rt;
>  
>  		if (action->action_type == DR_ACTION_TYP_L2_TO_TNL_L2)
>  			rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;

^ permalink raw reply

* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Andrew Lunn @ 2019-09-10 18:26 UTC (permalink / raw)
  To: Robert Beckett; +Cc: netdev, Vivien Didelot, Florian Fainelli, David S. Miller
In-Reply-To: <20190910154238.9155-2-bob.beckett@collabora.com>

On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote:
> This enables us to negoatiate pause frame transmission to prioritise
> packet delivery over throughput.

I don't think we can unconditionally enable this. It is a big
behaviour change, and it is likely to break running systems. It has
affects on QoS, packet prioritisation, etc.

I think there needs to be a configuration knob. But unfortunately, i
don't know of a good place to put this knob. The switch CPU port is
not visible in any way.

    Andrew

^ permalink raw reply

* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Florian Fainelli @ 2019-09-10 18:29 UTC (permalink / raw)
  To: Andrew Lunn, Robert Beckett; +Cc: netdev, Vivien Didelot, David S. Miller
In-Reply-To: <20190910182635.GA9761@lunn.ch>

On 9/10/19 11:26 AM, Andrew Lunn wrote:
> On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote:
>> This enables us to negoatiate pause frame transmission to prioritise
>> packet delivery over throughput.
> 
> I don't think we can unconditionally enable this. It is a big
> behaviour change, and it is likely to break running systems. It has
> affects on QoS, packet prioritisation, etc.
> 
> I think there needs to be a configuration knob. But unfortunately, i
> don't know of a good place to put this knob. The switch CPU port is
> not visible in any way.

Broadcast storm suppression is to be solved at ingress, not on the CPU
port, once this lands on the CPU port, it's game over already.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 2/3] net: dsa: microchip: add ksz9567 to ksz9477 driver
From: Andrew Lunn @ 2019-09-10 18:30 UTC (permalink / raw)
  To: George McCollister
  Cc: netdev, Woojung Huh, Florian Fainelli, Tristram Ha,
	David S. Miller, Marek Vasut, linux-kernel
In-Reply-To: <20190910131836.114058-3-george.mccollister@gmail.com>

On Tue, Sep 10, 2019 at 08:18:35AM -0500, George McCollister wrote:
> Add support for the KSZ9567 7-Port Gigabit Ethernet Switch to the
> ksz9477 driver. The KSZ9567 supports both SPI and I2C. Oddly the
> ksz9567 is already in the device tree binding documentation.
> 
> Signed-off-by: George McCollister <george.mccollister@gmail.com>
> Reviewed-by: Marek Vasut <marex@denx.de>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: dsa: microchip: remove NET_DSA_TAG_KSZ_COMMON
From: Florian Fainelli @ 2019-09-10 18:35 UTC (permalink / raw)
  To: George McCollister, netdev
  Cc: Woojung Huh, Andrew Lunn, Tristram Ha, David S. Miller,
	Marek Vasut, linux-kernel
In-Reply-To: <20190910131836.114058-4-george.mccollister@gmail.com>

On 9/10/19 6:18 AM, George McCollister wrote:
> Remove the superfluous NET_DSA_TAG_KSZ_COMMON and just use the existing
> NET_DSA_TAG_KSZ. Update the description to mention the three switch
> families it supports. No functional change.
> 
> Signed-off-by: George McCollister <george.mccollister@gmail.com>
> Reviewed-by: Marek Vasut <marex@denx.de>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox