Netdev List
 help / color / mirror / Atom feed
* Re: Default qdisc not correctly initialized with custom MTU
From: Holger Hoffstätte @ 2019-09-10 17:45 UTC (permalink / raw)
  To: Cong Wang; +Cc: Netdev
In-Reply-To: <CAM_iQpUO3vedg+XOcMb8s6hE=+hdvjPJp9DitjHZE6oNtDVkVQ@mail.gmail.com>

On 9/10/19 6:56 PM, Cong Wang wrote:
> It is _not_ created when sysctl is configured, it is either created via tc
> command, or implicitly created by kernel when you bring up eth0.
> sysctl only tells kernel what to create by default, but never commits it.

Ok, thank you - that's good to know, because it means there is something
wrong with how my interface is initially brought up. And indeed I found
the problem: my startup scripts apparently bring up the interface twice -
once to "pre-start" (load/verify modules etc.) and then again after
applying mtu/route/etc. settings. Obviously without MTU bringing up the
interface will pull in the default qdisc in the interface's default config,
and that's what I saw after boot. Weird but what can I say.

Anyway, thanks for trying to help. :)

Holger

^ permalink raw reply

* Re: [PATCH net] sctp: fix the missing put_user when dumping transport thresholds
From: David Miller @ 2019-09-10 17:33 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <3fa4f7700c93f06530c80bc666d1696cb7c077de.1568014409.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon,  9 Sep 2019 15:33:29 +0800

> This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
> a transport thresholds info.
> 
> Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.
> 
> Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [Patch net] sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero
From: David Miller @ 2019-09-10 17:31 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: netdev, syzbot+bc6297c11f19ee807dc2, syzbot+041483004a7f45f1f20a,
	syzbot+55be5f513bed37fc4367, jhs, jiri, vtlam
In-Reply-To: <20190908204051.760-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Sun,  8 Sep 2019 13:40:51 -0700

> In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero,
> it would make no progress inside the loop in hhf_dequeue() thus
> kernel would get stuck.
> 
> Fix this by checking this corner case in hhf_change().
> 
> Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com
> Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com
> Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: Terry Lam <vtlam@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [Patch net] net_sched: check cops->tcf_block in tc_bind_tclass()
From: David Miller @ 2019-09-10 17:29 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, syzbot+21b29db13c065852f64b, jhs, jiri
In-Reply-To: <20190908191123.31059-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Sun,  8 Sep 2019 12:11:23 -0700

> At least sch_red and sch_tbf don't implement ->tcf_block()
> while still have a non-zero tc "class".
> 
> Instead of adding nop implementations to each of such qdisc's,
> we can just relax the check of cops->tcf_block() in
> tc_bind_tclass(). They don't support TC filter anyway.
> 
> Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 5/5] sctp: add spt_pathcpthld in struct sctp_paddrthlds
From: David Miller @ 2019-09-10 17:27 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <604e6ac718c29aa5b1a8c4b164a126b82bc42a2f.1568015756.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon,  9 Sep 2019 15:56:51 +0800

> diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> index a15cc28..dfd81e1 100644
> --- a/include/uapi/linux/sctp.h
> +++ b/include/uapi/linux/sctp.h
> @@ -1069,6 +1069,7 @@ struct sctp_paddrthlds {
>  	struct sockaddr_storage spt_address;
>  	__u16 spt_pathmaxrxt;
>  	__u16 spt_pathpfthld;
> +	__u16 spt_pathcpthld;
>  };
>  
>  /*

As pointed out you can't add things to this structure without breaking existing
binaries.

^ permalink raw reply

* Re: [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: David Miller @ 2019-09-10 17:25 UTC (permalink / raw)
  To: tanhuazhong
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, huangguangbin2
In-Reply-To: <1568105908-60983-2-git-send-email-tanhuazhong@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>
Date: Tue, 10 Sep 2019 16:58:22 +0800

> +	/* Set to user value, no larger than max_rss_size. */
> +	if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
> +	    kinfo->req_rss_size <= max_rss_size) {
> +		dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
> +			 kinfo->rss_size, kinfo->req_rss_size);
> +		kinfo->rss_size = kinfo->req_rss_size;

Please do not emit kernel log messages for normal operations.

^ permalink raw reply

* RE: [PATCH net-next v2 1/2] net: stmmac: Only enable enhanced addressing mode when needed
From: Jose Abreu @ 2019-09-10 17:25 UTC (permalink / raw)
  To: Thierry Reding, Jose Abreu
  Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
	linux-tegra@vger.kernel.org
In-Reply-To: <20190910135427.GB9897@ulmo>

From: Thierry Reding <thierry.reding@gmail.com>
Date: Sep/10/2019, 14:54:27 (UTC+00:00)

> On Tue, Sep 10, 2019 at 08:32:38AM +0000, Jose Abreu wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > Date: Sep/09/2019, 20:11:27 (UTC+00:00)
> > 
> > > On Mon, Sep 09, 2019 at 04:07:04PM +0000, Jose Abreu wrote:
> > > > From: Thierry Reding <thierry.reding@gmail.com>
> > > > Date: Sep/09/2019, 16:25:45 (UTC+00:00)
> > > > 
> > > > > @@ -92,6 +92,7 @@ struct stmmac_dma_cfg {
> > > > >  	int fixed_burst;
> > > > >  	int mixed_burst;
> > > > >  	bool aal;
> > > > > +	bool eame;
> > > > 
> > > > bools should not be used in struct's, please change to int.
> > > 
> > > Huh? Since when? "aal" right above it is also bool. Can you provide a
> > > specific rationale for why we shouldn't use bool in structs?
> > 
> > Please see https://lkml.org/lkml/2017/11/21/384.
> 
> The context is slightly different here. stmmac_dma_cfg exists once for
> each of these ethernet devices in the system, and I would assume that in
> the vast majority of cases there's exactly one such device in the system
> so the potential size increase is very small. On the other hand, there
> are potentially very many struct sched_dl_entity, so the size impact is
> multiplied.
> 
> Anyway, if you insist I'll rewrite this to use an unsigned int bitfield.

For new code I would rather prefer "int" but I guess it's up to David to 
decide this. I'm okay with both options as the check for this usage was 
removed in checkpatch: 
https://lkml.org/lkml/2019/1/10/975

---
Thanks,
Jose Miguel 
Abreu

^ permalink raw reply

* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Sami Tolvanen @ 2019-09-10 17:22 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Kees Cook, Martin Lau,
	Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <4f4136f5-db54-f541-2843-ccb35be25ab4@fb.com>

On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote:
> You did not mention BPF_BINARY_HEADER_MAGIC and added member
> of `magic` in bpf_binary_header. Could you add some details
> on what is the purpose for this `magic` member?

Sure, I'll add a description to the next version.

The magic is a random number used to identify bpf_binary_header in
memory. The purpose of this patch is to limit the possible call
targets for the function pointer and checking for the magic helps
ensure we are jumping to a page that contains a jited function,
instead of allowing calls to arbitrary targets.

This is particularly useful when combined with the compiler-based
Control-Flow Integrity (CFI) mitigation, which Google started shipping
in Pixel kernels last year. The compiler injects checks to all
indirect calls, but cannot obviously validate jumps to dynamically
generated code.

> > +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
> > +{
> > +	const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
> > +
> > +	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
> > +		return prog->bpf_func(ctx, prog->insnsi);
> > +
> > +	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> > +		     !arch_bpf_jit_check_func(prog))) {
> > +		WARN(1, "attempt to jump to an invalid address");
> > +		return 0;
> > +	}
> > +
> > +	return prog->bpf_func(ctx, prog->insnsi);
> > +}

> The above can be rewritten as
> 	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
> 	    hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> 	    !arch_bpf_jit_check_func(prog))) {
> 		WARN(1, "attempt to jump to an invalid address");
> 		return 0;
> 	}

That doesn't look quite equivalent, but yes, this can be rewritten as a
single if statement like this:

	if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
	     prog->jited) &&
	    (hdr->magic != BPF_BINARY_HEADER_MAGIC ||
	     !arch_bpf_jit_check_func(prog)))

I think splitting the interpreter and JIT paths would be more readable,
but I can certainly change this if you prefer.

> BPF_PROG_RUN() will be called during xdp fast path.
> Have you measured how much slowdown the above change could
> cost for the performance?

I have not measured the overhead, but it shouldn't be significant. Is
there a particular benchmark you'd like me to run?

Sami

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Vivien Didelot @ 2019-09-10 17:19 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-1-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:46 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> This patch-set adds support for some features of the Marvell switch
> chips that can be used to handle packet storms.
> 
> The rationale for this was a setup that requires the ability to receive
> traffic from one port, while a packet storm is occuring on another port
> (via an external switch with a deliberate loop). This is needed to
> ensure vital data delivery from a specific port, while mitigating any
> loops or DoS that a user may introduce on another port (can't guarantee
> sensible users).
> 
> [patch 1/7] configures auto negotiation for CPU ports connected with
> phys to enable pause frame propogation.
> 
> [patch 2/7] allows setting of port's default output queue priority for
> any ingressing packets on that port.
> 
> [patch 3/7] dt-bindings for patch 2.
> 
> [patch 4/7] allows setting of a port's queue scheduling so that it can
> prioritise egress of traffic routed from high priority ports.
> 
> [patch 5/7] dt-bindings for patch 4.
> 
> [patch 6/7] allows ports to rate limit their egress. This can be used to
> stop the host CPU from becoming swamped by packet delivery and exhasting
> descriptors.
> 
> [patch 7/7] dt-bindings for patch 6.
> 
> 
> Robert Beckett (7):
>   net/dsa: configure autoneg for CPU port
>   net: dsa: mv88e6xxx: add ability to set default queue priorities per
>     port
>   dt-bindings: mv88e6xxx: add ability to set default queue priorities
>     per port
>   net: dsa: mv88e6xxx: add ability to set queue scheduling
>   dt-bindings: mv88e6xxx: add ability to set queue scheduling
>   net: dsa: mv88e6xxx: add egress rate limiting
>   dt-bindings: mv88e6xxx: add egress rate limiting
> 
>  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +++++
>  drivers/net/dsa/mv88e6xxx/chip.c              | 122 ++++++++++++---
>  drivers/net/dsa/mv88e6xxx/chip.h              |   5 +-
>  drivers/net/dsa/mv88e6xxx/port.c              | 140 +++++++++++++++++-
>  drivers/net/dsa/mv88e6xxx/port.h              |  24 ++-
>  include/dt-bindings/net/dsa-mv88e6xxx.h       |  22 +++
>  net/dsa/port.c                                |  10 ++
>  7 files changed, 327 insertions(+), 34 deletions(-)
>  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h

Feature series targeting netdev must be prefixed "PATCH net-next". As
this approach was a PoC, sending it as "RFC net-next" would be even more
appropriate.


Thank you,

	Vivien

^ permalink raw reply

* Re: [PATCH 4/7] net: dsa: mv88e6xxx: add ability to set queue scheduling
From: Vivien Didelot @ 2019-09-10 17:18 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-5-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:50 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> Add code to set Schedule for any port that specifies "schedule" in their
> device tree node.
> This allows port prioritization in conjunction with port default queue
> priorities or packet priorities.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 25 +++++++++++++++++++++++++
>  drivers/net/dsa/mv88e6xxx/chip.h |  1 +
>  drivers/net/dsa/mv88e6xxx/port.c | 21 +++++++++++++++++++++
>  drivers/net/dsa/mv88e6xxx/port.h |  6 ++++++
>  4 files changed, 53 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 5005a35493e3..2bc22c59200c 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2103,6 +2103,23 @@ static int mv88e6xxx_set_port_defqpri(struct mv88e6xxx_chip *chip, int port)
>  	return chip->info->ops->port_set_defqpri(chip, port, (u16)pri);
>  }
>  
> +static int mv88e6xxx_set_port_sched(struct mv88e6xxx_chip *chip, int port)
> +{
> +	struct dsa_switch *ds = chip->ds;
> +	struct device_node *dn = ds->ports[port].dn;
> +	int err;
> +	u32 sched;
> +
> +	if (!dn || !chip->info->ops->port_set_sched)
> +		return 0;
> +
> +	err = of_property_read_u32(dn, "schedule", &sched);
> +	if (err < 0)
> +		return 0;
> +
> +	return chip->info->ops->port_set_sched(chip, port, (u16)sched);
> +}
> +
>  static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> @@ -2218,6 +2235,10 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  	if (err)
>  		return err;
>  
> +	err = mv88e6xxx_set_port_sched(chip, port);
> +	if (err)
> +		return err;
> +
>  	if (chip->info->ops->port_pause_limit) {
>  		err = chip->info->ops->port_pause_limit(chip, port, 0, 0);
>  		if (err)
> @@ -3130,6 +3151,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3214,6 +3236,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3432,6 +3455,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3776,6 +3800,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 2d2c24f5a79d..ff3e35eceee0 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -386,6 +386,7 @@ struct mv88e6xxx_ops {
>  	int (*port_set_defqpri)(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  
>  	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port);
> +	int (*port_set_sched)(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  	int (*port_pause_limit)(struct mv88e6xxx_chip *chip, int port, u8 in,
>  				u8 out);
>  	int (*port_disable_learn_limit)(struct mv88e6xxx_chip *chip, int port);
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 3a45fcd5cd9c..236732fc598d 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1180,6 +1180,27 @@ int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
>  				    0x0001);
>  }
>  
> +/* Offset 0x0A: Egress Rate Control 2 */
> +int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched)
> +{
> +	u16 reg;
> +	int err;
> +
> +	if (sched > MV88E6XXX_PORT_SCHED_STRICT_ALL)
> +		return -EINVAL;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				  &reg);
> +	if (err)
> +		return err;
> +
> +	reg &= ~MV88E6XXX_PORT_SCHED_MASK;
> +	reg |= sched << MV88E6XXX_PORT_SCHED_SHIFT;
> +
> +	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				    reg);
> +}
> +
>  /* Offset 0x0C: Port ATU Control */
>  
>  int mv88e6xxx_port_disable_learn_limit(struct mv88e6xxx_chip *chip, int port)
> diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
> index 03884bbaa762..710d6eccafae 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.h
> +++ b/drivers/net/dsa/mv88e6xxx/port.h
> @@ -11,6 +11,7 @@
>  #ifndef _MV88E6XXX_PORT_H
>  #define _MV88E6XXX_PORT_H
>  
> +#include <dt-bindings/net/dsa-mv88e6xxx.h>
>  #include "chip.h"
>  
>  /* Offset 0x00: Port Status Register */
> @@ -207,6 +208,10 @@
>  
>  /* Offset 0x0A: Egress Rate Control 2 */
>  #define MV88E6XXX_PORT_EGRESS_RATE_CTL2		0x0a
> +#define MV88E6XXX_PORT_SCHED_SHIFT		12
> +#define MV88E6XXX_PORT_SCHED_MASK \
> +	(0x3 << MV88E6XXX_PORT_SCHED_SHIFT)
> +/* see MV88E6XXX_PORT_SCHED_* in include/dt-bindings/net/dsa-mv88e6xxx.h */
>  
>  /* Offset 0x0B: Port Association Vector */
>  #define MV88E6XXX_PORT_ASSOC_VECTOR			0x0b
> @@ -332,6 +337,7 @@ int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
>  int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
> +int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  int mv88e6097_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,
>  			       u8 out);
>  int mv88e6390_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,

Same comments applied as for the other patches adding implementations in port.c.


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH 6/7] net: dsa: mv88e6xxx: add egress rate limiting
From: Vivien Didelot @ 2019-09-10 17:13 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-7-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:52 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> Add code for specifying egress rate limiting per port.
> The rate can be specified as ethernet frames or bits per second.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c |  72 ++++++++++++++-------
>  drivers/net/dsa/mv88e6xxx/chip.h |   3 +-
>  drivers/net/dsa/mv88e6xxx/port.c | 106 ++++++++++++++++++++++++++++---
>  drivers/net/dsa/mv88e6xxx/port.h |  14 +++-
>  4 files changed, 158 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 2bc22c59200c..8c116496ab2f 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2120,6 +2120,32 @@ static int mv88e6xxx_set_port_sched(struct mv88e6xxx_chip *chip, int port)
>  	return chip->info->ops->port_set_sched(chip, port, (u16)sched);
>  }
>  
> +static int mv88e6xxx_set_port_egress_rate_limiting(struct mv88e6xxx_chip *chip,
> +						   int port)
> +{
> +	struct dsa_switch *ds = chip->ds;
> +	struct device_node *dn = ds->ports[port].dn;
> +	int err;
> +	u32 mode, count;
> +
> +	if (!dn || !chip->info->ops->port_egress_rate_limiting)
> +		return 0;
> +
> +	err = of_property_read_u32(dn, "egress-limit-mode", &mode);
> +	if (err < 0)
> +		goto disable;
> +
> +	err = of_property_read_u32(dn, "egress-limit-count", &count);
> +	if (err < 0)
> +		goto disable;
> +
> +	return chip->info->ops->port_egress_rate_limiting(chip, port, count,
> +							  mode);
> +
> +disable:
> +	return chip->info->ops->port_egress_rate_limiting(chip, port, 0, 0);
> +}
> +
>  static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> @@ -2263,11 +2289,9 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  			return err;
>  	}
>  
> -	if (chip->info->ops->port_egress_rate_limiting) {
> -		err = chip->info->ops->port_egress_rate_limiting(chip, port);
> -		if (err)
> -			return err;
> -	}
> +	err = mv88e6xxx_set_port_egress_rate_limiting(chip, port);
> +	if (err)
> +		return err;
>  
>  	err = mv88e6xxx_setup_message_port(chip, port);
>  	if (err)
> @@ -2809,7 +2833,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
>  	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -2879,7 +2903,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6095_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -2951,7 +2975,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_upstream_port = mv88e6095_port_set_upstream_port,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_set_pause = mv88e6185_port_set_pause,
>  	.port_link_state = mv88e6352_port_link_state,
> @@ -2994,7 +3018,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3034,7 +3058,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3108,7 +3132,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3150,7 +3174,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3193,7 +3217,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3235,7 +3259,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3275,7 +3299,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
>  	.port_set_speed = mv88e6185_port_set_speed,
>  	.port_set_frame_mode = mv88e6085_port_set_frame_mode,
>  	.port_set_egress_floods = mv88e6185_port_set_egress_floods,
> -	.port_egress_rate_limiting = mv88e6095_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_upstream_port = mv88e6095_port_set_upstream_port,
>  	.port_set_pause = mv88e6185_port_set_pause,
>  	.port_link_state = mv88e6185_port_link_state,
> @@ -3454,7 +3478,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3587,7 +3611,7 @@ static const struct mv88e6xxx_ops mv88e6320_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3630,7 +3654,7 @@ static const struct mv88e6xxx_ops mv88e6321_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3673,7 +3697,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3716,7 +3740,7 @@ static const struct mv88e6xxx_ops mv88e6350_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3755,7 +3779,7 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3799,7 +3823,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
>  	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_set_sched = mv88e6xxx_port_set_sched,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3851,7 +3875,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6390_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> @@ -3900,7 +3924,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> -	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
> +	.port_egress_rate_limiting = mv88e6xxx_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6390_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index ff3e35eceee0..75fbd5df4aae 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -385,7 +385,8 @@ struct mv88e6xxx_ops {
>  				   size_t size);
>  	int (*port_set_defqpri)(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  
> -	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port);
> +	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port,
> +					 u32 count, u32 mode);
>  	int (*port_set_sched)(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  	int (*port_pause_limit)(struct mv88e6xxx_chip *chip, int port, u8 in,
>  				u8 out);
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 236732fc598d..41418cfaca56 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1166,21 +1166,107 @@ int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri)
>  	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL2, reg);
>  }
>  
> -/* Offset 0x09: Port Rate Control */
> +/* Offset 0x09: Port Rate Control
> + * Offset 0x0A: Egress Rate Control 2
> + */
>  
> -int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
> +#define Kb			1000
> +#define Mb			(1000 * Kb)
> +#define Gb			(1000ull * Mb)
> +#define EGRESS_FRAME_RATE_MIN	7632
> +#define EGRESS_FRAME_RATE_MAX	31250000
> +#define EGRESS_BPS_RATE_MIN	(64 * Kb)
> +#define EGRESS_BPS_RATE_MAX	(1 * Gb)
> +#define EGRESS_RATE_PERIOD	32
> +int mv88e6xxx_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port,
> +					u32 count, u32 mode)
>  {
> -	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> -				    0x0000);
> -}
> +	u16 reg1, reg2;
> +	int err;
>  
> -int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
> -{
> -	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> -				    0x0001);
> +	/* quick exit for disabling */
> +	if (count == 0) {
> +		err = mv88e6xxx_port_read(chip, port,
> +					  MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +					  &reg2);
> +		if (err)
> +			return err;
> +		reg2 &= ~MV88E6XXX_PORT_EGRESS_RATE_MASK;
> +		err =  mv88e6xxx_port_write(chip, port,
> +					    MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +					    reg2);
> +		return err;
> +	}
> +
> +	if (mode > MV88E6XXX_PORT_EGRESS_COUNT_MODE_L3)
> +		return -EINVAL;
> +
> +	if (mode == MV88E6XXX_PORT_EGRESS_COUNT_MODE_FRAMES &&
> +	    (count < EGRESS_FRAME_RATE_MIN || count > EGRESS_FRAME_RATE_MAX))
> +		return -EINVAL;
> +
> +	if (mode != MV88E6XXX_PORT_EGRESS_COUNT_MODE_FRAMES &&
> +	    (count < EGRESS_BPS_RATE_MIN || count > EGRESS_BPS_RATE_MAX))
> +		return -EINVAL;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> +				  &reg1);
> +	if (err)
> +		return err;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				  &reg2);
> +	if (err)
> +		return err;
> +
> +	reg1 &= ~MV88E6XXX_PORT_EGRESS_DEC_MASK;
> +	reg2 &= ~MV88E6XXX_PORT_EGRESS_COUNT_MODE_MASK;
> +
> +	if (mode == MV88E6XXX_PORT_EGRESS_COUNT_MODE_FRAMES) {
> +		u32 val;
> +
> +		/* recommended to use dec of 1 for frame based */
> +		reg1 |= 1 << MV88E6XXX_PORT_EGRESS_DEC_SHIFT;
> +
> +		reg2 |= mode << MV88E6XXX_PORT_EGRESS_COUNT_MODE_SHIFT;
> +		reg2 &= ~MV88E6XXX_PORT_EGRESS_RATE_MASK;
> +
> +		val = NSEC_PER_SEC / (EGRESS_RATE_PERIOD * count);
> +		if (NSEC_PER_SEC % (EGRESS_RATE_PERIOD * count))
> +			val++;
> +		reg2 |= (u16)(val << MV88E6XXX_PORT_EGRESS_RATE_SHIFT);
> +	} else {
> +		u16 egress_dec, egress_rate;
> +		u64 dec_bytes, ns_bits;
> +
> +		if (count < (1 * Mb))
> +			egress_dec = (u16)roundup(count, (64 * Kb));
> +		else if (count < (100 * Mb))
> +			egress_dec = (u16)roundup(count, (1 * Mb));
> +		else
> +			egress_dec = (u16)roundup(count, (10 * Mb));
> +
> +		reg1 |= egress_dec;
> +
> +		dec_bytes = 8ull * NSEC_PER_SEC * egress_dec;
> +		ns_bits = 32ull * count;
> +		egress_rate = (u16)div64_u64(dec_bytes, ns_bits);
> +		reg2 |= egress_rate;
> +	}
> +
> +	err =  mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL1,
> +				    reg1);
> +	if (err)
> +		return err;
> +
> +	err =  mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_EGRESS_RATE_CTL2,
> +				    reg2);
> +	if (err)
> +		return err;
> +
> +	return 0;
>  }
>  
> -/* Offset 0x0A: Egress Rate Control 2 */
>  int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched)
>  {
>  	u16 reg;
> diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
> index 710d6eccafae..724f839c570a 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.h
> +++ b/drivers/net/dsa/mv88e6xxx/port.h
> @@ -205,13 +205,23 @@
>  
>  /* Offset 0x09: Egress Rate Control */
>  #define MV88E6XXX_PORT_EGRESS_RATE_CTL1		0x09
> +#define MV88E6XXX_PORT_EGRESS_DEC_SHIFT		0
> +#define MV88E6XXX_PORT_EGRESS_DEC_MASK		0x7f
>  
>  /* Offset 0x0A: Egress Rate Control 2 */
>  #define MV88E6XXX_PORT_EGRESS_RATE_CTL2		0x0a
> +#define MV88E6XXX_PORT_EGRESS_COUNT_MODE_SHIFT	14
> +#define MV88E6XXX_PORT_EGRESS_COUNT_MODE_MASK	\
> +	(0x3 << MV88E6XXX_PORT_EGRESS_COUNT_MODE_SHIFT)

No shift macros please, only 0x1234 masks and their values named as in the
documentation. This way we see clearly how the 16-bit registers are organized.

> +/* see MV88E6XXX_PORT_EGRESS_COUNT_* in
> + * include/dt-bindings/net/dsa-mv88e6xxx.h
> + */
>  #define MV88E6XXX_PORT_SCHED_SHIFT		12
>  #define MV88E6XXX_PORT_SCHED_MASK \
>  	(0x3 << MV88E6XXX_PORT_SCHED_SHIFT)
>  /* see MV88E6XXX_PORT_SCHED_* in include/dt-bindings/net/dsa-mv88e6xxx.h */
> +#define MV88E6XXX_PORT_EGRESS_RATE_SHIFT	0
> +#define MV88E6XXX_PORT_EGRESS_RATE_MASK		0xfff
>  
>  /* Offset 0x0B: Port Association Vector */
>  #define MV88E6XXX_PORT_ASSOC_VECTOR			0x0b
> @@ -335,8 +345,8 @@ int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
>  int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  				  size_t size);
>  int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri);
> -int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
> -int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
> +int mv88e6xxx_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port,
> +					u32 count, u32 mode);
>  int mv88e6xxx_port_set_sched(struct mv88e6xxx_chip *chip, int port, u16 sched);
>  int mv88e6097_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,
>  			       u8 out);

This patch does not look good. Implementations in port.c must be simple
functions ordered per Port register, implementing read write operations for
them, and eventually checking unsupported values. No logic in them. You may
abstract some values with an enum defined in chip.h if needed. (some models
don't use the same values for various definitions for example.)


Thanks,

	Vivien

^ permalink raw reply

* Re: Is bug 200755 in anyone's queue??
From: Willem de Bruijn @ 2019-09-10 17:10 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, Steve Zabele, Eric Dumazet, Mark KEATON,
	Network Development, shum@canndrew.org, vladimir116@gmail.com,
	saifi.khan@strikr.in, Daniel Borkmann, Stephen Hemminger,
	Craig Gallek
In-Reply-To: <e364d94b2d2a2342f192d6e80fec4798578a5d07.camel@redhat.com>

On Tue, Sep 10, 2019 at 12:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi all,
>
> On Tue, 2019-09-10 at 11:52 -0400, Willem de Bruijn wrote:
> > This clearly has some loose ends and is no shorter or simpler. So
> > unless anyone has comments or a different solution, I'll finish
> > up the first variant.
>
> I'm sorry for the late feedback.
>
> I was wondering if we could use a new UDP-specific setsockopt to remove
> the connected socket from the reuseport group at connect() time?
>
> That would not have any behavioral change for existing application
> leveraging the current reuseport implementation and requires possibly a
> simpler implementation, but would need application changes for UDP
> servers doing reuse/connect().
>
> WDYT?

Thanks for taking a look, too, Paolo.

I looked into detaching the sockets from the group at connect time. It
could be done without setsockopt, even. Unfortunately, it brings other
problems.

The reuseport group is still there, so may still match sockets
before the connection. If the connected socket no longer has
sk_reuseport set, binding new sockets will fail on conflict. And so a
few more.

^ permalink raw reply

* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Florian Fainelli @ 2019-09-10 16:56 UTC (permalink / raw)
  To: Robert Beckett, netdev; +Cc: Andrew Lunn, Vivien Didelot, David S. Miller
In-Reply-To: <20190910154238.9155-2-bob.beckett@collabora.com>

On 9/10/19 8:41 AM, Robert Beckett wrote:
> Configure autoneg for phy connected CPU ports.
> This allows us to use autoneg between the CPU port's phy and the link
> partner's phy.
> This enables us to negoatiate pause frame transmission to prioritise
> packet delivery over throughput.

s/autoneg/auto-negotiation/
s/phy/PHY/
s/negoatiate/negotiate/
s/prioritise/prioritize/ (maybe the latter is just my US english
dictionary tripping up)

Also the subject should be net: dsa: Configure auto-negotiation for CPU
port to match previous submissions done to that file.

Fixing up that code path sounds reasonable, but are you not hitting the
PHYLINK code path instead?

> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  net/dsa/port.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index f071acf2842b..1b6832eac2c5 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
>  		return PTR_ERR(phydev);
>  
>  	if (enable) {
> +		phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII |
> +				    SUPPORTED_AUI | SUPPORTED_FIBRE |
> +				    SUPPORTED_BNC | SUPPORTED_Pause |
> +				    SUPPORTED_Asym_Pause;
> +		phydev->advertising = phydev->supported;
> +
>  		err = genphy_config_init(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
>  
> +		err = genphy_config_aneg(phydev);
> +		if (err < 0)
> +			goto err_put_dev;
> +
>  		err = genphy_resume(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
> 


-- 
Florian

^ permalink raw reply

* Re: Default qdisc not correctly initialized with custom MTU
From: Cong Wang @ 2019-09-10 16:56 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Netdev
In-Reply-To: <dbc359d3-5cac-9b2e-6520-df4a25964bd3@applied-asynchrony.com>

On Tue, Sep 10, 2019 at 2:14 AM Holger Hoffstätte
<holger@applied-asynchrony.com> wrote:
>
> On 9/10/19 12:52 AM, Cong Wang wrote:
> > On Mon, Sep 9, 2019 at 5:44 AM Holger Hoffstätte
> > <holger@applied-asynchrony.com> wrote:
> >> I can't help but feel this is a slight bug in terms of initialization order,
> >> and that the default qdisc should only be created when it's first being
> >> used/attached to a link, not when the sysctls are configured.
> >
> > Yeah, this is because the fq_codel qdisc is initialized once and
> > doesn't get any notification when the netdev's MTU get changed.
>
> My point was that it shouldn't be created or initialized at all when
> the sysctl is configured, only the name should be validated/stored and
> queried when needed. If any interface is brought up before that point,
> no value (yet) would just mean "trod along with the defaults" to whoever
> is doing the work.

It is _not_ created when sysctl is configured, it is either created via tc
command, or implicitly created by kernel when you bring up eth0.
sysctl only tells kernel what to create by default, but never commits it.

>
> > We can "fix" this by adding a NETDEV_CHANGEMTU notifier to
> > qdisc's, but I don't know if it is really worth the effort.
>
> This is essentially the opposite of what I had in mind. The problem is
> that the entity was created, not that it needs to be notified.

Hmm? You did change MTU after adding fq_codel to eth0, right?
So how do you fix this without notification or recreation of fq_codel
in your mind?

I am happy to hear more details.

> Also I don't think that would work for scenarios with multiple links
> using different MTUs.

The fq_codel you created is apparently attached to a netdev,
I don't think this is even a problem. I _guess_ you somehow
believe you create a standalone fq_codel during sysctl setting,
this is just impossible. It must be attached to an interface, no
matter who creates it.

>
> > Is there any reason you can't change that order?
>
> Yes, because that wouldn't solve anything?

Really? You already said it works for you like below, I am confused.


> Like i said I can just kick the root qdisc to update itself in
> a post interface-setup script, and that works fine. Since I need
> that script anyway for setting several other parameters for
> the device it's no big deal - just another workaround.
>
> A brief look at the initialization in sch_mq/sch_generic unfortunately
> didn't really help clear things up for me, hence I guess my real
> question is whether a qdisc *must* be created early for some reason
> (assuming sysctls come before link setup), or whether this is something
> that could be delayed and done on-demand.

The default qdisc is created by kernel when you don't create any.
Again, you can create your own after changing the MTU, this should
solve the problem you see. It is all about ordering.

Thanks.

^ permalink raw reply

* Re: Is bug 200755 in anyone's queue??
From: Paolo Abeni @ 2019-09-10 16:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Steve Zabele, Eric Dumazet, Mark KEATON, Network Development,
	shum@canndrew.org, vladimir116@gmail.com, saifi.khan@strikr.in,
	Daniel Borkmann, on2k16nm@gmail.com, Stephen Hemminger,
	Craig Gallek
In-Reply-To: <CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com>

Hi all,

On Tue, 2019-09-10 at 11:52 -0400, Willem de Bruijn wrote:
> This clearly has some loose ends and is no shorter or simpler. So
> unless anyone has comments or a different solution, I'll finish
> up the first variant.

I'm sorry for the late feedback.

I was wondering if we could use a new UDP-specific setsockopt to remove
the connected socket from the reuseport group at connect() time?

That would not have any behavioral change for existing application
leveraging the current reuseport implementation and requires possibly a
simpler implementation, but would need application changes for UDP
servers doing reuse/connect().

WDYT?

Cheers,

Paolo


^ permalink raw reply

* RE: [PATCH net-next 0/6] net: stmmac: Improvements for -next
From: Jose Abreu @ 2019-09-10 16:50 UTC (permalink / raw)
  To: Jose Abreu, netdev@vger.kernel.org
  Cc: Joao Pinto, Giuseppe Cavallaro, Alexandre Torgue, David S. Miller,
	Maxime Coquelin, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <cover.1568126224.git.joabreu@synopsys.com>

From: Jose Abreu <joabreu@synopsys.com>
Date: Sep/10/2019, 15:41:21 (UTC+00:00)

> Misc patches for -next. It includes:
>  - Two fixes for features in -next only
>  - New features support for GMAC cores (which includes GMAC4 and GMAC5)

BTW, just for reference (and because I forgot to attach it earlier), 
this is the selftests output for GMAC5.10 after this patchset:

# ethtool -t ens4
The test result is PASS
The test extra info:
 1. MAC Loopback         	 0
 2. PHY Loopback         	 0
 3. MMC Counters         	 0
 4. EEE                  	 -95
 5. Hash Filter MC       	 0
 6. Perfect Filter UC    	 0
 7. MC Filter            	 0
 8. UC Filter            	 0
 9. Flow Control         	 0
10. RSS                  	 -95
11. VLAN Filtering       	 0
12. Double VLAN Filtering	 0
13. Flexible RX Parser   	 0
14. SA Insertion (desc)  	 0
15. SA Replacement (desc)	 0
16. SA Insertion (reg)  	 0
17. SA Replacement (reg)	 0
18. VLAN TX Insertion   	 0
19. SVLAN TX Insertion  	 0
20. L3 DA Filtering     	 -95
21. L3 SA Filtering     	 -95
22. L4 DA TCP Filtering 	 -95
23. L4 SA TCP Filtering 	 -95
24. L4 DA UDP Filtering 	 -95
25. L4 SA UDP Filtering 	 -95
26. ARP Offload         	 0
27. Jumbo Frame         	 0
28. Multichannel Jumbo  	 0
29. Split Header        	 -95

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH 3/7] dt-bindings: mv88e6xxx: add ability to set default queue priorities per port
From: Vivien Didelot @ 2019-09-10 16:49 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Robert Beckett, netdev, Andrew Lunn, David S. Miller, Rob Herring,
	Mark Rutland, devicetree, Jiri Pirko, Ido Schimmel
In-Reply-To: <23101286-4da2-2a53-e7cd-71ead263bbaa@gmail.com>

Hi Robert,

On Tue, 10 Sep 2019 09:42:24 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> This is a vendor specific driver/property,
> marvell,default-queue-priority (which be cheapskate on words) would be
> more readable. But still, I have some more fundamental issues with the
> general approach, see my response in the cover letter.

As Florian said, the DT is unlikely to welcome vendor specific nodes for
configuration which may be generic through standard network userspace tools.


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Florian Fainelli @ 2019-09-10 16:49 UTC (permalink / raw)
  To: Robert Beckett, netdev
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Ido Schimmel,
	Jiri Pirko
In-Reply-To: <20190910154238.9155-1-bob.beckett@collabora.com>

+Ido, Jiri,

On 9/10/19 8:41 AM, Robert Beckett wrote:
> This patch-set adds support for some features of the Marvell switch
> chips that can be used to handle packet storms.
> 
> The rationale for this was a setup that requires the ability to receive
> traffic from one port, while a packet storm is occuring on another port
> (via an external switch with a deliberate loop). This is needed to
> ensure vital data delivery from a specific port, while mitigating any
> loops or DoS that a user may introduce on another port (can't guarantee
> sensible users).

The use case is reasonable, but the implementation is not really. You
are using Device Tree which is meant to describe hardware as a policy
holder for setting up queue priorities and likewise for queue scheduling.

The tool that should be used for that purpose is tc and possibly an
appropriately offloaded queue scheduler in order to map the desired
scheduling class to what the hardware supports.

Jiri, Ido, how do you guys support this with mlxsw?

> 
> [patch 1/7] configures auto negotiation for CPU ports connected with
> phys to enable pause frame propogation.
> 
> [patch 2/7] allows setting of port's default output queue priority for
> any ingressing packets on that port.
> 
> [patch 3/7] dt-bindings for patch 2.
> 
> [patch 4/7] allows setting of a port's queue scheduling so that it can
> prioritise egress of traffic routed from high priority ports.
> 
> [patch 5/7] dt-bindings for patch 4.
> 
> [patch 6/7] allows ports to rate limit their egress. This can be used to
> stop the host CPU from becoming swamped by packet delivery and exhasting
> descriptors.
> 
> [patch 7/7] dt-bindings for patch 6.
> 
> 
> Robert Beckett (7):
>   net/dsa: configure autoneg for CPU port
>   net: dsa: mv88e6xxx: add ability to set default queue priorities per
>     port
>   dt-bindings: mv88e6xxx: add ability to set default queue priorities
>     per port
>   net: dsa: mv88e6xxx: add ability to set queue scheduling
>   dt-bindings: mv88e6xxx: add ability to set queue scheduling
>   net: dsa: mv88e6xxx: add egress rate limiting
>   dt-bindings: mv88e6xxx: add egress rate limiting
> 
>  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +++++
>  drivers/net/dsa/mv88e6xxx/chip.c              | 122 ++++++++++++---
>  drivers/net/dsa/mv88e6xxx/chip.h              |   5 +-
>  drivers/net/dsa/mv88e6xxx/port.c              | 140 +++++++++++++++++-
>  drivers/net/dsa/mv88e6xxx/port.h              |  24 ++-
>  include/dt-bindings/net/dsa-mv88e6xxx.h       |  22 +++
>  net/dsa/port.c                                |  10 ++
>  7 files changed, 327 insertions(+), 34 deletions(-)
>  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH 2/7] net: dsa: mv88e6xxx: add ability to set default queue priorities per port
From: Vivien Didelot @ 2019-09-10 16:43 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Robert Beckett, Andrew Lunn, Florian Fainelli,
	David S. Miller
In-Reply-To: <20190910154238.9155-3-bob.beckett@collabora.com>

Hi Robert,

On Tue, 10 Sep 2019 16:41:48 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> +static int mv88e6xxx_set_port_defqpri(struct mv88e6xxx_chip *chip, int port)
> +{
> +	struct dsa_switch *ds = chip->ds;
> +	struct device_node *dn = ds->ports[port].dn;
> +	int err;
> +	u32 pri;
> +
> +	if (!dn || !chip->info->ops->port_set_defqpri)
> +		return 0;
> +
> +	err = of_property_read_u32(dn, "defqpri", &pri);
> +	if (err < 0)
> +		return 0;
> +
> +	return chip->info->ops->port_set_defqpri(chip, port, (u16)pri);
> +}
> +
>  static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  {
>  	struct dsa_switch *ds = chip->ds;
> @@ -2176,6 +2193,10 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  			return err;
>  	}
>  
> +	err = mv88e6xxx_set_port_defqpri(chip, port);
> +	if (err)
> +		return err;
> +
>  	/* Port Association Vector: when learning source addresses
>  	 * of packets, add the address to the address database using
>  	 * a port bitmap that has only the bit for this port set and
> @@ -3107,6 +3128,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,

Please use a reference model, like mv88e6352_port_set_defqpri to avoid
confusion with a generic wrapper or implementation.

>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3190,6 +3212,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3407,6 +3430,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> @@ -3750,6 +3774,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
>  	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
>  	.port_set_ether_type = mv88e6351_port_set_ether_type,
>  	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
> +	.port_set_defqpri = mv88e6xxx_port_set_defqpri,
>  	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
>  	.port_pause_limit = mv88e6097_port_pause_limit,
>  	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 4646e46d47f2..2d2c24f5a79d 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -383,6 +383,7 @@ struct mv88e6xxx_ops {
>  				   u16 etype);
>  	int (*port_set_jumbo_size)(struct mv88e6xxx_chip *chip, int port,
>  				   size_t size);
> +	int (*port_set_defqpri)(struct mv88e6xxx_chip *chip, int port, u16 pri);

The default queue priority seems to be an integer in the [0:3] range, not
a register mask or value per-se. So an unsigned int seems more appropriate.

>  
>  	int (*port_egress_rate_limiting)(struct mv88e6xxx_chip *chip, int port);
>  	int (*port_pause_limit)(struct mv88e6xxx_chip *chip, int port, u8 in,
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 04309ef0a1cc..3a45fcd5cd9c 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1147,6 +1147,25 @@ int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL2, reg);
>  }
>  
> +int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri)
> +{
> +	u16 reg;
> +	int err;
> +
> +	if (pri > 3)
> +		return -EINVAL;
> +
> +	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL2, &reg);
> +	if (err)
> +		return err;
> +
> +	reg &= ~MV88E6XXX_PORT_CTL2_DEFQPRI_MASK;
> +	reg |= pri << MV88E6XXX_PORT_CTL2_DEFQPRI_SHIFT;

                      __bf_shf(MV88E6XXX_PORT_CTL2_DEFQPRI_MASK)

> +	reg |= MV88E6XXX_PORT_CTL2_USE_DEFQPRI;
> +
> +	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL2, reg);
> +}
> +
>  /* Offset 0x09: Port Rate Control */
>  
>  int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port)
> diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
> index 8d5a6cd6fb19..03884bbaa762 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.h
> +++ b/drivers/net/dsa/mv88e6xxx/port.h
> @@ -197,6 +197,9 @@
>  #define MV88E6XXX_PORT_CTL2_DEFAULT_FORWARD		0x0040
>  #define MV88E6XXX_PORT_CTL2_EGRESS_MONITOR		0x0020
>  #define MV88E6XXX_PORT_CTL2_INGRESS_MONITOR		0x0010
> +#define MV88E6XXX_PORT_CTL2_USE_DEFQPRI		0x0008
> +#define MV88E6XXX_PORT_CTL2_DEFQPRI_MASK		0x0006
> +#define MV88E6XXX_PORT_CTL2_DEFQPRI_SHIFT		1

No shift macro needed, MV88E6XXX_PORT_CTL2_DEFQPRI_MASK is enough.

>  #define MV88E6095_PORT_CTL2_CPU_PORT_MASK		0x000f
>  
>  /* Offset 0x09: Egress Rate Control */
> @@ -326,6 +329,7 @@ int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
>  				    bool message_port);
>  int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
>  				  size_t size);
> +int mv88e6xxx_port_set_defqpri(struct mv88e6xxx_chip *chip, int port, u16 pri);
>  int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
>  int mv88e6097_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
>  int mv88e6097_port_pause_limit(struct mv88e6xxx_chip *chip, int port, u8 in,

Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH 3/7] dt-bindings: mv88e6xxx: add ability to set default queue priorities per port
From: Florian Fainelli @ 2019-09-10 16:42 UTC (permalink / raw)
  To: Robert Beckett, netdev
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Rob Herring,
	Mark Rutland, devicetree, Jiri Pirko, Ido Schimmel
In-Reply-To: <20190910154238.9155-4-bob.beckett@collabora.com>

On 9/10/19 8:41 AM, Robert Beckett wrote:
> Document a new setting for Marvell switch chips to set the default queue
> priorities per port.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  Documentation/devicetree/bindings/net/dsa/marvell.txt | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt
> index 6f9538974bb9..e097c3c52eac 100644
> --- a/Documentation/devicetree/bindings/net/dsa/marvell.txt
> +++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt
> @@ -47,6 +47,10 @@ Optional properties:
>  			  bus. The node must contains a compatible string of
>  			  "marvell,mv88e6xxx-mdio-external"
>  
> +Optional properties for ports:
> +- defqpri=<n>		: Enforced default queue priority for the given port.
> +			  Valid range is 0..3

This is a vendor specific driver/property,
marvell,default-queue-priority (which be cheapskate on words) would be
more readable. But still, I have some more fundamental issues with the
general approach, see my response in the cover letter.
-- 
Florian

^ permalink raw reply

* Re: VRF Issue Since kernel 5
From: David Ahern @ 2019-09-10 16:39 UTC (permalink / raw)
  To: Gowen, netdev@vger.kernel.org
In-Reply-To: <CWLP265MB1554308A1373D9ECE68CB854FDB70@CWLP265MB1554.GBRP265.PROD.OUTLOOK.COM>

On 9/9/19 8:46 AM, Gowen wrote:
> 
> I can run:
> 
> 
> Admin@NETM06:~$ host www.google.co.uk
> www.google.co.uk has address 172.217.169.3
> www.google.co.uk has IPv6 address 2a00:1450:4009:80d::2003
> 
> 
> but I get a timeout for:
> 
> 
> sudo ip vrf  exec mgmt-vrf host www.google.co.uk

sudo perf record -e fib:*  ip vrf  exec mgmt-vrf host www.google.co.uk
sudo perf script

I am guessing the table is wrong for your setup, but maybe the output
(or ordering of events) sheds some light on the problem.

^ permalink raw reply

* Re: VRF Issue Since kernel 5
From: David Ahern @ 2019-09-10 16:36 UTC (permalink / raw)
  To: Alexis Bauvin, Gowen; +Cc: netdev@vger.kernel.org
In-Reply-To: <7CAF2F23-5D88-4BE7-B703-06B71D1EDD11@online.net>

On 9/9/19 1:01 PM, Alexis Bauvin wrote:
> Could you try swapping the local and l3mdev rules?
> 
> `ip rule del pref 0; ip rule add from all lookup local pref 1001`

yes, the rules should be re-ordered so that local rule is after l3mdev
rule (VRF is implemented as policy routing). In general, I would reverse
the order of those commands to ensure no breakage.

Also, 5.0 I think it was (too many kernel versions) added a new l3mdev
sysctl (raw_l3mdev_accept). Check all 3 of them and nmake sure they are
set properly for your use case.

These slides do not cover 5.0 changes but are still the best collection
of notes on VRF:
http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf

^ permalink raw reply

* [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2019-09-10
From: Jeff Kirsher @ 2019-09-10 16:34 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann

This series contains updates to i40e, ixgbe/vf and iavf.

Wenwen Wang fixes a potential memory leak where 3 allocated variables
are not properly cleaned up on failure for ixgbe.

Stefan Assmann fixes a potential kernel panic found when repeatedly
spawning and destroying VFs in i40e when a NULL pointer is dereferenced
due to a race condition.  Fixed up the i40e driver to clear the
__I40E_VIRTCHNL_OP_PENDING bit before returning after an invalid
minimum transmit rate is requested.  Updates the iavf driver to only
apply the MAC address change when the PF ACK's the requested change.

Tonghao Zhang updates ixgbe to use the skb_get_queue_mapping() API call
instead of the driver accessing the queue mapping directly.

Jake updates i40e to use ktime_get_real_ts64() instead of
ktime_to_timespec64().  Removes the define for bit 0x0001 for cloud
filters, since it is a reserved bit and not a valid type.  Also added
code comments to clearly state which bits are reserved and should not be
used or defined for cloud filter adminq command.  Clarify the macros
used to specify the cloud filter fields are individual bits, so use the
BIT() macro.

Aleksandr fixes up the print_link_message() to include the "negotiated"
FEC status for i40e.

Czeslaw also adds additional log message for devices without FEC in the
print_link_message() for i40e.

Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
value that was either 0 or something less than 10 which was causing
issues with hardware features, like RSC, that do not function well with
ITR values that low.

Colin Ian King reduces the object code size by making the array API
static constant.

Magnus fixes a potential receive buffer starvation issue for AF_XDP by
kicking the NAPI context of any queue with an attached AF_XDP zero-copy
socket.

The following are changes since commit c21815f1c199a2ffb77aa862206b0f8d94263d14:
  net/mlx4_en: ethtool: make array modes static const, makes object smaller
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Aleksandr Loktionov (1):
  i40e: fix missed "Negotiated" string in i40e_print_link_message()

Alexander Duyck (1):
  ixgbe: Prevent u8 wrapping of ITR value to something less than 10us

Colin Ian King (1):
  net/ixgbevf: make array api static const, makes object smaller

Czeslaw Zagorski (1):
  i40e: Fix message for other card without FEC.

Jacob Keller (4):
  i40e: use ktime_get_real_ts64 instead of ktime_to_timespec64
  i40e: remove I40E_AQC_ADD_CLOUD_FILTER_OIP
  i40e: mark additional missing bits as reserved
  i40e: use BIT macro to specify the cloud filter field flags

Magnus Karlsson (1):
  i40e: fix potential RX buffer starvation for AF_XDP

Stefan Assmann (3):
  i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
  i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min Tx rate
  iavf: fix MAC address setting for VFs when filter is rejected

Tonghao Zhang (1):
  ixgbe: use skb_get_queue_mapping in tx path

Wenwen Wang (1):
  ixgbe: fix memory leaks

 drivers/net/ethernet/intel/i40e/i40e.h        | 10 +++----
 .../net/ethernet/intel/i40e/i40e_adminq_cmd.h |  5 +++-
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 30 ++++++++++++-------
 drivers/net/ethernet/intel/i40e/i40e_ptp.c    |  2 +-
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c    |  3 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  5 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  1 -
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   |  7 +++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 +++++--
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 +++++----
 10 files changed, 60 insertions(+), 27 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [net-next 04/14] i40e: use ktime_get_real_ts64 instead of ktime_to_timespec64
From: Jeff Kirsher @ 2019-09-10 16:34 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190910163434.2449-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Remove a call to ktime_to_timespec64 by calling ktime_get_real_ts64
directly.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 11394a52e21c..9bf1ad4319f5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -725,7 +725,7 @@ static long i40e_ptp_create_clock(struct i40e_pf *pf)
 	pf->tstamp_config.tx_type = HWTSTAMP_TX_OFF;
 
 	/* Set the previous "reset" time to the current Kernel clock time */
-	pf->ptp_prev_hw_time = ktime_to_timespec64(ktime_get_real());
+	ktime_get_real_ts64(&pf->ptp_prev_hw_time);
 	pf->ptp_reset_start = ktime_get();
 
 	return 0;
-- 
2.21.0


^ permalink raw reply related

* [net-next 02/14] i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
From: Jeff Kirsher @ 2019-09-10 16:34 UTC (permalink / raw)
  To: davem
  Cc: Stefan Assmann, netdev, nhorman, sassmann, stable, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190910163434.2449-1-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

While testing VF spawn/destroy the following panic occurred.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
 ? __switch_to_asm+0x35/0x70
 ? __switch_to_asm+0x41/0x70
 ? __switch_to_asm+0x35/0x70
 ? _cond_resched+0x15/0x30
 i40e_sync_filters_subtask+0x56/0x70 [i40e]
 i40e_service_task+0x382/0x11b0 [i40e]
 ? __switch_to_asm+0x41/0x70
 ? __switch_to_asm+0x41/0x70
 process_one_work+0x1a7/0x3b0
 worker_thread+0x30/0x390
 ? create_worker+0x1a0/0x1a0
 kthread+0x112/0x130
 ? kthread_bind+0x30/0x30
 ret_from_fork+0x35/0x40

Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().

Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.

CC: stable@vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9f2f276bf27..3e2e465f43f9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2592,6 +2592,10 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
 		return;
 	if (!test_and_clear_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state))
 		return;
+	if (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) {
+		set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
+		return;
+	}
 
 	for (v = 0; v < pf->num_alloc_vsi; v++) {
 		if (pf->vsi[v] &&
@@ -2606,6 +2610,7 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
 			}
 		}
 	}
+	clear_bit(__I40E_VF_DISABLE, pf->state);
 }
 
 /**
-- 
2.21.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox