Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jakub Kicinski @ 2018-06-25 20:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev, John Hurley
In-Reply-To: <20180625204021.GE2161@nanopsycho>

On Mon, 25 Jun 2018 22:40:21 +0200, Jiri Pirko wrote:
> Mon, Jun 25, 2018 at 10:22:46PM CEST, jakub.kicinski@netronome.com wrote:
> >From: John Hurley <john.hurley@netronome.com>
> >
> >TC shared blocks allow multiple qdiscs to be grouped together and filters
> >shared between them. Currently the chains of filters attached to a block
> >are only flushed when the block is removed. If a qdisc is removed from a
> >block but the block still exists, flow del messages are not passed to the
> >callback registered for that qdisc. For the NFP, this presents the
> >possibility of rules still existing in hw when they should be removed.
> >
> >Prevent binding to shared blocks until the kernel can send per qdisc del
> >messages when block unbinds occur.  
> 
> This is not nfp-specific problem. Should be handled differently. The
> driver has information about offloaded filters. On unbind, it have
> enough info to do the flush, doesn't it?

Certainly.  But this fix is simpler and sufficient.  We need to
backport it back to 4.16.  If we have to go through driver tables 
and flush filters we may as well merge the reoffload series to net.

^ permalink raw reply

* Re: [PATCH net-next 3/7] net: sched: cls_flower: implement offload tcf_proto_op
From: Jiri Pirko @ 2018-06-25 20:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625043431.13413-4-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 06:34:27AM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Add the reoffload tcf_proto_op in flower to generate an offload message
>for each filter in the given tcf_proto. Call the specified callback with
>this new offload message. The function only returns an error if the
>callback rejects adding a 'hardware only' rule.
>
>A filter contains a flag to indicate if it is in hardware or not. To
>ensure the reoffload function properly maintains this flag, keep a
>reference counter for the number of instances of the filter that are in
>hardware. Only update the flag when this counter changes from or to 0. Add
>a generic helper function to implement this behaviour.
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/sch_generic.h | 15 +++++++++++++
> net/sched/cls_flower.c    | 44 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 59 insertions(+)
>
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 88ed64f60056..c0bd11a928ed 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -336,6 +336,21 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
> 	block->offloadcnt--;
> }
> 
>+static inline void
>+tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt, u32 *flags,

Why u32? This is not uapi or hw facing. Just use "unsigned int".

[...]

^ permalink raw reply

* Re: [PATCH v3,net-next] vlan: implement vlan id and protocol changes
From: David Ahern @ 2018-06-25 20:45 UTC (permalink / raw)
  To: Chas Williams, davem; +Cc: netdev, Roopa Prabhu, Ido Schimmel
In-Reply-To: <20180625103043.25384-1-3chas3@gmail.com>

On 6/25/18 4:30 AM, Chas Williams wrote:
> vlan_changelink silently ignores attempts to change the vlan id
> or protocol id of an existing vlan interface.  Implement by adding
> the new vlan id and protocol to the interface's vlan group and then
> removing the old vlan id and protocol from the vlan group.
> 
> Signed-off-by: Chas Williams <3chas3@gmail.com>
> ---
>  include/linux/netdevice.h |  1 +
>  net/8021q/vlan.c          |  4 ++--
>  net/8021q/vlan.h          |  2 ++
>  net/8021q/vlan_netlink.c  | 38 ++++++++++++++++++++++++++++++++++++++
>  net/core/dev.c            |  1 +
>  5 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3ec9850c7936..a95ae238addf 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2409,6 +2409,7 @@ enum netdev_cmd {
>  	NETDEV_CVLAN_FILTER_DROP_INFO,
>  	NETDEV_SVLAN_FILTER_PUSH_INFO,
>  	NETDEV_SVLAN_FILTER_DROP_INFO,
> +	NETDEV_CHANGEVLAN,
>  };
>  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
>  

you add the new notifier, but do not add any hooks to catch and process it.

Personally, I think it is a bit sketchy to change the vlan id on an
existing device and I suspect it will cause latent errors.

What's your use case for trying to implement the change versus causing
it to generate an unsupported error?

If this patch does get accepted, I believe the mlxsw switchdev driver
will be impacted.


> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 73a65789271b..b5e0ad1a581a 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -51,8 +51,8 @@ const char vlan_version[] = DRV_VERSION;
>  
>  /* End of global variables definitions. */
>  
> -static int vlan_group_prealloc_vid(struct vlan_group *vg,
> -				   __be16 vlan_proto, u16 vlan_id)
> +int vlan_group_prealloc_vid(struct vlan_group *vg,
> +			    __be16 vlan_proto, u16 vlan_id)
>  {
>  	struct net_device **array;
>  	unsigned int pidx, vidx;
> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
> index 44df1c3df02d..c734dd21d70d 100644
> --- a/net/8021q/vlan.h
> +++ b/net/8021q/vlan.h
> @@ -116,6 +116,8 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack);
>  void unregister_vlan_dev(struct net_device *dev, struct list_head *head);
>  bool vlan_dev_inherit_address(struct net_device *dev,
>  			      struct net_device *real_dev);
> +int vlan_group_prealloc_vid(struct vlan_group *vg,
> +			    __be16 vlan_proto, u16 vlan_id);
>  
>  static inline u32 vlan_get_ingress_priority(struct net_device *dev,
>  					    u16 vlan_tci)
> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
> index 9b60c1e399e2..ee27781939e0 100644
> --- a/net/8021q/vlan_netlink.c
> +++ b/net/8021q/vlan_netlink.c
> @@ -107,10 +107,48 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[],
>  			   struct nlattr *data[],
>  			   struct netlink_ext_ack *extack)
>  {
> +	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
>  	struct ifla_vlan_flags *flags;
>  	struct ifla_vlan_qos_mapping *m;
>  	struct nlattr *attr;
>  	int rem;
> +	__be16 vlan_proto = vlan->vlan_proto;
> +	u16 vlan_id = vlan->vlan_id;
> +
> +	if (data[IFLA_VLAN_ID])
> +		vlan_id = nla_get_u16(data[IFLA_VLAN_ID]);
> +
> +	if (data[IFLA_VLAN_PROTOCOL])
> +		vlan_proto = nla_get_be16(data[IFLA_VLAN_PROTOCOL]);
> +
> +	if (vlan->vlan_id != vlan_id || vlan->vlan_proto != vlan_proto) {
> +		struct net_device *real_dev = vlan->real_dev;
> +		struct vlan_info *vlan_info;
> +		struct vlan_group *grp;
> +		__be16 old_vlan_proto = vlan->vlan_proto;
> +		u16 old_vlan_id = vlan->vlan_id;
> +		int err;
> +
> +		err = vlan_vid_add(real_dev, vlan_proto, vlan_id);
> +		if (err)
> +			return err;
> +		vlan_info = rtnl_dereference(real_dev->vlan_info);
> +		grp = &vlan_info->grp;
> +		err = vlan_group_prealloc_vid(grp, vlan_proto, vlan_id);
> +		if (err < 0) {
> +			vlan_vid_del(real_dev, vlan_proto, vlan_id);
> +			return err;
> +		}
> +		vlan_group_set_device(grp, vlan_proto, vlan_id, dev);
> +		vlan->vlan_proto = vlan_proto;
> +		vlan->vlan_id = vlan_id;
> +
> +		vlan_group_set_device(grp, old_vlan_proto, old_vlan_id, NULL);
> +		vlan_vid_del(real_dev, old_vlan_proto, old_vlan_id);
> +
> +		err = call_netdevice_notifiers(NETDEV_CHANGEVLAN, dev);
> +		notifier_to_errno(err);
> +	}
>  
>  	if (data[IFLA_VLAN_FLAGS]) {
>  		flags = nla_data(data[IFLA_VLAN_FLAGS]);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a5aa1c7444e6..4bda5d2e3c85 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1587,6 +1587,7 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd)
>  	N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN)
>  	N(CVLAN_FILTER_PUSH_INFO) N(CVLAN_FILTER_DROP_INFO)
>  	N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO)
> +	N(CHANGEVLAN)
>  	}
>  #undef N
>  	return "UNKNOWN_NETDEV_EVENT";
> 

^ permalink raw reply

* Re: [PATCH net-next 2/7] net: sched: add tcf_proto_op to offload a rule
From: Jiri Pirko @ 2018-06-25 20:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625043431.13413-3-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 06:34:26AM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Create a new tcf_proto_op called 'reoffload' that generates a new offload
>message for each node in a tcf_proto. Pointers to the tcf_proto and
>whether the offload request is to add or delete the node are included.
>Also included is a callback function to send the offload message to and
>the option of priv data to go with the cb.
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/act_api.h     | 3 ---
> include/net/sch_generic.h | 6 ++++++
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/include/net/act_api.h b/include/net/act_api.h
>index 9e59ebfded62..5ff11adbe2a6 100644
>--- a/include/net/act_api.h
>+++ b/include/net/act_api.h
>@@ -190,9 +190,6 @@ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
> #endif
> }
> 
>-typedef int tc_setup_cb_t(enum tc_setup_type type,
>-			  void *type_data, void *cb_priv);
>-
> #ifdef CONFIG_NET_CLS_ACT
> int tc_setup_cb_egdev_register(const struct net_device *dev,
> 			       tc_setup_cb_t *cb, void *cb_priv);
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 6488daa32f82..88ed64f60056 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -20,6 +20,9 @@ struct qdisc_walker;
> struct tcf_walker;
> struct module;
> 
>+typedef int tc_setup_cb_t(enum tc_setup_type type,
>+			  void *type_data, void *cb_priv);
>+
> struct qdisc_rate_table {
> 	struct tc_ratespec rate;
> 	u32		data[256];
>@@ -256,6 +259,9 @@ struct tcf_proto_ops {
> 					  bool *last,
> 					  struct netlink_ext_ack *);
> 	void			(*walk)(struct tcf_proto*, struct tcf_walker *arg);
>+	int			(*reoffload)(struct tcf_proto *, bool,
>+					     tc_setup_cb_t *, void *,
>+					     struct netlink_ext_ack *);

Please, name the args. Unnamed args are annoying...


> 	void			(*bind_class)(void *, u32, unsigned long);
> 
> 	/* rtnetlink specific */
>-- 
>2.17.1
>

^ permalink raw reply

* Re: [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jiri Pirko @ 2018-06-25 20:40 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, John Hurley
In-Reply-To: <20180625202246.28871-3-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 10:22:46PM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>TC shared blocks allow multiple qdiscs to be grouped together and filters
>shared between them. Currently the chains of filters attached to a block
>are only flushed when the block is removed. If a qdisc is removed from a
>block but the block still exists, flow del messages are not passed to the
>callback registered for that qdisc. For the NFP, this presents the
>possibility of rules still existing in hw when they should be removed.
>
>Prevent binding to shared blocks until the kernel can send per qdisc del
>messages when block unbinds occur.

This is not nfp-specific problem. Should be handled differently. The
driver has information about offloaded filters. On unbind, it have
enough info to do the flush, doesn't it?


>
>Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>Reviewed-by: Simon Horman <simon.horman@netronome.com>
>---
> drivers/net/ethernet/netronome/nfp/bpf/main.c       | 3 +++
> drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 +++
> 2 files changed, 6 insertions(+)
>
>diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
>index fcdfb8e7fdea..6b15e3b11956 100644
>--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
>+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
>@@ -202,6 +202,9 @@ static int nfp_bpf_setup_tc_block(struct net_device *netdev,
> 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
> 		return -EOPNOTSUPP;
> 
>+	if (tcf_block_shared(f->block))
>+		return -EOPNOTSUPP;
>+
> 	switch (f->command) {
> 	case TC_BLOCK_BIND:
> 		return tcf_block_cb_register(f->block,
>diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>index 477f584f6d28..525057bee0ed 100644
>--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>@@ -631,6 +631,9 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
> 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
> 		return -EOPNOTSUPP;
> 
>+	if (tcf_block_shared(f->block))
>+		return -EOPNOTSUPP;
>+
> 	switch (f->command) {
> 	case TC_BLOCK_BIND:
> 		return tcf_block_cb_register(f->block,
>-- 
>2.17.1
>

^ permalink raw reply

* [PATCH 2/2] sh_eth: remove sh_eth_cpu_data::rpadir_value
From: Sergei Shtylyov @ 2018-06-25 20:37 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

If RPADIR exists, the value written to it is always the same for all SoCs
(and derived from NET_IP_ALIGN), so there has not  been any need to store
it in the *struct* sh_eth_cpu_data...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
 drivers/net/ethernet/renesas/sh_eth.c |    8 +-------
 drivers/net/ethernet/renesas/sh_eth.h |    1 -
 2 files changed, 1 insertion(+), 8 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -622,7 +622,6 @@ static struct sh_eth_cpu_data r7s72100_d
 	.tpauser	= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -672,7 +671,6 @@ static struct sh_eth_cpu_data r8a7740_da
 	.bculr		= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -798,7 +796,6 @@ static struct sh_eth_cpu_data r8a77980_d
 	.hw_swap	= 1,
 	.nbst		= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -851,7 +848,6 @@ static struct sh_eth_cpu_data sh7724_dat
 	.tpauser	= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value	= 0x00020000, /* NET_IP_ALIGN assumed to be 2 */
 };
 
 static void sh_eth_set_rate_sh7757(struct net_device *ndev)
@@ -898,7 +894,6 @@ static struct sh_eth_cpu_data sh7757_dat
 	.hw_swap	= 1,
 	.no_ade		= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.rtrate		= 1,
 	.dual_port	= 1,
 };
@@ -978,7 +973,6 @@ static struct sh_eth_cpu_data sh7757_dat
 	.bculr		= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -1467,7 +1461,7 @@ static int sh_eth_dev_init(struct net_de
 	/* Descriptor format */
 	sh_eth_ring_format(ndev);
 	if (mdp->cd->rpadir)
-		sh_eth_write(ndev, mdp->cd->rpadir_value, RPADIR);
+		sh_eth_write(ndev, NET_IP_ALIGN << 16, RPADIR);
 
 	/* all sh_eth int mask */
 	sh_eth_write(ndev, 0, EESIPR);
Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -487,7 +487,6 @@ struct sh_eth_cpu_data {
 	u32 ecsipr_value;
 	u32 fdr_value;
 	u32 fcftr_value;
-	u32 rpadir_value;
 
 	/* interrupt checking mask */
 	u32 tx_check;

^ permalink raw reply

* [PATCH 1/2] sh_eth: fix *enum* RPADIR_BIT
From: Sergei Shtylyov @ 2018-06-25 20:36 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

The *enum*  RPADIR_BIT  was declared in the commit 86a74ff21a7a ("net:
sh_eth: add support for Renesas SuperH Ethernet") adding SH771x support,
however the SH771x manual doesn't have the RPADIR register described and,
moreover, tells why the padding insertion must not be used. The newer SoC
manuals do have RPADIR documented, though with somewhat different layout --
update the *enum* according to these manuals...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
 drivers/net/ethernet/renesas/sh_eth.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -403,8 +403,7 @@ enum DESC_I_BIT {

 /* RPADIR */
 enum RPADIR_BIT {
-	RPADIR_PADS1 = 0x20000, RPADIR_PADS0 = 0x10000,
-	RPADIR_PADR = 0x0003f,
+	RPADIR_PADS = 0x1f0000, RPADIR_PADR = 0xffff,
 };

 /* FDR */

^ permalink raw reply

* [PATCH 0/2] sh_eth: RPADIR related clean-ups
From: Sergei Shtylyov @ 2018-06-25 20:34 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc

Hello!

Here's a set of 2 patches against DaveM's 'net-next.git' repo. They are
clean-ups related to RPADIR (DMA padding to NET_IP_ALIGN)...

[1/2] sh_eth: fix *enum* RPADIR_BIT
[2/2] sh_eth: remove sh_eth_cpu_data::rpadir_value

MBR, Sergei

^ permalink raw reply

* [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jakub Kicinski @ 2018-06-25 20:22 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, John Hurley, Jakub Kicinski
In-Reply-To: <20180625202246.28871-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

TC shared blocks allow multiple qdiscs to be grouped together and filters
shared between them. Currently the chains of filters attached to a block
are only flushed when the block is removed. If a qdisc is removed from a
block but the block still exists, flow del messages are not passed to the
callback registered for that qdisc. For the NFP, this presents the
possibility of rules still existing in hw when they should be removed.

Prevent binding to shared blocks until the kernel can send per qdisc del
messages when block unbinds occur.

Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c       | 3 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index fcdfb8e7fdea..6b15e3b11956 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -202,6 +202,9 @@ static int nfp_bpf_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	if (tcf_block_shared(f->block))
+		return -EOPNOTSUPP;
+
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 477f584f6d28..525057bee0ed 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -631,6 +631,9 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	if (tcf_block_shared(f->block))
+		return -EOPNOTSUPP;
+
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 1/2] nfp: flower: fix mpls ether type detection
From: Jakub Kicinski @ 2018-06-25 20:22 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Pieter Jansen van Vuuren
In-Reply-To: <20180625202246.28871-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously it was not possible to distinguish between mpls ether types and
other ether types. This leads to incorrect classification of offloaded
filters that match on mpls ether type. For example the following two
filters overlap:

 # tc filter add dev eth0 parent ffff: \
    protocol 0x8847 flower \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 parent ffff: \
    protocol 0x0800 flower \
    action mirred egress redirect dev eth2

The driver now correctly includes the mac_mpls layer where HW stores mpls
fields, when it detects an mpls ether type. It also sets the MPLS_Q bit to
indicate that the filter should match mpls packets.

Fixes: bb055c198d9b ("nfp: add mpls match offloading support")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 14 ++++++++++++++
 .../net/ethernet/netronome/nfp/flower/offload.c    |  8 ++++++++
 2 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 91935405f586..84f7a5dbea9d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -123,6 +123,20 @@ nfp_flower_compile_mac(struct nfp_flower_mac_mpls *frame,
 			 NFP_FLOWER_MASK_MPLS_Q;
 
 		frame->mpls_lse = cpu_to_be32(t_mpls);
+	} else if (dissector_uses_key(flow->dissector,
+				      FLOW_DISSECTOR_KEY_BASIC)) {
+		/* Check for mpls ether type and set NFP_FLOWER_MASK_MPLS_Q
+		 * bit, which indicates an mpls ether type but without any
+		 * mpls fields.
+		 */
+		struct flow_dissector_key_basic *key_basic;
+
+		key_basic = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_BASIC,
+						      flow->key);
+		if (key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_UC) ||
+		    key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_MC))
+			frame->mpls_lse = cpu_to_be32(NFP_FLOWER_MASK_MPLS_Q);
 	}
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index c42e64f32333..477f584f6d28 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -264,6 +264,14 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 		case cpu_to_be16(ETH_P_ARP):
 			return -EOPNOTSUPP;
 
+		case cpu_to_be16(ETH_P_MPLS_UC):
+		case cpu_to_be16(ETH_P_MPLS_MC):
+			if (!(key_layer & NFP_FLOWER_LAYER_MAC)) {
+				key_layer |= NFP_FLOWER_LAYER_MAC;
+				key_size += sizeof(struct nfp_flower_mac_mpls);
+			}
+			break;
+
 		/* Will be included in layer 2. */
 		case cpu_to_be16(ETH_P_8021Q):
 			break;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 0/2] nfp: MPLS and shared blocks TC offload fixes
From: Jakub Kicinski @ 2018-06-25 20:22 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This series brings two fixes to TC filter/action offload code.
Pieter fixes matching MPLS packets when the match is purely on
the MPLS ethertype and none of the MPLS fields are used.
John provides a fix for offload of shared blocks.  Unfortunately,
with shared blocks there is currently no guarantee that filters
which were added by the core will be removed before block unbind.
Our simple fix is to not support offload of rules on shared blocks
at all, a revert of this fix will be send for -next once the
reoffload infrastructure lands.  The shared blocks became important
as we are trying to use them for bonding offload (managed from user
space) and lack of remove calls leads to resource leaks.

John Hurley (1):
  nfp: reject binding to shared blocks

Pieter Jansen van Vuuren (1):
  nfp: flower: fix mpls ether type detection

 drivers/net/ethernet/netronome/nfp/bpf/main.c      |  3 +++
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 14 ++++++++++++++
 .../net/ethernet/netronome/nfp/flower/offload.c    | 12 ++++++++++++
 3 files changed, 29 insertions(+)

-- 
2.17.1

^ permalink raw reply

* Re: Route fallback issue
From: Grant Taylor @ 2018-06-25 20:07 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <alpine.LFD.2.20.1806252058210.2593@ja.home.ssi.bg>

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

On 06/25/2018 12:50 PM, Julian Anastasov wrote:
> Hello,

Hi Julian,

> Yes, ARP state for unreachable GWs may be updated slowly, there is 
> in-time feedback only for reachable state.

Fair.

Most of the installations where I needed D.G.D. to work would be okay 
with a < 5 minute timeout.  Obviously they would like faster, but 
automation is a LOT better than waiting on manual intervention.

IMHO < 30 seconds is great.  < 90 seconds is acceptable.  < 300 seconds 
leaves some room for improvement.

> You can create the two routes, of course. But only the default routes 
> are alternative.

Are you saying that the functionality I'm describing only works for 
default gateways or that the term "alternative route" only applies to 
default gateways?

The testing that I did indicated that alternative routes worked for 
specific prefixes too.

I tested multiple NetNSs with only directly attached routes and appended 
routes to a destination prefix, no default gateway / route of last resort.

The behavior seemed to be different when ignore_routes_with_linkdown was 
set verses unset.  Specifically, ignore_routes_with_linkdown seemed to 
help considerably.

Hence why I question the requirement for the "default" route verses a 
route to a specific prefix.

Can you explain why I saw the behavior difference with 
ignore_routes_with_linkdown if it only applies to the default route?

> The alternative routes work in this way:
> 
> - on lookup, routes are walked in order - as listed in table
> 
> - as long as route contains reachable gateway (ARP state), only this 
> route is used
> 
> - if some gateway becomes unreachable (ARP state), next alternative 
> routes are tried
> 
> - if ARP entry is expired (missing), this gateway can be probed if the 
> route is before the currently used route. This is what happens initially 
> when no ARP state is present for the GWs. It is bad luck if the probed 
> GW is actually unreachable.
> 
> - active probing by user space (ping GWs) can only help to keep the ARP 
> state present for the used gateways. By this way, if ARP entry for GW 
> is missing, the kernel will not risk to select unavailable route with 
> the goal to probe the GW.

This all makes sense.

Please confirm if "gateway" in this context is the "/default/ gateway" 
or not.  I ask because arguably "gateway" can be used as a term to 
describe the next hop for a route, or gateway, to a prefix.  Further, 
the "/default/ (gateway,router)" is the gateway or route of last resort. 
  Which to me means that "gateway" can be any route in this context.

> nexthop is the GW in the route

Thank you for confirming.

> Yes, the kernel avoids alternative routes with unreachable GWs

Fair enough.

> The multipath route uses all its alive nexthops at the same time... But 
> you may need in the same way active probing by user space, otherwise 
> unavailable GW can be selected.

I assume that the dead ECMP NEXTHOP is also subject to similar timeouts 
as alternative routes.  Correct?

> Yes, if you prefer, you may run PING every second to avoid such delays...

Agreed.

I'm trying to make sure I understand basic functionality before I do 
things to modify it.

-- 
Grant. . . .
unix || die

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply

* Re: Route fallback issue
From: Julian Anastasov @ 2018-06-25 19:02 UTC (permalink / raw)
  To: Erik Auerswald; +Cc: Grant Taylor, Akshat Kakkar, netdev, cronolog+lartc, lartc
In-Reply-To: <20180624134524.GA9925@unix-ag.uni-kl.de>


	Hello,

On Sun, 24 Jun 2018, Erik Auerswald wrote:

> Hello Julien,
> 
> > http://ja.ssi.bg/dgd-usage.txt
> 
> Thanks for that info!
> 
> Can you tell us what parts from the above text is actually implemented
> in the upstream Linux kernel, and starting with which version(s)
> (approximately)? The text describes ideas and patches from nearly two
> decades ago, is more recent documentation available somewhere?

	Nothing is included in kernel. The idea is that user space
has more control. It is best done with CONNMARK: stick NATed
connection to some path (via alive ISP), use route lookup just
to select alive path for the first packet in connection. So, what
we balance are connections, not packets (which does not work with
different ISPs). Probe GWs to keep only alive routes in the table.

Regards

^ permalink raw reply

* Re: Route fallback issue
From: Julian Anastasov @ 2018-06-25 18:50 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <544875a1-68e6-a8f3-4eb7-44f053605c3e@spamtrap.tnetconsulting.net>


	Hello,

On Thu, 21 Jun 2018, Grant Taylor wrote:

> On 06/21/2018 01:57 PM, Julian Anastasov wrote:
> > Hello,
> 
> > http://ja.ssi.bg/dgd-usage.txt
> 
> "DGD" or "Dead Gateway Detection" sounds very familiar.  I referenced it in an
> earlier reply.
> 
> I distinctly remember DGD not behaving satisfactorily years ago.  Where
> unsatisfactorily was something like 90 seconds (or more) to recover. Which
> actually matches what I was getting without the ignore_routes_with_linkdown=1
> setting that David A. mentioned.

	Yes, ARP state for unreachable GWs may be updated slowly,
there is in-time feedback only for reachable state.

> With ignore_routes_with_linkdown=1 things behaved much better.
> 
> > Not true. net/ipv4/fib_semantics.c:fib_select_path() calls
> > fib_select_default() only when prefixlen = 0 (default route).
> 
> Okay....  My testing last night disagrees with you.  Specifically, I was able
> to add a alternate routes to the same prefix, 192.0.2.128/26. There was not
> any default gateway configured on any of the NetNSs.  So everything was using
> routes for locally attacked or the two added via "ip route append".
> 
> What am I misinterpreting?  Or where are we otherwise talking past each other?

	You can create the two routes, of course. But only the
default routes are alternative.

> 
> > Otherwise, only the first route will be considered.
> 
> "only the first route" almost sounds like something akin to Equal Cost Multi
> Path.
> 
> I was not expecting "alternative routes" to use more than one route at a time,
> equally or otherwise.  I was wanting for the kernel to fall back to an
> alternate route / gateway / path in the event that the one that was being used
> became unusable / unreachable.
> 
> So what should "Alternative Routes" do?  How does this compare / contract to
> E.C.M.P. or D.G.D.

	The alternative routes work in this way:

- on lookup, routes are walked in order - as listed in table

- as long as route contains reachable gateway (ARP state), only this route 
is used

- if some gateway becomes unreachable (ARP state), next alternative routes 
are tried

- if ARP entry is expired (missing), this gateway can be probed if the 
route is before the currently used route. This is what happens initially
when no ARP state is present for the GWs. It is bad luck if the probed
GW is actually unreachable.

- active probing by user space (ping GWs) can only help to keep the
ARP state present for the used gateways. By this way, if ARP entry 
for GW is missing, the kernel will not risk to select unavailable route 
with the goal to probe the GW.

> > fib_select_default() is the function that decides which nexthop is reachable
> > and whether to contact it. It uses the ARP state via fib_detect_death().
> > That is all code that is behind this feature called "alternative routes":
> > the kernel selects one based on nexthop's ARP state.
> 
> Please confirm that you aren't entering / referring to E.C.M.P. territory when
> you say "nexthop".  I think that you are not, but I want to ask and be sure,
> particularly seeing as how things are very closely related.

	nexthop is the GW in the route

> It sounds like you're referring to literally the router that is the next hop
> in the path.  I.e. the device on the other end of the wire.

	Yes, the kernel avoids alternative routes with unreachable GWs

> I want to do some testing to see if fib_multipath_use_neigh alters this
> behavior at all.  I'm hoping that it will invalidate an alternate route if the
> MAC is not resolvable even if the physical link stays up.

	The multipath route uses all its alive nexthops at the same 
time... But you may need in the same way active probing by user space,
otherwise unavailable GW can be selected.

> Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering this
> behavior.  But having that timeout and starting to use an alternative route is
> considerably better than not using an alternative route.

	Yes, if you prefer, you may run PING every second to avoid such 
delays...

Regards

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Santosh Shilimkar @ 2018-06-25 18:44 UTC (permalink / raw)
  To: Sowmini Varadhan, Ka-Cheong Poon; +Cc: netdev, davem, rds-devel
In-Reply-To: <20180625175006.GI14823@oracle.com>

On 6/25/2018 10:50 AM, Sowmini Varadhan wrote:
> On (06/26/18 01:43), Ka-Cheong Poon wrote:
>>
>> Yes, I think if the socket is bound, it should check the scope_id
>> in msg_name (if not NULL) to make sure that they match.  A bound
>> RDS socket can send to multiple peers.  But if the bound local
>> address is link local, it should only be allowed to send to peers
>> on the same link.
> 
> agree.
Yep. Its inline with RDS bind behavior.

> 
> 
>> If a socket is bound, I guess the scope_id should be used.  So
>> if a socket is not bound to a link local address and the socket
>> is used to sent to a link local peer, it should fail.
> 
> PF_RDS sockets *MUST* alwasy be bound.  See
> Documentation/networking/rds.txt:
> "   Sockets must be bound before you can send or receive data.
>      This is needed because binding also selects a transport and
>      attaches it to the socket. Once bound, the transport assignment
>      does not change."
> 
In any case link local or not, the socket needs to be bound before
any data can be sent as documented. Send path already enforces
it.

>>> Also, why is there no IPv6 support in rds_connect?
>>
>>
>> Oops, I missed this when I ported the internal version to the
>> net-next version.  Will add it back.
> 
So the net-next wasn't tested? IPv6 connections
itself wouldn't be formed with this missing. As mentioned
already, please test v2 before posting on list.

Regards,
Santosh

^ permalink raw reply

* Re: [net regression] "fib_rules: move common handling of newrule delrule msgs into fib_nl2rule" breaks suppress_prefixlength
From: Roopa Prabhu @ 2018-06-25 18:36 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev
In-Reply-To: <CAJieiUiDOQF0NTUGAYmQEtwnrsXoqTAt-cW+dueeRXVP8rhaAw@mail.gmail.com>

On Mon, Jun 25, 2018 at 8:23 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On Sat, Jun 23, 2018 at 8:46 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> Hey Roopa,
>>
>> On a kernel with a minimal networking config,
>> CONFIG_IP_MULTIPLE_TABLES appears to be broken for certain rules after
>> f9d4b0c1e9695e3de7af3768205bacc27312320c.
>>
>> Try, for example, running:
>>
>> $ ip -4 rule add table main suppress_prefixlength 0
>>
>> It returns with EEXIST.
>>
>> Perhaps the reason is that the new rule_find function does not match
>> on suppress_prefixlength? However, rule_exist from before didn't do
>> that either. I'll keep playing and see if I can track it down myself,
>> but thought I should let you know first.
>
> I am surprised at that also. I cannot find prior rule_exist looking at
> suppress_prefixlength.
> I will dig deeper also today. But your patch LGTM with a small change
> I commented on it.
>

So the previous rule_exists code did not check for attribute matches correctly.
It would ignore a rule at the first non-existent attribute mis-match.
eg in your case, it would
return at pref mismatch.


$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0

$ip rule show
0:      from all lookup local

32763:  from all lookup main suppress_prefixlength 0

32764:  from all lookup main suppress_prefixlength 0

32765:  from all lookup main suppress_prefixlength 0

32766:  from all lookup main

32767:  from all lookup default


With your patch (with my proposed fixes), you should get proper EXISTS check

$ git diff HEAD

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c

index 126ffc5..de4c171 100644

--- a/net/core/fib_rules.c

+++ b/net/core/fib_rules.c

@@ -428,6 +428,14 @@ static struct fib_rule *rule_find(struct
fib_rules_ops *ops,

                if (rule->l3mdev && r->l3mdev != rule->l3mdev)

                        continue;



+               if (rule->suppress_ifgroup != -1 &&

+                   (r->suppress_ifgroup != rule->suppress_ifgroup))

+                       continue;

+

+               if (rule->suppress_prefixlen != -1 &&

+                   (r->suppress_prefixlen != rule->suppress_prefixlen))

+                       continue;

+

                if (uid_range_set(&rule->uid_range) &&

                    (!uid_eq(r->uid_range.start, rule->uid_range.start) ||

                    !uid_eq(r->uid_range.end, rule->uid_range.end)))



$ ip -4 rule add table main suppress_prefixlength 0

$ ip -4 rule add table main suppress_prefixlength 0

RTNETLINK answers: File exists

^ permalink raw reply

* [PATCH net-next] r8169: reject unsupported WoL options
From: Heiner Kallweit @ 2018-06-25 18:34 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

So far unsupported WoL options are silently ignored. Change this and
reject attempts to set unsupported options. This prevents situations
where a user tries to set an unsupported WoL option and is under the
impression it was successful because ethtool doesn't complain.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 1d33672c..70c13cc2 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1712,11 +1712,14 @@ static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 	struct rtl8169_private *tp = netdev_priv(dev);
 	struct device *d = tp_to_dev(tp);
 
+	if (wol->wolopts & ~WAKE_ANY)
+		return -EINVAL;
+
 	pm_runtime_get_noresume(d);
 
 	rtl_lock_work(tp);
 
-	tp->saved_wolopts = wol->wolopts & WAKE_ANY;
+	tp->saved_wolopts = wol->wolopts;
 
 	if (pm_runtime_active(d))
 		__rtl8169_set_wol(tp, tp->saved_wolopts);
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH] net/nfp: cast sizeof() to int when comparing with error code
From: Jakub Kicinski @ 2018-06-25 18:11 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: davem, oss-drivers, netdev
In-Reply-To: <20180625073318.6828-1-cgxu519@gmx.com>

On Mon, 25 Jun 2018 15:33:18 +0800, Chengguang Xu wrote:
> Negative error code will be larger than sizeof().
> 
> Signed-off-by: Chengguang Xu <cgxu519@gmx.com>

You should include the name of the tree in the tag, e.g. [PATCH net] or
[PATCH net-next] and if it's a fix, i.e. directed at the net tree, you
should add a fixes tag, in this case it would most likely be:

Fixes: a0d8e02c35ff ("nfp: add support for reading nffw info")

You can find this and more information about the development process of
the networking subsystem in:

https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Thank you!

^ permalink raw reply

* Re: [PATCH 00/14] ARM: davinci: step towards removing at24_platform_data
From: Andrew Lunn @ 2018-06-25 18:02 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman, Linux ARM, Linux Kernel Mailing List,
	linux-omap, netdev, Bartosz Golaszewski
In-Reply-To: <CAMRc=MdEn74G5SJAaGLAfNSS7Zy7wqH04aN5CMZ45UXqhu_58g@mail.gmail.com>

> With my patch 1/14 you'll get -EPROBE_DEFER from nvmem_cell_get() if
> the nvmem provider is not yet registered. Will that help in your case?

I don't think so. My driver instantiates the AT24 device. So if i get
-EPROBE_DEFER, i need to cleanup the probe, and return -EPROBDE_DEFER
to the code. Which means i need to remove the AT24 device...

       Andrew

^ permalink raw reply

* [PATCH 0/4] lan78xx minor fixes
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson

This is a small set of patches for the Microchip LAN78xx chip,
as used in the Raspberry Pi 3B+.
The main debug/discussion was on
https://github.com/raspberrypi/linux/issues/2458

Initial symptoms were that VLANs were very unreliable.
A couple of things were found:
- firstly that the hardware timeout value set failed to 
  take into account the VLAN tag, so a full MTU packet
  would be timed out.
- second was that regular checksum failures were being
  reported. Disabling checksum offload confirmed that
  the checksums were valid, and further experimentation
  identified that it was only if the VLAN tags were being
  passed through to the kernel that there were issues.
  The hardware supports VLAN filtering and tag stripping,
  therefore those have been implemented (much of the work
  was already done), and the driver drops back to s/w
  checksums should the choice be made not to use the h/w
  VLAN stripping.

Dave Stevenson (4):
  net: lan78xx: Allow for VLAN headers in timeout calcs
  net: lan78xx: Add support for VLAN filtering.
  net: lan78xx: Add support for VLAN tag stripping.
  net: lan78xx: Use s/w csum check on VLANs without tag stripping

 drivers/net/usb/lan78xx.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH 1/4] net: lan78xx: Allow for VLAN headers in timeout calcs
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

The frame abort timeout being set by lan78xx_set_rx_max_frame_length
didn't account for any VLAN headers, resulting in very low
throughput if used with tagged VLANs.

Use VLAN_ETH_HLEN instead of ETH_HLEN to correct for this.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index a89570f..2f793d4 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2298,7 +2298,7 @@ static int lan78xx_change_mtu(struct net_device *netdev, int new_mtu)
 	if ((ll_mtu % dev->maxpacket) == 0)
 		return -EDOM;
 
-	ret = lan78xx_set_rx_max_frame_length(dev, new_mtu + ETH_HLEN);
+	ret = lan78xx_set_rx_max_frame_length(dev, new_mtu + VLAN_ETH_HLEN);
 
 	netdev->mtu = new_mtu;
 
@@ -2587,7 +2587,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 	buf |= FCT_TX_CTL_EN_;
 	ret = lan78xx_write_reg(dev, FCT_TX_CTL, buf);
 
-	ret = lan78xx_set_rx_max_frame_length(dev, dev->net->mtu + ETH_HLEN);
+	ret = lan78xx_set_rx_max_frame_length(dev,
+					      dev->net->mtu + VLAN_ETH_HLEN);
 
 	ret = lan78xx_read_reg(dev, MAC_RX, &buf);
 	buf |= MAC_RX_RXEN_;
-- 
2.7.4

^ permalink raw reply related

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-25 17:54 UTC (permalink / raw)
  To: Siwei Liu, Michael S. Tsirkin
  Cc: Cornelia Huck, Alexander Duyck, virtio-dev, aaron.f.brown,
	Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel, virtualization,
	konrad.wilk, boris.ostrovsky, Joao Martins, Venu Busireddy,
	vijay.balakrishna
In-Reply-To: <CADGSJ22DX8=rNPY+gNS_q=LCYLR5ieoE7oD8p4+8AnpzQsWSCg@mail.gmail.com>

On 6/22/2018 5:17 PM, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 4:40 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>> On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote:
>>>> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
>>>>>> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>>>>>>> On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>>>>>>>>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>>>>>>>>>> On Wed, 20 Jun 2018 22:48:58 +0300
>>>>>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>>>>>>>>>>>> In any case, I'm not sure anymore why we'd want the extra uuid.
>>>>>>>>>>> It's mostly so we can have e.g. multiple devices with same MAC
>>>>>>>>>>> (which some people seem to want in order to then use
>>>>>>>>>>> then with different containers).
>>>>>>>>>>>
>>>>>>>>>>> But it is also handy for when you assign a PF, since then you
>>>>>>>>>>> can't set the MAC.
>>>>>>>>>>>
>>>>>>>>>> OK, so what about the following:
>>>>>>>>>>
>>>>>>>>>> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>>>>>>>>>>    that we have a new uuid field in the virtio-net config space
>>>>>>>>>> - in QEMU, add a property for virtio-net that allows to specify a uuid,
>>>>>>>>>>    offer VIRTIO_NET_F_STANDBY_UUID if set
>>>>>>>>>> - when configuring, set the property to the group UUID of the vfio-pci
>>>>>>>>>>    device
>>>>>>>>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>>>>>>>>> to still expose UUID in the config space on virtio-pci?
>>>>>>>>
>>>>>>>> Yes but guest is not supposed to read it.
>>>>>>>>
>>>>>>>>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>>>>>>>>> where the corresponding vfio-pci device attached to for a guest which
>>>>>>>>> doesn't support the feature (legacy).
>>>>>>>>>
>>>>>>>>> -Siwei
>>>>>>>> Yes but you won't add the primary behind such a bridge.
>>>>>>> I assume the UUID feature is a new one besides the exiting
>>>>>>> VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
>>>>>>> if UUID feature is present and supported by guest, we'll do pairing
>>>>>>> based on UUID; if UUID feature present
>>>>>>                                   ^^^^^^^  is NOT present
>>>>> Let's go back for a bit.
>>>>>
>>>>> What is wrong with matching by mac?
>>>>>
>>>>> 1. Does not allow multiple NICs with same mac
>>>>> 2. Requires MAC to be programmed by host in the PT device
>>>>>     (which is often possible with VFs but isn't possible with all PT
>>>>>     devices)
>>>> Might not neccessarily be something wrong, but it's very limited to
>>>> prohibit the MAC of VF from changing when enslaved by failover.
>>> You mean guest changing MAC? I'm not sure why we prohibit that.
>> I think Sridhar and Jiri might be better person to answer it. My
>> impression was that sync'ing the MAC address change between all 3
>> devices is challenging, as the failover driver uses MAC address to
>> match net_device internally.

Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
don't allow changing guest MAC unless it is a trusted VF.

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Sowmini Varadhan @ 2018-06-25 17:50 UTC (permalink / raw)
  To: Ka-Cheong Poon; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <25e1afda-7497-7f08-815a-286cf775bc09@oracle.com>

On (06/26/18 01:43), Ka-Cheong Poon wrote:
> 
> Yes, I think if the socket is bound, it should check the scope_id
> in msg_name (if not NULL) to make sure that they match.  A bound
> RDS socket can send to multiple peers.  But if the bound local
> address is link local, it should only be allowed to send to peers
> on the same link.

agree.


> If a socket is bound, I guess the scope_id should be used.  So
> if a socket is not bound to a link local address and the socket
> is used to sent to a link local peer, it should fail.

PF_RDS sockets *MUST* alwasy be bound.  See
Documentation/networking/rds.txt:
"   Sockets must be bound before you can send or receive data.
    This is needed because binding also selects a transport and
    attaches it to the socket. Once bound, the transport assignment
    does not change."

Also, rds_sendmsg checks this (from net-next, your version
has the equivalent ipv6_addr_any etc check):

        if (daddr == 0 || rs->rs_bound_addr == 0) {
                release_sock(sk);
                ret = -ENOTCONN; /* XXX not a great errno */
                goto out;
        }

> 
> >Also, why is there no IPv6 support in rds_connect?
> 
> 
> Oops, I missed this when I ported the internal version to the
> net-next version.  Will add it back.

Ok

--Sowmini

^ permalink raw reply

* Re: [PATCH 00/14] ARM: davinci: step towards removing at24_platform_data
From: Bartosz Golaszewski @ 2018-06-25 17:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman, Linux ARM, Linux Kernel Mailing List,
	linux-omap, netdev, Bartosz Golaszewski
In-Reply-To: <20180625174024.GB17417@lunn.ch>

2018-06-25 19:40 GMT+02:00 Andrew Lunn <andrew@lunn.ch>:
> On Mon, Jun 25, 2018 at 05:50:11PM +0200, Bartosz Golaszewski wrote:
>> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>>
>> Since I took over maintainership of the at24 driver I've been working
>> towards removing at24_platform_data in favor for device properties.
>>
>> DaVinci is the only platform that's still using it - all other users
>> have already been converted.
>>
>> One of the obstacles in case of DaVinci is removing the setup() callback
>> from the pdata struct, the only user of which are some davinci boards.
>
> Hi Bartosz
>
> Nice code.
>
> I've got a platform i want to add sometime soon using the at24. I
> noticed you doing the cleanup, so i avoided the setup() call. But
> using it would of helped...
>
> My platform is x86 based, so no device tree. I instantiate a number of
> AT24 devices from a platform driver, and then add nvmem cells so i can
> access data in these eeproms. This new code will make this simpler.
>
>> The only board that's still using this callback is now mityomapl138.
>> Unfortunately it stores more info in EEPROM than just the MAC address
>> and will require some more work. Unfortunately I don't have access
>> to this board so I can't test any actual solutions on a live hardware.
>
> Depending on what i find in the EEPROM, i need to instantiate other
> i2c devices. So i have the problem of knowing when the EEPROM has
> actually probed and i can use the nvmem API to retrieve the contents.
>
> What i have done so far, is registers a bus notifier on i2c_bus_type,
> and look for BUS_NOTIFY_BOUND_DRIVER. I can then check if the i2c
> client in the notifier is the at24 client. But when i then add more
> i2c clients from inside the notifier i get lockdep splats. They are
> false positives, but it does suggest it is not a good idea to do this.
>
> So it would be good to have some sort of recommended alternative to
> the setup() callback. Ideally it would be specific to a particular
> at24, and safe to call other i2c functions from.
>
> Do you have any ideas?
>
>    Andrew

With my patch 1/14 you'll get -EPROBE_DEFER from nvmem_cell_get() if
the nvmem provider is not yet registered. Will that help in your case?

Bart

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Ka-Cheong Poon @ 2018-06-25 17:43 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <20180625170317.GA28578@oracle.com>

On 06/26/2018 01:03 AM, Sowmini Varadhan wrote:
> On (06/25/18 03:38), Ka-Cheong Poon wrote:
>> @@ -1105,8 +1105,27 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
>>   			break;
>>   
>>   		case sizeof(*sin6): {
>> -			ret = -EPROTONOSUPPORT;
>> -			goto out;
>> +			int addr_type;
>                           :
>                           :
>> +			daddr = sin6->sin6_addr;
>> +			dport = sin6->sin6_port;
>> +			scope_id = sin6->sin6_scope_id;
>> +			break;
>>   		}
> 
> In rds_sendmsg, the scopeid passed to rds_conn_create_outgoing
> may come from the msg_name (if msg_name is a link-local) or
> may come from the rs_bound_scope_id (for connected socket, change
> made in Patch 1 of the series).
> 
> This sounds inconsistent.
> 
> If I bind to scopeid if1 and then send to fe80::1%if2 (without connect()),
> we'd create an rds_connection with dev_if set to if2.
> (first off, its a bit unexpected to be sending to fe80::1%if2 when you
> are bound to a link-local on if1!)
> 
> But then, if we got back a response from fe80::1%if2, I think we would
> not find a matching conn in rds_recv_incoming?


Yes, I think if the socket is bound, it should check the scope_id
in msg_name (if not NULL) to make sure that they match.  A bound
RDS socket can send to multiple peers.  But if the bound local
address is link local, it should only be allowed to send to peers
on the same link.


> And this is even more confusing because the fastpath in rds_sendmsg
> does not take the bound_scope_id into consideration at all:
> 1213         if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr))
> 1214                 conn = rs->rs_conn;
> 1215         else {
> 1216                 conn = rds_conn_create_outgoing( /* .. */, scope_id)
> so if I erroneously passed a msg_name on a connected rds socket, what
> would happen? (see also question about rds_connect() itself, below)


The check added above takes care of this.  The scope_id should
match.


> Should we always use rs_bound_scope_id for creating the outgoing
> rds_connection? (you may need something deterministic for this,
> like "if bound addr is linklocal, return error if daddr has a different
> scopeid, else use the bound addr's scopeid", plus, "if bound addr is
> not global, and daddr is link-local, we need a conn with the daddr's
> scopeid")


If a socket is bound, I guess the scope_id should be used.  So
if a socket is not bound to a link local address and the socket
is used to sent to a link local peer, it should fail.


> Also, why is there no IPv6 support in rds_connect?


Oops, I missed this when I ported the internal version to the
net-next version.  Will add it back.



-- 
K. Poon
ka-cheong.poon@oracle.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox