Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH][next] Bluetooth: hci_event: Use struct_size() helper
From: Gustavo A. R. Silva @ 2019-02-08  0:40 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S. Miller
  Cc: linux-bluetooth, netdev, linux-kernel, Gustavo A. R. Silva

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.

So, change the following form:

sizeof(*ev) + ev->num_hndl * sizeof(struct hci_comp_pkts_info)

 to :

struct_size(ev, handles, ev->num_hndl)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 net/bluetooth/hci_event.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index ac2826ce162b..609fd6871c5a 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3556,8 +3556,8 @@ static void hci_num_comp_pkts_evt(struct hci_dev *hdev, struct sk_buff *skb)
 		return;
 	}
 
-	if (skb->len < sizeof(*ev) || skb->len < sizeof(*ev) +
-	    ev->num_hndl * sizeof(struct hci_comp_pkts_info)) {
+	if (skb->len < sizeof(*ev) ||
+	    skb->len < struct_size(ev, handles, ev->num_hndl)) {
 		BT_DBG("%s bad parameters", hdev->name);
 		return;
 	}
@@ -3644,8 +3644,8 @@ static void hci_num_comp_blocks_evt(struct hci_dev *hdev, struct sk_buff *skb)
 		return;
 	}
 
-	if (skb->len < sizeof(*ev) || skb->len < sizeof(*ev) +
-	    ev->num_hndl * sizeof(struct hci_comp_blocks_info)) {
+	if (skb->len < sizeof(*ev) ||
+	    skb->len < struct_size(ev, handles, ev->num_hndl)) {
 		BT_DBG("%s bad parameters", hdev->name);
 		return;
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next] bridge: use struct_size() helper
From: Gustavo A. R. Silva @ 2019-02-08  0:58 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov, David S. Miller
  Cc: bridge, netdev, linux-kernel, Gustavo A. R. Silva

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

size = struct_size(instance, entry, count);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 net/bridge/br_multicast.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 1fb885a33c66..4a048fd1cbea 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1018,8 +1018,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
 		if (!nsrcs)
 			return -EINVAL;

-		grec_len = sizeof(*grec) +
-			   sizeof(struct in6_addr) * ntohs(*nsrcs);
+		grec_len = struct_size(grec, grec_src, ntohs(*nsrcs));

 		if (!ipv6_mc_may_pull(skb, len + grec_len))
 			return -EINVAL;
-- 
2.20.1

^ permalink raw reply related

* [PATCH net-next] netfilter: xt_recent: Use struct_size() in kvzalloc()
From: Gustavo A. R. Silva @ 2019-02-08  0:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller
  Cc: netfilter-devel, coreteam, netdev, linux-kernel,
	Gustavo A. R. Silva

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

size = sizeof(struct foo) + count * sizeof(void *);
instance = alloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

size = struct_size(instance, entry, count);
instance = alloc(size, GFP_KERNEL)

Notice that, in this case, variable sz is not necessary, hence it is
removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 net/netfilter/xt_recent.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index f44de4bc2100..1664d2ec8b2f 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -337,7 +337,6 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	unsigned int nstamp_mask;
 	unsigned int i;
 	int ret = -EINVAL;
-	size_t sz;
 
 	net_get_random_once(&hash_rnd, sizeof(hash_rnd));
 
@@ -387,8 +386,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 		goto out;
 	}
 
-	sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size;
-	t = kvzalloc(sz, GFP_KERNEL);
+	t = kvzalloc(struct_size(t, iphash, ip_list_hash_size), GFP_KERNEL);
 	if (t == NULL) {
 		ret = -ENOMEM;
 		goto out;
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next] ipvs: Use struct_size() helper
From: Gustavo A. R. Silva @ 2019-02-08  0:44 UTC (permalink / raw)
  To: Wensong Zhang, Simon Horman, Julian Anastasov, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller
  Cc: netdev, lvs-devel, netfilter-devel, coreteam, linux-kernel,
	Gustavo A. R. Silva

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

size = struct_size(instance, entry, count);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 7d6318664eb2..bcd9112f47d9 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2734,8 +2734,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 		int size;
 
 		get = (struct ip_vs_get_services *)arg;
-		size = sizeof(*get) +
-			sizeof(struct ip_vs_service_entry) * get->num_services;
+		size = struct_size(get, entrytable, get->num_services);
 		if (*len != size) {
 			pr_err("length: %u != %u\n", *len, size);
 			ret = -EINVAL;
@@ -2776,8 +2775,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 		int size;
 
 		get = (struct ip_vs_get_dests *)arg;
-		size = sizeof(*get) +
-			sizeof(struct ip_vs_dest_entry) * get->num_dests;
+		size = struct_size(get, entrytable, get->num_dests);
 		if (*len != size) {
 			pr_err("length: %u != %u\n", *len, size);
 			ret = -EINVAL;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next 1/2] mlxsw: spectrum_router: Offload blackhole routes
From: David Ahern @ 2019-02-08  0:40 UTC (permalink / raw)
  To: Ido Schimmel, netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Alexander Petrovskiy, mlxsw
In-Reply-To: <20190206194140.18606-2-idosch@mellanox.com>

On 2/6/19 11:42 AM, Ido Schimmel wrote:
> Create a new FIB entry type for blackhole routes and set it in case the
> type of the notified route is 'RTN_BLACKHOLE'.
> 
> Program such routes with a discard action and mark them as offloaded
> since the device is dropping the packets instead of the kernel.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  .../ethernet/mellanox/mlxsw/spectrum_router.c | 27 +++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 

One of the feature requests from the FRR team (and a feature I have
implemented) is a blackhole nexthop. The idea is that prefixes are
installed pointing to nexthop id N. That nexthop definition can be
atomically updated to go between a device / gateway and a blackhole.


 [ prefix ] --> [ nhid 1 ] --> [ dev1 / gateway1 ]


 [ prefix ] --> [ nhid 1 ] --> [ blackhole ]


 [ prefix ] --> [ nhid 1 ] --> [ dev2 / gateway2 ]

Do you see this working ok with mlxsw without having to update the
prefix entries (which can be numerous) directly?

^ permalink raw reply

* [PATCH][next] Bluetooth: a2mp: Use struct_size() helper
From: Gustavo A. R. Silva @ 2019-02-08  0:28 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S. Miller
  Cc: linux-bluetooth, netdev, linux-kernel, Gustavo A. R. Silva

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

size = struct_size(instance, entry, count);
instance = alloc(size, GFP_KERNEL)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 net/bluetooth/a2mp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/a2mp.c b/net/bluetooth/a2mp.c
index 58fc6333d412..5f918ea18b5a 100644
--- a/net/bluetooth/a2mp.c
+++ b/net/bluetooth/a2mp.c
@@ -174,7 +174,7 @@ static int a2mp_discover_req(struct amp_mgr *mgr, struct sk_buff *skb,
 			num_ctrl++;
 	}

-	len = num_ctrl * sizeof(struct a2mp_cl) + sizeof(*rsp);
+	len = struct_size(rsp, cl, num_ctrl);
 	rsp = kmalloc(len, GFP_ATOMIC);
 	if (!rsp) {
 		read_unlock(&hci_dev_list_lock);
-- 
2.20.1

^ permalink raw reply related

* Re: [iproute PATCH] ip-link: Fix listing of alias interfaces
From: Stephen Hemminger @ 2019-02-08  0:24 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev, Roopa Prabhu
In-Reply-To: <20190207130527.9439-1-phil@nwl.cc>

On Thu,  7 Feb 2019 14:05:27 +0100
Phil Sutter <phil@nwl.cc> wrote:

> Commit 50b9950dd9011 ("link dump filter") accidentally broke listing of
> links in the old alias interface notation:
> 
> | % ip link show eth0:1
> | RTNETLINK answers: No such device
> | Cannot send link get request: No such device
> 
> Prior to the above commit, link lookup was performed via ifindex
> returned by if_nametoindex(). The latter uses SIOCGIFINDEX ioctl call
> which on kernel side causes the colon-suffix to be dropped before doing
> the interface lookup. Netlink API though doesn't care about that at all.
> To keep things backward compatible, mimick ioctl API behaviour and drop
> the colon-suffix prior to sending the RTM_GETLINK request.
> 
> Fixes: 50b9950dd9011 ("link dump filter")
> Signed-off-by: Phil Sutter <phil@nwl.cc>

What about mistaken usage where the text after the colon is not a number,
or has additional colon?

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2019-02-08  0:24 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Tonghao Zhang,
	Pablo Neira Ayuso

[-- Attachment #1: Type: text/plain, Size: 8878 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

between commit:

  218d05ce326f ("net/mlx5e: Don't overwrite pedit action when multiple pedit used")

from the net tree and commit:

  c500c86b0c75 ("net/mlx5e: support for two independent packet edit actions")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index b5c1b039375a,85c5dd7fc2c7..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@@ -1310,15 -1308,12 +1309,12 @@@ static int parse_tunnel_attr(struct mlx
  				       outer_headers);
  	void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value,
  				       outer_headers);
- 
- 	struct flow_dissector_key_control *enc_control =
- 		skb_flow_dissector_target(f->dissector,
- 					  FLOW_DISSECTOR_KEY_ENC_CONTROL,
- 					  f->key);
- 	int err = 0;
+ 	struct flow_rule *rule = tc_cls_flower_offload_flow_rule(f);
+ 	struct flow_match_control enc_control;
+ 	int err;
  
  	err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f,
 -				 headers_c, headers_v);
 +				 headers_c, headers_v, match_level);
  	if (err) {
  		NL_SET_ERR_MSG_MOD(extack,
  				   "failed to parse tunnel attributes");
@@@ -1466,19 -1454,17 +1455,17 @@@ static int __parse_cls_flower(struct ml
  		return -EOPNOTSUPP;
  	}
  
- 	if ((dissector_uses_key(f->dissector,
- 				FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) ||
- 	     dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_KEYID) ||
- 	     dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_PORTS)) &&
- 	    dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
- 		struct flow_dissector_key_control *key =
- 			skb_flow_dissector_target(f->dissector,
- 						  FLOW_DISSECTOR_KEY_ENC_CONTROL,
- 						  f->key);
- 		switch (key->addr_type) {
+ 	if ((flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) ||
+ 	     flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_KEYID) ||
+ 	     flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_PORTS)) &&
+ 	    flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+ 		struct flow_match_control match;
+ 
+ 		flow_rule_match_enc_control(rule, &match);
+ 		switch (match.key->addr_type) {
  		case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
  		case FLOW_DISSECTOR_KEY_IPV6_ADDRS:
 -			if (parse_tunnel_attr(priv, spec, f, filter_dev))
 +			if (parse_tunnel_attr(priv, spec, f, filter_dev, tunnel_match_level))
  				return -EOPNOTSUPP;
  			break;
  		default:
@@@ -1937,12 -1880,11 +1883,11 @@@ static struct mlx5_fields fields[] = 
  	OFFLOAD(UDP_DPORT, 2, udp.dest,   0),
  };
  
 -/* On input attr->num_mod_hdr_actions tells how many HW actions can be parsed at
 - * max from the SW pedit action. On success, it says how many HW actions were
 - * actually parsed.
 +/* On input attr->max_mod_hdr_actions tells how many HW actions can be parsed at
 + * max from the SW pedit action. On success, attr->num_mod_hdr_actions
 + * says how many HW actions were actually parsed.
   */
- static int offload_pedit_fields(struct pedit_headers *masks,
- 				struct pedit_headers *vals,
+ static int offload_pedit_fields(struct pedit_headers_action *hdrs,
  				struct mlx5e_tc_flow_parse_attr *parse_attr,
  				struct netlink_ext_ack *extack)
  {
@@@ -1957,17 -1899,15 +1902,17 @@@
  	__be16 mask_be16;
  	void *action;
  
- 	set_masks = &masks[TCA_PEDIT_KEY_EX_CMD_SET];
- 	add_masks = &masks[TCA_PEDIT_KEY_EX_CMD_ADD];
- 	set_vals = &vals[TCA_PEDIT_KEY_EX_CMD_SET];
- 	add_vals = &vals[TCA_PEDIT_KEY_EX_CMD_ADD];
+ 	set_masks = &hdrs[0].masks;
+ 	add_masks = &hdrs[1].masks;
+ 	set_vals = &hdrs[0].vals;
+ 	add_vals = &hdrs[1].vals;
  
  	action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
 -	action = parse_attr->mod_hdr_actions;
 -	max_actions = parse_attr->num_mod_hdr_actions;
 -	nactions = 0;
 +	action = parse_attr->mod_hdr_actions +
 +		 parse_attr->num_mod_hdr_actions * action_size;
 +
 +	max_actions = parse_attr->max_mod_hdr_actions;
 +	nactions = parse_attr->num_mod_hdr_actions;
  
  	for (i = 0; i < ARRAY_SIZE(fields); i++) {
  		f = &fields[i];
@@@ -2085,52 -2027,53 +2032,55 @@@ static int alloc_mod_hdr_actions(struc
  static const struct pedit_headers zero_masks = {};
  
  static int parse_tc_pedit_action(struct mlx5e_priv *priv,
- 				 const struct tc_action *a, int namespace,
+ 				 const struct flow_action_entry *act, int namespace,
  				 struct mlx5e_tc_flow_parse_attr *parse_attr,
+ 				 struct pedit_headers_action *hdrs,
  				 struct netlink_ext_ack *extack)
  {
- 	struct pedit_headers masks[__PEDIT_CMD_MAX], vals[__PEDIT_CMD_MAX], *cmd_masks;
- 	int nkeys, i, err = -EOPNOTSUPP;
+ 	u8 cmd = (act->id == FLOW_ACTION_MANGLE) ? 0 : 1;
+ 	int err = -EOPNOTSUPP;
  	u32 mask, val, offset;
- 	u8 cmd, htype;
+ 	u8 htype;
  
- 	nkeys = tcf_pedit_nkeys(a);
+ 	htype = act->mangle.htype;
+ 	err = -EOPNOTSUPP; /* can't be all optimistic */
  
- 	memset(masks, 0, sizeof(struct pedit_headers) * __PEDIT_CMD_MAX);
- 	memset(vals,  0, sizeof(struct pedit_headers) * __PEDIT_CMD_MAX);
+ 	if (htype == FLOW_ACT_MANGLE_UNSPEC) {
+ 		NL_SET_ERR_MSG_MOD(extack, "legacy pedit isn't offloaded");
+ 		goto out_err;
+ 	}
  
- 	for (i = 0; i < nkeys; i++) {
- 		htype = tcf_pedit_htype(a, i);
- 		cmd = tcf_pedit_cmd(a, i);
- 		err = -EOPNOTSUPP; /* can't be all optimistic */
+ 	mask = act->mangle.mask;
+ 	val = act->mangle.val;
+ 	offset = act->mangle.offset;
  
- 		if (htype == TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK) {
- 			NL_SET_ERR_MSG_MOD(extack,
- 					   "legacy pedit isn't offloaded");
- 			goto out_err;
- 		}
+ 	err = set_pedit_val(htype, ~mask, val, offset, &hdrs[cmd]);
+ 	if (err)
+ 		goto out_err;
  
- 		if (cmd != TCA_PEDIT_KEY_EX_CMD_SET && cmd != TCA_PEDIT_KEY_EX_CMD_ADD) {
- 			NL_SET_ERR_MSG_MOD(extack, "pedit cmd isn't offloaded");
- 			goto out_err;
- 		}
+ 	hdrs[cmd].pedits++;
  
- 		mask = tcf_pedit_mask(a, i);
- 		val = tcf_pedit_val(a, i);
- 		offset = tcf_pedit_offset(a, i);
+ 	return 0;
+ out_err:
+ 	return err;
+ }
  
- 		err = set_pedit_val(htype, ~mask, val, offset, &masks[cmd], &vals[cmd]);
- 		if (err)
- 			goto out_err;
- 	}
+ static int alloc_tc_pedit_action(struct mlx5e_priv *priv, int namespace,
+ 				 struct mlx5e_tc_flow_parse_attr *parse_attr,
+ 				 struct pedit_headers_action *hdrs,
+ 				 struct netlink_ext_ack *extack)
+ {
+ 	struct pedit_headers *cmd_masks;
+ 	int err;
+ 	u8 cmd;
  
 -	err = alloc_mod_hdr_actions(priv, hdrs, namespace, parse_attr);
 -	if (err)
 -		goto out_err;
 +	if (!parse_attr->mod_hdr_actions) {
- 		err = alloc_mod_hdr_actions(priv, a, namespace, parse_attr);
++		err = alloc_mod_hdr_actions(priv, hdrs, namespace, parse_attr);
 +		if (err)
 +			goto out_err;
 +	}
  
- 	err = offload_pedit_fields(masks, vals, parse_attr, extack);
+ 	err = offload_pedit_fields(hdrs, parse_attr, extack);
  	if (err < 0)
  		goto out_dealloc_parsed_actions;
  
@@@ -2185,22 -2128,17 +2135,22 @@@ static bool csum_offload_supported(stru
  }
  
  static bool modify_header_match_supported(struct mlx5_flow_spec *spec,
- 					  struct tcf_exts *exts,
+ 					  struct flow_action *flow_action,
 +					  u32 actions,
  					  struct netlink_ext_ack *extack)
  {
- 	const struct tc_action *a;
+ 	const struct flow_action_entry *act;
  	bool modify_ip_header;
  	u8 htype, ip_proto;
  	void *headers_v;
  	u16 ethertype;
- 	int nkeys, i;
+ 	int i;
  
 -	headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers);
 +	if (actions & MLX5_FLOW_CONTEXT_ACTION_DECAP)
 +		headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, inner_headers);
 +	else
 +		headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers);
 +
  	ethertype = MLX5_GET(fte_match_set_lyr_2_4, headers_v, ethertype);
  
  	/* for non-IP we only re-write MACs, so we're okay */
@@@ -2256,8 -2190,9 +2202,9 @@@ static bool actions_match_supported(str
  		return false;
  
  	if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
- 		return modify_header_match_supported(&parse_attr->spec, exts,
+ 		return modify_header_match_supported(&parse_attr->spec,
+ 						     flow_action,
 -						     extack);
 +						     actions, extack);
  
  	return true;
  }

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 1/1] net: dsa: b53: Fix for failure when irq is not defined in dt
From: Florian Fainelli @ 2019-02-08  0:07 UTC (permalink / raw)
  To: Arun Parameswaran, Andrew Lunn, Vivien Didelot, David S . Miller
  Cc: bcm-kernel-feedback-list, netdev, linux-kernel
In-Reply-To: <20190208000118.9268-1-arun.parameswaran@broadcom.com>

On 2/7/19 4:01 PM, Arun Parameswaran wrote:
> Fixes the issues with non BCM58XX chips in the b53 driver
> failing, when the irq is not specified in the device tree.
> 
> Removed the check for BCM58XX in b53_srab_prepare_irq(),
> so the 'port->irq' will be set to '-EXIO' if the irq is not
> specified in the device tree.
> 
> Fixes: 16994374a6fc ("net: dsa: b53: Make SRAB driver manage port interrupts")
> Fixes: b2ddc48a81b5 ("net: dsa: b53: Do not fail when IRQ are not initialized")
> Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Thanks Arun!

> ---
>  drivers/net/dsa/b53/b53_srab.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
> index 90f514252987..d9c56a779c08 100644
> --- a/drivers/net/dsa/b53/b53_srab.c
> +++ b/drivers/net/dsa/b53/b53_srab.c
> @@ -511,9 +511,6 @@ static void b53_srab_prepare_irq(struct platform_device *pdev)
>  	/* Clear all pending interrupts */
>  	writel(0xffffffff, priv->regs + B53_SRAB_INTR);
>  
> -	if (dev->pdata && dev->pdata->chip_id != BCM58XX_DEVICE_ID)
> -		return;
> -
>  	for (i = 0; i < B53_N_PORTS; i++) {
>  		port = &priv->port_intrs[i];
>  
> 


-- 
Florian

^ permalink raw reply

* [PATCH 1/1] net: dsa: b53: Fix for failure when irq is not defined in dt
From: Arun Parameswaran @ 2019-02-08  0:01 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, Vivien Didelot, David S . Miller
  Cc: bcm-kernel-feedback-list, netdev, linux-kernel, Arun Parameswaran

Fixes the issues with non BCM58XX chips in the b53 driver
failing, when the irq is not specified in the device tree.

Removed the check for BCM58XX in b53_srab_prepare_irq(),
so the 'port->irq' will be set to '-EXIO' if the irq is not
specified in the device tree.

Fixes: 16994374a6fc ("net: dsa: b53: Make SRAB driver manage port interrupts")
Fixes: b2ddc48a81b5 ("net: dsa: b53: Do not fail when IRQ are not initialized")
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
---
 drivers/net/dsa/b53/b53_srab.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index 90f514252987..d9c56a779c08 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -511,9 +511,6 @@ static void b53_srab_prepare_irq(struct platform_device *pdev)
 	/* Clear all pending interrupts */
 	writel(0xffffffff, priv->regs + B53_SRAB_INTR);

-	if (dev->pdata && dev->pdata->chip_id != BCM58XX_DEVICE_ID)
-		return;
-
 	for (i = 0; i < B53_N_PORTS; i++) {
 		port = &priv->port_intrs[i];

-- 
2.17.1

^ permalink raw reply related

* RE: [PATCH v2 4/6] ethtool: support per-queue sub command --show-coalesce
From: Nunley, Nicholas D @ 2019-02-07 23:53 UTC (permalink / raw)
  To: Michal Kubecek, netdev@vger.kernel.org
  Cc: Kirsher, Jeffrey T, linville@tuxdriver.com, nhorman@redhat.com,
	sassmann@redhat.com
In-Reply-To: <20190206132228.GB18410@unicorn.suse.cz>

> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@suse.cz]
> Sent: Wednesday, February 6, 2019 5:22 AM
> To: netdev@vger.kernel.org
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; linville@tuxdriver.com;
> Nunley, Nicholas D <nicholas.d.nunley@intel.com>; nhorman@redhat.com;
> sassmann@redhat.com
> Subject: Re: [PATCH v2 4/6] ethtool: support per-queue sub command --
> show-coalesce
> 
> On Tue, Feb 05, 2019 at 04:01:04PM -0800, Jeff Kirsher wrote:
> > diff --git a/ethtool.c b/ethtool.c
> > index 4dc725c..9a1b83b 100644
> > --- a/ethtool.c
> > +++ b/ethtool.c
> > @@ -1409,6 +1409,29 @@ static int dump_coalesce(const struct
> ethtool_coalesce *ecoal)
> >  	return 0;
> >  }
> >
> > +void dump_per_queue_coalesce(struct ethtool_per_queue_op
> *per_queue_opt,
> > +			     __u32 *queue_mask)
> > +{
> > +	char *addr;
> > +	int i;
> > +
> > +	addr = (char *)per_queue_opt + sizeof(*per_queue_opt);
> > +	for (i = 0; i < __KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32);
> i++) {
> > +		int queue = i * 32;
> > +		__u32 mask = queue_mask[i];
> > +
> > +		while (mask > 0) {
> > +			if (mask & 0x1) {
> > +				fprintf(stdout, "Queue: %d\n", queue);
> > +				dump_coalesce((struct ethtool_coalesce
> *)addr);
> > +				addr += sizeof(struct ethtool_coalesce);
> > +			}
> > +			mask = mask >> 1;
> > +			queue++;
> > +		}
> > +	}
> > +}
> > +
> >  struct feature_state {
> >  	u32 off_flags;
> >  	struct ethtool_gfeatures features;
> 
> The casts and pointer arithmetic are a bit complicated. How about this?
> 
> 	struct ethtool_coalesce *addr;
> ...
> 	addr = (struct ethtool_coalesce *)(per_queue_opt + 1); ...
> 	dump_coalesce(addr);
> 	addr++;
> 
> Also passing n_queue down from do_perqueue() would prevent having to
> parse the whole bitmap even if we already know the NIC has only few
> queues.

Okay, that does make the pointer arithmetic a little more straightforward, so I'll change this. I'll also add the n_queue parameter to set_per_queue_coalesce() and dump_per_queue_coalesce() so we can exit early once we've found all the queues.

> > @@ -5245,7 +5268,8 @@ static const struct option {
> >  	{ "--show-fec", 1, do_gfec, "Show FEC settings"},
> >  	{ "--set-fec", 1, do_sfec, "Set FEC settings",
> >  	  "		[ encoding auto|off|rs|baser [...]]\n"},
> > -	{ "--set-perqueue-command", 1, do_perqueue, "Set per queue
> command",
> > +	{ "--set-perqueue-command", 1, do_perqueue, "Set per queue
> command. "
> > +	  "The supported sub commands include --show-coalesce",
> >  	  "             [queue_mask %x] SUB_COMMAND\n"},
> >  	{ "-h|--help", 0, show_usage, "Show this help" },
> >  	{ "--version", 0, do_version, "Show version number" }, @@ -5350,8
> > +5374,32 @@ static int find_max_num_queues(struct cmd_context *ctx)
> >  		   echannels.combined_count);
> >  }
> >
> > +static struct ethtool_per_queue_op *
> > +get_per_queue_coalesce(struct cmd_context *ctx, __u32 *queue_mask,
> > +int n_queues) {
> > +	struct ethtool_per_queue_op *per_queue_opt;
> > +
> > +	per_queue_opt = malloc(sizeof(*per_queue_opt) + n_queues *
> > +			sizeof(struct ethtool_coalesce));
> > +	if (!per_queue_opt)
> > +		return NULL;
> > +
> > +	memcpy(per_queue_opt->queue_mask, queue_mask,
> > +	       __KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32) *
> sizeof(__u32));
> > +	per_queue_opt->cmd = ETHTOOL_PERQUEUE;
> > +	per_queue_opt->sub_command = ETHTOOL_GCOALESCE;
> > +	if (send_ioctl(ctx, per_queue_opt)) {
> > +		free(per_queue_opt);
> > +		perror("Cannot get device per queue parameters");
> > +		return NULL;
> > +	}
> > +
> > +	return per_queue_opt;
> > +}
> > +
> >  static int do_perqueue(struct cmd_context *ctx)  {
> > +	struct ethtool_per_queue_op *per_queue_opt;
> >  	__u32
> queue_mask[__KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32)] = {0};
> >  	int i, n_queues = 0;
> >
> > @@ -5390,7 +5438,19 @@ static int do_perqueue(struct cmd_context *ctx)
> >  	if (i < 0)
> >  		exit_bad_args();
> >
> > -	/* no sub_command support yet */
> > +	if (strstr(args[i].opts, "--show-coalesce") != NULL) {
> 
> Comparing args[i].func to do_gcoalesce might be easier.

This is the one comment where I think it's better to leave the code as it is. To me is seems more confusing to match on a function pointer that we're never going to call. Unless there are more objections I'd rather keep it the way it is.

Otherwise, thanks for the review and all the comments. I'll work on making these changes and get an updated version submitted.

Nick

> 
> > +		per_queue_opt = get_per_queue_coalesce(ctx,
> queue_mask,
> > +						       n_queues);
> > +		if (per_queue_opt == NULL) {
> > +			perror("Cannot get device per queue parameters");
> > +			return -EFAULT;
> > +		}
> > +		dump_per_queue_coalesce(per_queue_opt, queue_mask);
> > +		free(per_queue_opt);
> > +	} else {
> > +		perror("The subcommand is not supported yet");
> > +		return -EOPNOTSUPP;
> > +	}
> >
> >  	return 0;
> >  }
> 
> Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next v2 07/10] net: phy: marvell10g: Add support for 2.5GBASET
From: Russell King - ARM Linux admin @ 2019-02-07 23:48 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, netdev, linux-kernel, Andrew Lunn, Florian Fainelli,
	Heiner Kallweit, linux-arm-kernel, Antoine Tenart,
	thomas.petazzoni, gregory.clement, miquel.raynal, nadavh, stefanc,
	mw
In-Reply-To: <20190207094939.27369-8-maxime.chevallier@bootlin.com>

On Thu, Feb 07, 2019 at 10:49:36AM +0100, Maxime Chevallier wrote:
> The Marvell Alaska family of PHYs supports 2.5GBaseT and 5GBaseT modes,
> as defined in the 802.3bz specification.
> 
> When the link partner requests a 2.5GBASET link, the PHY will
> reconfigure it's MII interface to 2500BASEX.
> 
> At 5G, the PHY will reconfigure it's interface to 5GBASE-R, but this
> mode isn't supported by any MAC for now.
> 
> This was tested with :
>  - The 88X3310, which is on the MacchiatoBin

Hi Maxime,

Looking deeper at this, I think we actually need an additional patch at
the beginning of your series.

The default AN advertisement in 7.32 is 0x1181 - which includes the
2.5G and 5G modes.  We need to clear these bits, so that when the 10G
mode disabled via ethtool, we do not switch to 2.5G or 5G speed (both
of which are not currently reported as supported.)  Such a patch needs
backporting to stable kernels.

>  - The 88E2010, an Alaska PHY that has no fiber interfaces, and is
>    limited to 5G maximum speed.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---
> V1 -> V2: Use a #define for the 88X3310 PHY ID, since it's reused in
> 	  various places in the code. Rebased on Heiner Kallweit's patch
> 	  introducing the phy_modify_mmd accessor.
> 
>  drivers/net/phy/marvell10g.c | 30 ++++++++++++++++++++++--------
>  include/linux/marvell_phy.h  |  1 +
>  2 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
> index 07df87b81369..581b4b6e31e9 100644
> --- a/drivers/net/phy/marvell10g.c
> +++ b/drivers/net/phy/marvell10g.c
> @@ -238,6 +238,7 @@ static int mv3310_config_init(struct phy_device *phydev)
>  
>  	/* Check that the PHY interface type is compatible */
>  	if (phydev->interface != PHY_INTERFACE_MODE_SGMII &&
> +	    phydev->interface != PHY_INTERFACE_MODE_2500BASEX &&
>  	    phydev->interface != PHY_INTERFACE_MODE_XAUI &&
>  	    phydev->interface != PHY_INTERFACE_MODE_RXAUI &&
>  	    phydev->interface != PHY_INTERFACE_MODE_10GKR)
> @@ -307,8 +308,18 @@ static int mv3310_config_aneg(struct phy_device *phydev)
>  	else
>  		reg = 0;
>  
> +	if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
> +			      phydev->advertising))
> +		reg |= MDIO_AN_10GBT_CTRL_ADV2_5G;
> +	if (linkmode_test_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
> +			      phydev->advertising))
> +		reg |= MDIO_AN_10GBT_CTRL_ADV5G;
> +
>  	ret = phy_modify_mmd(phydev, MDIO_MMD_AN, MDIO_AN_10GBT_CTRL,
> -			     MDIO_AN_10GBT_CTRL_ADV10G, reg);
> +			     MDIO_AN_10GBT_CTRL_ADV10G |
> +			     MDIO_AN_10GBT_CTRL_ADV5G |
> +			     MDIO_AN_10GBT_CTRL_ADV2_5G, reg);
> +
>  	if (ret < 0)
>  		return ret;
>  	if (ret > 0)
> @@ -337,17 +348,20 @@ static int mv3310_aneg_done(struct phy_device *phydev)
>  static void mv3310_update_interface(struct phy_device *phydev)
>  {
>  	if ((phydev->interface == PHY_INTERFACE_MODE_SGMII ||
> +	     phydev->interface == PHY_INTERFACE_MODE_2500BASEX ||
>  	     phydev->interface == PHY_INTERFACE_MODE_10GKR) && phydev->link) {
>  		/* The PHY automatically switches its serdes interface (and
> -		 * active PHYXS instance) between Cisco SGMII and 10GBase-KR
> -		 * modes according to the speed.  Florian suggests setting
> -		 * phydev->interface to communicate this to the MAC. Only do
> -		 * this if we are already in either SGMII or 10GBase-KR mode.
> +		 * active PHYXS instance) between Cisco SGMII, 10GBase-KR and
> +		 * 2500BaseX modes according to the speed.  Florian suggests
> +		 * setting phydev->interface to communicate this to the MAC.
> +		 * Only do this if we are already in one of the above modes.
>  		 */
>  		if (phydev->speed == SPEED_10000)
>  			phydev->interface = PHY_INTERFACE_MODE_10GKR;
> +		else if (phydev->speed == SPEED_2500)
> +			phydev->interface = PHY_INTERFACE_MODE_2500BASEX;
>  		else if (phydev->speed >= SPEED_10 &&
> -			 phydev->speed < SPEED_10000)
> +			 phydev->speed < SPEED_2500)
>  			phydev->interface = PHY_INTERFACE_MODE_SGMII;
>  	}
>  }
> @@ -450,7 +464,7 @@ static int mv3310_read_status(struct phy_device *phydev)
>  
>  static struct phy_driver mv3310_drivers[] = {
>  	{
> -		.phy_id		= 0x002b09aa,
> +		.phy_id		= MARVELL_PHY_ID_88X3310,
>  		.phy_id_mask	= MARVELL_PHY_ID_MASK,
>  		.name		= "mv88x3310",
>  		.features	= PHY_10GBIT_FEATURES,
> @@ -468,7 +482,7 @@ static struct phy_driver mv3310_drivers[] = {
>  module_phy_driver(mv3310_drivers);
>  
>  static struct mdio_device_id __maybe_unused mv3310_tbl[] = {
> -	{ 0x002b09aa, MARVELL_PHY_ID_MASK },
> +	{ MARVELL_PHY_ID_88X3310, MARVELL_PHY_ID_MASK },
>  	{ },
>  };
>  MODULE_DEVICE_TABLE(mdio, mv3310_tbl);
> diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
> index 1eb6f244588d..5851d68d828a 100644
> --- a/include/linux/marvell_phy.h
> +++ b/include/linux/marvell_phy.h
> @@ -20,6 +20,7 @@
>  #define MARVELL_PHY_ID_88E1540		0x01410eb0
>  #define MARVELL_PHY_ID_88E1545		0x01410ea0
>  #define MARVELL_PHY_ID_88E3016		0x01410e60
> +#define MARVELL_PHY_ID_88X3310		0x002b09aa
>  
>  /* The MV88e6390 Ethernet switch contains embedded PHYs. These PHYs do
>   * not have a model ID. So the switch driver traps reads to the ID2
> -- 
> 2.19.2
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* RE: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue settings
From: Nunley, Nicholas D @ 2019-02-07 23:43 UTC (permalink / raw)
  To: Michal Kubecek, netdev@vger.kernel.org
  Cc: Kirsher, Jeffrey T, linville@tuxdriver.com, nhorman@redhat.com,
	sassmann@redhat.com
In-Reply-To: <20190206124307.GA18410@unicorn.suse.cz>

> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@suse.cz]
> Sent: Wednesday, February 6, 2019 4:43 AM
> To: netdev@vger.kernel.org
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; linville@tuxdriver.com;
> Nunley, Nicholas D <nicholas.d.nunley@intel.com>; nhorman@redhat.com;
> sassmann@redhat.com
> Subject: Re: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue
> settings
> 
> On Tue, Feb 05, 2019 at 04:01:03PM -0800, Jeff Kirsher wrote:
> > +static int do_perqueue(struct cmd_context *ctx) {
> > +	__u32
> queue_mask[__KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32)] = {0};
> > +	int i, n_queues = 0;
> > +
> > +	if (ctx->argc == 0)
> > +		exit_bad_args();
> > +
> > +	/*
> > +	 * The sub commands will be applied to
> > +	 * all queues if no queue_mask set
> > +	 */
> > +	if (strncmp(*ctx->argp, "queue_mask", 10)) {
> 
> This would match any string starting with "queue_mask", is it intended?

No, I'll fix this. I don't know that there are any use cases where this distinction would matter, but then again there's no reason to have it match this way.

> 
> > +		n_queues = find_max_num_queues(ctx);
> > +		if (n_queues < 0) {
> > +			perror("Cannot get number of queues");
> > +			return -EFAULT;
> > +		}
> > +		for (i = 0; i < n_queues / 32; i++)
> > +			queue_mask[i] = ~0;
> > +		queue_mask[i] = (1 << (n_queues - i * 32)) - 1;
> 
> It's unlikely today, I guess, but in theory, this would overflow if n_queues ==
> MAX_NUM_QUEUE

Nice catch, I'll add a check to avoid this.

> > +		fprintf(stdout,
> > +			"The sub commands will be applied to all %d
> queues\n",
> > +			n_queues);
> > +	} else {
> > +		ctx->argc--;
> > +		ctx->argp++;
> > +		n_queues = set_queue_mask(queue_mask, *ctx->argp);
> > +		if (n_queues < 0) {
> > +			perror("Invalid queue mask");
> > +			return n_queues;
> > +		}
> > +		ctx->argc--;
> > +		ctx->argp++;
> > +	}
> > +
> > +	i = find_option(ctx->argc, ctx->argp);
> > +	if (i < 0)
> > +		exit_bad_args();
> > +
> > +	/* no sub_command support yet */
> > +
> > +	return 0;
> > +}
> 
> Michal Kubecek

^ permalink raw reply

* RE: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue settings
From: Nunley, Nicholas D @ 2019-02-07 23:41 UTC (permalink / raw)
  To: Michal Kubecek, netdev@vger.kernel.org
  Cc: Kirsher, Jeffrey T, linville@tuxdriver.com, nhorman@redhat.com,
	sassmann@redhat.com
In-Reply-To: <20190206103258.GK21401@unicorn.suse.cz>

> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@suse.cz]
> Sent: Wednesday, February 6, 2019 2:33 AM
> To: netdev@vger.kernel.org
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; linville@tuxdriver.com;
> Nunley, Nicholas D <nicholas.d.nunley@intel.com>; nhorman@redhat.com;
> sassmann@redhat.com
> Subject: Re: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue
> settings
> 
> On Tue, Feb 05, 2019 at 04:01:03PM -0800, Jeff Kirsher wrote:
> > +static int find_max_num_queues(struct cmd_context *ctx) {
> > +	struct ethtool_channels echannels;
> > +
> > +	echannels.cmd = ETHTOOL_GCHANNELS;
> > +	if (send_ioctl(ctx, &echannels))
> > +		return -1;
> > +
> > +	return MAX(MAX(echannels.rx_count, echannels.tx_count),
> > +		   echannels.combined_count);
> > +}
> 
> Is the outer MAX() correct here? From the documentation to -L option, it
> rather seems we might want
> 
> 	return MAX(echannels.rx_count, echannels.tx_count) +
> 	       echannels.combined_count;
> 
> But I can't find any NIC around which would have non-zero rx_count or
> tx_count so that I cannot check.

I think the original assumption here must have been that drivers either use separate Tx/Rx channels or support the combined approach, but never both at the same time. All Intel drivers only support the combined method so I didn't think to second guess this detail of the original implementation, however, I've since looked through the uses of get/set_channels elsewhere in the kernel and it looks like there are a few drivers that support both methods simultaneously, so that clearly needs to be supported too. Your suggestion above looks like the right thing to do.

The device queue-index-to-channel mapping could be a little ambiguous in the mixed mode (and is left as an implementation detail of the individual driver), but I suppose the most sensible approach would be to index through the combined channels first, then move on to the individual Tx/Rx channels, so there shouldn't be any future issues here if these drivers want to add support for per-queue commands.

^ permalink raw reply

* RE: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue settings
From: Nunley, Nicholas D @ 2019-02-07 23:37 UTC (permalink / raw)
  To: Michal Kubecek, netdev@vger.kernel.org
  Cc: Kirsher, Jeffrey T, linville@tuxdriver.com, nhorman@redhat.com,
	sassmann@redhat.com
In-Reply-To: <20190206091832.GI21401@unicorn.suse.cz>


> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@suse.cz]
> Sent: Wednesday, February 6, 2019 1:19 AM
> To: netdev@vger.kernel.org
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; linville@tuxdriver.com;
> Nunley, Nicholas D <nicholas.d.nunley@intel.com>; nhorman@redhat.com;
> sassmann@redhat.com
> Subject: Re: [PATCH v2 3/6] ethtool: introduce new ioctl for per-queue
> settings
> 
> On Tue, Feb 05, 2019 at 04:01:03PM -0800, Jeff Kirsher wrote:
> > Introduce a new ioctl for setting per-queue parameters.
> > Users can apply commands to specific queues by setting SUB_COMMAND
> and
> > queue_mask with the following ethtool command:
> >
> >  ethtool --set-perqueue-command DEVNAME [queue_mask %x]
> SUB_COMMAND
> 
> The "set" part is IMHO a bit confusing in combination with "show" type
> subcommands.

I'm not a fan of the "set" either. This patch set had already gone through some review before it was passed on to me, and the command naming wasn't a previous point of contention and I didn't want to disturb the parts that weren't "broken", but since you've brought it up I agree that this may not be the best name. I think "--perqueue-command" is more appropriate. Does this seem reasonable to you?

> > +static int set_queue_mask(u32 *queue_mask, char *str) {
> > +	int len = strlen(str);
> > +	int index = __KERNEL_DIV_ROUND_UP(len * 4, 32);
> > +	char tmp[9];
> > +	char *end = str + len;
> > +	int i, num;
> > +	__u32 mask;
> > +	int n_queues = 0;
> > +
> > +	if (len > MAX_NUM_QUEUE)
> > +		return -EINVAL;
> > +
> > +	for (i = 0; i < index; i++) {
> > +		num = end - str;
> > +
> > +		if (num >= 8) {
> > +			end -= 8;
> > +			num = 8;
> > +		} else {
> > +			end = str;
> > +		}
> > +		strncpy(tmp, end, num);
> > +		tmp[num] = '\0';
> > +
> > +		queue_mask[i] = strtoul(tmp, NULL, 16);
> > +
> > +		mask = queue_mask[i];
> > +		while (mask > 0) {
> > +			if (mask & 0x1)
> > +				n_queues++;
> > +			mask = mask >> 1;
> > +		}
> > +	}
> > +
> > +	return n_queues;
> > +}
> 
> Could you allow optional prefix "0x" as we do for link modes? Maybe
> parse_hex_u32_bitmap() could be used for both.

It's supposed to already work like this through to the eventual call to strtoul() (any "0x" prefix is ignored), but after closer inspection the current implementation won't work for a very specific set of inputs. Depending on the alignment the bitmap substring passed into strtoul() can end up being unrecognizable. For instance, "0xFFFFFFF" ends up getting divided into two calls to strtoul(), "xFFFFFFF" and "0", the former of which ends up evaluating to 0x0 rather than 0xFFFFFFF. Per your suggestion I'll replace the set_queue_mask() functionality with a call to parse_hex_u32_bitmap() to generate the bitmap and avoid this mess, plus some additional code to count up the number of queues.

> 
> > +static int do_perqueue(struct cmd_context *ctx) {
> > +	__u32
> queue_mask[__KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32)] = {0};
> > +	int i, n_queues = 0;
> > +
> > +	if (ctx->argc == 0)
> > +		exit_bad_args();
> > +
> > +	/*
> > +	 * The sub commands will be applied to
> > +	 * all queues if no queue_mask set
> > +	 */
> > +	if (strncmp(*ctx->argp, "queue_mask", 10)) {
> > +		n_queues = find_max_num_queues(ctx);
> > +		if (n_queues < 0) {
> > +			perror("Cannot get number of queues");
> > +			return -EFAULT;
> > +		}
> > +		for (i = 0; i < n_queues / 32; i++)
> > +			queue_mask[i] = ~0;
> > +		queue_mask[i] = (1 << (n_queues - i * 32)) - 1;
> > +		fprintf(stdout,
> > +			"The sub commands will be applied to all %d
> queues\n",
> > +			n_queues);
> > +	} else {
> > +		ctx->argc--;
> > +		ctx->argp++;
> > +		n_queues = set_queue_mask(queue_mask, *ctx->argp);
> > +		if (n_queues < 0) {
> > +			perror("Invalid queue mask");
> > +			return n_queues;
> > +		}
> > +		ctx->argc--;
> > +		ctx->argp++;
> > +	}
> > +
> > +	i = find_option(ctx->argc, ctx->argp);
> > +	if (i < 0)
> > +		exit_bad_args();
> > +
> > +	/* no sub_command support yet */
> > +
> > +	return 0;
> > +}
> 
> Technically the return value should be -EOPNOTSUPP here but as the next
> patch fixes that, it doesn't really matter.

I'll fix this anyway since I'll be submitting a new revision.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 5/7] net: phy: marvell10g: Force reading of 2.5/5G PMA extended abilities
From: Russell King - ARM Linux admin @ 2019-02-07 23:37 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Andrew Lunn, davem, netdev, linux-kernel, Florian Fainelli,
	Heiner Kallweit, linux-arm-kernel, Antoine Tenart,
	thomas.petazzoni, gregory.clement, miquel.raynal, nadavh, stefanc,
	mw
In-Reply-To: <20190128152621.2aec96c1@bootlin.com>

On Mon, Jan 28, 2019 at 03:26:21PM +0100, Maxime Chevallier wrote:
> Hello Russell,
> 
> On Mon, 21 Jan 2019 13:00:30 +0000
> Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:
> 
> >On Mon, Jan 21, 2019 at 01:29:45PM +0100, Maxime Chevallier wrote:
> >> Hello Russell,
> >> 
> >> On Mon, 21 Jan 2019 10:52:06 +0000
> >> Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:  
> >> >It's entirely possible that the 3310 switches to different hardware
> >> >blocks for 2.5G and 5G speeds, and reading _just_ the 1.4 register
> >> >is not sufficient.  
> >> 
> >> I agree with you but in that particular case, I think we are reading
> >> from the correct device. The datasheet itself says that we should be
> >> reading 1.4 and 1.11 as we expect, with 2.5G/5G support being set (these
> >> registers are read-only, and the datasheet's values aren't what we
> >> actually read).  
> >
> >No, you missed what I was saying.
> >
> >The 88x3310 is a hybrid device.  It contains multiple instances of
> >each individual device at different offsets in each MMD address space.
> 
> Ah I see, I indeed thought you refered to the MMDs. 
> 
> [...]
> 
> >The exception seems to be the PMA/PMD MMD which I've only discovered
> >a single instance.
> 
> Yes there only seems to be one. There are some other registers in the
> 1.0xCxxx range, but those who are documented don't help a lot with
> determing wether or not these modes are supported.
> 
> I wonder if these values are correctly reported in newer PHY firmware
> revisions.
> 
> I've checked other PCS instances, but it seems the one at 3.0x0xxx is
> the one used in 2.5/5GBASET.

In the following, I use "subdevice N" to mean instance "N + 1".  So
the instance at address 0 is the main device and the following instance
is subdevice 1.

Looking at the PCS, none of them have the 2.5G/5G bits set in the 3310
which is rather interesting, except the one at 3.0 has the EEE bits
set in 3.21.  You are quite correct that the first instance is used
for 2.5G copper, but PCS subdevice 2 also changes as a result of
switching down to 2.5G (its transmit fault status clears.)

It looks like PhyXS subdevice 3 is used (which is a SGMII/1000base-X
MAC facing PHY), is switched to fixed speed mode.  It doesn't have
what looks like a sensible advertisement either, so my guess is that
this will need the MAC side to be configured _not_ to expect the
in-band control word in this mode.  In other words, it may be SGMII
or 1000base-X upclocked to 2.5G speeds - which it is doesn't matter
because the difference is in the control word which looks like it's
omitted, or if it isn't, doesn't contain useful information.

Note that mvpp2 is slightly buggy in this area, and I have a patch
set that fixes it up - patches can be found in my "mvpp2" branch,
which I'm waiting for my problem reporter to report back after his
testing.  If you use this patch set, as long as you don't use
in-band mode, it should be fine.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH 01/18] MIPS: lantiq: pass struct device to DMA API functions
From: Paul Burton @ 2019-02-07 23:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Crispin, Vinod Koul, Dmitry Tarnyagin, Nicolas Ferre,
	Sudip Mukherjee, Felipe Balbi, linux-mips@vger.kernel.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	netdev@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-fbdev@vger.kernel.org, alsa-devel@alsa-project.org,
	iommu@lists.linux-foundation.org
In-Reply-To: <20190201084801.10983-2-hch@lst.de>

Hi Christoph,

On Fri, Feb 01, 2019 at 09:47:44AM +0100, Christoph Hellwig wrote:
> The DMA API generally relies on a struct device to work properly, and
> only barely works without one for legacy reasons.  Pass the easily
> available struct device from the platform_device to remedy this.
> 
> Also use GFP_KERNEL instead of GFP_ATOMIC as the gfp_t for the memory
> allocation, as we aren't in interrupt context or under a lock.
> 
> Note that this whole function looks somewhat bogus given that we never
> even look at the returned dma address, and the CPHYSADDR magic on
> a returned noncached mapping looks "interesting".  But I'll leave
> that to people more familiar with the code to sort out.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/mips/lantiq/xway/vmmc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Would you like this to go through the MIPS tree or elsewhere? If the
latter:

    Acked-by: Paul Burton <paul.burton@mips.com>

Thanks,
    Paul

^ permalink raw reply

* Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Li Yang @ 2019-02-07 23:24 UTC (permalink / raw)
  To: Pankaj Bansal
  Cc: Andrew Lunn, Shawn Guo, Florian Fainelli, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Rob Herring
In-Reply-To: <VI1PR0401MB24966218B333880762D931B5F1680@VI1PR0401MB2496.eurprd04.prod.outlook.com>

On Wed, Feb 6, 2019 at 10:44 PM Pankaj Bansal <pankaj.bansal@nxp.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Li Yang [mailto:leoyang.li@nxp.com]
> > Sent: Thursday, 7 February, 2019 05:09 AM
> > To: Andrew Lunn <andrew@lunn.ch>
> > Cc: Pankaj Bansal <pankaj.bansal@nxp.com>; Shawn Guo
> > <shawnguo@kernel.org>; Florian Fainelli <f.fainelli@gmail.com>;
> > netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Rob Herring
> > <robh+dt@kernel.org>
> > Subject: Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
> >
> > On Wed, Feb 6, 2019 at 3:46 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > > > >  &i2c0 {
> > > > > > >         status = "okay";
> > > > > > >
> > > > > > > +       fpga@66 {
> > > > > > > +               compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> > > > > > > +               reg = <0x66>;
> > > > > > > +               #address-cells = <1>;
> > > > > > > +               #size-cells = <0>;
> > > > > > > +
> > > > > > > +               mdio-mux-1@54 {
> > > > > >
> > > > > > Still no compatible string defined for the node.  Probably
> > > > > > should be
> > > > > > "mdio-mux- mmioreg", "mdio-mux"
> > > > >
> > > > > it is not a specific device. MDIO mux is meant to be controlled by
> > > > > some registers of parent device (FPGA).
> > > > > Therefore, IMO this should not be a device and there should not be
> > > > > any "compatible" property for it.
> > >
> > > > If it is not a device why we are defining a device node for it?  It
> > > > is probably not a physical device per se, but it can be considered a
> > > > virtual device provided by FPGA.
> > >
> > > It is a physical device. But it happens to be embedded inside another
> > > device. And that embedded is not performed as a bus with devices on
> > > it, so the device tree concepts don't fit directly.
> >
> > Whether or not it is populated as a bus(which probably should as the FPGA does
> > contain many different functions and these functions like the mdio-mux we are
> > discussing about could have separate drivers), the node should have a new
> > binding documentation similar to the mdio_mux_mmioreg binding or even
> > covers the mmioreg too.  And the best way to match the node with the binding
> > is through compatible strings IMO.  This is why I'm asking the node to have a
> > compatible string.
>
> The mdio_mux is NOT a device. FPGA is a device that provides the mdio mux functionality
> (among other functions). The mdio mux is controlled via some bits In one of the FPGA register.

With modern chips, it is likely to have multiple functions in a single
physical device that are covered by multiple subsystems.  We shouldn't
limit the concept of device to only physical devices.

>
> In my previous approach, I also used a compatible field for mdio_mux node in FPGA.
> https://www.mail-archive.com/netdev@vger.kernel.org/msg252744.html
> The FPGA driver would create as many platform devices for each subnode, and those devices
> Would attach to mdio_mux_regmap driver based on compatible field.
>
> BUT, this platform device creation is the problem. Since mdio_mux is not an actual device, how can
> We create a platform device for it?

Like I said platform device doesn't have to be a physically separate
device.  In this specific case, I think a multi-function device(MFD)
will be the best fit for this FPGA device.  The framework will help to
create sub-devices and help to share the regmap to all the
sub-function drivers.  Please check existing MFD drivers for more
details.

>
> Which is why I removed the compatible field from mdio_mux nodes. The FPGA driver detects the mdio_mux
> Using their name i.e. " mdio-mux-1@54". Like this:
>         for_each_child_of_node(dev->of_node, child) {
>                 if (!of_node_name_prefix(child, "mdio-mux"))
>
> Refer : https://patchwork.kernel.org/patch/10797227/
>
> >
> > >
> > > > This also bring up another question that why this device cannot
> > > > reuse the existing drivers/net/phy/mdio-mux-mmioreg.c driver?
> > >
> > > Because it is on an i2c bus, not an mmio bus.
> >
> > Oops, I missed that.
> >
> > >
> > > > If we think regmap is a better solution, shall we replace the
> > > > mmioreg driver with the regmap driver?
> > >
> > > regmap can be used with mmio. But for a single MMIO register it is a
> > > huge framework. So it makes sense to keep mdio-mux-mmioreg simple.
> > >
> > > If however the device is already using regmap, adding one more
> > > register is very little overhead. And it might be possible to use this
> > > new mux with an mmio regmap, or an spi regmap, etc. So we seem to be
> > > covering the best of both worlds.
> >
> > Ya.  It would be ideal if the new driver can cover the legacy mdio-mux-mmioreg
> > case too.
>
> Refer : https://patchwork.kernel.org/patch/10797227/
> The mdio_mux_regmap can be used by any FPGA be it i2c connected or MMIO based.
>
> >
> > Regards,
> > Leo

^ permalink raw reply

* Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
From: Sander Eikelenboom @ 2019-02-07 22:15 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, David S. Miller, netdev, linux-kernel

L.S.,

While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
(using an nftables firewall with NAT and connection tracking).

Unfortunately it isn't too obvious since no errors are logged, but on clients it
causes symptoms like firefox intermittently not being able to load pages with:
    Network Protocol Error
    An error occurred during a connection to www.example.com
    The page you are trying to view cannot be shown because an error in the network protocol was detected.
    Please contact the website owners to inform them of this problem.

But it's only intermittently, so i can still visit some webpages with clients, 
could be that packet size and or fragments are at play ?

So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
but to no avail. 

After that I tried to git bisect and ended up with:

faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
commit faec18dbb0405c7d4dda025054511dc3a6696918
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Dec 13 16:01:33 2018 +0100

    netfilter: nat: remove l4proto->manip_pkt

    This removes the last l4proto indirection, the two callers, the l3proto
    packet mangling helpers for ipv4 and ipv6, now call the
    nf_nat_l4proto_manip_pkt() helper.

    nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
    they contain no functionality anymore to not clutter this patch.

    Next patch will remove the empty files and the nf_nat_l4proto
    struct.

    nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
    other nat manip functionality as well, not just udp and udplite.

    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

:040000 040000 22d8706921e03cbd6d78a6ebcc5f253ccfd2bf0c b6f8ab2779215b4495dfe641f50e798da73859ac M  include
:040000 040000 af212a756f1acf00cbe45c3be5b71f38f01f1d34 165c440f9e6f2e05738628a19b51f7603f95752a M  net

Any ideas or debugging hints ?

--
Sander

^ permalink raw reply

* Re: [PATCH bpf-next 1/6] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
From: Daniel Borkmann @ 2019-02-07 22:21 UTC (permalink / raw)
  To: Martin Lau
  Cc: netdev@vger.kernel.org, Alexei Starovoitov, Kernel Team,
	Lawrence Brakmo
In-Reply-To: <20190207072716.qat44fx2t6wqvu4l@kafai-mbp.dhcp.thefacebook.com>

On 02/07/2019 08:27 AM, Martin Lau wrote:
[...]
> Following up the discussion in the iovisor conf call.
> 
> One of discussion was about:
> other than tw, can __sk_buff->sk always return a
> fullsock (PTR_TO_SOCKET_OR_NULL).  In request_sock case,
> it is doable because it can trace back to the listener sock.
> 
> However, that will go back to the sock_common accessing question.
> In particular, how to access the sock_common's fields of the
> request_sock itself?  Those fields in the request_sock are different
> from its listener sock.  e.g. the skc_daddr and skc_dport.
> 
> Also, if the sock_common fields of tw is needed, it will become weird
> because likely a new "struct bpf_tw_sock" is needed which is OK
> but all sock_common fields need to be copied from bpf_sock
> to bpf_tw_sock.
> 
> I think reading a sk from a ctx should return the
> most basic type PTR_TO_SOCK_COMMON_OR_NULL (unless the running
> ctx can guarantee that it always has a fullsock).
> Currently, it is __sk_buff->sk.  Later, sock_ops->sk...etc.
> One single 'struct bpf_sock' and limit fullsock field access
> at verification time.  The bpf_prog then moves down the chain
> based on what it needs.  It could be fullsock, tcp_sock...etc.
> 
> I think that will be the most flexible way to write bpf_prog
> while also avoid having duplicate fields in different
> bpf struct in uapi.

Ok, thanks for following up and sorry for late reply, lets go with
sock_common then. What's the plan to moving forward with accessing
full sk in case of req sk? Separate helper or backed into the newly
added bpf_sk_fullsock() one? Presumably latter?

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Russell King - ARM Linux admin @ 2019-02-07 22:19 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Andrew Lunn, Florian Fainelli, David Miller,
	netdev@vger.kernel.org
In-Reply-To: <974f6bca-c938-aac9-28b4-291cca0734db@gmail.com>

On Thu, Feb 07, 2019 at 10:54:51PM +0100, Heiner Kallweit wrote:
> On 07.02.2019 21:50, Andrew Lunn wrote:
> >> Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
> >> MMD. Because the Aquantia PHY has no device 29 in its package the code
> >> should work.
> > 
> > It lists device 29 in its devices in package. So the current code does
> > look there.
> > 
> I just looked for a description of a device 29. Strange that the device
> list states it's there and then it's not there.
> 
> When checking that I was scratching my head because of the following code
> in genphy_c45_read_link:
> 
> devad = __ffs(mmd_mask);
> mmd_mask &= ~BIT(devad);
> 
> AFAIK __ffs() returns the posix bit number, means it returns 1 for bit 0.

I think you're wrong.  Look at include/asm-generic/bitops/__ffs.h.  If
'word' is 0x00000001, then the function returns zero.  If 'word' is
0x00000002, it returns 1, etc.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Heiner Kallweit @ 2019-02-07 22:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <974f6bca-c938-aac9-28b4-291cca0734db@gmail.com>

On 07.02.2019 22:54, Heiner Kallweit wrote:
> On 07.02.2019 21:50, Andrew Lunn wrote:
>>> Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
>>> MMD. Because the Aquantia PHY has no device 29 in its package the code
>>> should work.
>>
>> It lists device 29 in its devices in package. So the current code does
>> look there.
>>
> I just looked for a description of a device 29. Strange that the device
> list states it's there and then it's not there.
> 
> When checking that I was scratching my head because of the following code
> in genphy_c45_read_link:
> 
> devad = __ffs(mmd_mask);
> mmd_mask &= ~BIT(devad);
> 
> AFAIK __ffs() returns the posix bit number, means it returns 1 for bit 0.
> Then this code piece seems to be wrong because I think the intention is
> to clear the bit we just found. Instead we clear the next bit.
> 
Nope, it doesn't seem to be the posix version .. Just checked the code of __ffs().

> And device 0 isn't really a device but a flag "Clause 22 registers present".
> So far we may have been lucky because to supported 10G PHY has this flag set.
> But if this flag is set and we try to access a register 0.1 then we may be
> in trouble again. Therefore I think we need to exclude also device 0.
> Or do I miss something?
> 
>>> So it seems we have to exclude device C22EXT in general. I'll add that
>>> and submit a v2.
>>
>> Yes.
>>
>> 	Andrew
>>
> Heiner
> 


^ permalink raw reply

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: Herbert Xu @ 2019-02-07 22:02 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, netdev, Jouni Malinen, Thomas Graf
In-Reply-To: <20190207214834.awrpkxunnvfdorbe@gondor.apana.org.au>

On Fri, Feb 08, 2019 at 05:48:34AM +0800, Herbert Xu wrote:
>
> > > Could you please show me who is doing this so I can review that
> > > to see whether it's a legitimate use of this API?
> > 
> > I'm sure you'll say it's not legitimate, but it still exists ;-)
> > 
> > mesh_plink_broken() gets called from the TX status path, via
> > ieee80211s_update_metric().
> 
> Let me take a look.

OK I think it is wrong but you already knew that :)

First of all our current walk interface does not guarantee that you
will see all elements because a parallel remove can cause you to miss
elements.  Although Neil Brown was trying to fix that it is still
not in the tree yet.

But more fundamentally, iterating over an unbounded list in a sotirq
context is *broken*!

Either you don't really need a hash table at all because your list
is short enough to start with, or you need to add more hash tables
(or other data structures) to efficiently look things up using these
alternative keys so that you don't end up hogging the softirq.

So this needs to be fixed in mesh.

Once that is gone hopefully we can remove the long obsolete init
API that carries the GFP flag.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [ISSUE][4.20.6] mlx5 and checksum failures
From: Ian Kumlien @ 2019-02-07 22:01 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Cong Wang, David Miller, Saeed Mahameed,
	Linux Kernel Network Developers
In-Reply-To: <CALzJLG8wPpiS+aU674B=jRK3ugbu8EfxY5Wrcs1_RwsjDtLhUg@mail.gmail.com>

On Thu, Feb 7, 2019 at 7:43 PM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
>
> On Thu, Feb 7, 2019 at 2:17 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Thu, Feb 7, 2019 at 2:01 AM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > > On Wed, Feb 6, 2019 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > It changes directly after the first hw checksum failure, I don't know why =/
> > >
> > > weird, Maybe a real check-summing issue/corruption on the PCI ?!
> >
> > Actually, it seems to have been introduced in 4.20.6 - 4.20.5 works just fine
> >
>
> Great, the difference is only 120 patches.
> that is bisect-able, it will only take 5 iterations to find the
> offending commit.

I just wish it wasn't a server that takes, what feels like 5 minutes to boot...

All of these seas of sensors 2d and 3d... =P

But, yep, that's the plan

> > Just FYI, my dmesg testcase:
> > time ssh <server> "dmesg && exit
> > real    3m5.845s
> > user    0m0.035s
> > sys     0m0.041s
> >
> > > can you try turning off checksum offloads
> > > ethtool -K ethX  rx off
> >
> > same test:
> > real    0m3.408s
> > user    0m0.022s
> > sys     0m0.032s
> >
> > So yes, something in 4.20.6 goes wrong on the receiving part :/

^ permalink raw reply

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: Johannes Berg @ 2019-02-07 21:56 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-wireless, netdev, Jouni Malinen, Thomas Graf
In-Reply-To: <20190207214834.awrpkxunnvfdorbe@gondor.apana.org.au>

On Fri, 2019-02-08 at 05:48 +0800, Herbert Xu wrote:
> On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote:
> > 
> > > This interface wasn't designed for use in softirq contexts.
> > 
> > Well, it clearly was used there. You even gave it a gfp_t argument in
> > rhashtable_walk_init(), so you can't really claim it wasn't designed for
> > this. I see now that it's ignored, but still?
> 
> I see.  This was added behind my back so I wasn't aware of it.

It's not used and actually I was wrong anyway: this would have also
allowed doing the walk while holding a spinlock or with softirqs
disabled, rather than from IRQ/softirq context.

In any case, it's clearly working to iterate from this context, and
doing a spinlock_bh vs. a spinlock in the rhashtable code isn't really
that big a deal? Not sure I really understand your objection.

johannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox