Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 00/24] sctp: support SCTP_FUTURE/CURRENT/ALL_ASSOC
From: David Miller @ 2019-01-30  8:44 UTC (permalink / raw)
  To: nhorman; +Cc: lucien.xin, netdev, linux-sctp, marcelo.leitner
In-Reply-To: <20190130074605.GB6120@neilslaptop.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 30 Jan 2019 02:46:05 -0500

> Ok, Dave, thank you for waiting on me for this, I've looked at this series, and
> after Xins explination on my question, I've no issue with this change:
> 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Awesome, series applied.

^ permalink raw reply

* Re: WoL broken in r8169.c since kernel 4.19
From: Marc Haber @ 2019-01-30  8:46 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev@vger.kernel.org
In-Reply-To: <1f40f95a-f595-45f5-5641-9eb4837da71e@gmail.com>

On Tue, Jan 29, 2019 at 08:01:14PM +0100, Heiner Kallweit wrote:
> the change to replace __rtl8169_set_wol(tp, 0) doesn't seem to be the right commit
> because it was included in 4.18 already. And if you read the commit description you'll
> see that it was replaced because it caused issues with certain boards. Having said that
> it's not an option for us.

:-(

> Can you in addition apply the following (again it may not apply cleanly) and provide
> the results for 4.18 and 4.19?

In addition to the debug output? Can I do that together with the other
patch you sent yesterday or does that need to be two different tests?

> And from today's run, can you provide the full dmesg output? I'd like to check
> why the message was written on resume only.

I guess that it just didn't find its way to syslog before resume and got
stuck in some buffer during suspend.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply

* [PATCH net-next 0/5] mlxsw: spectrum_acl: Include delta bits into hashtable key
From: Ido Schimmel @ 2019-01-30  8:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, mlxsw, Ido Schimmel

The Spectrum-2 ASIC allows multiple rules to use the same mask provided
that the difference between their masks is small enough (up to 8
consecutive delta bits). A more detailed explanation is provided in
merge commit 756cd36626f7 ("Merge branch
'mlxsw-Introduce-algorithmic-TCAM-support'").

These delta bits are part of the rule's key and therefore rules that
only differ in their delta bits can be inserted with the same A-TCAM
mask. In case two rules share the same key and only differ in their
priority, then the second will spill to the C-TCAM.

Current code does not take the delta bits into account when checking for
duplicate rules, which leads to unnecessary spillage to the C-TCAM.
This may result in reduced scale and performance.

Patch #1 includes the delta bits in the rule's key to avoid the above
mentioned problem.

Patch #2 adds a tracepoint when a rule is inserted into the C-TCAM.

Patches #3-#5 add test cases to make sure unnecessary spillage into the
C-TCAM does not occur.

Jiri Pirko (5):
  mlxsw: spectrum_acl: Include delta bits into hashtable key
  mlxsw: spectrum_acl: Add C-TCAM spill tracepoint
  selftests: spectrum-2: Extend and move trace helpers
  selftests: spectrum-2: Fix multiple_masks_test
  selftests: spectrum-2: Add delta two masks one key test

 .../mellanox/mlxsw/spectrum_acl_atcam.c       |  24 +--
 .../mlxsw/spectrum_acl_bloom_filter.c         |   2 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.h        |  10 +-
 include/trace/events/mlxsw.h                  |  38 +++++
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 143 ++++++++++++++----
 5 files changed, 174 insertions(+), 43 deletions(-)
 create mode 100644 include/trace/events/mlxsw.h

-- 
2.20.1

^ permalink raw reply

* [PATCH net-next 1/5] mlxsw: spectrum_acl: Include delta bits into hashtable key
From: Ido Schimmel @ 2019-01-30  8:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, mlxsw, Ido Schimmel
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Currently only ERP mask masked bits in key are considered for
the hashtable key. That leads to false negative collisions
and fallbacks to C-TCAM in case two keys differ only in delta bits.

Fix this by taking full encoded key as a hashtable key,
including delta bits.

Reported-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../mellanox/mlxsw/spectrum_acl_atcam.c       | 21 +++++++++----------
 .../mlxsw/spectrum_acl_bloom_filter.c         |  2 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.h        | 10 +++++----
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index 40dc76a5c412..cda0a7170c34 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -390,8 +390,7 @@ mlxsw_sp_acl_atcam_region_entry_insert(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		return err;
 
-	lkey_id = aregion->ops->lkey_id_get(aregion, aentry->ht_key.enc_key,
-					    erp_id);
+	lkey_id = aregion->ops->lkey_id_get(aregion, aentry->enc_key, erp_id);
 	if (IS_ERR(lkey_id))
 		return PTR_ERR(lkey_id);
 	aentry->lkey_id = lkey_id;
@@ -399,7 +398,7 @@ mlxsw_sp_acl_atcam_region_entry_insert(struct mlxsw_sp *mlxsw_sp,
 	kvdl_index = mlxsw_afa_block_first_kvdl_index(rulei->act_block);
 	mlxsw_reg_ptce3_pack(ptce3_pl, true, MLXSW_REG_PTCE3_OP_WRITE_WRITE,
 			     priority, region->tcam_region_info,
-			     aentry->ht_key.enc_key, erp_id,
+			     aentry->enc_key, erp_id,
 			     aentry->delta_info.start,
 			     aentry->delta_info.mask,
 			     aentry->delta_info.value,
@@ -424,12 +423,11 @@ mlxsw_sp_acl_atcam_region_entry_remove(struct mlxsw_sp *mlxsw_sp,
 	struct mlxsw_sp_acl_atcam_lkey_id *lkey_id = aentry->lkey_id;
 	struct mlxsw_sp_acl_tcam_region *region = aregion->region;
 	u8 erp_id = mlxsw_sp_acl_erp_mask_erp_id(aentry->erp_mask);
-	char *enc_key = aentry->ht_key.enc_key;
 	char ptce3_pl[MLXSW_REG_PTCE3_LEN];
 
 	mlxsw_reg_ptce3_pack(ptce3_pl, false, MLXSW_REG_PTCE3_OP_WRITE_WRITE, 0,
 			     region->tcam_region_info,
-			     enc_key, erp_id,
+			     aentry->enc_key, erp_id,
 			     aentry->delta_info.start,
 			     aentry->delta_info.mask,
 			     aentry->delta_info.value,
@@ -458,7 +456,7 @@ mlxsw_sp_acl_atcam_region_entry_action_replace(struct mlxsw_sp *mlxsw_sp,
 	kvdl_index = mlxsw_afa_block_first_kvdl_index(rulei->act_block);
 	mlxsw_reg_ptce3_pack(ptce3_pl, true, MLXSW_REG_PTCE3_OP_WRITE_UPDATE,
 			     priority, region->tcam_region_info,
-			     aentry->ht_key.enc_key, erp_id,
+			     aentry->enc_key, erp_id,
 			     aentry->delta_info.start,
 			     aentry->delta_info.mask,
 			     aentry->delta_info.value,
@@ -481,15 +479,15 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
 	int err;
 
 	mlxsw_afk_encode(afk, region->key_info, &rulei->values,
-			 aentry->full_enc_key, mask);
+			 aentry->ht_key.full_enc_key, mask);
 
 	erp_mask = mlxsw_sp_acl_erp_mask_get(aregion, mask, false);
 	if (IS_ERR(erp_mask))
 		return PTR_ERR(erp_mask);
 	aentry->erp_mask = erp_mask;
 	aentry->ht_key.erp_id = mlxsw_sp_acl_erp_mask_erp_id(erp_mask);
-	memcpy(aentry->ht_key.enc_key, aentry->full_enc_key,
-	       sizeof(aentry->ht_key.enc_key));
+	memcpy(aentry->enc_key, aentry->ht_key.full_enc_key,
+	       sizeof(aentry->enc_key));
 
 	/* Compute all needed delta information and clear the delta bits
 	 * from the encrypted key.
@@ -498,8 +496,9 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
 	aentry->delta_info.start = mlxsw_sp_acl_erp_delta_start(delta);
 	aentry->delta_info.mask = mlxsw_sp_acl_erp_delta_mask(delta);
 	aentry->delta_info.value =
-		mlxsw_sp_acl_erp_delta_value(delta, aentry->full_enc_key);
-	mlxsw_sp_acl_erp_delta_clear(delta, aentry->ht_key.enc_key);
+		mlxsw_sp_acl_erp_delta_value(delta,
+					     aentry->ht_key.full_enc_key);
+	mlxsw_sp_acl_erp_delta_clear(delta, aentry->enc_key);
 
 	/* Add rule to the list of A-TCAM rules, assuming this
 	 * rule is intended to A-TCAM. In case this rule does
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c
index f5c381dcb015..9545b572747e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c
@@ -133,7 +133,7 @@ mlxsw_sp_acl_bf_key_encode(struct mlxsw_sp_acl_atcam_region *aregion,
 		memcpy(chunk + MLXSW_BLOOM_CHUNK_PAD_BYTES, &erp_region_id,
 		       sizeof(erp_region_id));
 		memcpy(chunk + MLXSW_BLOOM_CHUNK_KEY_OFFSET,
-		       &aentry->ht_key.enc_key[chunk_key_offsets[chunk_index]],
+		       &aentry->enc_key[chunk_key_offsets[chunk_index]],
 		       MLXSW_BLOOM_CHUNK_KEY_BYTES);
 		chunk += MLXSW_BLOOM_KEY_CHUNK_BYTES;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
index 10512b7c6d50..0858d5b06353 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
@@ -161,9 +161,9 @@ struct mlxsw_sp_acl_atcam_region {
 };
 
 struct mlxsw_sp_acl_atcam_entry_ht_key {
-	char enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded key,
-							    * minus delta bits.
-							    */
+	char full_enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded
+								 * key.
+								 */
 	u8 erp_id;
 };
 
@@ -175,7 +175,9 @@ struct mlxsw_sp_acl_atcam_entry {
 	struct rhash_head ht_node;
 	struct list_head list; /* Member in entries_list */
 	struct mlxsw_sp_acl_atcam_entry_ht_key ht_key;
-	char full_enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded key */
+	char enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* Encoded key,
+							    * minus delta bits.
+							    */
 	struct {
 		u16 start;
 		u8 mask;
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 2/5] mlxsw: spectrum_acl: Add C-TCAM spill tracepoint
From: Ido Schimmel @ 2019-01-30  8:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, mlxsw, Ido Schimmel
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Add some visibility to the rule addition process and trace whenever rule
spilled into C-TCAM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../mellanox/mlxsw/spectrum_acl_atcam.c       |  3 ++
 include/trace/events/mlxsw.h                  | 38 +++++++++++++++++++
 2 files changed, 41 insertions(+)
 create mode 100644 include/trace/events/mlxsw.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index cda0a7170c34..a74a390901ac 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -7,6 +7,8 @@
 #include <linux/gfp.h>
 #include <linux/refcount.h>
 #include <linux/rhashtable.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/mlxsw.h>
 
 #include "reg.h"
 #include "core.h"
@@ -578,6 +580,7 @@ int mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
 	/* It is possible we failed to add the rule to the A-TCAM due to
 	 * exceeded number of masks. Try to spill into C-TCAM.
 	 */
+	trace_mlxsw_sp_acl_atcam_entry_add_ctcam_spill(mlxsw_sp, aregion);
 	err = mlxsw_sp_acl_ctcam_entry_add(mlxsw_sp, &aregion->cregion,
 					   &achunk->cchunk, &aentry->centry,
 					   rulei, true);
diff --git a/include/trace/events/mlxsw.h b/include/trace/events/mlxsw.h
new file mode 100644
index 000000000000..6c2bafcade18
--- /dev/null
+++ b/include/trace/events/mlxsw.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. All rights reserved */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mlxsw
+
+#if !defined(_MLXSW_TRACEPOINT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _MLXSW_TRACEPOINT_H
+
+#include <linux/tracepoint.h>
+
+struct mlxsw_sp;
+struct mlxsw_sp_acl_atcam_region;
+
+TRACE_EVENT(mlxsw_sp_acl_atcam_entry_add_ctcam_spill,
+	TP_PROTO(const struct mlxsw_sp *mlxsw_sp,
+		 const struct mlxsw_sp_acl_atcam_region *aregion),
+
+	TP_ARGS(mlxsw_sp, aregion),
+
+	TP_STRUCT__entry(
+		__field(const void *, mlxsw_sp)
+		__field(const void *, aregion)
+	),
+
+	TP_fast_assign(
+		__entry->mlxsw_sp = mlxsw_sp;
+		__entry->aregion = aregion;
+	),
+
+	TP_printk("mlxsw_sp %p, aregion %p",
+		  __entry->mlxsw_sp, __entry->aregion)
+);
+
+#endif /* _MLXSW_TRACEPOINT_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 3/5] selftests: spectrum-2: Extend and move trace helpers
From: Ido Schimmel @ 2019-01-30  8:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, mlxsw, Ido Schimmel
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Allow to specify number of trace hits and move helpers
to the beginning of the file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 71 +++++++++++++------
 1 file changed, 49 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index b41d6256b2d0..ed1ae902f2af 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -38,6 +38,55 @@ h2_destroy()
 	simple_if_fini $h2 192.0.2.2/24 198.51.100.2/24
 }
 
+tp_record()
+{
+	local tracepoint=$1
+	local cmd=$2
+
+	perf record -q -e $tracepoint $cmd
+	return $?
+}
+
+tp_record_all()
+{
+	local tracepoint=$1
+	local seconds=$2
+
+	perf record -a -q -e $tracepoint sleep $seconds
+	return $?
+}
+
+__tp_hit_count()
+{
+	local tracepoint=$1
+
+	local perf_output=`perf script -F trace:event,trace`
+	return `echo $perf_output | grep "$tracepoint:" | wc -l`
+}
+
+tp_check_hits()
+{
+	local tracepoint=$1
+	local count=$2
+
+	__tp_hit_count $tracepoint
+	if [[ "$?" -ne "$count" ]]; then
+		return 1
+	fi
+	return 0
+}
+
+tp_check_hits_any()
+{
+	local tracepoint=$1
+
+	__tp_hit_count $tracepoint
+	if [[ "$?" -eq "0" ]]; then
+		return 1
+	fi
+	return 0
+}
+
 single_mask_test()
 {
 	# When only a single mask is required, the device uses the master
@@ -325,28 +374,6 @@ ctcam_edge_cases_test()
 	ctcam_no_atcam_masks_test
 }
 
-tp_record()
-{
-	local tracepoint=$1
-	local cmd=$2
-
-	perf record -q -e $tracepoint $cmd
-	return $?
-}
-
-tp_check_hits()
-{
-	local tracepoint=$1
-	local count=$2
-
-	perf_output=`perf script -F trace:event,trace`
-	hits=`echo $perf_output | grep "$tracepoint:" | wc -l`
-	if [[ "$count" -ne "$hits" ]]; then
-		return 1
-	fi
-	return 0
-}
-
 delta_simple_test()
 {
 	# The first filter will create eRP, the second filter will fit into
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 4/5] selftests: spectrum-2: Fix multiple_masks_test
From: Ido Schimmel @ 2019-01-30  8:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, mlxsw, Ido Schimmel
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

With recent fix in C-TCAM spillage for delta masks, the test stops to be
falsely positive. So fix it not to use delta by adding src_ip bits to the
masks. Alongside with that, use C-TCAM spill trace to see when the
spillage actually happens.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 26 ++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index ed1ae902f2af..73a35ca827ac 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -231,20 +231,38 @@ multiple_masks_test()
 	# spillage is performed correctly and that the right filter is
 	# matched
 
+	if [[ "$tcflags" != "skip_sw" ]]; then
+		return 0;
+	fi
+
 	local index
 
 	RET=0
 
 	NUM_MASKS=32
+	NUM_ERPS=16
 	BASE_INDEX=100
 
 	for i in $(eval echo {1..$NUM_MASKS}); do
 		index=$((BASE_INDEX - i))
 
-		tc filter add dev $h2 ingress protocol ip pref $index \
-			handle $index \
-			flower $tcflags dst_ip 192.0.2.2/${i} src_ip 192.0.2.1 \
-			action drop
+		if ((i > NUM_ERPS)); then
+			exp_hits=1
+			err_msg="$i filters - C-TCAM spill did not happen when it was expected"
+		else
+			exp_hits=0
+			err_msg="$i filters - C-TCAM spill happened when it should not"
+		fi
+
+		tp_record "mlxsw:mlxsw_sp_acl_atcam_entry_add_ctcam_spill" \
+			"tc filter add dev $h2 ingress protocol ip pref $index \
+				handle $index \
+				flower $tcflags \
+				dst_ip 192.0.2.2/${i} src_ip 192.0.2.1/${i} \
+				action drop"
+		tp_check_hits "mlxsw:mlxsw_sp_acl_atcam_entry_add_ctcam_spill" \
+				$exp_hits
+		check_err $? "$err_msg"
 
 		$MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 \
 			-B 192.0.2.2 -t ip -q
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 5/5] selftests: spectrum-2: Add delta two masks one key test
From: Ido Schimmel @ 2019-01-30  8:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, mlxsw, Ido Schimmel
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Ensure that the bug is fixed and we no longer have C-TCAM spill for two
keys that differ only in delta.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 46 ++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index 73a35ca827ac..f1922bf597b0 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -9,7 +9,8 @@ lib_dir=$(dirname $0)/../../../../net/forwarding
 
 ALL_TESTS="single_mask_test identical_filters_test two_masks_test \
 	multiple_masks_test ctcam_edge_cases_test delta_simple_test \
-	bloom_simple_test bloom_complex_test bloom_delta_test"
+	delta_two_masks_one_key_test bloom_simple_test \
+	bloom_complex_test bloom_delta_test"
 NUM_NETIFS=2
 source $lib_dir/tc_common.sh
 source $lib_dir/lib.sh
@@ -450,6 +451,49 @@ delta_simple_test()
 	log_test "delta simple test ($tcflags)"
 }
 
+delta_two_masks_one_key_test()
+{
+	# If 2 keys are the same and only differ in mask in a way that
+	# they belong under the same ERP (second is delta of the first),
+	# there should be no C-TCAM spill.
+
+	RET=0
+
+	if [[ "$tcflags" != "skip_sw" ]]; then
+		return 0;
+	fi
+
+	tp_record "mlxsw:*" "tc filter add dev $h2 ingress protocol ip \
+		   pref 1 handle 101 flower $tcflags dst_ip 192.0.2.0/24 \
+		   action drop"
+	tp_check_hits "mlxsw:mlxsw_sp_acl_atcam_entry_add_ctcam_spill" 0
+	check_err $? "incorrect C-TCAM spill while inserting the first rule"
+
+	tp_record "mlxsw:*" "tc filter add dev $h2 ingress protocol ip \
+		   pref 2 handle 102 flower $tcflags dst_ip 192.0.2.2 \
+		   action drop"
+	tp_check_hits "mlxsw:mlxsw_sp_acl_atcam_entry_add_ctcam_spill" 0
+	check_err $? "incorrect C-TCAM spill while inserting the second rule"
+
+	$MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+		-t ip -q
+
+	tc_check_packets "dev $h2 ingress" 101 1
+	check_err $? "Did not match on correct filter"
+
+	tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower
+
+	$MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+		-t ip -q
+
+	tc_check_packets "dev $h2 ingress" 102 1
+	check_err $? "Did not match on correct filter"
+
+	tc filter del dev $h2 ingress protocol ip pref 2 handle 102 flower
+
+	log_test "delta two masks one key test ($tcflags)"
+}
+
 bloom_simple_test()
 {
 	# Bloom filter requires that the eRP table is used. This test
-- 
2.20.1


^ permalink raw reply related

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Peter Zijlstra @ 2019-01-30  8:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, jakub.kicinski, netdev,
	kernel-team, mingo, will.deacon, Paul McKenney, jannh
In-Reply-To: <20190130023212.zs4d6hws5tsfl5uc@ast-mbp.dhcp.thefacebook.com>

On Tue, Jan 29, 2019 at 06:32:13PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 29, 2019 at 10:16:54AM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > 
> > > > Ah, but the loop won't be in the BPF program itself. The BPF program
> > > > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
> > > > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
> > > > out-of-line versions of them).
> > > 
> > > As I said we considered exactly that and such approach has a lot of downsides
> > > comparing with the helper approach.
> > > Pretty much every time new feature is added we're evaluating whether it
> > > should be new instruction or new helper. 99% of the time we go with new helper.
> > 
> > Ah; it seems I'm confused on helper vs instruction. As in, I've no idea
> > what a helper is.
> 
> bpf helper is a normal kernel function that can be called from bpf program.
> In assembler it's a direct function call.

Ah, it is what is otherwise known as a standard library,

> > > > There isn't anything that mandates the JIT uses the exact same locking
> > > > routines the interpreter does, is there?
> > > 
> > > sure. This bpf_spin_lock() helper can be optimized whichever way the kernel wants.
> > > Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map types.
> > > JITs don't even need to do anything. It looks like function call from bpf prog
> > > point of view, but in JITed code it is a sequence of native instructions.
> > > 
> > > Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock()
> > > takes too much time then we can inline fast path of queued_spin_lock
> > > directly into bpf prog and save function call cost.
> > 
> > OK, so then the JIT can optimize helpers. Would it not make sense to
> > have the simple test-and-set spinlock in the generic code and have the
> > JITs use arch_spinlock_t where appropriate?
> 
> I think that pretty much the same as what I have with qspinlock.
> Instead of taking a risk how JIT writers implement bpf_spin_lock optimization
> I'm using qspinlock on architectures that are known to support it.

I see the argument for it...

> So instead of starting with dumb test-and-set there will be faster
> qspinlock from the start on x86, arm64 and few others archs.
> Those are the archs we care about the most anyway. Other archs can take
> time to optimize it (if optimizations are necessary at all).
> In general hacking JITs is much harder and more error prone than
> changing core and adding helpers. Hence we avoid touching JITs
> as much as possible.

So archs/JITs are not trivially able to override those helper functions?
Because for example ARM (32bit) doesn't do qspinlock but it's
arch_spinlock_t is _much_ better than a TAS lock.

> Like map_lookup inlining optimization we do only when JIT is on.
> And we do it purely in the generic core. See array_map_gen_lookup().
> We generate bpf instructions only to feed them into JITs so they
> can replace them with native asm. That is much easier to implement
> correctly than if we were doing inlining in every JIT independently.

That seems prudent and fairly painful at the same time. I see why you
would want to share as much of it as possible, but being restricted to
BPF instructions seems very limiting.

Anyway, I'll not press the issue; but I do think that arch specific
helpers would make sense here.

^ permalink raw reply

* Re: Stack sends oversize UDP packet to the driver
From: Michael Chan @ 2019-01-30  9:07 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश बंडेवार)
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet,
	Willem de Bruijn
In-Reply-To: <CAF2d9jhL_uzV_Er1gJ1eiUUUbJsPH4OHm5Lsbi0HnvnBdNekFQ@mail.gmail.com>

On Tue, Jan 22, 2019 at 10:29 AM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:

>
> The idea behind the fix is very simple and it is to create a dst-only
> (unregistered) device with a very low MTU and use it instead of 'lo'
> while invalidating the dst. This would make it *not* forward packets
> to driver which might need fragmentation.
>

We tested the 2 patches many times and including an overnight test.  I
can confirm that the oversize UDP packets are no longer seen with the
patches applied.  However, I don't see the blackhole xmit function
getting called to free the SKBs though.

Thanks.

^ permalink raw reply

* [PATCH] bpf: selftests: handle sparse CPU allocations
From: Martynas Pumputis @ 2019-01-30  9:19 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, m

Previously, bpf_num_possible_cpus() had a bug when calculating a
number of possible CPUs in the case of sparse CPU allocations, as
it was considering only the first range or element of
/sys/devices/system/cpu/possible.

E.g. in the case of "0,2-3" (CPU 1 is not available), the function
returned 1 instead of 3.

This patch fixes the function by making it parse all CPU ranges and
elements.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
---
 tools/testing/selftests/bpf/bpf_util.h | 29 +++++++++++++++++---------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h
index 315a44fa32af..8cab50408204 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -13,7 +13,7 @@ static inline unsigned int bpf_num_possible_cpus(void)
 	unsigned int start, end, possible_cpus = 0;
 	char buff[128];
 	FILE *fp;
-	int n;
+	int n, i, j = 0;
 
 	fp = fopen(fcpu, "r");
 	if (!fp) {
@@ -21,17 +21,26 @@ static inline unsigned int bpf_num_possible_cpus(void)
 		exit(1);
 	}
 
-	while (fgets(buff, sizeof(buff), fp)) {
-		n = sscanf(buff, "%u-%u", &start, &end);
-		if (n == 0) {
-			printf("Failed to retrieve # possible CPUs!\n");
-			exit(1);
-		} else if (n == 1) {
-			end = start;
+	if (!fgets(buff, sizeof(buff), fp)) {
+		printf("Failed to read %s!\n", fcpu);
+		exit(1);
+	}
+
+	for (i = 0; i <= strlen(buff); i++) {
+		if (buff[i] == ',' || buff[i] == '\0') {
+			buff[i] = '\0';
+			n = sscanf(&buff[j], "%u-%u", &start, &end);
+			if (n <= 0) {
+				printf("Failed to retrieve # possible CPUs!\n");
+				exit(1);
+			} else if (n == 1) {
+				end = start;
+			}
+			possible_cpus += end - start + 1;
+			j = i + 1;
 		}
-		possible_cpus = start == 0 ? end + 1 : 0;
-		break;
 	}
+
 	fclose(fp);
 
 	return possible_cpus;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH RFC RFT net-next 03/10] net: dsa: mv88e6060: Replace REG_WRITE macro
From: Pavel Machek @ 2019-01-30  9:24 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Vivien Didelot, Florian Fainelli
In-Reply-To: <20190130003758.23852-4-andrew@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 5780 bytes --]

On Wed 2019-01-30 01:37:51, Andrew Lunn wrote:
> The REG_WRITE macro contains a return statement, making it not very
> safe. Remove it by inlining the code.

Not bad, but maybe there should be dev_err() or something in case of
reg_write() returns an error?

Because no errors are expected in this case... AFAICT.
								Pavel

> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/dsa/mv88e6060.c | 73 +++++++++++++++++++++----------------
>  1 file changed, 41 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
> index 631358bf3d6b..da88c56e092c 100644
> --- a/drivers/net/dsa/mv88e6060.c
> +++ b/drivers/net/dsa/mv88e6060.c
> @@ -39,15 +39,6 @@ static int reg_write(struct mv88e6060_priv *priv, int addr, int reg, u16 val)
>  	return mdiobus_write_nested(priv->bus, priv->sw_addr + addr, reg, val);
>  }
>  
> -#define REG_WRITE(addr, reg, val)				\
> -	({							\
> -		int __ret;					\
> -								\
> -		__ret = reg_write(priv, addr, reg, val);		\
> -		if (__ret < 0)					\
> -			return __ret;				\
> -	})
> -
>  static const char *mv88e6060_get_name(struct mii_bus *bus, int sw_addr)
>  {
>  	int ret;
> @@ -102,17 +93,21 @@ static int mv88e6060_switch_reset(struct mv88e6060_priv *priv)
>  	/* Set all ports to the disabled state. */
>  	for (i = 0; i < MV88E6060_PORTS; i++) {
>  		ret = REG_READ(REG_PORT(i), PORT_CONTROL);
> -		REG_WRITE(REG_PORT(i), PORT_CONTROL,
> -			  ret & ~PORT_CONTROL_STATE_MASK);
> +		ret = reg_write(priv, REG_PORT(i), PORT_CONTROL,
> +				ret & ~PORT_CONTROL_STATE_MASK);
> +		if (ret)
> +			return ret;
>  	}
>  
>  	/* Wait for transmit queues to drain. */
>  	usleep_range(2000, 4000);
>  
>  	/* Reset the switch. */
> -	REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
> -		  GLOBAL_ATU_CONTROL_SWRESET |
> -		  GLOBAL_ATU_CONTROL_LEARNDIS);
> +	ret = reg_write(priv, REG_GLOBAL, GLOBAL_ATU_CONTROL,
> +			GLOBAL_ATU_CONTROL_SWRESET |
> +			GLOBAL_ATU_CONTROL_LEARNDIS);
> +	if (ret)
> +		return ret;
>  
>  	/* Wait up to one second for reset to complete. */
>  	timeout = jiffies + 1 * HZ;
> @@ -131,59 +126,67 @@ static int mv88e6060_switch_reset(struct mv88e6060_priv *priv)
>  
>  static int mv88e6060_setup_global(struct mv88e6060_priv *priv)
>  {
> +	int ret;
> +
>  	/* Disable discarding of frames with excessive collisions,
>  	 * set the maximum frame size to 1536 bytes, and mask all
>  	 * interrupt sources.
>  	 */
> -	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL, GLOBAL_CONTROL_MAX_FRAME_1536);
> +	ret = reg_write(priv, REG_GLOBAL, GLOBAL_CONTROL,
> +			GLOBAL_CONTROL_MAX_FRAME_1536);
> +	if (ret)
> +		return ret;
>  
>  	/* Disable automatic address learning.
>  	 */
> -	REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
> -		  GLOBAL_ATU_CONTROL_LEARNDIS);
> -
> -	return 0;
> +	return reg_write(priv, REG_GLOBAL, GLOBAL_ATU_CONTROL,
> +			 GLOBAL_ATU_CONTROL_LEARNDIS);
>  }
>  
>  static int mv88e6060_setup_port(struct mv88e6060_priv *priv, int p)
>  {
>  	int addr = REG_PORT(p);
> +	int ret;
>  
>  	/* Do not force flow control, disable Ingress and Egress
>  	 * Header tagging, disable VLAN tunneling, and set the port
>  	 * state to Forwarding.  Additionally, if this is the CPU
>  	 * port, enable Ingress and Egress Trailer tagging mode.
>  	 */
> -	REG_WRITE(addr, PORT_CONTROL,
> -		  dsa_is_cpu_port(priv->ds, p) ?
> +	ret = reg_write(priv, addr, PORT_CONTROL,
> +			dsa_is_cpu_port(priv->ds, p) ?
>  			PORT_CONTROL_TRAILER |
>  			PORT_CONTROL_INGRESS_MODE |
>  			PORT_CONTROL_STATE_FORWARDING :
>  			PORT_CONTROL_STATE_FORWARDING);
> +	if (ret)
> +		return ret;
>  
>  	/* Port based VLAN map: give each port its own address
>  	 * database, allow the CPU port to talk to each of the 'real'
>  	 * ports, and allow each of the 'real' ports to only talk to
>  	 * the CPU port.
>  	 */
> -	REG_WRITE(addr, PORT_VLAN_MAP,
> -		  ((p & 0xf) << PORT_VLAN_MAP_DBNUM_SHIFT) |
> -		   (dsa_is_cpu_port(priv->ds, p) ? dsa_user_ports(priv->ds) :
> -		    BIT(dsa_to_port(priv->ds, p)->cpu_dp->index)));
> +	ret = reg_write(priv, addr, PORT_VLAN_MAP,
> +			((p & 0xf) << PORT_VLAN_MAP_DBNUM_SHIFT) |
> +			(dsa_is_cpu_port(priv->ds, p) ?
> +			 dsa_user_ports(priv->ds) :
> +			 BIT(dsa_to_port(priv->ds, p)->cpu_dp->index)));
> +	if (ret)
> +		return ret;
>  
>  	/* Port Association Vector: when learning source addresses
>  	 * of packets, add the address to the address database using
>  	 * a port bitmap that has only the bit for this port set and
>  	 * the other bits clear.
>  	 */
> -	REG_WRITE(addr, PORT_ASSOC_VECTOR, BIT(p));
> -
> -	return 0;
> +	return reg_write(priv, addr, PORT_ASSOC_VECTOR, BIT(p));
>  }
>  
>  static int mv88e6060_setup_addr(struct mv88e6060_priv *priv)
>  {
>  	u8 addr[ETH_ALEN];
> +	int ret;
>  	u16 val;
>  
>  	eth_random_addr(addr);
> @@ -195,11 +198,17 @@ static int mv88e6060_setup_addr(struct mv88e6060_priv *priv)
>  	 */
>  	val &= 0xfeff;
>  
> -	REG_WRITE(REG_GLOBAL, GLOBAL_MAC_01, val);
> -	REG_WRITE(REG_GLOBAL, GLOBAL_MAC_23, (addr[2] << 8) | addr[3]);
> -	REG_WRITE(REG_GLOBAL, GLOBAL_MAC_45, (addr[4] << 8) | addr[5]);
> +	ret = reg_write(priv, REG_GLOBAL, GLOBAL_MAC_01, val);
> +	if (ret)
> +		return ret;
> +
> +	ret = reg_write(priv, REG_GLOBAL, GLOBAL_MAC_23,
> +			(addr[2] << 8) | addr[3]);
> +	if (ret)
> +		return ret;
>  
> -	return 0;
> +	return reg_write(priv, REG_GLOBAL, GLOBAL_MAC_45,
> +			 (addr[4] << 8) | addr[5]);
>  }
>  
>  static int mv88e6060_setup(struct dsa_switch *ds)

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH RFC RFT net-next 00/10] Modernize mv88e6060 and remove legacy probe
From: Pavel Machek @ 2019-01-30  9:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Vivien Didelot, Florian Fainelli
In-Reply-To: <20190130003758.23852-1-andrew@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

On Wed 2019-01-30 01:37:48, Andrew Lunn wrote:
> The mv88e6060 is the last device using the legacy method of probing an
> DSA Ethernet switch. This patchset applies some cleanups to the
> driver, and then adds support for probing the device as an MDIO bus
> device. The legacy probe is then removed from the driver, and then
> from DSA as a whole.
> 
> This is compile tested only. Comment and testing welcome.

Series looks good to me. But I have out of tree board 6065, not 6060,
so testing is not too easy.

								Pavel

Acked-by: Pavel Machek <pavel@ucw.cz>

		
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH net] ixgbe: fix potential RX buffer starvation for AF_XDP
From: Magnus Karlsson @ 2019-01-30  9:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Magnus Karlsson, Björn Töpel, intel-wired-lan,
	Network Development
In-Reply-To: <20190129111411.3a4c3599@cakuba.hsd1.ca.comcast.net>

On Tue, Jan 29, 2019 at 8:15 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Tue, 29 Jan 2019 15:03:50 +0100, Magnus Karlsson wrote:
> > When the RX rings are created they are also populated with buffers so
> > that packets can be received. Usually these are kernel buffers, but
> > for AF_XDP in zero-copy mode, these are user-space buffers and in this
> > case the application might not have sent down any buffers to the
> > driver at this point. And if no buffers are allocated at ring creation
> > time, no packets can be received and no interupts will be generated so
> > the napi poll function that allocates buffers to the rings will never
> > get executed.
> >
> > To recitfy this, we kick the NAPI context of any queue with an
> > attached AF_XDP zero-copy socket in two places in the code. Once after
> > an XDP program has loaded and once after the umem is registered.  This
> > take care of both cases: XDP program gets loaded first then AF_XDP
> > socket is created, and the reverse, AF_XDP socket is created first,
> > then XDP program is loaded.
> >
> > Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support")
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
>
> I may not understand the problem fully, but isn't it kind of normal
> that if you create a ring empty you'll never receive packets?  And it
> should be reasonably easy to catch while writing an app from scratch
> (i.e. it behaves deterministically).

Agree that this should be the normal behavior for a NIC. The question
is how to get out of this situation.There are two options: punt this
to the application writer or fix this in the driver. I chose to fix
the driver since this removes complexity in the application.

> Plus user space can already do the kick manually: create the ring empty,
> attach prog, put packets on the ring, and kick xmit - that would work,
> no?

Unfortunately there is no way such a kick could be guaranteed to work
since it only guarantees to wake up Tx. If the Rx is in another NAPI
context, then this will not work. On i40e and ixgbe, the Rx and Tx
processing is in the same NAPI so this would indeed work. But can I
guarantee that this is true for all current and future NICs?
Personally I would like to have Tx in a separate NAPI as I could get
better performance that way, at least in the i40e driver. You also do
not need to have any Tx at all in your socket, as it could be Rx only.
These arguments are why I went with the approach to fix this in the
driver, instead of a hard-to-explain extra step in the application
that might or might not work depending on your driver.

I believe the crux of the problem is that we need to tell the driver
to take buffers from the fill ring then put references to them in the
HW Rx ring so that packets can be received from the wire. In this
patch I went for installing an XDP program and calling bind() as "the"
two places. But there could be a scenario where I do bind then install
the XDP program and after that populate my fill queue and not call any
other syscall ever! This would not be covered by this patch :-(.

So you might be correct that the only way to fix this is from the
application. But how to do it in a simple and elegant way? I do not
think sendto() is the right answer, but what about poll()? Can we wire
that up to kick Rx through the busy poll mechanism? I would like the
solution to be simple and not complicate things for the user, if
possible. Another idea would be to use a timer in the driver to
periodically check the fill queue in the case it is empty. This would
be easy for application writers but add complexity for people
implementing AF_XDP ZC support in their drivers. Any ideas? What do
you prefer?

Thanks: Magnus

> Putting the kick in ixgbe_xdp_setup() seems a tiny bit random to me,
> but perhaps I'm not seeing the rationale clearly.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] net: dsa: mv88e6xxx: Save switch rules
From: Miquel Raynal @ 2019-01-30  9:46 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, netdev,
	linux-kernel, Thomas Petazzoni, Gregory Clement, Antoine Tenart,
	Maxime Chevallier, Nadav Haklai
In-Reply-To: <20190129104613.GC20920@t480s.localdomain>

Hi Vivien & Andrew,

Vivien Didelot <vivien.didelot@gmail.com> wrote on Tue, 29 Jan 2019
10:46:13 -0500:

> Hi Miquèl,
> 
> On Tue, 29 Jan 2019 15:51:57 +0100, Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > > Today, there is no S2RAM support for switches. First, I proposed to add
> > > suspend/resume callbacks to the mv88e6xxx driver - just enough to avoid
> > > crashing the kernel.  
> > 
> > Then i would suggest the mv88e6xxx refuses the suspend. Actually that
> > probably is the first correct step. We don't have suspend support, so
> > stop the suspend happening, so preventing the kernel crash.  
>  
> I am not confortable with adding support for S2RAM in mv88e6xxx yet because
> as it was explained, we are aware of much complicated scenarios out there
> to pretend that DSA /partly/ supports suspend-resume. The prefered approach
> for the moment is to keep things simple and not supporting this feature yet,
> especially at the mv88e6xxx driver level.
> 
> However crashing is unacceptable so I'm backing Andrew's point here, please
> submit a fix to prevent the suspend (and crash) for the moment.
> 
> Sorry if you felt that your work is being delayed, it is much appreciated!

Thanks for the more detailed explanation, I got your point and I better
understand your reluctance.

So your proposal is to refuse suspending when using a mv88e6xxx switch.
What about the current situation where suspending is allowed, but all
the configuration gone? As long as all the ports are disabled during
suspend, it should not hurt anything, right? Plus, this is what the
bcm_sf2 and qca8k drivers are doing. I can even add an error message in
the resume path to warn about this drawback.


Thanks,
Miquèl

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: Peter Ceiley @ 2019-01-30  9:59 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev
In-Reply-To: <172787aa-9ef5-091d-f70f-baf89fe0b1ee@gmail.com>

Hi Heiner,

I tried disabling the ASPM using the pcie_aspm=off kernel parameter
and this made no difference.

I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
subsequently loaded the module in the running 4.19.18 kernel. I can
confirm that this immediately resolved the issue and access to the NFS
shares operated as expected.

I presume this means it is an issue with the r8169 driver included in
4.19 onwards?

To answer your last questions:

Base Board Information
    Manufacturer: Alienware
    Product Name: 0PGRP5
    Version: A02

... and yes, the RTL8168 is the onboard network chip.

Regards,

Peter.

On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Hi Peter,
>
> I think the vendor driver doesn't enable ASPM per default.
> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> Few older systems seem to have issues with ASPM, what kind of
> system / mainboard are you using? The RTL8168 is the onboard
> network chip?
>
> Rgds, Heiner
>
>
> On 29.01.2019 07:20, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > Thanks, I'll do some more testing. It might not be the driver - I
> > assumed it was due to the fact that using the r8168 driver 'resolves'
> > the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > a good idea.
> >
> > Cheers,
> >
> > Peter.
> >
> > On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> at a first glance it doesn't look like a typical driver issue.
> >> What you could do:
> >>
> >> - Test the r8169.c from 4.18 on top of 4.19.
> >>
> >> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>
> >> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>
> >> Any specific reason why you think root cause is in the driver and not
> >> elsewhere in the network subsystem?
> >>
> >> Heiner
> >>
> >>
> >> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> Thanks for getting back to me.
> >>>
> >>> No, I don't use jumbo packets.
> >>>
> >>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>> establishing a connection and is most notable, for example, on my
> >>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>> larger directories) to list the contents of each directory. Once a
> >>> transfer begins on a file, I appear to get good bandwidth.
> >>>
> >>> I'm unsure of the best scientific data to provide you in order to
> >>> troubleshoot this issue. Running the following
> >>>
> >>>     netstat -s |grep retransmitted
> >>>
> >>> shows a steady increase in retransmitted segments each time I list the
> >>> contents of a remote directory, for example, running 'ls' on a
> >>> directory containing 345 media files did the following using kernel
> >>> 4.19.18:
> >>>
> >>> increased retransmitted segments by 21 and the 'time' command showed
> >>> the following:
> >>>     real    0m19.867s
> >>>     user    0m0.012s
> >>>     sys    0m0.036s
> >>>
> >>> The same command shows no retransmitted segments running kernel
> >>> 4.18.16 and 'time' showed:
> >>>     real    0m0.300s
> >>>     user    0m0.004s
> >>>     sys    0m0.007s
> >>>
> >>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>
> >>> dmesg XID:
> >>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>
> >>> # lspci -vv
> >>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>     Latency: 0, Cache Line Size: 64 bytes
> >>>     Interrupt: pin A routed to IRQ 19
> >>>     Region 0: I/O ports at d000 [size=256]
> >>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>     Capabilities: [40] Power Management version 3
> >>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>         Address: 0000000000000000  Data: 0000
> >>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>> <512ns, L1 <64us
> >>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>> SlotPowerLimit 10.000W
> >>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>> Latency L0s unlimited, L1 <64us
> >>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>> OBFF Via message/WAKE#
> >>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>> OBFF Disabled
> >>>              AtomicOpsCtl: ReqEn-
> >>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>              Transmit Margin: Normal Operating Range,
> >>> EnterModifiedCompliance- ComplianceSOS-
> >>>              Compliance De-emphasis: -6dB
> >>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>> EqualizationComplete-, EqualizationPhase1-
> >>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>         Vector table: BAR=4 offset=00000000
> >>>         PBA: BAR=4 offset=00000800
> >>>     Capabilities: [d0] Vital Product Data
> >>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>         Not readable
> >>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>> ECRCChkCap+ ECRCChkEn-
> >>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>     Capabilities: [140 v1] Virtual Channel
> >>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>         Ctrl:    ArbSelect=Fixed
> >>>         Status:    InProgress-
> >>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>             Status:    NegoPending- InProgress-
> >>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>         Max snoop latency: 71680ns
> >>>         Max no snoop latency: 71680ns
> >>>     Kernel driver in use: r8169
> >>>     Kernel modules: r8169
> >>>
> >>> Please let me know if you have any other ideas in terms of testing.
> >>>
> >>> Thanks!
> >>>
> >>> Peter.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I have been experiencing very poor network performance since Kernel
> >>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>
> >>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>> 4.20.4 & 4.19.18).
> >>>>>
> >>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>> differ in that I still have a network connection. I have attempted to
> >>>>> reload the driver on a running system, but this does not improve the
> >>>>> situation.
> >>>>>
> >>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>
> >>>>> lshw shows:
> >>>>>        description: Ethernet interface
> >>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>        physical id: 0
> >>>>>        bus info: pci@0000:03:00.0
> >>>>>        logical name: enp3s0
> >>>>>        version: 0c
> >>>>>        serial:
> >>>>>        size: 1Gbit/s
> >>>>>        capacity: 1Gbit/s
> >>>>>        width: 64 bits
> >>>>>        clock: 33MHz
> >>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>> 1000bt-fd autonegotiation
> >>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>
> >>>>> Kind Regards,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>> Hi Peter,
> >>>>
> >>>> the description "poor network performance" is quite vague, therefore:
> >>>>
> >>>> - Can you provide any measurements?
> >>>> - iperf results before and after
> >>>> - statistics about dropped packets (rx and/or tx)
> >>>> - Do you use jumbo packets?
> >>>>
> >>>> Also help would be a "lspci -vv" output for the network card and
> >>>> the dmesg output line with the chip XID.
> >>>>
> >>>> Heiner
> >>>
> >>
> >
>

^ permalink raw reply

* Re: [PATCH 2/7] sh_eth: RX checksum offload support
From: Simon Horman @ 2019-01-30 10:06 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, David S. Miller, linux-renesas-soc, linux-sh
In-Reply-To: <319141ef-caa5-2140-2920-c471dac086ea@cogentembedded.com>

On Tue, Jan 29, 2019 at 06:43:45PM +0300, Sergei Shtylyov wrote:
> On 01/29/2019 10:58 AM, Simon Horman wrote:
> 
> >>>> Add support for the RX checksum offload. This is enabled by default and
> >>>> may be disabled and re-enabled using 'ethtool':
> >>>>
> >>>> # ethtool -K eth0 rx {on|off}
> >>>>
> >>>> Some Ether MACs provide a simple checksumming scheme which appears to be
> >>>> completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
> >>>> the L2 header is appended to packet data; this may be trivially read by
> >>>> the driver and used to update the skb accordingly. The same checksumming
> >>>> scheme is implemented in the EtherAVB MACs and now supported by tha 'ravb'
> >>>> driver.
> >>>>
> >>>> In terms of performance, throughput is close to gigabit line rate with the
> >>>> RX checksum offload both enabled and disabled.  The 'perf' output, however,
> >>>> appears to indicate that significantly less time is spent in do_csum() --
> >>>> this is as expected.
> >>>
> >>> Nice.
> >>>
> >>> FYI, this seems similar to what I observed for RAVB, perhaps on H3 I don't
> >>> exactly recall. On E3, which has less CPU power, I recently observed that
> >>> with rx-csum enabled I can achieve gigabit line rate, but with rx-csum
> >>> disabled throughput is significantly lower. I.e. on that system throughput
> >>> is CPU bound with 1500 byte packets unless rx-csum enabled.
> >>
> >>    Unfortunately, we can't teset these patches on the other gen3 boards. ISTR
> >> you have RZ/A1H board... if it's still with you, I'd appreciate testing.
> > 
> > Unfortunately, as of a few weeks ago, I no longer have that board.
> > 
> >>> Next point:
> >>>
> >>> 2da64300fbc ("ravb: expand rx descriptor data to accommodate hw checksum")
> >>> is fresh in my mind and I wonder if mdp->rx_buf_sz needs to grow to ensure
> >>> that there is always enough space for the csum.
> >>
> >>    Well, if you look at sh_eth_ring_init(), you'll see that the driver reserves
> >> plenty of space at the end the RX buffers.
> > 
> > Yes, I see that. And I assume that was enough space before this patch.
> > But is it still enough space now that 2 bytes are needed for the hardware csum?
> 
>   To quote the source:
> 
> 	/* +26 gets the maximum ethernet encapsulation, +7 & ~7 because the
> 	 * card needs room to do 8 byte alignment, +2 so we can reserve
> 	 * the first 2 bytes, and +16 gets room for the status word from the
> 	 * card.
> 	 */
> 	mdp->rx_buf_sz = (ndev->mtu <= 1492 ? PKT_BUF_SZ :
> 			  (((ndev->mtu + 26 + 7) & ~7) + 2 + 16));
> 
>    I have no idea what they mean by status word and why it takes 16 bytes (and I even
> have the R8A771x manual!) but I think these 16 bytes are where our checksum goes...
> that's why I said there's plenty of space. :-)

Ok. FWIIW, I don't know either.

> > 2 bytes that might have previously been used as packet data in some
> > circumstances.
> > 
> >>> In particular, have you
> >>> tested this with MTU-size frames with VLANs. (My test is to run iperf3 over
> >>> a VLAN netdev, netperf over a VLAN netdev would likely work just as well.)
> >>
> >>    Could you refresh me on how to bring up a VLAN on a given interface?
> > 
> > You will need a kernel with CONFIG_VLAN_8021Q enabled.
> > 
> > Then you can do something like this:
> > 
> > 	ip link add link eth0 name eth0.1 type vlan id 1
> > 	ip addr add 10.1.1.100/24 dev eth0.1
> > 	ip link set dev eth0.1 up
> 
>   Thank you! I'm not familiar with 'ip' at all, thought 'ifconfig' could do the same
> thing easier but couldn't remember all the needed incantations... :-)
>    Anyway, it worked!
> 
> >> [...]
> >>>> The above results collected on the R-Car V3H Starter Kit board.
> >>>>
> >>>> Based on the commit 4d86d3818627 ("ravb: RX checksum offload")...
> >>>>
> >>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> >> [...]
> 
> MBR, Sergei
> 

^ permalink raw reply

* Re: [PATCH 3/7] sh_eth: offload RX checksum on R7S72100
From: Simon Horman @ 2019-01-30 10:08 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, David S. Miller, linux-renesas-soc, linux-sh
In-Reply-To: <e5b05616-d9b3-3c0a-189b-4fc02190c8a5@cogentembedded.com>

On Tue, Jan 29, 2019 at 01:37:38PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 01/29/2019 11:00 AM, Simon Horman wrote:
> 
> >>>> The RZ/A1H (R7S721000) SoC manual describes the Ether MAC's RX checksum
> >>>> offload the same way as it's implemented in the EtherAVB MACs...
> >>>>
> >>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> >>>
> >>> Regarding this and the remaining patches in this series,
> >>> which add rx-csum offload support in the sh_eth driver for
> >>> various SoCs: has this been tested?
> >>
> >>    As I said, I've only tested it on R8A77980.
> 
>    And still hoping Geert would be able to test on R8A7740.
> 
> > 
> > Thanks, I missed that.
> > 
> > As you may have guessed the implication of my question is that
> > IMHO it would be best only to add this feature to SoCs where
> > it has been tested.
> 
>    You don't trust the manuals? :-)

As a rule I do not.
But sometimes I have to anyway.

^ permalink raw reply

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
From: Peter Zijlstra @ 2019-01-30 10:10 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-4-ast@kernel.org>

On Tue, Jan 29, 2019 at 08:04:57PM -0800, Alexei Starovoitov wrote:
> Lockdep warns about false positive:

The report reads like:

	tracepoint_probe_register()
#0	  mutex_lock(&tracepoint_mutex)
	  tracepoint_add_func()
	    static_key_slow_inc()
#1	      cpus_read_lock();


	_cpu_up()
#1	  cpus_write_lock();
	  ...
	  perf_event_init_cpu()
#2	    mutex_lock(&pmus_lock);
#3	    mutex_lock(&ctx->mutex);


	perf_ioctl()
#4	  perf_event_ctx_lock();
	  _perf_ioctl(IOC_QUERY_BPF)
	    perf_event_query_prog_array()
#5	      mutex_lock(&bpf_event_mutex);


	bpf_probe_register()
#5	  mutex_lock(&bpf_event_mutex);
	  __bpf_probe_register()
	    tracepoint_probe_register()
#0	      mutex_lock(&tracepoint_mutex);

Which to me reads like an entirely valid deadlock scenario.

And note that the first and last can be combined to give:

	bpf_probe_register()
#5	  mutex_lock(&bpf_event_mutex);
	  __bpf_probe_register()
	    tracepoint_probe_register()
#0	      mutex_lock(&tracepoint_mutex);
	      tracepoint_add_func()
	        static_key_slow_inc()
#1		  cpus_read_lock();


Which generates a deadlock even without #0.

Why do you say this is not possible? All you need is 3 CPUs, one doing a
CPU online, one doing a perf ioctl() and one doing that
bpf_probe_register().



^ permalink raw reply

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Peter Zijlstra @ 2019-01-30 10:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, edumazet, jannh, netdev, kernel-team, Waiman Long
In-Reply-To: <20190130040458.2544340-3-ast@kernel.org>

On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> Lockdep warns about false positive:

This is not a false positive, and you probably also need to use
down_read_non_owner() to match this up_read_non_owner().

{up,down}_read() and {up,down}_read_non_owner() are not only different
in the lockdep annotation; there is also optimistic spin stuff that
relies on 'owner' tracking.

> [   11.211460] ------------[ cut here ]------------
> [   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
> [   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
> [   11.213134] Modules linked in:
> [   11.213413] CPU: 0 PID: 141 Comm: systemd-journal Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #476
> [   11.214191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> [   11.214954] RIP: 0010:lock_release+0x1ad/0x280
> [   11.217036] RSP: 0018:ffff88813ba03f50 EFLAGS: 00010086
> [   11.217516] RAX: 000000000000001f RBX: ffff8881378d8000 RCX: 0000000000000000
> [   11.218179] RDX: ffffffff810d3e9e RSI: 0000000000000001 RDI: ffffffff810d3eb3
> [   11.218851] RBP: ffff8881393e2b08 R08: 0000000000000002 R09: 0000000000000000
> [   11.219504] R10: 0000000000000000 R11: ffff88813ba03d9d R12: ffffffff8118dfa2
> [   11.220162] R13: 0000000000000086 R14: 0000000000000000 R15: 0000000000000000
> [   11.220717] FS:  00007f3c8cf35780(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
> [   11.221348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   11.221822] CR2: 00007f5825d92080 CR3: 00000001378c8005 CR4: 00000000003606f0
> [   11.222381] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   11.222951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   11.223508] Call Trace:
> [   11.223705]  <IRQ>
> [   11.223874]  ? __local_bh_enable+0x7a/0x80
> [   11.224199]  up_read+0x1c/0xa0
> [   11.224446]  do_up_read+0x12/0x20
> [   11.224713]  irq_work_run_list+0x43/0x70
> [   11.225030]  irq_work_run+0x26/0x50
> [   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
> [   11.225662]  irq_work_interrupt+0xf/0x20
> 
> since rw_semaphore is released in a different task vs task that locked the sema.
> It is expected behavior.
> Silence the warning by using up_read_non_owner().
> 
> Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/stackmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index d43b14535827..4b79e7c251e5 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -44,7 +44,7 @@ static void do_up_read(struct irq_work *entry)
>  	struct stack_map_irq_work *work;
>  
>  	work = container_of(entry, struct stack_map_irq_work, irq_work);
> -	up_read(work->sem);
> +	up_read_non_owner(work->sem);
>  	work->sem = NULL;
>  }
>  
> -- 
> 2.20.0
> 

^ permalink raw reply

* [PATCH -next] mISDN: hfcsusb: Fix potential NULL pointer dereference
From: YueHaibing @ 2019-01-30 10:19 UTC (permalink / raw)
  To: isdn, davem, gustavo, bigeasy; +Cc: linux-kernel, netdev, YueHaibing

There is a potential NULL pointer dereference in case
kzalloc() fails and returns NULL.

Fixes: 69f52adb2d53 ("mISDN: Add HFC USB driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/isdn/hardware/mISDN/hfcsusb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c
index 124ff53..5660d5a 100644
--- a/drivers/isdn/hardware/mISDN/hfcsusb.c
+++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
@@ -263,6 +263,8 @@ hfcsusb_ph_info(struct hfcsusb *hw)
 	int i;
 
 	phi = kzalloc(struct_size(phi, bch, dch->dev.nrbchan), GFP_ATOMIC);
+	if (!phi)
+		return;
 	phi->dch.ch.protocol = hw->protocol;
 	phi->dch.ch.Flags = dch->Flags;
 	phi->dch.state = dch->state;
-- 
2.7.0



^ permalink raw reply related

* Re: [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
From: Peter Zijlstra @ 2019-01-30 10:21 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-2-ast@kernel.org>

On Tue, Jan 29, 2019 at 08:04:55PM -0800, Alexei Starovoitov wrote:
> 
> It has been explained that is a false positive here:
> https://lkml.org/lkml/2018/7/25/756

Please, no external references like that. The best option is to fully
include the explanation here so that a future software archeology
student (note, this might be yourself in a years time) can find all
relevant information in the git history.

Or, if you _really_ _really_ have to refer to external sources, use:

  https://lkml.kernel.org/r/${msg_id}

which is a controlled URL and can be made to point to any archive
(currently, and sadly, lore.kernel.org).

Also, since it includes the whole msg_id, people can trivially find the
email in their local archive.

While I agree that lkml.org is a _MUCH_ saner interface than lore (which
causes eye and brain cancer for just looking at it), it is a _really_
flaky website and the url contains no clues for finding it in the local
archive.

> Recap:
> - stackmap uses pcpu_freelist
> - The lock in pcpu_freelist is a percpu lock
> - stackmap is only used by tracing bpf_prog
> - A tracing bpf_prog cannot be run if another bpf_prog
>   has already been running (ensured by the percpu bpf_prog_active counter).

^ permalink raw reply

* Re: [PATCH] ath10k: snoc: remove set but not used variable 'ar_snoc'
From: Kalle Valo @ 2019-01-30 10:32 UTC (permalink / raw)
  To: Brian Norris; +Cc: YueHaibing, ath10k, netdev, linux-wireless, kernel-janitors
In-Reply-To: <CA+ASDXMQto_dqPOuUEX9vqqbWmor8qvpwxBVTg6tSnHkfsCzpw@mail.gmail.com>

Brian Norris <briannorris@chromium.org> writes:

> On Mon, Jan 28, 2019 at 9:53 PM YueHaibing <yuehaibing@huawei.com> wrote:
>>
>> ping...
>
> For some reason, your patch shows up as Deferred in patchwork:
>
> https://patchwork.kernel.org/patch/10589789/
>
> So the maintainers have accidentally (?) ignored it.

Actually I put it deliberately to deferred as I wanted to apply Govind's
QMI patches first and only then this. But unfortunately the deferred
patches have been piled up by other patches so I haven't taken this
patch yet.

> I'm not what the official suggestion is for that, but you might just
> resend.

No need to resend, I can just change the state from Deferred to New. But
I see that the patch was already resent so I'll drop this version now.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] tty: Fix WARNING in tty_set_termios
From: Johan Hovold @ 2019-01-30 10:32 UTC (permalink / raw)
  To: shuah
  Cc: Al Viro, linux-wireless, chris, devel, robh, sameo, linux-serial,
	jslaby, santhameena13, kirk, johan.hedberg, arnd, marcel,
	samuel.thibault, m.maya.nakamura, zhongjiang, gregkh, speakup,
	linux-kernel, linux-bluetooth, netdev, nishka.dasgupta_ug18,
	davem
In-Reply-To: <fe4018a7-19e4-cdd1-3095-bbe88a82788e@kernel.org>

On Mon, Jan 28, 2019 at 02:29:22PM -0700, shuah wrote:
> On 1/25/19 9:14 PM, Al Viro wrote:
> > On Fri, Jan 25, 2019 at 04:29:05PM -0700, Shuah Khan wrote:
> >> tty_set_termios() has the following WARMN_ON which can be triggered with a
> >> syscall to invoke TIOCGETD __NR_ioctl.

You meant TIOCSETD here, and in fact its the call which sets the uart
protocol that triggers the warning.

> >> WARN_ON(tty->driver->type == TTY_DRIVER_TYPE_PTY &&
> >>                  tty->driver->subtype == PTY_TYPE_MASTER);
> >> Reference: https://syzkaller.appspot.com/bug?id=2410d22f1d8e5984217329dd0884b01d99e3e48d
> >>
> >> A simple change would have been to print error message instead of WARN_ON.
> >> However, the callers assume that tty_set_termios() always returns 0 and
> >> don't check return value. The complete solution is fixing all the callers
> >> to check error and bail out to fix the WARN_ON.
> >>
> >> This fix changes tty_set_termios() to return error and all the callers
> >> to check error and bail out. The reproducer is used to reproduce the
> >> problem and verify the fix.
> > 
> >> --- a/drivers/bluetooth/hci_ldisc.c
> >> +++ b/drivers/bluetooth/hci_ldisc.c
> >> @@ -321,6 +321,8 @@ void hci_uart_set_flow_control(struct hci_uart *hu, bool enable)
> >>   		status = tty_set_termios(tty, &ktermios);
> >>   		BT_DBG("Disabling hardware flow control: %s",
> >>   		       status ? "failed" : "success");
> >> +		if (status)
> >> +			return;
> > 
> > Can that ldisc end up set on pty master?  And does it make any sense there?
> 
> The initial objective of the patch is to prevent the WARN_ON by making
> the change to return error instead of WARN_ON. However, without changes
> to places that don't check the return and keep making progress, there
> will be secondary problems.
> 
> Without this change to return here, instead of WARN_ON, it will fail
> with the following NULL pointer dereference at the next thing 
> hci_uart_set_flow_control() attempts.
> 
> status = tty->driver->ops->tiocmget(tty);
> 
> kernel: [10140.649783] BUG: unable to handle kernel NULL pointer 

That's a separate issue, which is being fixed:

	https://lkml.kernel.org/r/20190130095938.GP3691@localhost

> > IOW, I don't believe that this patch makes any sense.  If anything,
> > we need to prevent unconditional tty_set_termios() on the path
> > that *does* lead to calling it for pty.
> 
> I don't think preventing unconditional tty_set_termios() is enough to
> prevent secondary problems such as the one above.
> 
> For example, the following call chain leads to the WARN_ON that was
> reported. Even if void hci_uart_set_baudrate() prevents the very first
> tty_set_termios() call, its caller hci_uart_setup() continues with
> more tty setup. It goes ahead to call driver setup callback. The
> driver callback goes on to do more setup calling tty_set_termios().
> 
> WARN_ON call path:
>   hci_uart_set_baudrate+0x1cc/0x250 drivers/bluetooth/hci_ldisc.c:378
>   hci_uart_setup+0xa2/0x490 drivers/bluetooth/hci_ldisc.c:401
>   hci_dev_do_open+0x6b1/0x1920 net/bluetooth/hci_core.c:1423
> 
> Once this WARN_ON is changed to return error, the following
> happens, when hci_uart_setup() does driver setup callback.
> 
> kernel: [10140.649836]  mrvl_setup+0x17/0x80 [hci_uart]
> kernel: [10140.649840]  hci_uart_setup+0x56/0x160 [hci_uart]
> kernel: [10140.649850]  hci_dev_do_open+0xe6/0x630 [bluetooth]
> kernel: [10140.649860]  hci_power_on+0x52/0x220 [bluetooth]
> 
> I think continuing to catch the invalid condition in tty_set_termios()
> and preventing progress by checking return value is a straight forward
> change to avoid secondary problems, and it might be difficult to catch
> all the cases where it could fail.

I agree with Al that this change doesn't make much sense. The WARN_ON
is there to catch any bugs leading to the termios being changed for a
master side pty. Those should bugs should be fixed, and not worked
around in order to silence a WARN_ON.

The problem started with 7721383f4199 ("Bluetooth: hci_uart: Support
operational speed during setup") which introduced a new way for how
tty_set_termios() could end up being called for a master pty.

As Al hinted at, setting these ldiscs for a master pty really makes no
sense and perhaps that is what we should prevent unless simply making
sure they do not call tty_set_termios() is sufficient for the time
being.

Finally, note that serdev never operates on a pty, and that this is only
an issue for (the three) line disciplines.

Johan

^ permalink raw reply

* RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Pankaj Bansal @ 2019-01-30 10:37 UTC (permalink / raw)
  To: Pankaj Bansal, Shawn Guo, Leo Li, Andrew Lunn, Florian Fainelli
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20181115173752.22482-1-pankaj.bansal@nxp.com>

HI Shawn/Leo,

Can you please review this patch and include it in your tree ?

Regards,
Pankaj Bansal

-----Original Message-----
From: Pankaj Bansal [mailto:pankaj.bansal@nxp.com] 
Sent: Thursday, 15 November, 2018 05:42 PM
To: Shawn Guo <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>; Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Pankaj Bansal <pankaj.bansal@nxp.com>
Subject: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes

The two external MDIO buses used to communicate with phy devices that are external to SOC are muxed in LX2160AQDS board.

These buses can be routed to any one of the eight IO slots on LX2160AQDS board depending on value in fpga register 0x54.

Additionally the external MDIO1 is used to communicate to the onboard RGMII phy devices.

The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is controlled by bits 0-3 of fpga register.

Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
---

Notes:
    This patch depends on following patches:
    [1]https://patchwork.kernel.org/cover/10658863/
    [2]https://patchwork.codeaurora.org/patch/637861/

 .../boot/dts/freescale/fsl-lx2160a-qds.dts   | 116 +++++++++++++++++
 .../boot/dts/freescale/fsl-lx2160a.dtsi      |  23 ++++
 2 files changed, 139 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
index 8a0305a2b778..39aa2731ddfa 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
@@ -54,6 +54,121 @@
 &i2c0 {
 	status = "okay";
 
+	fpga@66 {
+		compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+		reg = <0x66>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		mdio-mux-1@54 {
+			mdio-parent-bus = <&emdio1>;
+			reg = <0x54>;		 /* BRDCFG4 */
+			mux-mask = <0xf8>;      /* EMI1_MDIO */
+			#address-cells=<1>;
+			#size-cells = <0>;
+
+			mdio@0 {
+				reg = <0x00>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@40 {
+				reg = <0x40>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@c0 {
+				reg = <0xc0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@c8 {
+				reg = <0xc8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@d0 {
+				reg = <0xd0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@d8 {
+				reg = <0xd8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@e0 {
+				reg = <0xe0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@e8 {
+				reg = <0xe8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@f0 {
+				reg = <0xf0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@f8 {
+				reg = <0xf8>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+		};
+
+		mdio-mux-2@54 {
+			mdio-parent-bus = <&emdio2>;
+			reg = <0x54>;		 /* BRDCFG4 */
+			mux-mask = <0x07>;      /* EMI2_MDIO */
+			#address-cells=<1>;
+			#size-cells = <0>;
+
+			mdio@0 {
+				reg = <0x00>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@1 {
+				reg = <0x01>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@2 {
+				reg = <0x02>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@3 {
+				reg = <0x03>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@4 {
+				reg = <0x04>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@5 {
+				reg = <0x05>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@6 {
+				reg = <0x06>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+			mdio@7 {
+				reg = <0x07>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+			};
+		};
+	};
+
 	i2c-mux@77 {
 		compatible = "nxp,pca9547";
 		reg = <0x77>;
@@ -118,3 +233,4 @@
 &usb1 {
 	status = "okay";
 };
+
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
index 6ce0677c3096..518882b05f03 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -780,5 +780,28 @@
 				     <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
 			dma-coherent;
 		};
+		/* TODO: WRIOP (CCSR?) */
+		/* WRIOP0: 0x8B8_0000, E-MDIO1: 0x1_6000 */
+		emdio1: mdio@0x8B96000 {
+			compatible = "fsl,fman-memac-mdio";
+			reg = <0x0 0x8B96000 0x0 0x1000>;
+			device_type = "mdio";	/* TODO: is this necessary? */
+			little-endian;	/* force the driver in LE mode */
+
+			/* Not necessary on the QDS, but needed on the RDB*/
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+		/* WRIOP0: 0x8B8_0000, E-MDIO2: 0x1_7000 */
+		emdio2: mdio@0x8B97000 {
+			compatible = "fsl,fman-memac-mdio";
+			reg = <0x0 0x8B97000 0x0 0x1000>;
+			device_type = "mdio";	/* TODO: is this necessary? */
+			little-endian;	/* force the driver in LE mode */
+
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
 	};
 };
+
--
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox