Netdev List
 help / color / mirror / Atom feed
* RESPONSE
From: Ms. Ella Golan @ 2018-04-30 13:44 UTC (permalink / raw)
  To: Recipients

I am Ms.Ella Golan, I am the Executive Vice President Banking Division with FIRST INTERNATIONAL BANK OF ISRAEL LTD (FIBI). I am getting in touch with you regarding an extremely important and urgent matter. If you would oblige me the opportunity, I shall provide you with details upon your response.

Faithfully,
Ms.Ella Golan 

^ permalink raw reply

* Re: [bug report] net/mlx5e: TLS, Add Innova TLS TX offload data path
From: Leon Romanovsky @ 2018-05-03 12:07 UTC (permalink / raw)
  To: Dan Carpenter, Saeed Mahameed, Boris Pismenny; +Cc: linux-rdma, netdev
In-Reply-To: <20180503112300.GA6309@mwanda>

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

+ Saeed, Boris and netdev

On Thu, May 03, 2018 at 02:23:00PM +0300, Dan Carpenter wrote:
> Hello Ilya Lesokhin,
>
> The patch bf23974104fa: "net/mlx5e: TLS, Add Innova TLS TX offload
> data path" from Apr 30, 2018, leads to the following static checker
> warning:
>
> 	drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:63 mlx5e_tls_add_metadata()
> 	warn: struct type mismatch 'ethhdr vs mlx5e_tls_metadata'
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
>     55  static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 swid)
>     56  {
>     57          struct mlx5e_tls_metadata *pet;
>     58          struct ethhdr *eth;
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     59
>     60          if (skb_cow_head(skb, sizeof(struct mlx5e_tls_metadata)))
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     61                  return -ENOMEM;
>     62
>     63          eth = (struct ethhdr *)skb_push(skb, sizeof(struct mlx5e_tls_metadata));
>                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This feels like it should be sizeof(*eth) + sizeof(*pet)?
>
>     64          skb->mac_header -= sizeof(struct mlx5e_tls_metadata);
>     65          pet = (struct mlx5e_tls_metadata *)(eth + 1);
>     66
>     67          memmove(skb->data, skb->data + sizeof(struct mlx5e_tls_metadata),
>     68                  2 * ETH_ALEN);
>     69
>     70          eth->h_proto = cpu_to_be16(MLX5E_METADATA_ETHER_TYPE);
>     71          pet->syndrome_swid = htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
>     72
>     73          return 0;
>     74  }
>
> regards,
> dan carpenter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] sctp: fix a potential missing-check bug
From: Wenwen Wang @ 2018-05-03 12:01 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Kangjie Lu, Vlad Yasevich, Neil Horman, David S. Miller,
	open list:SCTP PROTOCOL, open list:NETWORKING [GENERAL],
	open list, Wenwen Wang
In-Reply-To: <20180503014838.GL5105@localhost.localdomain>

On Wed, May 2, 2018 at 8:48 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Wed, May 02, 2018 at 08:27:05PM -0500, Wenwen Wang wrote:
>> On Wed, May 2, 2018 at 8:24 PM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Wed, May 02, 2018 at 08:15:45PM -0500, Wenwen Wang wrote:
>> >> In sctp_setsockopt_maxseg(), the integer 'val' is compared against min_len
>> >> and max_len to check whether it is in the appropriate range. If it is not,
>> >> an error code -EINVAL will be returned. This is enforced by a security
>> >> check. But, this check is only executed when 'val' is not 0. In fact, if
>> >> 'val' is 0, it will be assigned with a new value (if the return value of
>> >> the function sctp_id2assoc() is not 0) in the following execution. However,
>> >> this new value of 'val' is not checked before it is used to assigned to
>> >> asoc->user_frag. That means it is possible that the new value of 'val'
>> >> could be out of the expected range. This can cause security issues
>> >> such as buffer overflows, e.g., the new value of 'val' is used as an index
>> >> to access a buffer.
>> >>
>> >> This patch inserts a check for the new value of 'val' to see if it is in
>> >> the expected range. If it is not, an error code -EINVAL will be returned.
>> >>
>> >> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
>> >> ---
>> >>  net/sctp/socket.c | 22 +++++++++++-----------
>> >>  1 file changed, 11 insertions(+), 11 deletions(-)
>> >
>> > ?
>> > This patch is the same as previous one. git send-email <old file>
>> > maybe?
>> >
>> >   Marcelo
>>
>> Thanks for your suggestion, Marcelo. I can send the old file. But, I
>> have added a line of comment in this patch.
>
> I meant if you had sent the old patch again by accident, because you
> said you worked on an old version of the tree, but then posted a patch
> that also doesn't use the new MTU function I mentioned.
>
>   Marcelo

I worked on the latest kernel. But, I didn't find the MTU function
sctp_mtu_payload().

The problematic function that I found is sctp_setsockopt_maxseg()
located in the file net/sctp/socket.c.

Thanks,

Wenwen

^ permalink raw reply

* [PATCH net-next 4/4] mlxsw: pci: Check number of CQEs for CQE version 2
From: Ido Schimmel @ 2018-05-03 11:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20180503115942.8279-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Check number of CQEs for CQE version 2 reported by QUERY_AQ_CAP command.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/cmd.h | 7 ++++++-
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 4 ++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/cmd.h b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
index 669226b0759c..8da91b023b13 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/cmd.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
@@ -424,10 +424,15 @@ MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_rdq_sz, 0x04, 24, 8);
 MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_num_rdqs, 0x04, 0, 8);
 
 /* cmd_mbox_query_aq_cap_log_max_cq_sz
- * Log (base 2) of max CQEs allowed on CQ.
+ * Log (base 2) of the Maximum CQEs allowed in a CQ for CQEv0 and CQEv1.
  */
 MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_cq_sz, 0x08, 24, 8);
 
+/* cmd_mbox_query_aq_cap_log_max_cqv2_sz
+ * Log (base 2) of the Maximum CQEs allowed in a CQ for CQEv2.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_cqv2_sz, 0x08, 16, 8);
+
 /* cmd_mbox_query_aq_cap_max_num_cqs
  * Maximum number of CQs.
  */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index e9ce0e27aa9c..db794a1a3a7e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -959,6 +959,7 @@ static int mlxsw_pci_aqs_init(struct mlxsw_pci *mlxsw_pci, char *mbox)
 	u8 rdq_log2sz;
 	u8 num_cqs;
 	u8 cq_log2sz;
+	u8 cqv2_log2sz;
 	u8 num_eqs;
 	u8 eq_log2sz;
 	int err;
@@ -974,6 +975,7 @@ static int mlxsw_pci_aqs_init(struct mlxsw_pci *mlxsw_pci, char *mbox)
 	rdq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_rdq_sz_get(mbox);
 	num_cqs = mlxsw_cmd_mbox_query_aq_cap_max_num_cqs_get(mbox);
 	cq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_cq_sz_get(mbox);
+	cqv2_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_cqv2_sz_get(mbox);
 	num_eqs = mlxsw_cmd_mbox_query_aq_cap_max_num_eqs_get(mbox);
 	eq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_eq_sz_get(mbox);
 
@@ -986,6 +988,8 @@ static int mlxsw_pci_aqs_init(struct mlxsw_pci *mlxsw_pci, char *mbox)
 	if ((1 << sdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
 	    (1 << rdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
 	    (1 << cq_log2sz != MLXSW_PCI_CQE01_COUNT) ||
+	    (mlxsw_pci->max_cqe_ver == MLXSW_PCI_CQE_V2 &&
+	     (1 << cqv2_log2sz != MLXSW_PCI_CQE2_COUNT)) ||
 	    (1 << eq_log2sz != MLXSW_PCI_EQE_COUNT)) {
 		dev_err(&pdev->dev, "Unsupported number of async queue descriptors\n");
 		return -EINVAL;
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 3/4] mlxsw: pci: Allow to use CQEs of version 1 and version 2
From: Ido Schimmel @ 2018-05-03 11:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20180503115942.8279-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Use previously added resources to query FW support for multiple versions
of CQEs. Use the biggest version supported. For SDQs, it has no sense to
use version 2 as it does not introduce any new features, but it is
twice the size of CQE version 1.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/cmd.h | 24 ++++++++--
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 78 +++++++++++++++++++++++++++----
 2 files changed, 91 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/cmd.h b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
index 479511cf79bc..669226b0759c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/cmd.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
@@ -662,6 +662,12 @@ MLXSW_ITEM32(cmd_mbox, config_profile, set_kvd_hash_single_size, 0x0C, 25, 1);
  */
 MLXSW_ITEM32(cmd_mbox, config_profile, set_kvd_hash_double_size, 0x0C, 26, 1);
 
+/* cmd_mbox_config_set_cqe_version
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_cqe_version, 0x08, 0, 1);
+
 /* cmd_mbox_config_profile_max_vepa_channels
  * Maximum number of VEPA channels per port (0 through 16)
  * 0 - multi-channel VEPA is disabled
@@ -841,6 +847,14 @@ MLXSW_ITEM32_INDEXED(cmd_mbox, config_profile, swid_config_type,
 MLXSW_ITEM32_INDEXED(cmd_mbox, config_profile, swid_config_properties,
 		     0x60, 0, 8, 0x08, 0x00, false);
 
+/* cmd_mbox_config_profile_cqe_version
+ * CQE version:
+ * 0: CQE version is 0
+ * 1: CQE version is either 1 or 2
+ * CQE ver 1 or 2 is configured by Completion Queue Context field cqe_ver.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, cqe_version, 0xB0, 0, 8);
+
 /* ACCESS_REG - Access EMAD Supported Register
  * ----------------------------------
  * OpMod == 0 (N/A), INMmod == 0 (N/A)
@@ -1032,11 +1046,15 @@ static inline int mlxsw_cmd_sw2hw_cq(struct mlxsw_core *mlxsw_core,
 				 0, cq_number, in_mbox, MLXSW_CMD_MBOX_SIZE);
 }
 
-/* cmd_mbox_sw2hw_cq_cv
+enum mlxsw_cmd_mbox_sw2hw_cq_cqe_ver {
+	MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_1,
+	MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_2,
+};
+
+/* cmd_mbox_sw2hw_cq_cqe_ver
  * CQE Version.
- * 0 - CQE Version 0, 1 - CQE Version 1
  */
-MLXSW_ITEM32(cmd_mbox, sw2hw_cq, cv, 0x00, 28, 4);
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, cqe_ver, 0x00, 28, 4);
 
 /* cmd_mbox_sw2hw_cq_c_eqn
  * Event Queue this CQ reports completion events to.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 24686ba45729..e9ce0e27aa9c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -156,6 +156,8 @@ struct mlxsw_pci {
 	} cmd;
 	struct mlxsw_bus_info bus_info;
 	const struct pci_device_id *id;
+	enum mlxsw_pci_cqe_v max_cqe_ver; /* Maximal supported CQE version */
+	u8 num_sdq_cqs; /* Number of CQs used for SDQs */
 };
 
 static void mlxsw_pci_queue_tasklet_schedule(struct mlxsw_pci_queue *q)
@@ -477,6 +479,17 @@ static void mlxsw_pci_rdq_fini(struct mlxsw_pci *mlxsw_pci,
 	}
 }
 
+static void mlxsw_pci_cq_pre_init(struct mlxsw_pci *mlxsw_pci,
+				  struct mlxsw_pci_queue *q)
+{
+	q->u.cq.v = mlxsw_pci->max_cqe_ver;
+
+	/* For SDQ it is pointless to use CQEv2, so use CQEv1 instead */
+	if (q->u.cq.v == MLXSW_PCI_CQE_V2 &&
+	    q->num < mlxsw_pci->num_sdq_cqs)
+		q->u.cq.v = MLXSW_PCI_CQE_V1;
+}
+
 static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 			     struct mlxsw_pci_queue *q)
 {
@@ -491,7 +504,13 @@ static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 		mlxsw_pci_cqe_owner_set(q->u.cq.v, elem, 1);
 	}
 
-	mlxsw_cmd_mbox_sw2hw_cq_cv_set(mbox, 0); /* CQE ver 0 */
+	if (q->u.cq.v == MLXSW_PCI_CQE_V1)
+		mlxsw_cmd_mbox_sw2hw_cq_cqe_ver_set(mbox,
+				MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_1);
+	else if (q->u.cq.v == MLXSW_PCI_CQE_V2)
+		mlxsw_cmd_mbox_sw2hw_cq_cqe_ver_set(mbox,
+				MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_2);
+
 	mlxsw_cmd_mbox_sw2hw_cq_c_eqn_set(mbox, MLXSW_PCI_EQ_COMP_NUM);
 	mlxsw_cmd_mbox_sw2hw_cq_st_set(mbox, 0);
 	mlxsw_cmd_mbox_sw2hw_cq_log_cq_size_set(mbox, ilog2(q->count));
@@ -643,6 +662,18 @@ static void mlxsw_pci_cq_tasklet(unsigned long data)
 	}
 }
 
+static u16 mlxsw_pci_cq_elem_count(const struct mlxsw_pci_queue *q)
+{
+	return q->u.cq.v == MLXSW_PCI_CQE_V2 ? MLXSW_PCI_CQE2_COUNT :
+					       MLXSW_PCI_CQE01_COUNT;
+}
+
+static u8 mlxsw_pci_cq_elem_size(const struct mlxsw_pci_queue *q)
+{
+	return q->u.cq.v == MLXSW_PCI_CQE_V2 ? MLXSW_PCI_CQE2_SIZE :
+					       MLXSW_PCI_CQE01_SIZE;
+}
+
 static int mlxsw_pci_eq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 			     struct mlxsw_pci_queue *q)
 {
@@ -755,11 +786,15 @@ static void mlxsw_pci_eq_tasklet(unsigned long data)
 struct mlxsw_pci_queue_ops {
 	const char *name;
 	enum mlxsw_pci_queue_type type;
+	void (*pre_init)(struct mlxsw_pci *mlxsw_pci,
+			 struct mlxsw_pci_queue *q);
 	int (*init)(struct mlxsw_pci *mlxsw_pci, char *mbox,
 		    struct mlxsw_pci_queue *q);
 	void (*fini)(struct mlxsw_pci *mlxsw_pci,
 		     struct mlxsw_pci_queue *q);
 	void (*tasklet)(unsigned long data);
+	u16 (*elem_count_f)(const struct mlxsw_pci_queue *q);
+	u8 (*elem_size_f)(const struct mlxsw_pci_queue *q);
 	u16 elem_count;
 	u8 elem_size;
 };
@@ -782,11 +817,12 @@ static const struct mlxsw_pci_queue_ops mlxsw_pci_rdq_ops = {
 
 static const struct mlxsw_pci_queue_ops mlxsw_pci_cq_ops = {
 	.type		= MLXSW_PCI_QUEUE_TYPE_CQ,
+	.pre_init	= mlxsw_pci_cq_pre_init,
 	.init		= mlxsw_pci_cq_init,
 	.fini		= mlxsw_pci_cq_fini,
 	.tasklet	= mlxsw_pci_cq_tasklet,
-	.elem_count	= MLXSW_PCI_CQE01_COUNT,
-	.elem_size	= MLXSW_PCI_CQE01_SIZE
+	.elem_count_f	= mlxsw_pci_cq_elem_count,
+	.elem_size_f	= mlxsw_pci_cq_elem_size
 };
 
 static const struct mlxsw_pci_queue_ops mlxsw_pci_eq_ops = {
@@ -806,12 +842,15 @@ static int mlxsw_pci_queue_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 	int i;
 	int err;
 
-	q->u.cq.v = MLXSW_PCI_CQE_V0;
+	q->num = q_num;
+	if (q_ops->pre_init)
+		q_ops->pre_init(mlxsw_pci, q);
 
 	spin_lock_init(&q->lock);
-	q->num = q_num;
-	q->count = q_ops->elem_count;
-	q->elem_size = q_ops->elem_size;
+	q->count = q_ops->elem_count_f ? q_ops->elem_count_f(q) :
+					 q_ops->elem_count;
+	q->elem_size = q_ops->elem_size_f ? q_ops->elem_size_f(q) :
+					    q_ops->elem_size;
 	q->type = q_ops->type;
 	q->pci = mlxsw_pci;
 
@@ -840,7 +879,7 @@ static int mlxsw_pci_queue_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 
 		elem_info = mlxsw_pci_queue_elem_info_get(q, i);
 		elem_info->elem =
-			__mlxsw_pci_queue_elem_get(q, q_ops->elem_size, i);
+			__mlxsw_pci_queue_elem_get(q, q->elem_size, i);
 	}
 
 	mlxsw_cmd_mbox_zero(mbox);
@@ -952,6 +991,8 @@ static int mlxsw_pci_aqs_init(struct mlxsw_pci *mlxsw_pci, char *mbox)
 		return -EINVAL;
 	}
 
+	mlxsw_pci->num_sdq_cqs = num_sdqs;
+
 	err = mlxsw_pci_queue_group_init(mlxsw_pci, mbox, &mlxsw_pci_eq_ops,
 					 num_eqs);
 	if (err) {
@@ -1192,6 +1233,11 @@ static int mlxsw_pci_config_profile(struct mlxsw_pci *mlxsw_pci, char *mbox,
 		mlxsw_pci_config_profile_swid_config(mlxsw_pci, mbox, i,
 						     &profile->swid_config[i]);
 
+	if (mlxsw_pci->max_cqe_ver > MLXSW_PCI_CQE_V0) {
+		mlxsw_cmd_mbox_config_profile_set_cqe_version_set(mbox, 1);
+		mlxsw_cmd_mbox_config_profile_cqe_version_set(mbox, 1);
+	}
+
 	return mlxsw_cmd_config_profile_set(mlxsw_pci->core, mbox);
 }
 
@@ -1386,6 +1432,21 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
 	if (err)
 		goto err_query_resources;
 
+	if (MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V2) &&
+	    MLXSW_CORE_RES_GET(mlxsw_core, CQE_V2))
+		mlxsw_pci->max_cqe_ver = MLXSW_PCI_CQE_V2;
+	else if (MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V1) &&
+		 MLXSW_CORE_RES_GET(mlxsw_core, CQE_V1))
+		mlxsw_pci->max_cqe_ver = MLXSW_PCI_CQE_V1;
+	else if ((MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V0) &&
+		  MLXSW_CORE_RES_GET(mlxsw_core, CQE_V0)) ||
+		 !MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V0)) {
+		mlxsw_pci->max_cqe_ver = MLXSW_PCI_CQE_V0;
+	} else {
+		dev_err(&pdev->dev, "Invalid supported CQE version combination reported\n");
+		goto err_cqe_v_check;
+	}
+
 	err = mlxsw_pci_config_profile(mlxsw_pci, mbox, profile, res);
 	if (err)
 		goto err_config_profile;
@@ -1408,6 +1469,7 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
 	mlxsw_pci_aqs_fini(mlxsw_pci);
 err_aqs_init:
 err_config_profile:
+err_cqe_v_check:
 err_query_resources:
 err_boardinfo:
 	mlxsw_pci_fw_area_fini(mlxsw_pci);
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 2/4] mlxsw: pci: Introduce helpers to work with multiple CQE versions
From: Ido Schimmel @ 2018-05-03 11:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20180503115942.8279-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Introduce definitions of fields in CQE version 1 and 2. Also, introduce
common helpers that would call appropriate version-specific helpers
according to the version enum passed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/pci.c    | 72 +++++++++++++++------------
 drivers/net/ethernet/mellanox/mlxsw/pci_hw.h | 74 ++++++++++++++++++++++++----
 2 files changed, 104 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 3a9381977d6d..24686ba45729 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -117,6 +117,7 @@ struct mlxsw_pci_queue {
 		struct {
 			u32 comp_sdq_count;
 			u32 comp_rdq_count;
+			enum mlxsw_pci_cqe_v v;
 		} cq;
 		struct {
 			u32 ev_cmd_count;
@@ -202,24 +203,6 @@ static bool mlxsw_pci_elem_hw_owned(struct mlxsw_pci_queue *q, bool owner_bit)
 	return owner_bit != !!(q->consumer_counter & q->count);
 }
 
-static char *
-mlxsw_pci_queue_sw_elem_get(struct mlxsw_pci_queue *q,
-			    u32 (*get_elem_owner_func)(const char *))
-{
-	struct mlxsw_pci_queue_elem_info *elem_info;
-	char *elem;
-	bool owner_bit;
-
-	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
-	elem = elem_info->elem;
-	owner_bit = get_elem_owner_func(elem);
-	if (mlxsw_pci_elem_hw_owned(q, owner_bit))
-		return NULL;
-	q->consumer_counter++;
-	rmb(); /* make sure we read owned bit before the rest of elem */
-	return elem;
-}
-
 static struct mlxsw_pci_queue_type_group *
 mlxsw_pci_queue_type_group_get(struct mlxsw_pci *mlxsw_pci,
 			       enum mlxsw_pci_queue_type q_type)
@@ -505,7 +488,7 @@ static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 	for (i = 0; i < q->count; i++) {
 		char *elem = mlxsw_pci_queue_elem_get(q, i);
 
-		mlxsw_pci_cqe_owner_set(elem, 1);
+		mlxsw_pci_cqe_owner_set(q->u.cq.v, elem, 1);
 	}
 
 	mlxsw_cmd_mbox_sw2hw_cq_cv_set(mbox, 0); /* CQE ver 0 */
@@ -559,7 +542,7 @@ static void mlxsw_pci_cqe_sdq_handle(struct mlxsw_pci *mlxsw_pci,
 static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
 				     struct mlxsw_pci_queue *q,
 				     u16 consumer_counter_limit,
-				     char *cqe)
+				     enum mlxsw_pci_cqe_v cqe_v, char *cqe)
 {
 	struct pci_dev *pdev = mlxsw_pci->pdev;
 	struct mlxsw_pci_queue_elem_info *elem_info;
@@ -579,10 +562,11 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
 	if (q->consumer_counter++ != consumer_counter_limit)
 		dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
 
-	if (mlxsw_pci_cqe_lag_get(cqe)) {
+	if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
 		rx_info.is_lag = true;
-		rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe);
-		rx_info.lag_port_index = mlxsw_pci_cqe_lag_port_index_get(cqe);
+		rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
+		rx_info.lag_port_index =
+			mlxsw_pci_cqe_lag_subport_get(cqe_v, cqe);
 	} else {
 		rx_info.is_lag = false;
 		rx_info.u.sys_port = mlxsw_pci_cqe_system_port_get(cqe);
@@ -591,7 +575,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
 	rx_info.trap_id = mlxsw_pci_cqe_trap_id_get(cqe);
 
 	byte_count = mlxsw_pci_cqe_byte_count_get(cqe);
-	if (mlxsw_pci_cqe_crc_get(cqe))
+	if (mlxsw_pci_cqe_crc_get(cqe_v, cqe))
 		byte_count -= ETH_FCS_LEN;
 	skb_put(skb, byte_count);
 	mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
@@ -608,7 +592,18 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
 
 static char *mlxsw_pci_cq_sw_cqe_get(struct mlxsw_pci_queue *q)
 {
-	return mlxsw_pci_queue_sw_elem_get(q, mlxsw_pci_cqe_owner_get);
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	char *elem;
+	bool owner_bit;
+
+	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+	elem = elem_info->elem;
+	owner_bit = mlxsw_pci_cqe_owner_get(q->u.cq.v, elem);
+	if (mlxsw_pci_elem_hw_owned(q, owner_bit))
+		return NULL;
+	q->consumer_counter++;
+	rmb(); /* make sure we read owned bit before the rest of elem */
+	return elem;
 }
 
 static void mlxsw_pci_cq_tasklet(unsigned long data)
@@ -621,8 +616,8 @@ static void mlxsw_pci_cq_tasklet(unsigned long data)
 
 	while ((cqe = mlxsw_pci_cq_sw_cqe_get(q))) {
 		u16 wqe_counter = mlxsw_pci_cqe_wqe_counter_get(cqe);
-		u8 sendq = mlxsw_pci_cqe_sr_get(cqe);
-		u8 dqn = mlxsw_pci_cqe_dqn_get(cqe);
+		u8 sendq = mlxsw_pci_cqe_sr_get(q->u.cq.v, cqe);
+		u8 dqn = mlxsw_pci_cqe_dqn_get(q->u.cq.v, cqe);
 
 		if (sendq) {
 			struct mlxsw_pci_queue *sdq;
@@ -636,7 +631,7 @@ static void mlxsw_pci_cq_tasklet(unsigned long data)
 
 			rdq = mlxsw_pci_rdq_get(mlxsw_pci, dqn);
 			mlxsw_pci_cqe_rdq_handle(mlxsw_pci, rdq,
-						 wqe_counter, cqe);
+						 wqe_counter, q->u.cq.v, cqe);
 			q->u.cq.comp_rdq_count++;
 		}
 		if (++items == credits)
@@ -696,7 +691,18 @@ static void mlxsw_pci_eq_cmd_event(struct mlxsw_pci *mlxsw_pci, char *eqe)
 
 static char *mlxsw_pci_eq_sw_eqe_get(struct mlxsw_pci_queue *q)
 {
-	return mlxsw_pci_queue_sw_elem_get(q, mlxsw_pci_eqe_owner_get);
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	char *elem;
+	bool owner_bit;
+
+	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+	elem = elem_info->elem;
+	owner_bit = mlxsw_pci_eqe_owner_get(elem);
+	if (mlxsw_pci_elem_hw_owned(q, owner_bit))
+		return NULL;
+	q->consumer_counter++;
+	rmb(); /* make sure we read owned bit before the rest of elem */
+	return elem;
 }
 
 static void mlxsw_pci_eq_tasklet(unsigned long data)
@@ -779,8 +785,8 @@ static const struct mlxsw_pci_queue_ops mlxsw_pci_cq_ops = {
 	.init		= mlxsw_pci_cq_init,
 	.fini		= mlxsw_pci_cq_fini,
 	.tasklet	= mlxsw_pci_cq_tasklet,
-	.elem_count	= MLXSW_PCI_CQE_COUNT,
-	.elem_size	= MLXSW_PCI_CQE_SIZE
+	.elem_count	= MLXSW_PCI_CQE01_COUNT,
+	.elem_size	= MLXSW_PCI_CQE01_SIZE
 };
 
 static const struct mlxsw_pci_queue_ops mlxsw_pci_eq_ops = {
@@ -800,6 +806,8 @@ static int mlxsw_pci_queue_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 	int i;
 	int err;
 
+	q->u.cq.v = MLXSW_PCI_CQE_V0;
+
 	spin_lock_init(&q->lock);
 	q->num = q_num;
 	q->count = q_ops->elem_count;
@@ -938,7 +946,7 @@ static int mlxsw_pci_aqs_init(struct mlxsw_pci *mlxsw_pci, char *mbox)
 
 	if ((1 << sdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
 	    (1 << rdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
-	    (1 << cq_log2sz != MLXSW_PCI_CQE_COUNT) ||
+	    (1 << cq_log2sz != MLXSW_PCI_CQE01_COUNT) ||
 	    (1 << eq_log2sz != MLXSW_PCI_EQE_COUNT)) {
 		dev_err(&pdev->dev, "Unsupported number of async queue descriptors\n");
 		return -EINVAL;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
index fb082ad21b00..963155f6a17a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
@@ -82,10 +82,12 @@
 #define MLXSW_PCI_AQ_PAGES	8
 #define MLXSW_PCI_AQ_SIZE	(MLXSW_PCI_PAGE_SIZE * MLXSW_PCI_AQ_PAGES)
 #define MLXSW_PCI_WQE_SIZE	32 /* 32 bytes per element */
-#define MLXSW_PCI_CQE_SIZE	16 /* 16 bytes per element */
+#define MLXSW_PCI_CQE01_SIZE	16 /* 16 bytes per element */
+#define MLXSW_PCI_CQE2_SIZE	32 /* 32 bytes per element */
 #define MLXSW_PCI_EQE_SIZE	16 /* 16 bytes per element */
 #define MLXSW_PCI_WQE_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_WQE_SIZE)
-#define MLXSW_PCI_CQE_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE_SIZE)
+#define MLXSW_PCI_CQE01_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE01_SIZE)
+#define MLXSW_PCI_CQE2_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE2_SIZE)
 #define MLXSW_PCI_EQE_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_EQE_SIZE)
 #define MLXSW_PCI_EQE_UPDATE_COUNT	0x80
 
@@ -126,10 +128,48 @@ MLXSW_ITEM16_INDEXED(pci, wqe, byte_count, 0x02, 0, 14, 0x02, 0x00, false);
  */
 MLXSW_ITEM64_INDEXED(pci, wqe, address, 0x08, 0, 64, 0x8, 0x0, false);
 
+enum mlxsw_pci_cqe_v {
+	MLXSW_PCI_CQE_V0,
+	MLXSW_PCI_CQE_V1,
+	MLXSW_PCI_CQE_V2,
+};
+
+#define mlxsw_pci_cqe_item_helpers(name, v0, v1, v2)				\
+static inline u32 mlxsw_pci_cqe_##name##_get(enum mlxsw_pci_cqe_v v, char *cqe)	\
+{										\
+	switch (v) {								\
+	default:								\
+	case MLXSW_PCI_CQE_V0:							\
+		return mlxsw_pci_cqe##v0##_##name##_get(cqe);			\
+	case MLXSW_PCI_CQE_V1:							\
+		return mlxsw_pci_cqe##v1##_##name##_get(cqe);			\
+	case MLXSW_PCI_CQE_V2:							\
+		return mlxsw_pci_cqe##v2##_##name##_get(cqe);			\
+	}									\
+}										\
+static inline void mlxsw_pci_cqe_##name##_set(enum mlxsw_pci_cqe_v v,		\
+					      char *cqe, u32 val)		\
+{										\
+	switch (v) {								\
+	default:								\
+	case MLXSW_PCI_CQE_V0:							\
+		mlxsw_pci_cqe##v0##_##name##_set(cqe, val);			\
+		break;								\
+	case MLXSW_PCI_CQE_V1:							\
+		mlxsw_pci_cqe##v1##_##name##_set(cqe, val);			\
+		break;								\
+	case MLXSW_PCI_CQE_V2:							\
+		mlxsw_pci_cqe##v2##_##name##_set(cqe, val);			\
+		break;								\
+	}									\
+}
+
 /* pci_cqe_lag
  * Packet arrives from a port which is a LAG
  */
-MLXSW_ITEM32(pci, cqe, lag, 0x00, 23, 1);
+MLXSW_ITEM32(pci, cqe0, lag, 0x00, 23, 1);
+MLXSW_ITEM32(pci, cqe12, lag, 0x00, 24, 1);
+mlxsw_pci_cqe_item_helpers(lag, 0, 12, 12);
 
 /* pci_cqe_system_port/lag_id
  * When lag=0: System port on which the packet was received
@@ -138,8 +178,12 @@ MLXSW_ITEM32(pci, cqe, lag, 0x00, 23, 1);
  * bits [3:0] sub_port on which the packet was received
  */
 MLXSW_ITEM32(pci, cqe, system_port, 0x00, 0, 16);
-MLXSW_ITEM32(pci, cqe, lag_id, 0x00, 4, 12);
-MLXSW_ITEM32(pci, cqe, lag_port_index, 0x00, 0, 4);
+MLXSW_ITEM32(pci, cqe0, lag_id, 0x00, 4, 12);
+MLXSW_ITEM32(pci, cqe12, lag_id, 0x00, 0, 16);
+mlxsw_pci_cqe_item_helpers(lag_id, 0, 12, 12);
+MLXSW_ITEM32(pci, cqe0, lag_subport, 0x00, 0, 4);
+MLXSW_ITEM32(pci, cqe12, lag_subport, 0x00, 16, 8);
+mlxsw_pci_cqe_item_helpers(lag_subport, 0, 12, 12);
 
 /* pci_cqe_wqe_counter
  * WQE count of the WQEs completed on the associated dqn
@@ -162,28 +206,38 @@ MLXSW_ITEM32(pci, cqe, trap_id, 0x08, 0, 9);
  * Length include CRC. Indicates the length field includes
  * the packet's CRC.
  */
-MLXSW_ITEM32(pci, cqe, crc, 0x0C, 8, 1);
+MLXSW_ITEM32(pci, cqe0, crc, 0x0C, 8, 1);
+MLXSW_ITEM32(pci, cqe12, crc, 0x0C, 9, 1);
+mlxsw_pci_cqe_item_helpers(crc, 0, 12, 12);
 
 /* pci_cqe_e
  * CQE with Error.
  */
-MLXSW_ITEM32(pci, cqe, e, 0x0C, 7, 1);
+MLXSW_ITEM32(pci, cqe0, e, 0x0C, 7, 1);
+MLXSW_ITEM32(pci, cqe12, e, 0x00, 27, 1);
+mlxsw_pci_cqe_item_helpers(e, 0, 12, 12);
 
 /* pci_cqe_sr
  * 1 - Send Queue
  * 0 - Receive Queue
  */
-MLXSW_ITEM32(pci, cqe, sr, 0x0C, 6, 1);
+MLXSW_ITEM32(pci, cqe0, sr, 0x0C, 6, 1);
+MLXSW_ITEM32(pci, cqe12, sr, 0x00, 26, 1);
+mlxsw_pci_cqe_item_helpers(sr, 0, 12, 12);
 
 /* pci_cqe_dqn
  * Descriptor Queue (DQ) Number.
  */
-MLXSW_ITEM32(pci, cqe, dqn, 0x0C, 1, 5);
+MLXSW_ITEM32(pci, cqe0, dqn, 0x0C, 1, 5);
+MLXSW_ITEM32(pci, cqe12, dqn, 0x0C, 1, 6);
+mlxsw_pci_cqe_item_helpers(dqn, 0, 12, 12);
 
 /* pci_cqe_owner
  * Ownership bit.
  */
-MLXSW_ITEM32(pci, cqe, owner, 0x0C, 0, 1);
+MLXSW_ITEM32(pci, cqe01, owner, 0x0C, 0, 1);
+MLXSW_ITEM32(pci, cqe2, owner, 0x1C, 0, 1);
+mlxsw_pci_cqe_item_helpers(owner, 01, 01, 2);
 
 /* pci_eqe_event_type
  * Event type.
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 1/4] mlxsw: resources: Add CQE versions resources
From: Ido Schimmel @ 2018-05-03 11:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20180503115942.8279-1-idosch@mellanox.com>

From: Jiri Pirko <jiri@mellanox.com>

Add resources that FW uses to report supported CQE versions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/resources.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/resources.h b/drivers/net/ethernet/mellanox/mlxsw/resources.h
index 087aad52c195..fd9299ccec72 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/resources.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/resources.h
@@ -43,6 +43,9 @@ enum mlxsw_res_id {
 	MLXSW_RES_ID_KVD_SINGLE_MIN_SIZE,
 	MLXSW_RES_ID_KVD_DOUBLE_MIN_SIZE,
 	MLXSW_RES_ID_MAX_TRAP_GROUPS,
+	MLXSW_RES_ID_CQE_V0,
+	MLXSW_RES_ID_CQE_V1,
+	MLXSW_RES_ID_CQE_V2,
 	MLXSW_RES_ID_COUNTER_POOL_SIZE,
 	MLXSW_RES_ID_MAX_SPAN,
 	MLXSW_RES_ID_COUNTER_SIZE_PACKETS_BYTES,
@@ -81,6 +84,9 @@ static u16 mlxsw_res_ids[] = {
 	[MLXSW_RES_ID_KVD_SINGLE_MIN_SIZE] = 0x1002,
 	[MLXSW_RES_ID_KVD_DOUBLE_MIN_SIZE] = 0x1003,
 	[MLXSW_RES_ID_MAX_TRAP_GROUPS] = 0x2201,
+	[MLXSW_RES_ID_CQE_V0] = 0x2210,
+	[MLXSW_RES_ID_CQE_V1] = 0x2211,
+	[MLXSW_RES_ID_CQE_V2] = 0x2212,
 	[MLXSW_RES_ID_COUNTER_POOL_SIZE] = 0x2410,
 	[MLXSW_RES_ID_MAX_SPAN] = 0x2420,
 	[MLXSW_RES_ID_COUNTER_SIZE_PACKETS_BYTES] = 0x2443,
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 0/4] mlxsw: Introduce support for CQEv1/2
From: Ido Schimmel @ 2018-05-03 11:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, mlxsw, Ido Schimmel

Jiri says:

Current SwitchX2 and Spectrum FWs support CQEv0 and that is what we
implement in mlxsw. Spectrum FW also supports CQE v1 and v2.
However, Spectrum-2 won't support CQEv0. Prepare for it and setup the
CQE versions to use according to what is queried from FW.

Jiri Pirko (4):
  mlxsw: resources: Add CQE versions resources
  mlxsw: pci: Introduce helpers to work with multiple CQE versions
  mlxsw: pci: Allow to use CQEs of version 1 and version 2
  mlxsw: pci: Check number of CQEs for CQE version 2

 drivers/net/ethernet/mellanox/mlxsw/cmd.h       |  31 ++++-
 drivers/net/ethernet/mellanox/mlxsw/pci.c       | 148 ++++++++++++++++++------
 drivers/net/ethernet/mellanox/mlxsw/pci_hw.h    |  74 ++++++++++--
 drivers/net/ethernet/mellanox/mlxsw/resources.h |   6 +
 4 files changed, 208 insertions(+), 51 deletions(-)

-- 
2.14.3

^ permalink raw reply

* Re: [PATCH] sctp: fix a potential missing-check bug
From: Neil Horman @ 2018-05-03 11:23 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Marcelo Ricardo Leitner, Kangjie Lu, Vlad Yasevich,
	David S. Miller, open list:SCTP PROTOCOL,
	open list:NETWORKING [GENERAL], open list
In-Reply-To: <CAAa=b7d_dHFUudPdJr3YuDaYRzNvAgHd0Lcubw3Ks94O-2kb3w@mail.gmail.com>

On Wed, May 02, 2018 at 08:07:17PM -0500, Wenwen Wang wrote:
> Hi Marcelo,
> 
> I guess I worked on an old version of the kernel. I will re-submit the
> patch. Sorry :(
> 
You don't have to resubmit the patch, this isn't broken.  As marcelo points out,
a value of zero in this socket option is special, meaning set the fragmentation
to whatever the pmtu is, which will always rest between the min and max segment
lengths.

Neil

> Wenwen
> 
> On Wed, May 2, 2018 at 6:23 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > Hi Wenwen,
> >
> > On Wed, May 02, 2018 at 05:12:45PM -0500, Wenwen Wang wrote:
> >> In sctp_setsockopt_maxseg(), the integer 'val' is compared against min_len
> >> and max_len to check whether it is in the appropriate range. If it is not,
> >> an error code -EINVAL will be returned. This is enforced by a security
> >> check. But, this check is only executed when 'val' is not 0. In fact, if
> >
> > Which makes sense, no? Especially if considering that 0 should be an
> > allowed value as it turns off the user limit.
> >
> >> 'val' is 0, it will be assigned with a new value (if the return value of
> >> the function sctp_id2assoc() is not 0) in the following execution. However,
> >> this new value of 'val' is not checked before it is used to assigned to
> >
> > Which 'new value'? val is not set to something new during the
> > function. It always contains the user supplied value.
> >
> >> asoc->user_frag. That means it is possible that the new value of 'val'
> >> could be out of the expected range. This can cause security issues
> >> such as buffer overflows, e.g., the new value of 'val' is used as an index
> >> to access a buffer.
> >>
> >> This patch inserts a check for the new value of 'val' to see if it is in
> >> the expected range. If it is not, an error code -EINVAL will be returned.
> >>
> >> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
> >> ---
> >>  net/sctp/socket.c | 21 ++++++++++-----------
> >>  1 file changed, 10 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> >> index 80835ac..2beb601 100644
> >> --- a/net/sctp/socket.c
> >> +++ b/net/sctp/socket.c
> >> @@ -3212,6 +3212,7 @@ static int sctp_setsockopt_maxseg(struct sock *sk, char __user *optval, unsigned
> >>       struct sctp_af *af = sp->pf->af;
> >>       struct sctp_assoc_value params;
> >>       struct sctp_association *asoc;
> >> +     int min_len, max_len;
> >>       int val;
> >>
> >>       if (optlen == sizeof(int)) {
> >> @@ -3231,19 +3232,15 @@ static int sctp_setsockopt_maxseg(struct sock *sk, char __user *optval, unsigned
> >>               return -EINVAL;
> >>       }
> >>
> >> -     if (val) {
> >> -             int min_len, max_len;
> >> +     min_len = SCTP_DEFAULT_MINSEGMENT - af->net_header_len;
> >> +     min_len -= af->ip_options_len(sk);
> >> +     min_len -= sizeof(struct sctphdr) +
> >> +                sizeof(struct sctp_data_chunk);
> >
> > On which tree did you base your patch on? Your patch lacks a tag so it
> > defaults to net-next, and I reworked this section on current net-next
> > and these MTU calculcations are now handled by sctp_mtu_payload().
> >
> > But even for net tree, I don't understand which issue you're fixing
> > here. Actually it seems to me that both codes seems to do the same
> > thing.
> >
> >>
> >> -             min_len = SCTP_DEFAULT_MINSEGMENT - af->net_header_len;
> >> -             min_len -= af->ip_options_len(sk);
> >> -             min_len -= sizeof(struct sctphdr) +
> >> -                        sizeof(struct sctp_data_chunk);
> >> +     max_len = SCTP_MAX_CHUNK_LEN - sizeof(struct sctp_data_chunk);
> >>
> >> -             max_len = SCTP_MAX_CHUNK_LEN - sizeof(struct sctp_data_chunk);
> >> -
> >> -             if (val < min_len || val > max_len)
> >> -                     return -EINVAL;
> >> -     }
> >> +     if (val && (val < min_len || val > max_len))
> >> +             return -EINVAL;
> >>
> >>       asoc = sctp_id2assoc(sk, params.assoc_id);
> >>       if (asoc) {
> >> @@ -3253,6 +3250,8 @@ static int sctp_setsockopt_maxseg(struct sock *sk, char __user *optval, unsigned
> >>                       val -= sizeof(struct sctphdr) +
> >>                              sctp_datachk_len(&asoc->stream);
> >>               }
> >> +             if (val < min_len || val > max_len)
> >> +                     return -EINVAL;
> >>               asoc->user_frag = val;
> >>               asoc->frag_point = sctp_frag_point(asoc, asoc->pathmtu);
> >>       } else {
> >> --
> >> 2.7.4
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> 

^ permalink raw reply

* Re: [RFC PATCH 4/5] net: macb: Add support for suspend/resume with full power down
From: Harini Katakam @ 2018-05-03 11:20 UTC (permalink / raw)
  To: Claudiu Beznea
  Cc: Nicolas Ferre, David Miller, netdev, linux-kernel, michals,
	appanad
In-Reply-To: <35d980af-b9d5-495e-88af-c4fc911b8429@microchip.com>

Hi Claudiu,

On Thu, May 3, 2018 at 3:39 PM, Claudiu Beznea
<Claudiu.Beznea@microchip.com> wrote:
>
>
> On 22.03.2018 15:51, harinikatakamlinux@gmail.com wrote:
>> From: Harini Katakam <harinik@xilinx.com>
>>
>> When macb device is suspended and system is powered down, the clocks
>> are removed and hence macb should be closed gracefully and restored
>> upon resume.
>
> Is this a power saving mode which shut down the core?

The Ethernet IP is suspended and a majority of the SoC is shut down, yes.

<snip>
>> +             netif_device_detach(netdev);
>> +     } else {
>> +             netif_device_detach(netdev);
>> +             for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
>> +                     napi_disable(&queue->napi);
>> +             phy_stop(netdev->phydev);
>> +             phy_suspend(netdev->phydev);
>> +             spin_lock_irqsave(&bp->lock, flags);
>> +             macb_reset_hw(bp);
>> +             spin_unlock_irqrestore(&bp->lock, flags);
>
> Wouldn't be simple to just call macb_close() here?

<snip>
>
> Wouln't be simpler to call macb_open() here?

No, I think that would be excessive for suspend. This does just
enough to put the IP into suspend and cut off clocks.
For ex., the RX and TX buffers are not freed and allocated again
in this cycle, just the buffer descriptors.

Regards,
Harini

^ permalink raw reply

* Re: Silently dropped UDP packets on kernel 4.14
From: Kristian Evensen @ 2018-05-03 11:19 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Florian Westphal, Netfilter Development Mailing list,
	Network Development
In-Reply-To: <20180503094241.mfos3scewpct3dnu@unicorn.suse.cz>

Hi Michal,

Thanks for providing a nice summary of your experience when dealing
with this problem. Always nice to know that I am not alone :)

On Thu, May 3, 2018 at 11:42 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> One of the ideas I had was this:
>
>   - keep also unconfirmed conntracks in some data structure
>   - check new packets also against unconfirmed conntracks
>   - if it matches an unconfirmed conntrack, defer its processing
>     until that conntrack is either inserted or discarded

I was thinking about something along the same lines and came to the
same conclusion, it is a lot of hassle and work for a very special
case. I think that replacing the conntrack entry is a good compromise,
it improves on the current situation, and allows for the creation of
"perfect" solutions in user-space. For example, a user can keep track
of seen UDP flows, and then only release new packets belonging to the
same flow when the conntrack entry is created.

BR,
Kristian

^ permalink raw reply

* Re: [RFC PATCH 3/5] net: macb: Add pm runtime support
From: Harini Katakam @ 2018-05-03 11:13 UTC (permalink / raw)
  To: Claudiu Beznea
  Cc: Nicolas Ferre, David Miller, netdev, linux-kernel, michals,
	appanad, Shubhrajyoti Datta
In-Reply-To: <48b8ea24-52be-c314-f674-1a6cae4d95f4@microchip.com>

Hi Claudiu,

On Thu, May 3, 2018 at 3:39 PM, Claudiu Beznea
<Claudiu.Beznea@microchip.com> wrote:
>
>
> On 22.03.2018 15:51, harinikatakamlinux@gmail.com wrote:
>> From: Harini Katakam <harinik@xilinx.com>
<snip>
> I would use a "goto" instruction, e.g.:
>                 value = -ETIMEDOUT;
>                 goto out;
>

Will do

>
> Below, in macb_open() you have a return err; case:
>         err = macb_alloc_consistent(bp);
>         if (err) {
>                 netdev_err(dev, "Unable to allocate DMA memory (error %d)\n",
>                            err);
>                 return err;
>         }
>
> You have to undo pm_runtime_get_sync() with pm_runtime_put_sync() or something
> similar to decrement dev->power.usage_count.

Will do

<snip>
>> @@ -4104,11 +4145,16 @@ static int macb_remove(struct platform_device *pdev)
>>               mdiobus_free(bp->mii_bus);
>>
>>               unregister_netdev(dev);
>> -             clk_disable_unprepare(bp->tx_clk);
>> -             clk_disable_unprepare(bp->hclk);
>> -             clk_disable_unprepare(bp->pclk);
>> -             clk_disable_unprepare(bp->rx_clk);
>> -             clk_disable_unprepare(bp->tsu_clk);
>> +             pm_runtime_disable(&pdev->dev);
>> +             pm_runtime_dont_use_autosuspend(&pdev->dev);
>> +             if (!pm_runtime_suspended(&pdev->dev)) {
>> +                     clk_disable_unprepare(bp->tx_clk);
>> +                     clk_disable_unprepare(bp->hclk);
>> +                     clk_disable_unprepare(bp->pclk);
>> +                     clk_disable_unprepare(bp->rx_clk);
>> +                     clk_disable_unprepare(bp->tsu_clk);
>> +                     pm_runtime_set_suspended(&pdev->dev);
>
> This is driver remove function. Shouldn't clocks be removed?

clk_disable_unprepare IS being done here.
The check for !pm_runtime_suspended is just to make sure the
clocks are not already removed (in runtime_suspend).

Regards,
Harini

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: Tariq Toukan @ 2018-05-03 11:06 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Tal Gilboa
  Cc: netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>



On 03/05/2018 6:25 AM, Eric Dumazet wrote:
> When adding rb-tree for TCP retransmit queue, we inadvertently broke
> TCP autocorking.
> 
> tcp_should_autocork() should really check if the rtx queue is not empty.
> 

Hi Eric,

We are glad to see that the issue that Tal investigated and reported [1] 
is now addressed.
Thanks for doing that!

Tal, let’s perf test to see the effect of this fix.

Best,
Tariq

[1] https://patchwork.ozlabs.org/cover/822218/

^ permalink raw reply

* Re: [RFC PATCH 1/5] net: macb: Check MDIO state before read/write and use timeouts
From: Harini Katakam @ 2018-05-03 10:58 UTC (permalink / raw)
  To: Claudiu Beznea
  Cc: Nicolas Ferre, David Miller, netdev, linux-kernel, michals,
	appanad, Shubhrajyoti Datta
In-Reply-To: <81b5b276-e59c-79d8-1616-79ff0e9c5f17@microchip.com>

Hi Claudiu,

On Thu, May 3, 2018 at 3:38 PM, Claudiu Beznea
<Claudiu.Beznea@microchip.com> wrote:
>
>
> On 22.03.2018 15:51, harinikatakamlinux@gmail.com wrote:
>> From: Harini Katakam <harinik@xilinx.com>
>>
<snip>
>> +     ulong timeout;
>> +
>> +     timeout = jiffies + msecs_to_jiffies(1000);
>> +     /* wait for end of transfer */
>> +     do {
>> +             if (MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
>> +                     break;
>> +
>> +             cpu_relax();
>> +     } while (!time_after_eq(jiffies, timeout));
>> +
>> +     if (time_after_eq(jiffies, timeout)) {
>> +             netdev_err(bp->dev, "wait for end of transfer timed out\n");
>> +             return -ETIMEDOUT;
>> +     }
>
> Wouldn't be cleaner to keep it in this way:
>
>         while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR))) {
>                 if (time_after_eq(jiffies, timeout) {
>                         netdev_err(bp->dev, "wait for end of transfer timed out\n");
>                         return -ETIMEDOUT;
>                 }
>                 cpu_relax();
>         }
>

Thanks for the review.
Sure, will update in next version.

Regards,
Harini

^ permalink raw reply

* Re: [PATCH V2] net/netlink: optimize seq_puts and seq_printf in af_netlink.c
From: YU Bo @ 2018-05-03 10:57 UTC (permalink / raw)
  To: Julia Lawall; +Cc: davem, xiyou.wangcong, yuzibode, netdev, kernel-janitors
In-Reply-To: <alpine.DEB.2.20.1805031142420.3385@hadrien>

Hello,
On Thu, May 03, 2018 at 11:44:33AM +0200, Julia Lawall wrote:
>
>
>On Thu, 3 May 2018, YU Bo wrote:
>
>> Before the patch, the command `cat /proc/net/netlink` will output like:
>>
>> https://clbin.com/BojZv
>>
>> After the patch:
>>
>> https://clbin.com/lnu4L
>>
>> The optimization will make convenience for using `cat /proc/net/netlink`
>> But,The checkpatch will give a warning:
>>
>> WARNING: quoted string split across lines
>
>The interest of the checkpatch warning is that someone may want to grep
>for something that has actually been split over two lines.  If this is not
>an issue in your case and if there are good reasons for splitting the
>string, then you can ignore checkpatch.
Yes, the warning will be generated in original af_netlink.c and i dom't
think to split it is better.

Thank you!
>
>julia
>
>>
>> Signed-off-by: Bo YU <tsu.yubo@gmail.com>
>> ---
>> Changes in v2:
>> Do not break the indentation of the code line
>> ---
>> net/netlink/af_netlink.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
>> index 55342c4d5cec..2e2dd88fc79f 100644
>> --- a/net/netlink/af_netlink.c
>> +++ b/net/netlink/af_netlink.c
>> @@ -2606,13 +2606,13 @@ static int netlink_seq_show(struct seq_file *seq, void
>> *v)
>> {
>> 	if (v == SEQ_START_TOKEN) {
>> 		seq_puts(seq,
>> -			 "sk       Eth Pid    Groups   "
>> -			 "Rmem     Wmem     Dump     Locks     Drops
>> Inode\n");
>> +			 "sk               Eth Pid        Groups   "
>> +			 "Rmem     Wmem     Dump  Locks    Drops    Inode\n");
>> 	} else {
>> 		struct sock *s = v;
>> 		struct netlink_sock *nlk = nlk_sk(s);
>>
>> -		seq_printf(seq, "%pK %-3d %-6u %08x %-8d %-8d %d %-8d %-8d
>> %-8lu\n",
>> +		seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d
>> %-8lu\n",
>> 			   s,
>> 			   s->sk_protocol,
>> 			   nlk->portid,
>> --
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply

* [PATCH net-next] net: bridge: avoid duplicate notification on up/down/change netdev events
From: Nikolay Aleksandrov @ 2018-05-03 10:47 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, bridge, Nikolay Aleksandrov

While handling netdevice events, br_device_event() sometimes uses
br_stp_(disable|enable)_port which unconditionally send a notification,
but then a second notification for the same event is sent at the end of
the br_device_event() function. To avoid sending duplicate notifications
in such cases, check if one has already been sent (i.e.
br_stp_enable/disable_port have been called).
The patch is based on a change by Satish Ashok.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
We've been running with a similar patch for over an year, it's been
thoroughly tested. Sending for net-next since it's an improvement and
not really a bug fix.

 net/bridge/br.c         | 12 ++++++++----
 net/bridge/br_if.c      | 11 ++++++++---
 net/bridge/br_private.h |  2 +-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index 671d13c10f6f..2ca035054664 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -34,6 +34,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	struct net_bridge_port *p;
 	struct net_bridge *br;
+	bool notified = false;
 	bool changed_addr;
 	int err;
 
@@ -67,7 +68,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 		break;
 
 	case NETDEV_CHANGE:
-		br_port_carrier_check(p);
+		br_port_carrier_check(p, &notified);
 		break;
 
 	case NETDEV_FEAT_CHANGE:
@@ -76,8 +77,10 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 
 	case NETDEV_DOWN:
 		spin_lock_bh(&br->lock);
-		if (br->dev->flags & IFF_UP)
+		if (br->dev->flags & IFF_UP) {
 			br_stp_disable_port(p);
+			notified = true;
+		}
 		spin_unlock_bh(&br->lock);
 		break;
 
@@ -85,6 +88,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 		if (netif_running(br->dev) && netif_oper_up(dev)) {
 			spin_lock_bh(&br->lock);
 			br_stp_enable_port(p);
+			notified = true;
 			spin_unlock_bh(&br->lock);
 		}
 		break;
@@ -110,8 +114,8 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	}
 
 	/* Events that may cause spanning tree to refresh */
-	if (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
-	    event == NETDEV_CHANGE || event == NETDEV_DOWN)
+	if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
+			  event == NETDEV_CHANGE || event == NETDEV_DOWN))
 		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
 
 	return NOTIFY_DONE;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 82c1a6f430b3..e3a8ea1bcbe2 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -64,7 +64,7 @@ static int port_cost(struct net_device *dev)
 
 
 /* Check for port carrier transitions. */
-void br_port_carrier_check(struct net_bridge_port *p)
+void br_port_carrier_check(struct net_bridge_port *p, bool *notified)
 {
 	struct net_device *dev = p->dev;
 	struct net_bridge *br = p->br;
@@ -73,16 +73,21 @@ void br_port_carrier_check(struct net_bridge_port *p)
 	    netif_running(dev) && netif_oper_up(dev))
 		p->path_cost = port_cost(dev);
 
+	*notified = false;
 	if (!netif_running(br->dev))
 		return;
 
 	spin_lock_bh(&br->lock);
 	if (netif_running(dev) && netif_oper_up(dev)) {
-		if (p->state == BR_STATE_DISABLED)
+		if (p->state == BR_STATE_DISABLED) {
 			br_stp_enable_port(p);
+			*notified = true;
+		}
 	} else {
-		if (p->state != BR_STATE_DISABLED)
+		if (p->state != BR_STATE_DISABLED) {
 			br_stp_disable_port(p);
+			*notified = true;
+		}
 	}
 	spin_unlock_bh(&br->lock);
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1a5093115534..0ddeeea2c6a7 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -573,7 +573,7 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	      enum br_pkt_type pkt_type, bool local_rcv, bool local_orig);
 
 /* br_if.c */
-void br_port_carrier_check(struct net_bridge_port *p);
+void br_port_carrier_check(struct net_bridge_port *p, bool *notified);
 int br_add_bridge(struct net *net, const char *name);
 int br_del_bridge(struct net *net, const char *name);
 int br_add_if(struct net_bridge *br, struct net_device *dev,
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next mlxsw 3/3] selftests: forwarding: mirror_gre_nh: Unset RP filter
From: Petr Machata @ 2018-05-03 10:37 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah
In-Reply-To: <cover.1525343276.git.petrm@mellanox.com>

The test fails to work if reverse-path filtering is in effect on the
mirrored-to host interface, or for all interfaces.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/forwarding/mirror_gre_nh.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
index a0d1ad4..8fa681e 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
@@ -29,6 +29,9 @@ setup_prepare()
 	swp3=${NETIFS[p5]}
 	h3=${NETIFS[p6]}
 
+	sysctl_set net.ipv4.conf.all.rp_filter 0
+	sysctl_set net.ipv4.conf.$h3.rp_filter 0
+
 	vrf_prepare
 	mirror_gre_topo_create
 
@@ -60,6 +63,9 @@ cleanup()
 
 	mirror_gre_topo_destroy
 	vrf_cleanup
+
+	sysctl_restore net.ipv4.conf.$h3.rp_filter
+	sysctl_restore net.ipv4.conf.all.rp_filter
 }
 
 test_gretap()
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next mlxsw 2/3] selftests: forwarding: Use sysctl_set(), sysctl_restore()
From: Petr Machata @ 2018-05-03 10:37 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah
In-Reply-To: <cover.1525343276.git.petrm@mellanox.com>

Instead of hand-managing the sysctl set and restore, use the wrappers
sysctl_set() and sysctl_restore() to do the bookkeeping automatically.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/forwarding/lib.sh                | 11 ++++-------
 tools/testing/selftests/net/forwarding/mirror_gre_changes.sh |  7 ++-----
 tools/testing/selftests/net/forwarding/router_multipath.sh   | 12 ++++--------
 3 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 426b294..a7a6750 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -420,17 +420,14 @@ sysctl_restore()
 
 forwarding_enable()
 {
-       ipv4_fwd=$(sysctl -n net.ipv4.conf.all.forwarding)
-       ipv6_fwd=$(sysctl -n net.ipv6.conf.all.forwarding)
-
-       sysctl -q -w net.ipv4.conf.all.forwarding=1
-       sysctl -q -w net.ipv6.conf.all.forwarding=1
+	sysctl_set net.ipv4.conf.all.forwarding 1
+	sysctl_set net.ipv6.conf.all.forwarding 1
 }
 
 forwarding_restore()
 {
-       sysctl -q -w net.ipv6.conf.all.forwarding=$ipv6_fwd
-       sysctl -q -w net.ipv4.conf.all.forwarding=$ipv4_fwd
+	sysctl_restore net.ipv6.conf.all.forwarding
+	sysctl_restore net.ipv4.conf.all.forwarding
 }
 
 tc_offload_check()
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
index fdb612f..50ab346 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
@@ -36,9 +36,7 @@ setup_prepare()
 
 	# This test downs $swp3, which deletes the configured IPv6 address
 	# unless this sysctl is set.
-	local key=net.ipv6.conf.$swp3.keep_addr_on_down
-	SWP3_KEEP_ADDR_ON_DOWN=$(sysctl -n $key)
-	sysctl -qw $key=1
+	sysctl_set net.ipv6.conf.$swp3.keep_addr_on_down 1
 
 	ip address add dev $swp3 192.0.2.129/28
 	ip address add dev $h3 192.0.2.130/28
@@ -57,8 +55,7 @@ cleanup()
 	ip address del dev $h3 192.0.2.130/28
 	ip address del dev $swp3 192.0.2.129/28
 
-	local key=net.ipv6.conf.$swp3.keep_addr_on_down
-	sysctl -qw $key=$SWP3_KEEP_ADDR_ON_DOWN
+	sysctl_restore net.ipv6.conf.$swp3.keep_addr_on_down
 
 	mirror_gre_topo_destroy
 	vrf_cleanup
diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 6c43762..8b6d0fb 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -205,13 +205,11 @@ multipath4_test()
        local weight_rp13=$3
        local t0_rp12 t0_rp13 t1_rp12 t1_rp13
        local packets_rp12 packets_rp13
-       local hash_policy
 
        # Transmit multiple flows from h1 to h2 and make sure they are
        # distributed between both multipath links (rp12 and rp13)
        # according to the configured weights.
-       hash_policy=$(sysctl -n net.ipv4.fib_multipath_hash_policy)
-       sysctl -q -w net.ipv4.fib_multipath_hash_policy=1
+       sysctl_set net.ipv4.fib_multipath_hash_policy 1
        ip route replace 198.51.100.0/24 vrf vrf-r1 \
                nexthop via 169.254.2.22 dev $rp12 weight $weight_rp12 \
                nexthop via 169.254.3.23 dev $rp13 weight $weight_rp13
@@ -233,7 +231,7 @@ multipath4_test()
        ip route replace 198.51.100.0/24 vrf vrf-r1 \
                nexthop via 169.254.2.22 dev $rp12 \
                nexthop via 169.254.3.23 dev $rp13
-       sysctl -q -w net.ipv4.fib_multipath_hash_policy=$hash_policy
+       sysctl_restore net.ipv4.fib_multipath_hash_policy
 }
 
 multipath6_l4_test()
@@ -243,13 +241,11 @@ multipath6_l4_test()
        local weight_rp13=$3
        local t0_rp12 t0_rp13 t1_rp12 t1_rp13
        local packets_rp12 packets_rp13
-       local hash_policy
 
        # Transmit multiple flows from h1 to h2 and make sure they are
        # distributed between both multipath links (rp12 and rp13)
        # according to the configured weights.
-       hash_policy=$(sysctl -n net.ipv6.fib_multipath_hash_policy)
-       sysctl -q -w net.ipv6.fib_multipath_hash_policy=1
+       sysctl_set net.ipv6.fib_multipath_hash_policy 1
 
        ip route replace 2001:db8:2::/64 vrf vrf-r1 \
 	       nexthop via fe80:2::22 dev $rp12 weight $weight_rp12 \
@@ -272,7 +268,7 @@ multipath6_l4_test()
 	       nexthop via fe80:2::22 dev $rp12 \
 	       nexthop via fe80:3::23 dev $rp13
 
-       sysctl -q -w net.ipv6.fib_multipath_hash_policy=$hash_policy
+       sysctl_restore net.ipv6.fib_multipath_hash_policy
 }
 
 multipath6_test()
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next mlxsw 1/3] selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()
From: Petr Machata @ 2018-05-03 10:36 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah
In-Reply-To: <cover.1525343276.git.petrm@mellanox.com>

Add two helper functions: sysctl_set() to change the value of a given
sysctl setting, and sysctl_restore() to change it back to what it was.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 7fe6d27..426b294 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -401,6 +401,23 @@ bridge_ageing_time_get()
 	echo $((ageing_time / 100))
 }
 
+declare -A SYSCTL_ORIG
+sysctl_set()
+{
+	local key=$1; shift
+	local value=$1; shift
+
+	SYSCTL_ORIG[$key]=$(sysctl -n $key)
+	sysctl -qw $key=$value
+}
+
+sysctl_restore()
+{
+	local key=$1; shift
+
+	sysctl -qw $key=${SYSCTL_ORIG["$key"]}
+}
+
 forwarding_enable()
 {
        ipv4_fwd=$(sysctl -n net.ipv4.conf.all.forwarding)
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next mlxsw 0/3] selftests: forwarding: Updates to sysctl handling
From: Petr Machata @ 2018-05-03 10:36 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah

Some selftests need to adjust sysctl settings. In order to be neutral to
the system that the test is run on, it is a good practice to change back
to the original setting after the test ends. That involves some
boilerplate that can be abstracted away.

In patch #1, introduce two functions, sysctl_set() and sysctl_restore().
The former stores the current value of a given setting, and sets a new
value. The latter restores the setting to the previously-stored value.

In patch #2, use these wrappers in a number of tests.

Additionally in patch #3, fix a problem in mirror_gre_nh.sh, which
neglected to set a sysctl that's crucial for the test to work.

Petr Machata (3):
  selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()
  selftests: forwarding: Use sysctl_set(), sysctl_restore()
  selftests: forwarding: mirror_gre_nh: Unset RP filter

 tools/testing/selftests/net/forwarding/lib.sh      | 28 ++++++++++++++++------
 .../selftests/net/forwarding/mirror_gre_changes.sh |  7 ++----
 .../selftests/net/forwarding/mirror_gre_nh.sh      |  6 +++++
 .../selftests/net/forwarding/router_multipath.sh   | 12 ++++------
 4 files changed, 33 insertions(+), 20 deletions(-)

-- 
2.4.11

^ permalink raw reply

* Re: [RFC PATCH 4/5] net: macb: Add support for suspend/resume with full power down
From: Claudiu Beznea @ 2018-05-03 10:09 UTC (permalink / raw)
  To: harinikatakamlinux, nicolas.ferre, davem
  Cc: netdev, linux-kernel, harinik, michals, appanad
In-Reply-To: <1521726700-22634-5-git-send-email-harinikatakamlinux@gmail.com>



On 22.03.2018 15:51, harinikatakamlinux@gmail.com wrote:
> From: Harini Katakam <harinik@xilinx.com>
> 
> When macb device is suspended and system is powered down, the clocks
> are removed and hence macb should be closed gracefully and restored
> upon resume. 

Is this a power saving mode which shut down the core?

This patch does the same by switching off the net device,
> suspending phy and performing necessary cleanup of interrupts and BDs.
> Upon resume, all these are reinitialized again.
> 
> Reset of macb device is done only when GEM is not a wake device.
> Even when gem is a wake device, tx queues can be stopped and ptp device
> can be closed (tsu clock will be disabled in pm_runtime_suspend) as
> wake event detection has no dependency on this.
> 
> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> Signed-off-by: Harini Katakam <harinik@xilinx.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 38 ++++++++++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index ce75088..bca91bd 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -4167,16 +4167,33 @@ static int __maybe_unused macb_suspend(struct device *dev)
>  	struct platform_device *pdev = to_platform_device(dev);
>  	struct net_device *netdev = platform_get_drvdata(pdev);
>  	struct macb *bp = netdev_priv(netdev);
> +	struct macb_queue *queue = bp->queues;
> +	unsigned long flags;
> +	unsigned int q;
> +
> +	if (!netif_running(netdev))
> +		return 0;
>  
> -	netif_carrier_off(netdev);
> -	netif_device_detach(netdev);
>  
>  	if (bp->wol & MACB_WOL_ENABLED) {
>  		macb_writel(bp, IER, MACB_BIT(WOL));
>  		macb_writel(bp, WOL, MACB_BIT(MAG));
>  		enable_irq_wake(bp->queues[0].irq);
> +		netif_device_detach(netdev);
> +	} else {
> +		netif_device_detach(netdev);
> +		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
> +			napi_disable(&queue->napi);
> +		phy_stop(netdev->phydev);
> +		phy_suspend(netdev->phydev);
> +		spin_lock_irqsave(&bp->lock, flags);
> +		macb_reset_hw(bp);
> +		spin_unlock_irqrestore(&bp->lock, flags);

Wouldn't be simple to just call macb_close() here?

>  	}
>  
> +	netif_carrier_off(netdev);
> +	if (bp->ptp_info)
> +		bp->ptp_info->ptp_remove(netdev);
>  	pm_runtime_force_suspend(dev);
>  
>  	return 0;
> @@ -4187,6 +4204,11 @@ static int __maybe_unused macb_resume(struct device *dev)
>  	struct platform_device *pdev = to_platform_device(dev);
>  	struct net_device *netdev = platform_get_drvdata(pdev);
>  	struct macb *bp = netdev_priv(netdev);
> +	struct macb_queue *queue = bp->queues;
> +	unsigned int q;
> +
> +	if (!netif_running(netdev))
> +		return 0;
>  
>  	pm_runtime_force_resume(dev);
>  
> @@ -4194,9 +4216,21 @@ static int __maybe_unused macb_resume(struct device *dev)
>  		macb_writel(bp, IDR, MACB_BIT(WOL));
>  		macb_writel(bp, WOL, 0);
>  		disable_irq_wake(bp->queues[0].irq);
> +	} else {
> +		macb_writel(bp, NCR, MACB_BIT(MPE));
> +		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
> +			napi_enable(&queue->napi);
> +		netif_carrier_on(netdev);
> +		phy_resume(netdev->phydev);
> +		phy_start(netdev->phydev);
>  	}
>  
> +	bp->macbgem_ops.mog_init_rings(bp);
> +	macb_init_hw(bp);
> +	macb_set_rx_mode(netdev);
>  	netif_device_attach(netdev);
> +	if (bp->ptp_info)
> +		bp->ptp_info->ptp_init(netdev);

Wouln't be simpler to call macb_open() here?

>  
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [RFC PATCH 3/5] net: macb: Add pm runtime support
From: Claudiu Beznea @ 2018-05-03 10:09 UTC (permalink / raw)
  To: harinikatakamlinux, nicolas.ferre, davem
  Cc: netdev, linux-kernel, harinik, michals, appanad,
	Shubhrajyoti Datta
In-Reply-To: <1521726700-22634-4-git-send-email-harinikatakamlinux@gmail.com>



On 22.03.2018 15:51, harinikatakamlinux@gmail.com wrote:
> From: Harini Katakam <harinik@xilinx.com>
> 
> Add runtime pm functions and move clock handling there.
> Enable clocks in mdio read/write functions.
> 
> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
> Signed-off-by: Harini Katakam <harinik@xilinx.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 105 ++++++++++++++++++++++++++-----
>  1 file changed, 90 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index ae61927..ce75088 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -35,6 +35,7 @@
>  #include <linux/ip.h>
>  #include <linux/udp.h>
>  #include <linux/tcp.h>
> +#include <linux/pm_runtime.h>
>  #include "macb.h"
>  
>  #define MACB_RX_BUFFER_SIZE	128
> @@ -77,6 +78,7 @@
>   * 1 frame time (10 Mbits/s, full-duplex, ignoring collisions)
>   */
>  #define MACB_HALT_TIMEOUT	1230
> +#define MACB_PM_TIMEOUT  100 /* ms */
>  
>  /* DMA buffer descriptor might be different size
>   * depends on hardware configuration:
> @@ -321,8 +323,13 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  {
>  	struct macb *bp = bus->priv;
>  	int value;
> +	int err;
>  	ulong timeout;
>  
> +	err = pm_runtime_get_sync(&bp->pdev->dev);
> +	if (err < 0)

You have to call pm_runtime_put_noidle() or something similar to decrement the
dev->power.usage_count.

> +		return err;
> +
>  	timeout = jiffies + msecs_to_jiffies(1000);
>  	/* wait for end of transfer */
>  	do {
> @@ -334,6 +341,8 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  
>  	if (time_after_eq(jiffies, timeout)) {
>  		netdev_err(bp->dev, "wait for end of transfer timed out\n");

For this:

> +		pm_runtime_mark_last_busy(&bp->pdev->dev);
> +		pm_runtime_put_autosuspend(&bp->pdev->dev);>  		return -ETIMEDOUT;
>  	}
>  
> @@ -354,11 +363,15 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  
>  	if (time_after_eq(jiffies, timeout)) {
>  		netdev_err(bp->dev, "wait for end of transfer timed out\n");

And this:

> +		pm_runtime_mark_last_busy(&bp->pdev->dev);
> +		pm_runtime_put_autosuspend(&bp->pdev->dev);
>  		return -ETIMEDOUT;

I would use a "goto" instruction, e.g.:
  		value = -ETIMEDOUT;
		goto out;

>  	}
>  
>  	value = MACB_BFEXT(DATA, macb_readl(bp, MAN));
>  

out:
> +	pm_runtime_mark_last_busy(&bp->pdev->dev);
> +	pm_runtime_put_autosuspend(&bp->pdev->dev);
>  	return value;
>  }
>  
> @@ -366,8 +379,13 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>  			   u16 value)
>  {
>  	struct macb *bp = bus->priv;
> +	int err;
>  	ulong timeout;
>  
> +	err = pm_runtime_get_sync(&bp->pdev->dev);> +	if (err < 0)

Ditto

> +		return err;
> +
>  	timeout = jiffies + msecs_to_jiffies(1000);
>  	/* wait for end of transfer */
>  	do {
> @@ -379,6 +397,8 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>  
>  	if (time_after_eq(jiffies, timeout)) {
>  		netdev_err(bp->dev, "wait for end of transfer timed out\n");

Ditto

> +		pm_runtime_mark_last_busy(&bp->pdev->dev);
> +		pm_runtime_put_autosuspend(&bp->pdev->dev);
>  		return -ETIMEDOUT;
>  	}
>  
> @@ -400,9 +420,13 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>  
>  	if (time_after_eq(jiffies, timeout)) {
>  		netdev_err(bp->dev, "wait for end of transfer timed out\n");
> +		pm_runtime_mark_last_busy(&bp->pdev->dev);
> +		pm_runtime_put_autosuspend(&bp->pdev->dev);

Ditto

>  		return -ETIMEDOUT;
>  	}
>  
> +	pm_runtime_mark_last_busy(&bp->pdev->dev);
> +	pm_runtime_put_autosuspend(&bp->pdev->dev);
>  	return 0;
>  }
>  
> @@ -2338,6 +2362,10 @@ static int macb_open(struct net_device *dev)
>  
>  	netdev_dbg(bp->dev, "open\n");
>  
> +	err = pm_runtime_get_sync(&bp->pdev->dev);
> +	if (err < 0)

Ditto

> +		return err;
> +

Below, in macb_open() you have a return err; case:
	err = macb_alloc_consistent(bp);
	if (err) {
		netdev_err(dev, "Unable to allocate DMA memory (error %d)\n",
			   err);
		return err;
	}

You have to undo pm_runtime_get_sync() with pm_runtime_put_sync() or something
similar to decrement dev->power.usage_count.
	
>  	/* carrier starts down */
>  	netif_carrier_off(dev);
>  
> @@ -2397,6 +2425,8 @@ static int macb_close(struct net_device *dev)
>  	if (bp->ptp_info)
>  		bp->ptp_info->ptp_remove(dev);
>  
> +	pm_runtime_put(&bp->pdev->dev);
> +
>  	return 0;
>  }
>  
> @@ -3949,6 +3979,11 @@ static int macb_probe(struct platform_device *pdev)
>  	if (err)
>  		return err;
>  
> +	pm_runtime_set_autosuspend_delay(&pdev->dev, MACB_PM_TIMEOUT);
> +	pm_runtime_use_autosuspend(&pdev->dev);
> +	pm_runtime_get_noresume(&pdev->dev);
> +	pm_runtime_set_active(&pdev->dev);
> +	pm_runtime_enable(&pdev->dev);
>  	native_io = hw_is_native_io(mem);
>  
>  	macb_probe_queues(mem, native_io, &queue_mask, &num_queues);
> @@ -4062,6 +4097,9 @@ static int macb_probe(struct platform_device *pdev)
>  		    macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
>  		    dev->base_addr, dev->irq, dev->dev_addr);
>  
> +	pm_runtime_mark_last_busy(&bp->pdev->dev);
> +	pm_runtime_put_autosuspend(&bp->pdev->dev);
> +
>  	return 0;
>  
>  err_out_unregister_mdio:
> @@ -4081,6 +4119,9 @@ static int macb_probe(struct platform_device *pdev)
>  	clk_disable_unprepare(pclk);
>  	clk_disable_unprepare(rx_clk);
>  	clk_disable_unprepare(tsu_clk);
> +	pm_runtime_disable(&pdev->dev);
> +	pm_runtime_set_suspended(&pdev->dev);
> +	pm_runtime_dont_use_autosuspend(&pdev->dev);
>  
>  	return err;
>  }
> @@ -4104,11 +4145,16 @@ static int macb_remove(struct platform_device *pdev)
>  		mdiobus_free(bp->mii_bus);
>  
>  		unregister_netdev(dev);
> -		clk_disable_unprepare(bp->tx_clk);
> -		clk_disable_unprepare(bp->hclk);
> -		clk_disable_unprepare(bp->pclk);
> -		clk_disable_unprepare(bp->rx_clk);
> -		clk_disable_unprepare(bp->tsu_clk);
> +		pm_runtime_disable(&pdev->dev);
> +		pm_runtime_dont_use_autosuspend(&pdev->dev);
> +		if (!pm_runtime_suspended(&pdev->dev)) {
> +			clk_disable_unprepare(bp->tx_clk);
> +			clk_disable_unprepare(bp->hclk);
> +			clk_disable_unprepare(bp->pclk);
> +			clk_disable_unprepare(bp->rx_clk);
> +			clk_disable_unprepare(bp->tsu_clk);
> +			pm_runtime_set_suspended(&pdev->dev);

This is driver remove function. Shouldn't clocks be removed?

> +		}>  		of_node_put(bp->phy_node);
>  		free_netdev(dev);
>  	}
> @@ -4129,13 +4175,9 @@ static int __maybe_unused macb_suspend(struct device *dev)
>  		macb_writel(bp, IER, MACB_BIT(WOL));
>  		macb_writel(bp, WOL, MACB_BIT(MAG));
>  		enable_irq_wake(bp->queues[0].irq);
> -	} else {
> -		clk_disable_unprepare(bp->tx_clk);
> -		clk_disable_unprepare(bp->hclk);
> -		clk_disable_unprepare(bp->pclk);
> -		clk_disable_unprepare(bp->rx_clk);
>  	}
> -	clk_disable_unprepare(bp->tsu_clk);
> +
> +	pm_runtime_force_suspend(dev);
>  
>  	return 0;
>  }
> @@ -4146,11 +4188,43 @@ static int __maybe_unused macb_resume(struct device *dev)
>  	struct net_device *netdev = platform_get_drvdata(pdev);
>  	struct macb *bp = netdev_priv(netdev);
>  
> +	pm_runtime_force_resume(dev);
> +
>  	if (bp->wol & MACB_WOL_ENABLED) {
>  		macb_writel(bp, IDR, MACB_BIT(WOL));
>  		macb_writel(bp, WOL, 0);
>  		disable_irq_wake(bp->queues[0].irq);
> -	} else {
> +	}
> +
> +	netif_device_attach(netdev);
> +
> +	return 0;
> +}
> +
> +static int __maybe_unused macb_runtime_suspend(struct device *dev)
> +{
> +	struct platform_device *pdev = to_platform_device(dev);
> +	struct net_device *netdev = platform_get_drvdata(pdev);
> +	struct macb *bp = netdev_priv(netdev);
> +
> +	if (!(device_may_wakeup(&bp->dev->dev))) {
> +		clk_disable_unprepare(bp->tx_clk);
> +		clk_disable_unprepare(bp->hclk);
> +		clk_disable_unprepare(bp->pclk);
> +		clk_disable_unprepare(bp->rx_clk);
> +	}
> +	clk_disable_unprepare(bp->tsu_clk);
> +
> +	return 0;
> +}
> +
> +static int __maybe_unused macb_runtime_resume(struct device *dev)
> +{
> +	struct platform_device *pdev = to_platform_device(dev);
> +	struct net_device *netdev = platform_get_drvdata(pdev);
> +	struct macb *bp = netdev_priv(netdev);
> +
> +	if (!(device_may_wakeup(&bp->dev->dev))) {
>  		clk_prepare_enable(bp->pclk);
>  		clk_prepare_enable(bp->hclk);
>  		clk_prepare_enable(bp->tx_clk);
> @@ -4158,12 +4232,13 @@ static int __maybe_unused macb_resume(struct device *dev)
>  	}
>  	clk_prepare_enable(bp->tsu_clk);
>  
> -	netif_device_attach(netdev);
> -
>  	return 0;
>  }
>  
> -static SIMPLE_DEV_PM_OPS(macb_pm_ops, macb_suspend, macb_resume);
> +static const struct dev_pm_ops macb_pm_ops = {
> +	SET_SYSTEM_SLEEP_PM_OPS(macb_suspend, macb_resume)
> +	SET_RUNTIME_PM_OPS(macb_runtime_suspend, macb_runtime_resume, NULL)
> +};
>  
>  static struct platform_driver macb_driver = {
>  	.probe		= macb_probe,
> 

^ permalink raw reply

* Re: [RFC PATCH 1/5] net: macb: Check MDIO state before read/write and use timeouts
From: Claudiu Beznea @ 2018-05-03 10:08 UTC (permalink / raw)
  To: harinikatakamlinux, nicolas.ferre, davem
  Cc: netdev, linux-kernel, harinik, michals, appanad,
	Shubhrajyoti Datta
In-Reply-To: <1521726700-22634-2-git-send-email-harinikatakamlinux@gmail.com>



On 22.03.2018 15:51, harinikatakamlinux@gmail.com wrote:
> From: Harini Katakam <harinik@xilinx.com>
> 
> Replace the while loop in MDIO read/write functions with a timeout.
> In addition, add a check for MDIO bus busy before initiating a new
> operation as well to make sure there is no ongoing MDIO operation.
> 
> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
> Signed-off-by: Harini Katakam <harinik@xilinx.com>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 54 ++++++++++++++++++++++++++++++--
>  1 file changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index d09bd43..f4030c1 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -321,6 +321,21 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  {
>  	struct macb *bp = bus->priv;
>  	int value;
> +	ulong timeout;
> +
> +	timeout = jiffies + msecs_to_jiffies(1000);
> +	/* wait for end of transfer */
> +	do {
> +		if (MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> +			break;
> +
> +		cpu_relax();
> +	} while (!time_after_eq(jiffies, timeout));
> +
> +	if (time_after_eq(jiffies, timeout)) {
> +		netdev_err(bp->dev, "wait for end of transfer timed out\n");
> +		return -ETIMEDOUT;
> +	}

Wouldn't be cleaner to keep it in this way:

	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR))) {
		if (time_after_eq(jiffies, timeout) {
			netdev_err(bp->dev, "wait for end of transfer timed out\n");
			return -ETIMEDOUT;
		}
		cpu_relax();
	}

>  
>  	macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
>  			      | MACB_BF(RW, MACB_MAN_READ)
> @@ -328,9 +343,19 @@ static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  			      | MACB_BF(REGA, regnum)
>  			      | MACB_BF(CODE, MACB_MAN_CODE)));
>  
> +	timeout = jiffies + msecs_to_jiffies(1000);
>  	/* wait for end of transfer */
> -	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> +	do {
> +		if (MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> +			break;
> +
>  		cpu_relax();
> +	} while (!time_after_eq(jiffies, timeout));
> +
> +	if (time_after_eq(jiffies, timeout)) {
> +		netdev_err(bp->dev, "wait for end of transfer timed out\n");
> +		return -ETIMEDOUT;
> +	}
>  
>  	value = MACB_BFEXT(DATA, macb_readl(bp, MAN));
>  
> @@ -341,6 +366,21 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>  			   u16 value)
>  {
>  	struct macb *bp = bus->priv;
> +	ulong timeout;
> +
> +	timeout = jiffies + msecs_to_jiffies(1000);
> +	/* wait for end of transfer */
> +	do {
> +		if (MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> +			break;
> +
> +		cpu_relax();
> +	} while (!time_after_eq(jiffies, timeout));
> +
> +	if (time_after_eq(jiffies, timeout)) {
> +		netdev_err(bp->dev, "wait for end of transfer timed out\n");
> +		return -ETIMEDOUT;
> +	}
>  
>  	macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
>  			      | MACB_BF(RW, MACB_MAN_WRITE)
> @@ -349,9 +389,19 @@ static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>  			      | MACB_BF(CODE, MACB_MAN_CODE)
>  			      | MACB_BF(DATA, value)));
>  
> +	timeout = jiffies + msecs_to_jiffies(1000);
>  	/* wait for end of transfer */
> -	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> +	do {
> +		if (MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> +			break;
> +
>  		cpu_relax();
> +	} while (!time_after_eq(jiffies, timeout));
> +
> +	if (time_after_eq(jiffies, timeout)) {
> +		netdev_err(bp->dev, "wait for end of transfer timed out\n");
> +		return -ETIMEDOUT;
> +	}
>  
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [PATCH V2] net/netlink: optimize seq_puts and seq_printf in af_netlink.c
From: Julia Lawall @ 2018-05-03  9:44 UTC (permalink / raw)
  To: YU Bo; +Cc: davem, xiyou.wangcong, yuzibode, netdev, kernel-janitors
In-Reply-To: <20180503090901.35bxgzs2tjsl7bqr@debian>



On Thu, 3 May 2018, YU Bo wrote:

> Before the patch, the command `cat /proc/net/netlink` will output like:
>
> https://clbin.com/BojZv
>
> After the patch:
>
> https://clbin.com/lnu4L
>
> The optimization will make convenience for using `cat /proc/net/netlink`
> But,The checkpatch will give a warning:
>
> WARNING: quoted string split across lines

The interest of the checkpatch warning is that someone may want to grep
for something that has actually been split over two lines.  If this is not
an issue in your case and if there are good reasons for splitting the
string, then you can ignore checkpatch.

julia

>
> Signed-off-by: Bo YU <tsu.yubo@gmail.com>
> ---
> Changes in v2:
> Do not break the indentation of the code line
> ---
> net/netlink/af_netlink.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 55342c4d5cec..2e2dd88fc79f 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2606,13 +2606,13 @@ static int netlink_seq_show(struct seq_file *seq, void
> *v)
> {
> 	if (v == SEQ_START_TOKEN) {
> 		seq_puts(seq,
> -			 "sk       Eth Pid    Groups   "
> -			 "Rmem     Wmem     Dump     Locks     Drops
> Inode\n");
> +			 "sk               Eth Pid        Groups   "
> +			 "Rmem     Wmem     Dump  Locks    Drops    Inode\n");
> 	} else {
> 		struct sock *s = v;
> 		struct netlink_sock *nlk = nlk_sk(s);
>
> -		seq_printf(seq, "%pK %-3d %-6u %08x %-8d %-8d %d %-8d %-8d
> %-8lu\n",
> +		seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d
> %-8lu\n",
> 			   s,
> 			   s->sk_protocol,
> 			   nlk->portid,
> --
> 2.11.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: Silently dropped UDP packets on kernel 4.14
From: Michal Kubecek @ 2018-05-03  9:42 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kristian Evensen, Netfilter Development Mailing list,
	Network Development
In-Reply-To: <20180503050345.iyasach2ogf25dt3@breakpoint.cc>

On Thu, May 03, 2018 at 07:03:45AM +0200, Florian Westphal wrote:
> Kristian Evensen <kristian.evensen@gmail.com> wrote:
> > I went for the early-insert approached and have patched
> 
> I'm sorry for suggesting that.
> 
> It doesn't work, because of NAT.
> NAT rewrites packet content and changes the reply tuple, but the tuples
> determine the hash insertion location.
> 
> I don't know how to solve this problem.

It's an old problem which surfaces from time to time when some special
conditions make it more visible. When I was facing it in 2015, I found
this thread from as early as 2009:

  https://www.spinics.net/lists/linux-net/msg16712.html

In our case, the customer was using IPVS in "one packet scheduling" mode
(it drops the conntrack entry after each packet) which increased the
probability of insert collisions significantly. Using NFQUEUE 

We were lucky, though, as it turned out the only reason why customer
needed connection tracking was to make sure fragments of long UDP
datagrams are not sent to different real servers. For newer kernels
after commit 6aafeef03b9d ("netfilter: push reasm skb through instead of
original frag skbs"), this was no longer necessary so that they could
disable connection tracking for these packets.

For older kernels without this change, I tried several ideas, each of
which didn't work for some reason. We ended up with rather hacky
workaround, not dropping the packet on collision (so that its conntrack
wasn't inserted into the table and was dropped once the packet was
sent). It worked fine for our customer but like the early insert
approach, it wouldn't work with NAT.

One of the ideas I had was this:

  - keep also unconfirmed conntracks in some data structure
  - check new packets also against unconfirmed conntracks
  - if it matches an unconfirmed conntrack, defer its processing
    until that conntrack is either inserted or discarded

But as it would be rather complicated to implement without races and
harming performance, I didn't want to actually try it until I would
run out of other ideas. With NAT coming to the play, there doesn't seem
to be many other options.

Michal Kubecek

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox