Netdev List
 help / color / mirror / Atom feed
* Optimizing kernel compilation / alignments for network performance
From: Rafał Miłecki @ 2022-04-27 12:04 UTC (permalink / raw)
  To: Network Development, linux-arm-kernel, Russell King, Andrew Lunn,
	Felix Fietkau
  Cc: openwrt-devel@lists.openwrt.org, Florian Fainelli

Hi,

I noticed years ago that kernel changes touching code - that I don't use
at all - can affect network performance for me.

I work with home routers based on Broadcom Northstar platform. Those
are SoCs with not-so-powerful 2 x ARM Cortex-A9 CPU cores. Main task of
those devices is NAT masquerade and that is what I test with iperf
running on two x86 machines.

***

Example of such unused code change:
ce5013ff3bec ("mtd: spi-nor: Add support for XM25QH64A and XM25QH128A").
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce5013ff3bec05cf2a8a05c75fcd520d9914d92b
It lowered my NAT speed from 381 Mb/s to 367 Mb/s (-3,5%).

I first reported that issue it in the e-mail thread:
ARM router NAT performance affected by random/unrelated commits
https://lkml.org/lkml/2019/5/21/349
https://www.spinics.net/lists/linux-block/msg40624.html

Back then it was commit 5b0890a97204 ("flow_dissector: Parse batman-adv
unicast headers")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9316a9ed6895c4ad2f0cde171d486f80c55d8283
that increased my NAT speed from 741 Mb/s to 773 Mb/s (+4,3%).

***

It appears Northstar CPUs have little cache size and so any change in
location of kernel symbols can affect NAT performance. That explains why
changing unrelated code affects anything & it has been partially proven
aligning some of cache-v7.S code.

My question is: is there a way to find out & force an optimal symbols
locations?

Adding .align 5 to the cache-v7.S is a partial success. I'd like to find
out what other functions are worth optimizing (aligning) and force that
(I guess  __attribute__((aligned(32))) could be used).

I can't really draw any conclusions from comparing System.map before and
after above commits as they relocate thousands of symbols in one go.

Optimizing is pretty important for me for two reasons:
1. I want to reach maximum possible NAT masquerade performance
2. I need stable performance across random commits to detect regressions

^ permalink raw reply

* [PATCH net-next] net: prestera: add police action support
From: Volodymyr Mytnyk @ 2022-04-27 12:05 UTC (permalink / raw)
  To: netdev
  Cc: Taras Chornyi, Serhiy Pshyk, Volodymyr Mytnyk, Taras Chornyi,
	David S. Miller, Jakub Kicinski, Paolo Abeni, linux-kernel

- Add HW api to configure policer:
  - SR TCM policer mode is only supported for now.
  - Policer ingress/egress direction support.
- Add police action support into flower

Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
---
 .../net/ethernet/marvell/prestera/prestera_acl.c   | 35 +++++++++-
 .../net/ethernet/marvell/prestera/prestera_acl.h   | 12 ++++
 .../ethernet/marvell/prestera/prestera_flower.c    | 10 +++
 .../net/ethernet/marvell/prestera/prestera_hw.c    | 81 ++++++++++++++++++++++
 .../net/ethernet/marvell/prestera/prestera_hw.h    | 13 ++++
 5 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_acl.c b/drivers/net/ethernet/marvell/prestera/prestera_acl.c
index e5627782fac6..3a141f2db812 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_acl.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_acl.c
@@ -35,6 +35,10 @@ struct prestera_acl_rule_entry {
 			u8 valid:1;
 		} accept, drop, trap;
 		struct {
+			u8 valid:1;
+			struct prestera_acl_action_police i;
+		} police;
+		struct {
 			struct prestera_acl_action_jump i;
 			u8 valid:1;
 		} jump;
@@ -533,6 +537,12 @@ static int __prestera_acl_rule_entry2hw_add(struct prestera_switch *sw,
 		act_hw[act_num].id = PRESTERA_ACL_RULE_ACTION_TRAP;
 		act_num++;
 	}
+	/* police */
+	if (e->police.valid) {
+		act_hw[act_num].id = PRESTERA_ACL_RULE_ACTION_POLICE;
+		act_hw[act_num].police = e->police.i;
+		act_num++;
+	}
 	/* jump */
 	if (e->jump.valid) {
 		act_hw[act_num].id = PRESTERA_ACL_RULE_ACTION_JUMP;
@@ -557,6 +567,9 @@ __prestera_acl_rule_entry_act_destruct(struct prestera_switch *sw,
 {
 	/* counter */
 	prestera_counter_put(sw->counter, e->counter.block, e->counter.id);
+	/* police */
+	if (e->police.valid)
+		prestera_hw_policer_release(sw, e->police.i.id);
 }
 
 void prestera_acl_rule_entry_destroy(struct prestera_acl *acl,
@@ -579,6 +592,8 @@ __prestera_acl_rule_entry_act_construct(struct prestera_switch *sw,
 					struct prestera_acl_rule_entry *e,
 					struct prestera_acl_rule_entry_arg *arg)
 {
+	int err;
+
 	/* accept */
 	e->accept.valid = arg->accept.valid;
 	/* drop */
@@ -588,10 +603,26 @@ __prestera_acl_rule_entry_act_construct(struct prestera_switch *sw,
 	/* jump */
 	e->jump.valid = arg->jump.valid;
 	e->jump.i = arg->jump.i;
+	/* police */
+	if (arg->police.valid) {
+		u8 type = arg->police.ingress ? PRESTERA_POLICER_TYPE_INGRESS :
+						PRESTERA_POLICER_TYPE_EGRESS;
+
+		err = prestera_hw_policer_create(sw, type, &e->police.i.id);
+		if (err)
+			goto err_out;
+
+		err = prestera_hw_policer_sr_tcm_set(sw, e->police.i.id,
+						     arg->police.rate,
+						     arg->police.burst);
+		if (err) {
+			prestera_hw_policer_release(sw, e->police.i.id);
+			goto err_out;
+		}
+		e->police.valid = arg->police.valid;
+	}
 	/* counter */
 	if (arg->count.valid) {
-		int err;
-
 		err = prestera_counter_get(sw->counter, arg->count.client,
 					   &e->counter.block,
 					   &e->counter.id);
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_acl.h b/drivers/net/ethernet/marvell/prestera/prestera_acl.h
index 6d2ad27682d1..f963e1e0c0f0 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_acl.h
+++ b/drivers/net/ethernet/marvell/prestera/prestera_acl.h
@@ -56,6 +56,7 @@ enum prestera_acl_rule_action {
 	PRESTERA_ACL_RULE_ACTION_TRAP = 2,
 	PRESTERA_ACL_RULE_ACTION_JUMP = 5,
 	PRESTERA_ACL_RULE_ACTION_COUNT = 7,
+	PRESTERA_ACL_RULE_ACTION_POLICE = 8,
 
 	PRESTERA_ACL_RULE_ACTION_MAX
 };
@@ -74,6 +75,10 @@ struct prestera_acl_action_jump {
 	u32 index;
 };
 
+struct prestera_acl_action_police {
+	u32 id;
+};
+
 struct prestera_acl_action_count {
 	u32 id;
 };
@@ -86,6 +91,7 @@ struct prestera_acl_rule_entry_key {
 struct prestera_acl_hw_action_info {
 	enum prestera_acl_rule_action id;
 	union {
+		struct prestera_acl_action_police police;
 		struct prestera_acl_action_count count;
 		struct prestera_acl_action_jump jump;
 	};
@@ -107,6 +113,12 @@ struct prestera_acl_rule_entry_arg {
 		} jump;
 		struct {
 			u8 valid:1;
+			u64 rate;
+			u64 burst;
+			bool ingress;
+		} police;
+		struct {
+			u8 valid:1;
 			u32 client;
 		} count;
 	};
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_flower.c b/drivers/net/ethernet/marvell/prestera/prestera_flower.c
index c12b09ac6559..d43e503c644f 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_flower.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_flower.c
@@ -108,6 +108,16 @@ static int prestera_flower_parse_actions(struct prestera_flow_block *block,
 
 			rule->re_arg.trap.valid = 1;
 			break;
+		case FLOW_ACTION_POLICE:
+			if (rule->re_arg.police.valid)
+				return -EEXIST;
+
+			rule->re_arg.police.valid = 1;
+			rule->re_arg.police.rate =
+				act->police.rate_bytes_ps;
+			rule->re_arg.police.burst = act->police.burst;
+			rule->re_arg.police.ingress = true;
+			break;
 		case FLOW_ACTION_GOTO:
 			err = prestera_flower_parse_goto_action(block, rule,
 								chain_index,
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_hw.c b/drivers/net/ethernet/marvell/prestera/prestera_hw.c
index c66cc929c820..79fd3cac539d 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_hw.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_hw.c
@@ -74,6 +74,10 @@ enum prestera_cmd_type_t {
 	PRESTERA_CMD_TYPE_SPAN_UNBIND = 0x1102,
 	PRESTERA_CMD_TYPE_SPAN_RELEASE = 0x1103,
 
+	PRESTERA_CMD_TYPE_POLICER_CREATE = 0x1500,
+	PRESTERA_CMD_TYPE_POLICER_RELEASE = 0x1501,
+	PRESTERA_CMD_TYPE_POLICER_SET = 0x1502,
+
 	PRESTERA_CMD_TYPE_CPU_CODE_COUNTERS_GET = 0x2000,
 
 	PRESTERA_CMD_TYPE_ACK = 0x10000,
@@ -164,6 +168,10 @@ enum {
 };
 
 enum {
+	PRESTERA_POLICER_MODE_SR_TCM
+};
+
+enum {
 	PRESTERA_HW_FDB_ENTRY_TYPE_REG_PORT = 0,
 	PRESTERA_HW_FDB_ENTRY_TYPE_LAG = 1,
 	PRESTERA_HW_FDB_ENTRY_TYPE_MAX = 2,
@@ -430,6 +438,9 @@ struct prestera_msg_acl_action {
 		} jump;
 		struct {
 			__le32 id;
+		} police;
+		struct {
+			__le32 id;
 		} count;
 		__le32 reserved[6];
 	};
@@ -570,6 +581,26 @@ struct mvsw_msg_cpu_code_counter_ret {
 	__le64 packet_count;
 };
 
+struct prestera_msg_policer_req {
+	struct prestera_msg_cmd cmd;
+	__le32 id;
+	union {
+		struct {
+			__le64 cir;
+			__le32 cbs;
+		} __packed sr_tcm; /* make sure always 12 bytes size */
+		__le32 reserved[6];
+	};
+	u8 mode;
+	u8 type;
+	u8 pad[2];
+};
+
+struct prestera_msg_policer_resp {
+	struct prestera_msg_ret ret;
+	__le32 id;
+};
+
 struct prestera_msg_event {
 	__le16 type;
 	__le16 id;
@@ -622,6 +653,7 @@ static void prestera_hw_build_tests(void)
 	BUILD_BUG_ON(sizeof(struct prestera_msg_rif_req) != 36);
 	BUILD_BUG_ON(sizeof(struct prestera_msg_vr_req) != 8);
 	BUILD_BUG_ON(sizeof(struct prestera_msg_lpm_req) != 36);
+	BUILD_BUG_ON(sizeof(struct prestera_msg_policer_req) != 36);
 
 	/*  structure that are part of req/resp fw messages */
 	BUILD_BUG_ON(sizeof(struct prestera_msg_iface) != 16);
@@ -640,6 +672,7 @@ static void prestera_hw_build_tests(void)
 	BUILD_BUG_ON(sizeof(struct prestera_msg_counter_resp) != 24);
 	BUILD_BUG_ON(sizeof(struct prestera_msg_rif_resp) != 12);
 	BUILD_BUG_ON(sizeof(struct prestera_msg_vr_resp) != 12);
+	BUILD_BUG_ON(sizeof(struct prestera_msg_policer_resp) != 12);
 
 	/* check events */
 	BUILD_BUG_ON(sizeof(struct prestera_msg_event_port) != 20);
@@ -1192,6 +1225,9 @@ prestera_acl_rule_add_put_action(struct prestera_msg_acl_action *action,
 	case PRESTERA_ACL_RULE_ACTION_JUMP:
 		action->jump.index = __cpu_to_le32(info->jump.index);
 		break;
+	case PRESTERA_ACL_RULE_ACTION_POLICE:
+		action->police.id = __cpu_to_le32(info->police.id);
+		break;
 	case PRESTERA_ACL_RULE_ACTION_COUNT:
 		action->count.id = __cpu_to_le32(info->count.id);
 		break;
@@ -2163,3 +2199,48 @@ int prestera_hw_counter_clear(struct prestera_switch *sw, u32 block_id,
 	return prestera_cmd(sw, PRESTERA_CMD_TYPE_COUNTER_CLEAR,
 			    &req.cmd, sizeof(req));
 }
+
+int prestera_hw_policer_create(struct prestera_switch *sw, u8 type,
+			       u32 *policer_id)
+{
+	struct prestera_msg_policer_resp resp;
+	struct prestera_msg_policer_req req = {
+		.type = type
+	};
+	int err;
+
+	err = prestera_cmd_ret(sw, PRESTERA_CMD_TYPE_POLICER_CREATE,
+			       &req.cmd, sizeof(req), &resp.ret, sizeof(resp));
+	if (err)
+		return err;
+
+	*policer_id = __le32_to_cpu(resp.id);
+	return 0;
+}
+
+int prestera_hw_policer_release(struct prestera_switch *sw,
+				u32 policer_id)
+{
+	struct prestera_msg_policer_req req = {
+		.id = __cpu_to_le32(policer_id)
+	};
+
+	return prestera_cmd(sw, PRESTERA_CMD_TYPE_POLICER_RELEASE,
+			    &req.cmd, sizeof(req));
+}
+
+int prestera_hw_policer_sr_tcm_set(struct prestera_switch *sw,
+				   u32 policer_id, u64 cir, u32 cbs)
+{
+	struct prestera_msg_policer_req req = {
+		.mode = PRESTERA_POLICER_MODE_SR_TCM,
+		.id = __cpu_to_le32(policer_id),
+		.sr_tcm = {
+			.cir = __cpu_to_le64(cir),
+			.cbs = __cpu_to_le32(cbs)
+		}
+	};
+
+	return prestera_cmd(sw, PRESTERA_CMD_TYPE_POLICER_SET,
+			    &req.cmd, sizeof(req));
+}
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_hw.h b/drivers/net/ethernet/marvell/prestera/prestera_hw.h
index fd896a8838bb..579d9ba23ffc 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_hw.h
+++ b/drivers/net/ethernet/marvell/prestera/prestera_hw.h
@@ -107,6 +107,11 @@ enum {
 	PRESTERA_STP_FORWARD,
 };
 
+enum {
+	PRESTERA_POLICER_TYPE_INGRESS,
+	PRESTERA_POLICER_TYPE_EGRESS
+};
+
 enum prestera_hw_cpu_code_cnt_t {
 	PRESTERA_HW_CPU_CODE_CNT_TYPE_DROP = 0,
 	PRESTERA_HW_CPU_CODE_CNT_TYPE_TRAP = 1,
@@ -288,4 +293,12 @@ prestera_hw_cpu_code_counters_get(struct prestera_switch *sw, u8 code,
 				  enum prestera_hw_cpu_code_cnt_t counter_type,
 				  u64 *packet_count);
 
+/* Policer API */
+int prestera_hw_policer_create(struct prestera_switch *sw, u8 type,
+			       u32 *policer_id);
+int prestera_hw_policer_release(struct prestera_switch *sw,
+				u32 policer_id);
+int prestera_hw_policer_sr_tcm_set(struct prestera_switch *sw,
+				   u32 policer_id, u64 cir, u32 cbs);
+
 #endif /* _PRESTERA_HW_H_ */
-- 
2.7.4


^ permalink raw reply related

* RE: [PATCH net-next 0/2] net: phy: add LAN8742 phy support
From: Woojung.Huh @ 2022-04-27 12:06 UTC (permalink / raw)
  To: Yuiko.Oshino, Yuiko.Oshino, davem, netdev, andrew, Ravi.Hegde,
	UNGLinuxDriver
In-Reply-To: <20220427115142.12642-1-yuiko.oshino@microchip.com>

HI Yuiko,

Thanks for update. I think it's ready to submit.

Woojung

> -----Original Message-----
> From: Yuiko Oshino <yuiko.oshino@microchip.com>
> Sent: Wednesday, April 27, 2022 7:52 AM
> To: Woojung Huh - C21699 <Woojung.Huh@microchip.com>; Yuiko Oshino -
> C18177 <Yuiko.Oshino@microchip.com>; davem@davemloft.net;
> netdev@vger.kernel.org; andrew@lunn.ch; Ravi Hegde - C21689
> <Ravi.Hegde@microchip.com>; UNGLinuxDriver
> <UNGLinuxDriver@microchip.com>
> Subject: [PATCH net-next 0/2] net: phy: add LAN8742 phy support
> 
> add LAN8742 phy support
> update LAN88xx phy ID and phy ID mask so that it can coexist with LAN8742
> 
> The current phy IDs on the available hardware.
>     LAN8742 0x0007C130, 0x0007C131
>     LAN88xx 0x0007C132
> 
> Yuiko Oshino (2):
>   net: phy: microchip: update LAN88xx phy ID and phy ID mask.
>   net: phy: smsc: add LAN8742 phy support.
> 
>  drivers/net/phy/microchip.c |  6 +++---
>  drivers/net/phy/smsc.c      | 27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 3 deletions(-)
> 
> --
> 2.25.1


^ permalink raw reply

* Re: [PATCH net-next 6/7] net: phy: smsc: Cache interrupt mask
From: Andrew Lunn @ 2022-04-27 12:14 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Steve Glendinning, UNGLinuxDriver, Oliver Neukum, David S. Miller,
	Jakub Kicinski, Paolo Abeni, netdev, linux-usb, Andre Edich,
	Oleksij Rempel, Martyn Welch, Gabriel Hojda, Christoph Fritz,
	Lino Sanfilippo, Philipp Rosenberger, Heiner Kallweit,
	Russell King
In-Reply-To: <7603f74ddc32bbaa55e9f1bea5d4024b6e376035.1651037513.git.lukas@wunner.de>

On Wed, Apr 27, 2022 at 07:48:06AM +0200, Lukas Wunner wrote:
> Cache the interrupt mask to avoid re-reading it from the PHY upon every
> interrupt.  The PHY may be located on a USB device, so the additional
> read may unnecessarily increase interrupt overhead and latency.

I don't think your justification is valid. The MDIO bus is clocked at
2.5MHz. So even if you are using USB 1.1 at 12MHz, the USB overheads
are not particularly large. At 480Mbps they are pretty insignificant.

In general, we consider PHYs as slow devices, they take over 1 second
to negotiate a link and declare it up. So we don't do this sort of
micro optimization.

What i think is relevant here is that you could have an interrupt
storm going on because you don't mask interrupts? It is not a true
storm, due to the way USB works, more of a light shower. Do you have
any statistics to show this code actually reduces the amount of rain
in a significant way?

	Andrew

^ permalink raw reply

* [PATCH net-next v2 0/2] net: phy: add LAN8742 phy support
From: Yuiko Oshino @ 2022-04-27 12:19 UTC (permalink / raw)
  To: woojung.huh, yuiko.oshino, davem, netdev, andrew, ravi.hegde,
	UNGLinuxDriver

add LAN8742 phy support
update LAN88xx phy ID and phy ID mask so that it can coexist with LAN8742

The current phy IDs on the available hardware.
    LAN8742 0x0007C130, 0x0007C131
    LAN88xx 0x0007C132

v1->v2:
-removed "REVIEW REQUEST3" from the PATCH 1/2.

Yuiko Oshino (2):
  net: phy: microchip: update LAN88xx phy ID and phy ID mask.
  net: phy: smsc: add LAN8742 phy support.

 drivers/net/phy/microchip.c |  6 +++---
 drivers/net/phy/smsc.c      | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 3 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH net-next v2 1/2] net: phy: microchip: update LAN88xx phy ID and phy ID mask.
From: Yuiko Oshino @ 2022-04-27 12:19 UTC (permalink / raw)
  To: woojung.huh, yuiko.oshino, davem, netdev, andrew, ravi.hegde,
	UNGLinuxDriver
In-Reply-To: <20220427121957.13099-1-yuiko.oshino@microchip.com>

update LAN88xx phy ID and phy ID mask because the existing code conflicts with the LAN8742 phy.

The current phy IDs on the available hardware.
        LAN8742 0x0007C130, 0x0007C131
        LAN88xx 0x0007C132

Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
---
 drivers/net/phy/microchip.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
index 9f1f2b6c97d4..131caf659ed2 100644
--- a/drivers/net/phy/microchip.c
+++ b/drivers/net/phy/microchip.c
@@ -344,8 +344,8 @@ static int lan88xx_config_aneg(struct phy_device *phydev)
 
 static struct phy_driver microchip_phy_driver[] = {
 {
-	.phy_id		= 0x0007c130,
-	.phy_id_mask	= 0xfffffff0,
+	.phy_id		= 0x0007c132,
+	.phy_id_mask	= 0xfffffff2,
 	.name		= "Microchip LAN88xx",
 
 	/* PHY_GBIT_FEATURES */
@@ -369,7 +369,7 @@ static struct phy_driver microchip_phy_driver[] = {
 module_phy_driver(microchip_phy_driver);
 
 static struct mdio_device_id __maybe_unused microchip_tbl[] = {
-	{ 0x0007c130, 0xfffffff0 },
+	{ 0x0007c132, 0xfffffff2 },
 	{ }
 };
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH net-next v2 2/2] net: phy: smsc: add LAN8742 phy support.
From: Yuiko Oshino @ 2022-04-27 12:19 UTC (permalink / raw)
  To: woojung.huh, yuiko.oshino, davem, netdev, andrew, ravi.hegde,
	UNGLinuxDriver
In-Reply-To: <20220427121957.13099-1-yuiko.oshino@microchip.com>

The current phy IDs on the available hardware.
        LAN8742 0x0007C130, 0x0007C131

Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
---
 drivers/net/phy/smsc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index d8cac02a79b9..2b3b5d0ad6f3 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -483,6 +483,32 @@ static struct phy_driver smsc_phy_driver[] = {
 
 	.suspend	= genphy_suspend,
 	.resume		= genphy_resume,
+}, {
+	.phy_id	= 0x0007c130,	/* 0x0007c130 and 0x0007c131 */
+	.phy_id_mask	= 0xfffffff2,
+	.name		= "Microchip LAN8742",
+
+	/* PHY_BASIC_FEATURES */
+	.flags		= PHY_RST_AFTER_CLK_EN,
+
+	.probe		= smsc_phy_probe,
+
+	/* basic functions */
+	.read_status	= lan87xx_read_status,
+	.config_init	= smsc_phy_config_init,
+	.soft_reset	= smsc_phy_reset,
+
+	/* IRQ related */
+	.config_intr	= smsc_phy_config_intr,
+	.handle_interrupt = smsc_phy_handle_interrupt,
+
+	/* Statistics */
+	.get_sset_count = smsc_get_sset_count,
+	.get_strings	= smsc_get_strings,
+	.get_stats	= smsc_get_stats,
+
+	.suspend	= genphy_suspend,
+	.resume	= genphy_resume,
 } };
 
 module_phy_driver(smsc_phy_driver);
@@ -498,6 +524,7 @@ static struct mdio_device_id __maybe_unused smsc_tbl[] = {
 	{ 0x0007c0d0, 0xfffffff0 },
 	{ 0x0007c0f0, 0xfffffff0 },
 	{ 0x0007c110, 0xfffffff0 },
+	{ 0x0007c130, 0xfffffff2 },
 	{ }
 };
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net-next 05/12] dt-bindings: net: dsa: add bindings for Renesas RZ/N1 Advanced 5 port switch
From: Geert Uytterhoeven @ 2022-04-27 12:20 UTC (permalink / raw)
  To: Clément Léger
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Magnus Damm, Heiner Kallweit, Russell King,
	Thomas Petazzoni, Herve Codina, Miquèl Raynal,
	Milan Stevanovic, Jimmy Lalande, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux-Renesas, netdev
In-Reply-To: <20220414122250.158113-6-clement.leger@bootlin.com>

Hi Clément,

On Thu, Apr 14, 2022 at 2:24 PM Clément Léger <clement.leger@bootlin.com> wrote:
> Add bindings for Renesas RZ/N1 Advanced 5 port switch. This switch is
> present on Renesas RZ/N1 SoC and was probably provided by MoreThanIP.
> This company does not exists anymore and has been bought by Synopsys.
> Since this IP can't be find anymore in the Synospsy portfolio, lets use
> Renesas as the vendor compatible for this IP.
>
> Signed-off-by: Clément Léger <clement.leger@bootlin.com>

Thanks for your patch!

> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
> @@ -0,0 +1,128 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/dsa/renesas,rzn1-a5psw.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Renesas RZ/N1 Advanced 5 ports ethernet switch
> +
> +maintainers:
> +  - Clément Léger <clement.leger@bootlin.com>
> +
> +description: |
> +  The advanced 5 ports switch is present on the Renesas RZ/N1 SoC family and
> +  handles 4 ports + 1 CPU management port.
> +
> +allOf:
> +  - $ref: dsa.yaml#
> +
> +properties:
> +  compatible:
> +    const: renesas,rzn1-a5psw

Please document an SoC-specific compatible value
"renesas,r9a06g032-a5psw", too, so we can easily handle differences
between members within the RZ/N1 family, if ever needed.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH memcg v4] net: set proper memcg for net_init hooks allocations
From: Michal Koutný @ 2022-04-27 12:22 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Vasily Averin, Vlastimil Babka, Roman Gushchin, kernel,
	Florian Westphal, LKML, Michal Hocko, Cgroups, netdev,
	David S. Miller, Jakub Kicinski, Paolo Abeni
In-Reply-To: <CALvZod4oaj9MpBDVUp9KGmnqu4F3UxjXgOLkrkvmRfFjA7F1dw@mail.gmail.com>

On Tue, Apr 26, 2022 at 10:23:32PM -0700, Shakeel Butt <shakeelb@google.com> wrote:
> [...]
> >
> > +static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p)
> > +{
> > +       struct mem_cgroup *memcg;
> > +
> 
> Do we need memcg_kmem_enabled() check here or maybe
> mem_cgroup_from_obj() should be doing memcg_kmem_enabled() instead of
> mem_cgroup_disabled() as we can have "cgroup.memory=nokmem" boot
> param.

I reckon such a guard is on the charge side and readers should treat
NULL and root_mem_group equally. Or is there a case when these two are
different? 

(I can see it's different semantics when stored in current->active_memcg
(and active_memcg() getter) but for such "outer" callers like here it
seems equal.)

Regards,
Michal


^ permalink raw reply

* Re: [PATCH net-next 5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
From: Andrew Lunn @ 2022-04-27 12:26 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Steve Glendinning, UNGLinuxDriver, Oliver Neukum, David S. Miller,
	Jakub Kicinski, Paolo Abeni, netdev, linux-usb, Andre Edich,
	Oleksij Rempel, Martyn Welch, Gabriel Hojda, Christoph Fritz,
	Lino Sanfilippo, Philipp Rosenberger, Heiner Kallweit,
	Russell King
In-Reply-To: <276a1b50cf9fcca5168ca2770a863cb56069a277.1651037513.git.lukas@wunner.de>

On Wed, Apr 27, 2022 at 07:48:05AM +0200, Lukas Wunner wrote:
> Link status of SMSC LAN95xx chips is polled once per second, even though
> they're capable of signaling PHY interrupts through the MAC layer.
> 
> Forward those interrupts to the PHY driver to avoid polling.  Benefits
> are reduced bus traffic, reduced CPU overhead and quicker interface
> bringup.
> 
> Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
> smsc95xx: fix link detection for disabled autonegotiation").
> Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
> hence couldn't detect link-up events when auto-negotiation was disabled.
> The proper solution would have been to enable the ENERGYON interrupt
> instead of polling.
> 
> Since then, PHY handling was moved from the LAN95xx driver to the SMSC
> PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
> That PHY driver is capable of link detection with auto-negotiation
> disabled because it enables the ENERGYON interrupt.
> 
> Note that signaling interrupts through the MAC layer not only works with
> the integrated PHY, but also with an external PHY, provided its
> interrupt pin is attached to LAN95xx's nPHY_INT pin.
> 
> In the unlikely event that the interrupt pin of an external PHY is
> attached to a GPIO of the SoC (or not connected at all), the driver can
> be amended to retrieve the irq from the PHY's of_node.
> 
> To forward PHY interrupts to phylib, it is not sufficient to call
> phy_mac_interrupt().  Instead, the PHY's interrupt handler needs to run
> so that PHY interrupts are cleared.  That's because according to page
> 119 of the LAN950x datasheet, "The source of this interrupt is a level.
> The interrupt persists until it is cleared in the PHY."
> 
> https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
> 
> Therefore, create an IRQ domain with a single IRQ for the PHY.  In the
> future, the IRQ domain may be extended to support the 11 GPIOs on the
> LAN95xx.
> 
> Normally the PHY interrupt should be masked until the PHY driver has
> cleared it.  However masking requires a (sleeping) USB transaction and
> interrupts are received in (non-sleepable) softirq context.  I decided
> not to mask the interrupt at all (by using the dummy_irq_chip's noop
> ->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
> intervals and normally that's sufficient to wake the PHY driver's IRQ
> thread and have it clear the interrupt.  If it does take longer, worst
> thing that can happen is the IRQ thread is woken again.  No big deal.
> 
> Because PHY interrupts are now perpetually enabled, there's no need to
> selectively enable them on suspend.  So remove all invocations of
> smsc95xx_enable_phy_wakeup_interrupts().
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Andre Edich <andre.edich@microchip.com>

This looks reasonable from a PHY perspective. I cannot say much about
USB though.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v5 1/3] selftests: bpf: add test for bpf_skb_change_proto
From: Daniel Borkmann @ 2022-04-27 12:34 UTC (permalink / raw)
  To: Lina.Wang, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Shuah Khan, Matthias Brugger
  Cc: Alexei Starovoitov, Andrii Nakryiko, Jesper Dangaard Brouer,
	Eric Dumazet, linux-kernel, Maciej enczykowski, linux-kselftest,
	netdev, bpf
In-Reply-To: <20220418015230.4481-1-Lina.Wang@mediatek.com>

On 4/18/22 3:52 AM, Lina.Wang@mediatek.com wrote:
[...]
>> OT: In Cilium we run similar NAT46/64 translation for XDP and tc/BPF
>> for our LB services [4] (that is,
>> v4 VIP with v6 backends, and v6 VIP with v4 backends).
>>
>>     [4]
>> https://github.com/cilium/cilium/blob/master/bpf/lib/nat_46x64.h
>>         
>> https://github.com/cilium/cilium/blob/master/test/nat46x64/test.sh
> 
> It is complicated for me, my case doesnot use XDP driver.I use xdp_dummy
> just to enable veth NAPI GRO, not real XDP driver code. My test case is
> simple and enough for my patch, I think. I have covered tcp and udp,
> normal and SO_SEGMENT.

Ok, fair enough, then lets resend with the minor fixups as discussed
earlier and worst case we can do the test_progs integration at a later
point to avoid blocking the fix. At least there is something runnable
under net/bpf selftests.. even if not yet in BPF CI but we can follow-up
on it later. Thanks Lina!

^ permalink raw reply

* Re: [PATCH net-next v6 02/13] net: wwan: t7xx: Add control DMA interface
From: Loic Poulain @ 2022-04-27 12:34 UTC (permalink / raw)
  To: Sergey Ryazanov
  Cc: Ricardo Martinez, netdev, linux-wireless, Jakub Kicinski,
	David Miller, Johannes Berg, M Chetan Kumar,
	chandrashekar.devegowda, Intel Corporation, chiranjeevi.rapolu,
	Haijun Liu (刘海军), amir.hanania,
	Andy Shevchenko, dinesh.sharma, eliot.lee, ilpo.johannes.jarvinen,
	moises.veleta, pierre-louis.bossart, muralidharan.sethuraman,
	Soumya.Prakash.Mishra, sreehari.kancharla, madhusmita.sahu
In-Reply-To: <CAHNKnsRt=H_tkqG7CNf15DBYJmmunYy6vsm4HjneN47EQB_uug@mail.gmail.com>

On Tue, 26 Apr 2022 at 02:19, Sergey Ryazanov <ryazanov.s.a@gmail.com> wrote:
>
> Hello Ricardo, Loic, Ilpo,
>
> On Fri, Apr 8, 2022 at 1:37 AM Ricardo Martinez
> <ricardo.martinez@linux.intel.com> wrote:
> > ...
> > Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> > Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> >
> > From a WWAN framework perspective:
> > Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
> >
> > Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> This line with "From a WWAN framework perspective" looks confusing to
> me. Anyone not familiar with all of the iterations will be in doubt as
> to whether it belongs only to Loic's review or to both of them.
>
> How about to format this block like this:
>
> > Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> > Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> > Reviewed-by: Loic Poulain <loic.poulain@linaro.org> (WWAN framework)
> > Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> or like this:
>
> > Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> > Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> > Reviewed-by: Loic Poulain <loic.poulain@linaro.org> # WWAN framework
> > Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> Parentheses vs. comment sign. I saw people use both of these formats,
> I just do not know which is better. What do you think?

My initial comment was to highlight that someone else should double
check the network code, but it wasn't expected to end up in the commit
message. Maybe simply drop this extra comment?

Regards,
Loic

^ permalink raw reply

* Re: [PATCH net-next 0/7] Polling be gone on LAN95xx
From: Oleksij Rempel @ 2022-04-27 12:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Steve Glendinning, UNGLinuxDriver, Oliver Neukum, David S. Miller,
	Jakub Kicinski, Paolo Abeni, netdev, linux-usb, Andre Edich,
	Martyn Welch, Gabriel Hojda, Christoph Fritz, Lino Sanfilippo,
	Philipp Rosenberger, Heiner Kallweit, Andrew Lunn, Russell King
In-Reply-To: <cover.1651037513.git.lukas@wunner.de>

On Wed, Apr 27, 2022 at 07:48:00AM +0200, Lukas Wunner wrote:
> Do away with link status polling on LAN95XX USB Ethernet
> and rely on interrupts instead, thereby reducing bus traffic,
> CPU overhead and improving interface bringup latency.
> 
> The meat of the series is in patch [5/7].  The preceding and
> following patches are various cleanups to prepare for and
> adjust to interrupt-driven link state detection.
> 
> Please review and test.  Thanks!

Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>

Tested on:
- LAN9514 (RPi2)
- LAN9512 (EVB)
- LAN9500 (EVB)

On USB unplug i get some not usable kernel message, but it is not the
show stopper for me:
smsc95xx 1-1.4.1:1.0 eth1: Error updating MAC full duplex mode
smsc95xx 1-1.4.1:1.0 eth1: hardware isn't capable of remote wakeup

Thank you!

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH net] drivers: net: can: Fix deadlock in grcan_close()
From: Andreas Larsson @ 2022-04-27 12:47 UTC (permalink / raw)
  To: Oliver Hartkopp, Duoming Zhou, linux-kernel
  Cc: wg, mkl, davem, kuba, pabeni, linux-can, netdev
In-Reply-To: <caaa6059-6172-e562-e48e-5987884052b9@hartkopp.net>

On 2022-04-26 21:12, Oliver Hartkopp wrote:
> On 25.04.22 06:24, Duoming Zhou wrote:
>> There are deadlocks caused by del_timer_sync(&priv->hang_timer)
>> and del_timer_sync(&priv->rr_timer) in grcan_close(), one of
>> the deadlocks are shown below:
>>
>>     (Thread 1)              |      (Thread 2)
>>                             | grcan_reset_timer()
>> grcan_close()              |  mod_timer()
>>   spin_lock_irqsave() //(1) |  (wait a time)
>>   ...                       | grcan_initiate_running_reset()
>>   del_timer_sync()          |  spin_lock_irqsave() //(2)
>>   (wait timer to stop)      |  ...
>>
>> We hold priv->lock in position (1) of thread 1 and use
>> del_timer_sync() to wait timer to stop, but timer handler
>> also need priv->lock in position (2) of thread 2.
>> As a result, grcan_close() will block forever.
>>
>> This patch extracts del_timer_sync() from the protection of
>> spin_lock_irqsave(), which could let timer handler to obtain
>> the needed lock.
>>
>> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
>> ---
>>   drivers/net/can/grcan.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/can/grcan.c b/drivers/net/can/grcan.c
>> index d0c5a7a60da..1189057b5d6 100644
>> --- a/drivers/net/can/grcan.c
>> +++ b/drivers/net/can/grcan.c
>> @@ -1102,8 +1102,10 @@ static int grcan_close(struct net_device *dev)
>>       priv->closing = true;
>>       if (priv->need_txbug_workaround) {
>> +        spin_unlock_irqrestore(&priv->lock, flags);
>>           del_timer_sync(&priv->hang_timer);
>>           del_timer_sync(&priv->rr_timer);
>> +        spin_lock_irqsave(&priv->lock, flags);
> 
> It looks weird to unlock and re-lock the operations like this. This 
> breaks the intended locking for the closing process.
> 
> Isn't there any possibility to e.g. move that entire if-section before 
> the lock?

All functions wishing to start the timers both check priv->closing and
then, if false, start the timer within the priv->lock spinlock. Given
that, it should be ok that del_timer_sync is not done within the
spinlock as therefore no one can restart any timers after priv->closing
has been set to true.

It looks a bit weird, but setting priv->closing to true needs to happen
within the priv->lock spinlock protection and needs to happen before
del_timer_sync to avoid a race between grcan_close and someone starting
the timer.

Reviewed-by: Andreas Larsson <andreas@gaisler.com>

-- 
Andreas Larsson

^ permalink raw reply

* Re: [PATCH net-next 6/7] net: phy: smsc: Cache interrupt mask
From: Lukas Wunner @ 2022-04-27 12:51 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Steve Glendinning, UNGLinuxDriver, Oliver Neukum, David S. Miller,
	Jakub Kicinski, Paolo Abeni, netdev, linux-usb, Andre Edich,
	Oleksij Rempel, Martyn Welch, Gabriel Hojda, Christoph Fritz,
	Lino Sanfilippo, Philipp Rosenberger, Heiner Kallweit,
	Russell King
In-Reply-To: <Ymkzno2IbyNbFrEL@lunn.ch>

On Wed, Apr 27, 2022 at 02:14:22PM +0200, Andrew Lunn wrote:
> On Wed, Apr 27, 2022 at 07:48:06AM +0200, Lukas Wunner wrote:
> > Cache the interrupt mask to avoid re-reading it from the PHY upon every
> > interrupt.  The PHY may be located on a USB device, so the additional
> > read may unnecessarily increase interrupt overhead and latency.
> 
> I don't think your justification is valid. The MDIO bus is clocked at
> 2.5MHz. So even if you are using USB 1.1 at 12MHz, the USB overheads
> are not particularly large. At 480Mbps they are pretty insignificant.
> 
> In general, we consider PHYs as slow devices, they take over 1 second
> to negotiate a link and declare it up. So we don't do this sort of
> micro optimization.
> 
> What i think is relevant here is that you could have an interrupt
> storm going on because you don't mask interrupts? It is not a true
> storm, due to the way USB works, more of a light shower. Do you have
> any statistics to show this code actually reduces the amount of rain
> in a significant way?

TBH the primary motivation for this change is that it simplifies the
succeeding commit ("Cope with hot-removal in interrupt handler").

Additionally it seemed silly to me to re-read the interrupt mask
every time for no reason at all.  To test and debug this series
I logged every MDIO read/write and these nonsensical transactions
are very visible and very annoying in the log output.

So yeah, maybe the latency argument isn't very strong, but there
are other arguments which I didn't deem necessary mentioning in
the commit message as they seemed somewhat egotistical. :)

Thanks,

Lukas

^ permalink raw reply

* Re: [PATCH net-next 05/12] dt-bindings: net: dsa: add bindings for Renesas RZ/N1 Advanced 5 port switch
From: Clément Léger @ 2022-04-27 12:56 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Rob Herring,
	Krzysztof Kozlowski, Magnus Damm, Heiner Kallweit, Russell King,
	Thomas Petazzoni, Herve Codina, Miquèl Raynal,
	Milan Stevanovic, Jimmy Lalande, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux-Renesas, netdev
In-Reply-To: <CAMuHMdU+kosUPavthyPcWVAC_WhdwXiFKt61oSmgdV6Qxk_0xg@mail.gmail.com>

Le Wed, 27 Apr 2022 14:20:33 +0200,
Geert Uytterhoeven <geert@linux-m68k.org> a écrit :

> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
> > @@ -0,0 +1,128 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/net/dsa/renesas,rzn1-a5psw.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Renesas RZ/N1 Advanced 5 ports ethernet switch
> > +
> > +maintainers:
> > +  - Clément Léger <clement.leger@bootlin.com>
> > +
> > +description: |
> > +  The advanced 5 ports switch is present on the Renesas RZ/N1 SoC family and
> > +  handles 4 ports + 1 CPU management port.
> > +
> > +allOf:
> > +  - $ref: dsa.yaml#
> > +
> > +properties:
> > +  compatible:
> > +    const: renesas,rzn1-a5psw  
> 
> Please document an SoC-specific compatible value
> "renesas,r9a06g032-a5psw", too, so we can easily handle differences
> between members within the RZ/N1 family, if ever needed.

Hi Geert,

Thanks, I already did that for the V2 after your first comment on the
MII converter bindings ! I'll sent a V2 soon.

Clément

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds



-- 
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

^ permalink raw reply

* Re: Optimizing kernel compilation / alignments for network performance
From: Alexander Lobakin @ 2022-04-27 12:56 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Alexander Lobakin, Network Development, linux-arm-kernel,
	Russell King, Andrew Lunn, Felix Fietkau,
	openwrt-devel@lists.openwrt.org, Florian Fainelli
In-Reply-To: <84f25f73-1fab-fe43-70eb-45d25b614b4c@gmail.com>

From: Rafał Miłecki <zajec5@gmail.com>
Date: Wed, 27 Apr 2022 14:04:54 +0200

> Hi,

Hej,

> 
> I noticed years ago that kernel changes touching code - that I don't use
> at all - can affect network performance for me.
> 
> I work with home routers based on Broadcom Northstar platform. Those
> are SoCs with not-so-powerful 2 x ARM Cortex-A9 CPU cores. Main task of
> those devices is NAT masquerade and that is what I test with iperf
> running on two x86 machines.
> 
> ***
> 
> Example of such unused code change:
> ce5013ff3bec ("mtd: spi-nor: Add support for XM25QH64A and XM25QH128A").
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce5013ff3bec05cf2a8a05c75fcd520d9914d92b
> It lowered my NAT speed from 381 Mb/s to 367 Mb/s (-3,5%).
> 
> I first reported that issue it in the e-mail thread:
> ARM router NAT performance affected by random/unrelated commits
> https://lkml.org/lkml/2019/5/21/349
> https://www.spinics.net/lists/linux-block/msg40624.html
> 
> Back then it was commit 5b0890a97204 ("flow_dissector: Parse batman-adv
> unicast headers")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9316a9ed6895c4ad2f0cde171d486f80c55d8283
> that increased my NAT speed from 741 Mb/s to 773 Mb/s (+4,3%).
> 
> ***
> 
> It appears Northstar CPUs have little cache size and so any change in
> location of kernel symbols can affect NAT performance. That explains why
> changing unrelated code affects anything & it has been partially proven
> aligning some of cache-v7.S code.
> 
> My question is: is there a way to find out & force an optimal symbols
> locations?

Take a look at CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B[0]. I've been
fighting with the same issue on some Realtek MIPS boards: random
code changes in random kernel core parts were affecting NAT /
network performance. This option resolved this I'd say, for the cost
of slightly increased vmlinux size (almost no change in vmlinuz
size).
The only thing is that it was recently restricted to a set of
architectures and MIPS and ARM32 are not included now lol. So it's
either a matter of expanding the list (since it was restricted only
because `-falign-functions=` is not supported on some architectures)
or you can just do:

make KCFLAGS=-falign-functions=64 # replace 64 with your I-cache size

The actual alignment is something to play with, I stopped on the
cacheline size, 32 in my case.
Also, this does not provide any guarantees that you won't suffer
from random data cacheline changes. There were some initiatives to
introduce debug alignment of data as well, but since function are
often bigger than 32, while variables are usually much smaller, it
was increasing the vmlinux size by a ton (imagine each u32 variable
occupying 32-64 bytes instead of 4). But the chance of catching this
is much lower than to suffer from I-cache function misplacement.

> 
> Adding .align 5 to the cache-v7.S is a partial success. I'd like to find
> out what other functions are worth optimizing (aligning) and force that
> (I guess  __attribute__((aligned(32))) could be used).
> 
> I can't really draw any conclusions from comparing System.map before and
> after above commits as they relocate thousands of symbols in one go.
> 
> Optimizing is pretty important for me for two reasons:
> 1. I want to reach maximum possible NAT masquerade performance
> 2. I need stable performance across random commits to detect regressions

[0] https://elixir.bootlin.com/linux/v5.18-rc4/K/ident/CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B

Thanks,
Al

^ permalink raw reply

* [PATCH v2] net: dsa: mv88e6xxx: Single chip mode detection for MV88E6*41
From: Nathan Rossi @ 2022-04-27 13:09 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Nathan Rossi, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski, Paolo Abeni

The mv88e6xxx driver expects switches that are configured in single chip
addressing mode to have the MDIO address configured as 0. This is due to
the switch ADDR pins representing the single chip addressing mode as 0.
However depending on the device (e.g. MV88E6*41) the switch does not
respond on address 0 or any other address below 16 (the first port
address) in single chip addressing mode. This allows for other devices
to be on the same shared MDIO bus despite the switch being in single
chip addressing mode.

When using a switch that works this way it is not possible to configure
switch driver as single chip addressing via device tree, along with
another MDIO device on the same bus with address 0, as both devices
would have the same address of 0 resulting in mdiobus_register_device
-EBUSY errors for one of the devices with address 0.

In order to support this configuration the switch node can have its MDIO
address configured as 16 (the first address that the device responds
to). During initialization the driver will treat this address similar to
how address 0 is, however because this address is also a valid
multi-chip address (in certain switch models, but not all) the driver
will configure the SMI in single chip addressing mode and attempt to
detect the switch model. If the device is configured in single chip
addressing mode this will succeed and the initialization process can
continue. If it fails to detect a valid model this is because the switch
model register is not a valid register when in multi-chip mode, it will
then fall back to the existing SMI initialization process using the MDIO
address as the multi-chip mode address.

This detection method is safe if the device is in either mode because
the single chip addressing mode read is a direct SMI/MDIO read operation
and has no side effects compared to the SMI writes required for the
multi-chip addressing mode.

In order to implement this change, the reset gpio configuration is moved
to occur before any SMI initialization. This ensures that the device has
the same/correct reset gpio state for both mv88e6xxx_smi_init calls.

Signed-off-by: Nathan Rossi <nathan@nathanrossi.com>
---
Changes in v2:
- Return result of mv88e6xxx_single_chip_detect into err variable to
  clarify the result is in error value and not a true/false result
---
 drivers/net/dsa/mv88e6xxx/chip.c | 46 ++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 64f4fdd029..b8f956eece 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -6276,6 +6276,32 @@ static int mv88e6xxx_detect(struct mv88e6xxx_chip *chip)
 	return 0;
 }
 
+static int mv88e6xxx_single_chip_detect(struct mv88e6xxx_chip *chip,
+					struct mdio_device *mdiodev)
+{
+	int err;
+
+	/* dual_chip takes precedence over single/multi-chip modes */
+	if (chip->info->dual_chip)
+		return -EINVAL;
+
+	/* If the mdio addr is 16 indicating the first port address of a switch
+	 * (e.g. mv88e6*41) in single chip addressing mode the device may be
+	 * configured in single chip addressing mode. Setup the smi access as
+	 * single chip addressing mode and attempt to detect the model of the
+	 * switch, if this fails the device is not configured in single chip
+	 * addressing mode.
+	 */
+	if (mdiodev->addr != 16)
+		return -EINVAL;
+
+	err = mv88e6xxx_smi_init(chip, mdiodev->bus, 0);
+	if (err)
+		return err;
+
+	return mv88e6xxx_detect(chip);
+}
+
 static struct mv88e6xxx_chip *mv88e6xxx_alloc_chip(struct device *dev)
 {
 	struct mv88e6xxx_chip *chip;
@@ -6959,10 +6985,6 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 
 	chip->info = compat_info;
 
-	err = mv88e6xxx_smi_init(chip, mdiodev->bus, mdiodev->addr);
-	if (err)
-		goto out;
-
 	chip->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
 	if (IS_ERR(chip->reset)) {
 		err = PTR_ERR(chip->reset);
@@ -6971,9 +6993,19 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	if (chip->reset)
 		usleep_range(1000, 2000);
 
-	err = mv88e6xxx_detect(chip);
-	if (err)
-		goto out;
+	/* Detect if the device is configured in single chip addressing mode,
+	 * otherwise continue with address specific smi init/detection.
+	 */
+	err = mv88e6xxx_single_chip_detect(chip, mdiodev);
+	if (err) {
+		err = mv88e6xxx_smi_init(chip, mdiodev->bus, mdiodev->addr);
+		if (err)
+			goto out;
+
+		err = mv88e6xxx_detect(chip);
+		if (err)
+			goto out;
+	}
 
 	if (chip->info->edsa_support == MV88E6XXX_EDSA_SUPPORTED)
 		chip->tag_protocol = DSA_TAG_PROTO_EDSA;
---
2.36.0

^ permalink raw reply related

* [PATCH v2 0/5] hv_sock: Hardening changes
From: Andrea Parri (Microsoft) @ 2022-04-27 13:12 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)

Changes since v1[1]:
  - Expand commit message of patch #2
  - Coding style changes

Changes since RFC[2]:
  - Massage changelogs, fix typo
  - Drop "hv_sock: Initialize send_buf in hvs_stream_enqueue()"
  - Remove style/newline change
  - Remove/"inline" hv_pkt_iter_first_raw()

Applies to v5.18-rc4.

Thanks,
  Andrea

[1] https://lkml.kernel.org/r/20220420200720.434717-1-parri.andrea@gmail.com
[2] https://lkml.kernel.org/r/20220413204742.5539-1-parri.andrea@gmail.com

Andrea Parri (Microsoft) (5):
  hv_sock: Check hv_pkt_iter_first_raw()'s return value
  hv_sock: Copy packets sent by Hyper-V out of the ring buffer
  hv_sock: Add validation for untrusted Hyper-V values
  Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
  Drivers: hv: vmbus: Refactor the ring-buffer iterator functions

 drivers/hv/channel_mgmt.c        |  8 ++++--
 drivers/hv/ring_buffer.c         | 32 ++++++---------------
 include/linux/hyperv.h           | 48 ++++++++++----------------------
 net/vmw_vsock/hyperv_transport.c | 21 +++++++++++---
 4 files changed, 47 insertions(+), 62 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH v2 1/5] hv_sock: Check hv_pkt_iter_first_raw()'s return value
From: Andrea Parri (Microsoft) @ 2022-04-27 13:12 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220427131225.3785-1-parri.andrea@gmail.com>

The function returns NULL if the ring buffer doesn't contain enough
readable bytes to constitute a packet descriptor.  The ring buffer's
write_index is in memory which is shared with the Hyper-V host, an
erroneous or malicious host could thus change its value and overturn
the result of hvs_stream_has_data().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/hyperv_transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e111e13b66604..943352530936e 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -603,6 +603,8 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
 
 	if (need_refill) {
 		hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan);
+		if (!hvs->recv_desc)
+			return -ENOBUFS;
 		ret = hvs_update_recv_data(hvs);
 		if (ret)
 			return ret;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 2/5] hv_sock: Copy packets sent by Hyper-V out of the ring buffer
From: Andrea Parri (Microsoft) @ 2022-04-27 13:12 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220427131225.3785-1-parri.andrea@gmail.com>

Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver
within the guest VM.  Hyper-V can send packets with erroneous values or
modify packet fields after they are processed by the guest.  To defend
against these scenarios, copy the incoming packet after validating its
length and offset fields using hv_pkt_iter_{first,next}().  Use
HVS_PKT_LEN(HVS_MTU_SIZE) to initialize the buffer which holds the
copies of the incoming packets.  In this way, the packet can no longer
be modified by the host.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 net/vmw_vsock/hyperv_transport.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 943352530936e..8c37d07017fc4 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -78,6 +78,9 @@ struct hvs_send_buf {
 					 ALIGN((payload_len), 8) + \
 					 VMBUS_PKT_TRAILER_SIZE)
 
+/* Upper bound on the size of a VMbus packet for hv_sock */
+#define HVS_MAX_PKT_SIZE	HVS_PKT_LEN(HVS_MTU_SIZE)
+
 union hvs_service_id {
 	guid_t	srv_id;
 
@@ -378,6 +381,8 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
 	}
 
+	chan->max_pkt_size = HVS_MAX_PKT_SIZE;
+
 	ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
 			 conn_from_host ? new : sk);
 	if (ret != 0) {
@@ -602,7 +607,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
 		return -EOPNOTSUPP;
 
 	if (need_refill) {
-		hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan);
+		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
 		if (!hvs->recv_desc)
 			return -ENOBUFS;
 		ret = hvs_update_recv_data(hvs);
@@ -618,7 +623,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
 
 	hvs->recv_data_len -= to_read;
 	if (hvs->recv_data_len == 0) {
-		hvs->recv_desc = hv_pkt_iter_next_raw(hvs->chan, hvs->recv_desc);
+		hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
 		if (hvs->recv_desc) {
 			ret = hvs_update_recv_data(hvs);
 			if (ret)
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 4/5] Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
From: Andrea Parri (Microsoft) @ 2022-04-27 13:12 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220427131225.3785-1-parri.andrea@gmail.com>

So that isolated guests can communicate with the host via hv_sock
channels.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 8 ++++++--
 include/linux/hyperv.h    | 8 ++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 67be81208a2d9..d800220ee54f4 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -976,13 +976,17 @@ find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer)
 	return channel;
 }
 
-static bool vmbus_is_valid_device(const guid_t *guid)
+static bool vmbus_is_valid_offer(const struct vmbus_channel_offer_channel *offer)
 {
+	const guid_t *guid = &offer->offer.if_type;
 	u16 i;
 
 	if (!hv_is_isolation_supported())
 		return true;
 
+	if (is_hvsock_offer(offer))
+		return true;
+
 	for (i = 0; i < ARRAY_SIZE(vmbus_devs); i++) {
 		if (guid_equal(guid, &vmbus_devs[i].guid))
 			return vmbus_devs[i].allowed_in_isolated;
@@ -1004,7 +1008,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 
 	trace_vmbus_onoffer(offer);
 
-	if (!vmbus_is_valid_device(&offer->offer.if_type)) {
+	if (!vmbus_is_valid_offer(offer)) {
 		pr_err_ratelimited("Invalid offer %d from the host supporting isolation\n",
 				   offer->child_relid);
 		atomic_dec(&vmbus_connection.offer_in_progress);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 55478a6810b60..1112c5cf894e6 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1044,10 +1044,14 @@ struct vmbus_channel {
 u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr);
 u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id);
 
+static inline bool is_hvsock_offer(const struct vmbus_channel_offer_channel *o)
+{
+	return !!(o->offer.chn_flags & VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
+}
+
 static inline bool is_hvsock_channel(const struct vmbus_channel *c)
 {
-	return !!(c->offermsg.offer.chn_flags &
-		  VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
+	return is_hvsock_offer(&c->offermsg);
 }
 
 static inline bool is_sub_channel(const struct vmbus_channel *c)
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 5/5] Drivers: hv: vmbus: Refactor the ring-buffer iterator functions
From: Andrea Parri (Microsoft) @ 2022-04-27 13:12 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220427131225.3785-1-parri.andrea@gmail.com>

With no users of hv_pkt_iter_next_raw() and no "external" users of
hv_pkt_iter_first_raw(), the iterator functions can be refactored
and simplified to remove some indirection/code.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/ring_buffer.c | 32 +++++++++-----------------------
 include/linux/hyperv.h   | 35 ++++-------------------------------
 2 files changed, 13 insertions(+), 54 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 3d215d9dec433..fa98b3a91206a 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -421,7 +421,7 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 	memcpy(buffer, (const char *)desc + offset, packetlen);
 
 	/* Advance ring index to next packet descriptor */
-	__hv_pkt_iter_next(channel, desc, true);
+	__hv_pkt_iter_next(channel, desc);
 
 	/* Notify host of update */
 	hv_pkt_iter_close(channel);
@@ -456,22 +456,6 @@ static u32 hv_pkt_iter_avail(const struct hv_ring_buffer_info *rbi)
 		return (rbi->ring_datasize - priv_read_loc) + write_loc;
 }
 
-/*
- * Get first vmbus packet without copying it out of the ring buffer
- */
-struct vmpacket_descriptor *hv_pkt_iter_first_raw(struct vmbus_channel *channel)
-{
-	struct hv_ring_buffer_info *rbi = &channel->inbound;
-
-	hv_debug_delay_test(channel, MESSAGE_DELAY);
-
-	if (hv_pkt_iter_avail(rbi) < sizeof(struct vmpacket_descriptor))
-		return NULL;
-
-	return (struct vmpacket_descriptor *)(hv_get_ring_buffer(rbi) + rbi->priv_read_index);
-}
-EXPORT_SYMBOL_GPL(hv_pkt_iter_first_raw);
-
 /*
  * Get first vmbus packet from ring buffer after read_index
  *
@@ -483,11 +467,14 @@ struct vmpacket_descriptor *hv_pkt_iter_first(struct vmbus_channel *channel)
 	struct vmpacket_descriptor *desc, *desc_copy;
 	u32 bytes_avail, pkt_len, pkt_offset;
 
-	desc = hv_pkt_iter_first_raw(channel);
-	if (!desc)
+	hv_debug_delay_test(channel, MESSAGE_DELAY);
+
+	bytes_avail = hv_pkt_iter_avail(rbi);
+	if (bytes_avail < sizeof(struct vmpacket_descriptor))
 		return NULL;
+	bytes_avail = min(rbi->pkt_buffer_size, bytes_avail);
 
-	bytes_avail = min(rbi->pkt_buffer_size, hv_pkt_iter_avail(rbi));
+	desc = (struct vmpacket_descriptor *)(hv_get_ring_buffer(rbi) + rbi->priv_read_index);
 
 	/*
 	 * Ensure the compiler does not use references to incoming Hyper-V values (which
@@ -534,8 +521,7 @@ EXPORT_SYMBOL_GPL(hv_pkt_iter_first);
  */
 struct vmpacket_descriptor *
 __hv_pkt_iter_next(struct vmbus_channel *channel,
-		   const struct vmpacket_descriptor *desc,
-		   bool copy)
+		   const struct vmpacket_descriptor *desc)
 {
 	struct hv_ring_buffer_info *rbi = &channel->inbound;
 	u32 packetlen = desc->len8 << 3;
@@ -548,7 +534,7 @@ __hv_pkt_iter_next(struct vmbus_channel *channel,
 		rbi->priv_read_index -= dsize;
 
 	/* more data? */
-	return copy ? hv_pkt_iter_first(channel) : hv_pkt_iter_first_raw(channel);
+	return hv_pkt_iter_first(channel);
 }
 EXPORT_SYMBOL_GPL(__hv_pkt_iter_next);
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 1112c5cf894e6..370adc9971d3e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1673,55 +1673,28 @@ static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc)
 	return desc->len8 << 3;
 }
 
-struct vmpacket_descriptor *
-hv_pkt_iter_first_raw(struct vmbus_channel *channel);
-
 struct vmpacket_descriptor *
 hv_pkt_iter_first(struct vmbus_channel *channel);
 
 struct vmpacket_descriptor *
 __hv_pkt_iter_next(struct vmbus_channel *channel,
-		   const struct vmpacket_descriptor *pkt,
-		   bool copy);
+		   const struct vmpacket_descriptor *pkt);
 
 void hv_pkt_iter_close(struct vmbus_channel *channel);
 
 static inline struct vmpacket_descriptor *
-hv_pkt_iter_next_pkt(struct vmbus_channel *channel,
-		     const struct vmpacket_descriptor *pkt,
-		     bool copy)
+hv_pkt_iter_next(struct vmbus_channel *channel,
+		 const struct vmpacket_descriptor *pkt)
 {
 	struct vmpacket_descriptor *nxt;
 
-	nxt = __hv_pkt_iter_next(channel, pkt, copy);
+	nxt = __hv_pkt_iter_next(channel, pkt);
 	if (!nxt)
 		hv_pkt_iter_close(channel);
 
 	return nxt;
 }
 
-/*
- * Get next packet descriptor without copying it out of the ring buffer
- * If at end of list, return NULL and update host.
- */
-static inline struct vmpacket_descriptor *
-hv_pkt_iter_next_raw(struct vmbus_channel *channel,
-		     const struct vmpacket_descriptor *pkt)
-{
-	return hv_pkt_iter_next_pkt(channel, pkt, false);
-}
-
-/*
- * Get next packet descriptor from iterator
- * If at end of list, return NULL and update host.
- */
-static inline struct vmpacket_descriptor *
-hv_pkt_iter_next(struct vmbus_channel *channel,
-		 const struct vmpacket_descriptor *pkt)
-{
-	return hv_pkt_iter_next_pkt(channel, pkt, true);
-}
-
 #define foreach_vmbus_pkt(pkt, channel) \
 	for (pkt = hv_pkt_iter_first(channel); pkt; \
 	    pkt = hv_pkt_iter_next(channel, pkt))
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 3/5] hv_sock: Add validation for untrusted Hyper-V values
From: Andrea Parri (Microsoft) @ 2022-04-27 13:12 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220427131225.3785-1-parri.andrea@gmail.com>

For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer.  Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hvs_stream_dequeue().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 include/linux/hyperv.h           |  5 +++++
 net/vmw_vsock/hyperv_transport.c | 10 ++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index fe2e0179ed51e..55478a6810b60 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1663,6 +1663,11 @@ static inline u32 hv_pkt_datalen(const struct vmpacket_descriptor *desc)
 	return (desc->len8 << 3) - (desc->offset8 << 3);
 }
 
+/* Get packet length associated with descriptor */
+static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc)
+{
+	return desc->len8 << 3;
+}
 
 struct vmpacket_descriptor *
 hv_pkt_iter_first_raw(struct vmbus_channel *channel);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 8c37d07017fc4..fd98229e3db30 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -577,12 +577,18 @@ static bool hvs_dgram_allow(u32 cid, u32 port)
 static int hvs_update_recv_data(struct hvsock *hvs)
 {
 	struct hvs_recv_buf *recv_buf;
-	u32 payload_len;
+	u32 pkt_len, payload_len;
+
+	pkt_len = hv_pkt_len(hvs->recv_desc);
+
+	if (pkt_len < HVS_HEADER_LEN)
+		return -EIO;
 
 	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
 	payload_len = recv_buf->hdr.data_size;
 
-	if (payload_len > HVS_MTU_SIZE)
+	if (payload_len > pkt_len - HVS_HEADER_LEN ||
+	    payload_len > HVS_MTU_SIZE)
 		return -EIO;
 
 	if (payload_len == 0)
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net-next v6 02/13] net: wwan: t7xx: Add control DMA interface
From: Sergey Ryazanov @ 2022-04-27 13:17 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Ricardo Martinez, netdev, linux-wireless, Jakub Kicinski,
	David Miller, Johannes Berg, M Chetan Kumar,
	Devegowda, Chandrashekar, Intel Corporation, chiranjeevi.rapolu,
	Haijun Liu (刘海军), Hanania, Amir,
	Andy Shevchenko, Sharma, Dinesh, Lee, Eliot,
	Jarvinen, Ilpo Johannes, Veleta, Moises, Bossart, Pierre-louis,
	Sethuraman, Muralidharan, Mishra, Soumya Prakash,
	Kancharla, Sreehari, Sahu, Madhusmita
In-Reply-To: <CAMZdPi90Joo8+_44ceqS3k8ez08W_AX-eWs42F0ztDN67WR2Pw@mail.gmail.com>

On Wed, Apr 27, 2022 at 3:35 PM Loic Poulain <loic.poulain@linaro.org> wrote:
> On Tue, 26 Apr 2022 at 02:19, Sergey Ryazanov <ryazanov.s.a@gmail.com> wrote:
>> On Fri, Apr 8, 2022 at 1:37 AM Ricardo Martinez
>> <ricardo.martinez@linux.intel.com> wrote:
>>> ...
>>> Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>> Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>>
>>> From a WWAN framework perspective:
>>> Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
>>>
>>> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>
>> This line with "From a WWAN framework perspective" looks confusing to
>> me. Anyone not familiar with all of the iterations will be in doubt as
>> to whether it belongs only to Loic's review or to both of them.
>>
>> How about to format this block like this:
>>
>>> Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>> Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> (WWAN framework)
>>> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>
>> or like this:
>>
>>> Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>> Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> # WWAN framework
>>> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>
>> Parentheses vs. comment sign. I saw people use both of these formats,
>> I just do not know which is better. What do you think?
>
> My initial comment was to highlight that someone else should double
> check the network code, but it wasn't expected to end up in the commit
> message. Maybe simply drop this extra comment?

Yep, this drastically solves the problem with comment format :) I do not mind.

-- 
Sergey

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox