netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 net-next 0/2] net: dsa: mv88e6xxx: Improve indirect addressing performance
@ 2022-01-28 16:26 Tobias Waldekranz
  2022-01-28 16:26 ` [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling Tobias Waldekranz
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Tobias Waldekranz @ 2022-01-28 16:26 UTC (permalink / raw)
  To: davem, kuba; +Cc: netdev, andrew, David.Laight

The individual patches have all the details. This work was triggered
by recent work on a platform that took 16s (sic) to load the mv88e6xxx
module.

The first patch gets rid of most of that time by replacing a very long
delay with a tighter poll loop to wait for the busy bit to clear.

The second patch shaves off some more time by avoiding redundant
busy-bit-checks, saving 1 out of 4 MDIO operations for every register
read/write in the optimal case.

v1 -> v2:
- Make sure that we always poll the busy bit at least twice, in the
  unlikely event that the first one is quick to query the hardware,
  but is then scheduled out for a long time before the timeout is
  checked.

v2 -> v3:
- Fallback to the longer sleeps after the initial two poll attempts.

Tobias Waldekranz (2):
  net: dsa: mv88e6xxx: Improve performance of busy bit polling
  net: dsa: mv88e6xxx: Improve indirect addressing performance

 drivers/net/dsa/mv88e6xxx/chip.c | 13 +++++++++---
 drivers/net/dsa/mv88e6xxx/chip.h |  1 +
 drivers/net/dsa/mv88e6xxx/smi.c  | 35 +++++++++++++++++++++-----------
 3 files changed, 34 insertions(+), 15 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling
  2022-01-28 16:26 [PATCH v3 net-next 0/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
@ 2022-01-28 16:26 ` Tobias Waldekranz
  2022-01-28 17:17   ` Andrew Lunn
  2022-01-28 16:26 ` [PATCH v3 net-next 2/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
  2022-01-31 11:40 ` [PATCH v3 net-next 0/2] " patchwork-bot+netdevbpf
  2 siblings, 1 reply; 5+ messages in thread
From: Tobias Waldekranz @ 2022-01-28 16:26 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, andrew, David.Laight, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, linux-kernel

Avoid a long delay when a busy bit is still set and has to be polled
again.

Measurements on a system with 2 Opals (6097F) and one Agate (6352)
show that even with this much tighter loop, we have about a 50% chance
of the bit being cleared on the first poll, all other accesses see the
bit being cleared on the second poll.

On a standard MDIO bus running MDC at 2.5MHz, a single access with 32
bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us.

This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in
the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow
case - bringing the average close to 1ms.

With this change in place, the slow case is closer to 2*26us + CPU
overhead, with the average well below 100us - a 10x improvement.

This translates to real-world winnings. On a 3-chip 20-port system,
the modprobe time drops by 88%:

Before:

root@coronet:~# time modprobe mv88e6xxx
real    0m 15.99s
user    0m 0.00s
sys     0m 1.52s

After:

root@coronet:~# time modprobe mv88e6xxx
real    0m 2.21s
user    0m 0.00s
sys     0m 1.54s

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 13 ++++++++++---
 drivers/net/dsa/mv88e6xxx/smi.c  | 11 +++++++++--
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 58ca684d73f7..1023e4549359 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -86,12 +86,16 @@ int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val)
 int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
 			u16 mask, u16 val)
 {
+	const unsigned long timeout = jiffies + msecs_to_jiffies(50);
 	u16 data;
 	int err;
 	int i;
 
-	/* There's no bus specific operation to wait for a mask */
-	for (i = 0; i < 16; i++) {
+	/* There's no bus specific operation to wait for a mask. Even
+	 * if the initial poll takes longer than 50ms, always do at
+	 * least one more attempt.
+	 */
+	for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
 		err = mv88e6xxx_read(chip, addr, reg, &data);
 		if (err)
 			return err;
@@ -99,7 +103,10 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
 		if ((data & mask) == val)
 			return 0;
 
-		usleep_range(1000, 2000);
+		if (i < 2)
+			cpu_relax();
+		else
+			usleep_range(1000, 2000);
 	}
 
 	dev_err(chip->dev, "Timeout while waiting for switch\n");
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 282fe08db050..728ef3f54ec5 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -55,11 +55,15 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
 static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip,
 				     int dev, int reg, int bit, int val)
 {
+	const unsigned long timeout = jiffies + msecs_to_jiffies(50);
 	u16 data;
 	int err;
 	int i;
 
-	for (i = 0; i < 16; i++) {
+	/* Even if the initial poll takes longer than 50ms, always do
+	 * at least one more attempt.
+	 */
+	for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
 		err = mv88e6xxx_smi_direct_read(chip, dev, reg, &data);
 		if (err)
 			return err;
@@ -67,7 +71,10 @@ static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip,
 		if (!!(data & BIT(bit)) == !!val)
 			return 0;
 
-		usleep_range(1000, 2000);
+		if (i < 2)
+			cpu_relax();
+		else
+			usleep_range(1000, 2000);
 	}
 
 	return -ETIMEDOUT;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 net-next 2/2] net: dsa: mv88e6xxx: Improve indirect addressing performance
  2022-01-28 16:26 [PATCH v3 net-next 0/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
  2022-01-28 16:26 ` [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling Tobias Waldekranz
@ 2022-01-28 16:26 ` Tobias Waldekranz
  2022-01-31 11:40 ` [PATCH v3 net-next 0/2] " patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: Tobias Waldekranz @ 2022-01-28 16:26 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, andrew, David.Laight, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, linux-kernel

Before this change, both the read and write callback would start out
by asserting that the chip's busy flag was cleared. However, both
callbacks also made sure to wait for the clearing of the busy bit
before returning - making the initial check superfluous. The only
time that would ever have an effect was if the busy bit was initially
set for some reason.

With that in mind, make sure to perform an initial check of the busy
bit, after which both read and write can rely the previous operation
to have waited for the bit to clear.

This cuts the number of operations on the underlying MDIO bus by 25%

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.h |  1 +
 drivers/net/dsa/mv88e6xxx/smi.c  | 24 ++++++++++++++----------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 8271b8aa7b71..438cee853d07 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -392,6 +392,7 @@ struct mv88e6xxx_chip {
 struct mv88e6xxx_bus_ops {
 	int (*read)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val);
 	int (*write)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val);
+	int (*init)(struct mv88e6xxx_chip *chip);
 };
 
 struct mv88e6xxx_mdio_bus {
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 728ef3f54ec5..a990271b7482 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -111,11 +111,6 @@ static int mv88e6xxx_smi_indirect_read(struct mv88e6xxx_chip *chip,
 {
 	int err;
 
-	err = mv88e6xxx_smi_direct_wait(chip, chip->sw_addr,
-					MV88E6XXX_SMI_CMD, 15, 0);
-	if (err)
-		return err;
-
 	err = mv88e6xxx_smi_direct_write(chip, chip->sw_addr,
 					 MV88E6XXX_SMI_CMD,
 					 MV88E6XXX_SMI_CMD_BUSY |
@@ -139,11 +134,6 @@ static int mv88e6xxx_smi_indirect_write(struct mv88e6xxx_chip *chip,
 {
 	int err;
 
-	err = mv88e6xxx_smi_direct_wait(chip, chip->sw_addr,
-					MV88E6XXX_SMI_CMD, 15, 0);
-	if (err)
-		return err;
-
 	err = mv88e6xxx_smi_direct_write(chip, chip->sw_addr,
 					 MV88E6XXX_SMI_DATA, data);
 	if (err)
@@ -162,9 +152,20 @@ static int mv88e6xxx_smi_indirect_write(struct mv88e6xxx_chip *chip,
 					 MV88E6XXX_SMI_CMD, 15, 0);
 }
 
+static int mv88e6xxx_smi_indirect_init(struct mv88e6xxx_chip *chip)
+{
+	/* Ensure that the chip starts out in the ready state. As both
+	 * reads and writes always ensure this on return, they can
+	 * safely depend on the chip not being busy on entry.
+	 */
+	return mv88e6xxx_smi_direct_wait(chip, chip->sw_addr,
+					 MV88E6XXX_SMI_CMD, 15, 0);
+}
+
 static const struct mv88e6xxx_bus_ops mv88e6xxx_smi_indirect_ops = {
 	.read = mv88e6xxx_smi_indirect_read,
 	.write = mv88e6xxx_smi_indirect_write,
+	.init = mv88e6xxx_smi_indirect_init,
 };
 
 int mv88e6xxx_smi_init(struct mv88e6xxx_chip *chip,
@@ -182,5 +183,8 @@ int mv88e6xxx_smi_init(struct mv88e6xxx_chip *chip,
 	chip->bus = bus;
 	chip->sw_addr = sw_addr;
 
+	if (chip->smi_ops->init)
+		return chip->smi_ops->init(chip);
+
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling
  2022-01-28 16:26 ` [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling Tobias Waldekranz
@ 2022-01-28 17:17   ` Andrew Lunn
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Lunn @ 2022-01-28 17:17 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: davem, kuba, netdev, David.Laight, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, linux-kernel

On Fri, Jan 28, 2022 at 05:26:49PM +0100, Tobias Waldekranz wrote:
> Avoid a long delay when a busy bit is still set and has to be polled
> again.
> 
> Measurements on a system with 2 Opals (6097F) and one Agate (6352)
> show that even with this much tighter loop, we have about a 50% chance
> of the bit being cleared on the first poll, all other accesses see the
> bit being cleared on the second poll.
> 
> On a standard MDIO bus running MDC at 2.5MHz, a single access with 32
> bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us.
> 
> This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in
> the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow
> case - bringing the average close to 1ms.
> 
> With this change in place, the slow case is closer to 2*26us + CPU
> overhead, with the average well below 100us - a 10x improvement.
> 
> This translates to real-world winnings. On a 3-chip 20-port system,
> the modprobe time drops by 88%:
> 
> Before:
> 
> root@coronet:~# time modprobe mv88e6xxx
> real    0m 15.99s
> user    0m 0.00s
> sys     0m 1.52s
> 
> After:
> 
> root@coronet:~# time modprobe mv88e6xxx
> real    0m 2.21s
> user    0m 0.00s
> sys     0m 1.54s
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 net-next 0/2] net: dsa: mv88e6xxx: Improve indirect addressing performance
  2022-01-28 16:26 [PATCH v3 net-next 0/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
  2022-01-28 16:26 ` [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling Tobias Waldekranz
  2022-01-28 16:26 ` [PATCH v3 net-next 2/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
@ 2022-01-31 11:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-01-31 11:40 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: davem, kuba, netdev, andrew, David.Laight

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 28 Jan 2022 17:26:48 +0100 you wrote:
> The individual patches have all the details. This work was triggered
> by recent work on a platform that took 16s (sic) to load the mv88e6xxx
> module.
> 
> The first patch gets rid of most of that time by replacing a very long
> delay with a tighter poll loop to wait for the busy bit to clear.
> 
> [...]

Here is the summary with links:
  - [v3,net-next,1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling
    https://git.kernel.org/netdev/net-next/c/35da1dfd9484
  - [v3,net-next,2/2] net: dsa: mv88e6xxx: Improve indirect addressing performance
    https://git.kernel.org/netdev/net-next/c/7bca16b22e6a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-31 11:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-28 16:26 [PATCH v3 net-next 0/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
2022-01-28 16:26 ` [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling Tobias Waldekranz
2022-01-28 17:17   ` Andrew Lunn
2022-01-28 16:26 ` [PATCH v3 net-next 2/2] net: dsa: mv88e6xxx: Improve indirect addressing performance Tobias Waldekranz
2022-01-31 11:40 ` [PATCH v3 net-next 0/2] " patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).