* [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention
@ 2023-10-29 18:35 Mirsad Goran Todorovac
2023-10-29 18:35 ` [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls " Mirsad Goran Todorovac
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Mirsad Goran Todorovac @ 2023-10-29 18:35 UTC (permalink / raw)
To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel,
netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit,
nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Mirsad Goran Todorovac, Marco Elver
A pair of new helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq()
are introduced.
The motivation for these helpers was the locking overhead of 130 consecutive
r8168_mac_ocp_write() calls in the RTL8411b reset after the NIC gets confused
if the PHY is powered-down.
To quote Heiner:
On RTL8411b the RX unit gets confused if the PHY is powered-down.
This was reported in [0] and confirmed by Realtek. Realtek provided
a sequence to fix the RX unit after PHY wakeup.
A series of about 130 r8168_mac_ocp_write() calls is performed to program the
RTL registers for recovery, each doing an expensive spin_lock_irqsave() and
spin_unlock_irqrestore().
Each mac ocp write is made of:
static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
u32 data)
{
if (rtl_ocp_reg_failure(reg))
return;
RTL_W32(tp, OCPDR, OCPAR_FLAG | (reg << 15) | data);
}
static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg,
u32 data)
{
unsigned long flags;
raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
__r8168_mac_ocp_write(tp, reg, data);
raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
}
Register programming is done through RTL_W32() macro which expands into
#define RTL_W32(tp, reg, val32) writel((val32), tp->mmio_addr + (reg))
which is further (on Alpha):
extern inline void writel(u32 b, volatile void __iomem *addr)
{
mb();
__raw_writel(b, addr);
}
or on i386/x86_64:
#define build_mmio_write(name, size, type, reg, barrier) \
static inline void name(type val, volatile void __iomem *addr) \
{ asm volatile("mov" size " %0,%1": :reg (val), \
"m" (*(volatile type __force *)addr) barrier); }
build_mmio_write(writel, "l", unsigned int, "r", :"memory")
This obviously involves iat least a compiler barrier.
mb() expands into something like this i.e. on x86_64:
#define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
This means a whole lot of memory bus stalls: for spin_lock_irqsave(),
memory barrier, writel(), and spin_unlock_irqrestore().
With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like
a lock storm that will stall all of the cores and CPUs on the same memory controller
for certain time I/O takes to finish.
In a sequential case of RTL register programming, the writes to RTL registers
can be coalesced under a same raw spinlock. This can dramatically decrease the
number of bus stalls in a multicore or multi-CPU system.
Macro helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() are
provided to reduce lock contention:
static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
{
...
/* The following Realtek-provided magic fixes an issue with the RX unit
* getting confused after the PHY having been powered-down.
*/
static const struct recover_8411b_info init_zero_seq[] = {
{ 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 },
...
};
...
r8168_mac_ocp_write_seq(tp, init_zero_seq);
...
}
The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/
functions that only changed the function names and the ending of the line, so the actual
hex data is unchanged.
To repeat, the reason for the introduction of the original commit
was to enable recovery of the RX unit on the RTL8411b which was confused by the
powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem
into a series of about 500+ memory bus locks, most waiting for the main memory read,
modify and write under a LOCK. The memory barrier in RTL_W32 should suffice for
the programming sequence to reach RTL NIC registers.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
Fixes: fe4e8db0392a6 ("r8169: fix issue with confused RX unit after PHY power-down on RTL8411b")
Fixes: 91c8643578a21 ("r8169: use spinlock to protect mac ocp register access")
Fixes: d6c36cbc5e533 ("r8169: Use a raw_spinlock_t for the register locks.")
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: nic_swsd@realtek.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/
Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
---
v5:
added unlocked primitives to allow mac ocs modify grouping
applied coalescing of mac ocp writes/modifies for 8168ep and 8117
some formatting fixes to please checkpatch.pl
v4:
fixed complaints as advised by Heiner and checkpatch.pl
split the patch into five sections to be more easily manipulated and reviewed
introduced r8168_mac_ocp_write_seq()
applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B
v3:
removed register/mask pair array sentinels, so using ARRAY_SIZE().
avoided duplication of RTL_W32() call code as advised by Heiner.
drivers/net/ethernet/realtek/r8169_main.c | 71 +++++++++++++++++++++--
1 file changed, 66 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 361b90007148..da1f5d1b4fd5 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -888,7 +888,7 @@ static int r8168_phy_ocp_read(struct rtl8169_private *tp, u32 reg)
(RTL_R32(tp, GPHY_OCP) & 0xffff) : -ETIMEDOUT;
}
-static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
+static inline void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
{
if (rtl_ocp_reg_failure(reg))
return;
@@ -905,7 +905,7 @@ static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
}
-static u16 __r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
+static inline u16 __r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
{
if (rtl_ocp_reg_failure(reg))
return 0;
@@ -927,18 +927,79 @@ static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
return val;
}
+static inline void __r8168_mac_ocp_modify(struct rtl8169_private *tp, u32 reg, u16 mask,
+ u16 set)
+{
+ u16 data;
+
+ data = __r8168_mac_ocp_read(tp, reg);
+ __r8168_mac_ocp_write(tp, reg, (data & ~mask) | set);
+}
+
static void r8168_mac_ocp_modify(struct rtl8169_private *tp, u32 reg, u16 mask,
u16 set)
{
unsigned long flags;
- u16 data;
raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
- data = __r8168_mac_ocp_read(tp, reg);
- __r8168_mac_ocp_write(tp, reg, (data & ~mask) | set);
+ __r8168_mac_ocp_modify(tp, reg, mask, set);
raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
}
+struct e_info_regdata {
+ u32 reg;
+ u32 data;
+};
+
+struct e_info_regmaskset {
+ u32 reg;
+ u16 mask;
+ u16 set;
+};
+
+static void __r8168_mac_ocp_write_seqlen(struct rtl8169_private *tp,
+ const struct e_info_regdata *array, int len)
+{
+ struct e_info_regdata const *p;
+
+ for (p = array; len--; p++)
+ __r8168_mac_ocp_write(tp, p->reg, p->data);
+}
+
+static void r8168_mac_ocp_write_seqlen(struct rtl8169_private *tp,
+ const struct e_info_regdata *array, int len)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
+ __r8168_mac_ocp_write_seqlen(tp, array, len);
+ raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
+}
+
+static void __r8168_mac_ocp_modify_seqlen(struct rtl8169_private *tp,
+ const struct e_info_regmaskset *array, int len)
+{
+ struct e_info_regmaskset const *p;
+
+ for (p = array; len--; p++)
+ __r8168_mac_ocp_modify(tp, p->reg, p->mask, p->set);
+}
+
+static void r8168_mac_ocp_modify_seqlen(struct rtl8169_private *tp,
+ const struct e_info_regmaskset *array, int len)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags);
+ __r8168_mac_ocp_modify_seqlen(tp, array, len);
+ raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags);
+}
+
+#define r8168_mac_ocp_write_seq(tp, a) r8168_mac_ocp_write_seqlen(tp, a, ARRAY_SIZE(a))
+#define r8168_mac_ocp_modify_seq(tp, a) r8168_mac_ocp_modify_seqlen(tp, a, ARRAY_SIZE(a))
+#define __r8168_mac_ocp_write_seq(tp, a) __r8168_mac_ocp_write_seqlen(tp, a, ARRAY_SIZE(a))
+#define __r8168_mac_ocp_modify_seq(tp, a) __r8168_mac_ocp_modify_seqlen(tp, a, ARRAY_SIZE(a))
+
/* Work around a hw issue with RTL8168g PHY, the quirk disables
* PHY MCU interrupts before PHY power-down.
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls to reduce spinlock contention 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac @ 2023-10-29 18:35 ` Mirsad Goran Todorovac 2023-10-30 13:48 ` Heiner Kallweit 2023-10-29 18:35 ` [PATCH v5 3/7] r8169: Coalesce mac ocp write and modify for 8168H start " Mirsad Goran Todorovac ` (5 subsequent siblings) 6 siblings, 1 reply; 16+ messages in thread From: Mirsad Goran Todorovac @ 2023-10-29 18:35 UTC (permalink / raw) To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mirsad Goran Todorovac, Marco Elver On RTL8411b the RX unit gets confused if the PHY is powered-down. This was reported in [0] and confirmed by Realtek. Realtek provided a sequence to fix the RX unit after PHY wakeup. A series of about 130 r8168_mac_ocp_write() calls is performed to program the RTL registers for recovery. With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like a lock storm that will stall all of the cores and CPUs on the same memory controller for certain time I/O takes to finish. In a sequential case of RTL register programming, a sequence of writes to the RTL registers can be coalesced under a same raw spinlock. This can dramatically decrease the number of bus stalls in a multicore or multi-CPU system: static void rtl_hw_start_8411_2(struct rtl8169_private *tp) { ... /* The following Realtek-provided magic fixes an issue with the RX unit * getting confused after the PHY having been powered-down. */ static const struct recover_8411b_info init_zero_seq[] = { { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 }, ... }; static const struct recover_8411b_info recover_seq[] = { { 0xF800, 0xE008 }, { 0xF802, 0xE00A }, { 0xF804, 0xE00C }, ... }; static const struct recover_8411b_info final_seq[] = { { 0xFC2A, 0x0743 }, { 0xFC2C, 0x0801 }, { 0xFC2E, 0x0BE9 }, ... }; r8168_mac_ocp_write_seq(tp, init_zero_seq); mdelay(3); r8168_mac_ocp_write(tp, 0xFC26, 0x0000); r8168_mac_ocp_write_seq(tp, recover_seq); r8168_mac_ocp_write(tp, 0xFC26, 0x8000); r8168_mac_ocp_write_seq(tp, final_seq); } The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/ functions that only changed the function names and the ending of the line, so the actual hex data is unchanged. Note that the reason for the introduction of the original commit was to enable recovery of the RX unit on the RTL8411b which was confused by the powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem into a series of about 500+ memory bus locks, most waiting for the main MMIO memory read-modify-write under a LOCK. The memory barrier in RTL_W32 should suffice for the programming sequence to reach RTL NIC registers. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075 Fixes: fe4e8db0392a6 ("r8169: fix issue with confused RX unit after PHY power-down on RTL8411b") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Marco Elver <elver@google.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- v5: added unlocked primitives to allow mac ocs modify grouping applied coalescing of mac ocp writes/modifies for 8168ep and 8117 some formatting fixes to please checkpatch.pl v4: fixed complaints as advised by Heiner and checkpatch.pl split the patch into five sections to be more easily manipulated and reviewed introduced r8168_mac_ocp_write_seq() applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B v3: removed register/mask pair array sentinels, so using ARRAY_SIZE(). avoided duplication of RTL_W32() call code as advised by Heiner. drivers/net/ethernet/realtek/r8169_main.c | 173 ++++++---------------- 1 file changed, 46 insertions(+), 127 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index da1f5d1b4fd5..f4a1d1a74b8b 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -3161,145 +3161,64 @@ static void rtl_hw_start_8411_2(struct rtl8169_private *tp) { 0x1d, 0x0000, 0x4000 }, }; - rtl_hw_start_8168g(tp); + static const struct e_info_regdata init_zero_seq[] = { + { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 }, { 0xFC2E, 0x0000 }, + { 0xFC30, 0x0000 }, { 0xFC32, 0x0000 }, { 0xFC34, 0x0000 }, { 0xFC36, 0x0000 }, + }; + static const struct e_info_regdata recover_seq[] = { + { 0xF800, 0xE008 }, { 0xF802, 0xE00A }, { 0xF804, 0xE00C }, { 0xF806, 0xE00E }, + { 0xF808, 0xE027 }, { 0xF80A, 0xE04F }, { 0xF80C, 0xE05E }, { 0xF80E, 0xE065 }, + { 0xF810, 0xC602 }, { 0xF812, 0xBE00 }, { 0xF814, 0x0000 }, { 0xF816, 0xC502 }, + { 0xF818, 0xBD00 }, { 0xF81A, 0x074C }, { 0xF81C, 0xC302 }, { 0xF81E, 0xBB00 }, + { 0xF820, 0x080A }, { 0xF822, 0x6420 }, { 0xF824, 0x48C2 }, { 0xF826, 0x8C20 }, + { 0xF828, 0xC516 }, { 0xF82A, 0x64A4 }, { 0xF82C, 0x49C0 }, { 0xF82E, 0xF009 }, + { 0xF830, 0x74A2 }, { 0xF832, 0x8CA5 }, { 0xF834, 0x74A0 }, { 0xF836, 0xC50E }, + { 0xF838, 0x9CA2 }, { 0xF83A, 0x1C11 }, { 0xF83C, 0x9CA0 }, { 0xF83E, 0xE006 }, + { 0xF840, 0x74F8 }, { 0xF842, 0x48C4 }, { 0xF844, 0x8CF8 }, { 0xF846, 0xC404 }, + { 0xF848, 0xBC00 }, { 0xF84A, 0xC403 }, { 0xF84C, 0xBC00 }, { 0xF84E, 0x0BF2 }, + { 0xF850, 0x0C0A }, { 0xF852, 0xE434 }, { 0xF854, 0xD3C0 }, { 0xF856, 0x49D9 }, + { 0xF858, 0xF01F }, { 0xF85A, 0xC526 }, { 0xF85C, 0x64A5 }, { 0xF85E, 0x1400 }, + { 0xF860, 0xF007 }, { 0xF862, 0x0C01 }, { 0xF864, 0x8CA5 }, { 0xF866, 0x1C15 }, + { 0xF868, 0xC51B }, { 0xF86A, 0x9CA0 }, { 0xF86C, 0xE013 }, { 0xF86E, 0xC519 }, + { 0xF870, 0x74A0 }, { 0xF872, 0x48C4 }, { 0xF874, 0x8CA0 }, { 0xF876, 0xC516 }, + { 0xF878, 0x74A4 }, { 0xF87A, 0x48C8 }, { 0xF87C, 0x48CA }, { 0xF87E, 0x9CA4 }, + { 0xF880, 0xC512 }, { 0xF882, 0x1B00 }, { 0xF884, 0x9BA0 }, { 0xF886, 0x1B1C }, + { 0xF888, 0x483F }, { 0xF88A, 0x9BA2 }, { 0xF88C, 0x1B04 }, { 0xF88E, 0xC508 }, + { 0xF890, 0x9BA0 }, { 0xF892, 0xC505 }, { 0xF894, 0xBD00 }, { 0xF896, 0xC502 }, + { 0xF898, 0xBD00 }, { 0xF89A, 0x0300 }, { 0xF89C, 0x051E }, { 0xF89E, 0xE434 }, + { 0xF8A0, 0xE018 }, { 0xF8A2, 0xE092 }, { 0xF8A4, 0xDE20 }, { 0xF8A6, 0xD3C0 }, + { 0xF8A8, 0xC50F }, { 0xF8AA, 0x76A4 }, { 0xF8AC, 0x49E3 }, { 0xF8AE, 0xF007 }, + { 0xF8B0, 0x49C0 }, { 0xF8B2, 0xF103 }, { 0xF8B4, 0xC607 }, { 0xF8B6, 0xBE00 }, + { 0xF8B8, 0xC606 }, { 0xF8BA, 0xBE00 }, { 0xF8BC, 0xC602 }, { 0xF8BE, 0xBE00 }, + { 0xF8C0, 0x0C4C }, { 0xF8C2, 0x0C28 }, { 0xF8C4, 0x0C2C }, { 0xF8C6, 0xDC00 }, + { 0xF8C8, 0xC707 }, { 0xF8CA, 0x1D00 }, { 0xF8CC, 0x8DE2 }, { 0xF8CE, 0x48C1 }, + { 0xF8D0, 0xC502 }, { 0xF8D2, 0xBD00 }, { 0xF8D4, 0x00AA }, { 0xF8D6, 0xE0C0 }, + { 0xF8D8, 0xC502 }, { 0xF8DA, 0xBD00 }, { 0xF8DC, 0x0132 }, + }; + + static const struct e_info_regdata final_seq[] = { + { 0xFC2A, 0x0743 }, { 0xFC2C, 0x0801 }, { 0xFC2E, 0x0BE9 }, { 0xFC30, 0x02FD }, + { 0xFC32, 0x0C25 }, { 0xFC34, 0x00A9 }, { 0xFC36, 0x012D }, + }; + + rtl_hw_start_8168g(tp); rtl_ephy_init(tp, e_info_8411_2); /* The following Realtek-provided magic fixes an issue with the RX unit * getting confused after the PHY having been powered-down. */ - r8168_mac_ocp_write(tp, 0xFC28, 0x0000); - r8168_mac_ocp_write(tp, 0xFC2A, 0x0000); - r8168_mac_ocp_write(tp, 0xFC2C, 0x0000); - r8168_mac_ocp_write(tp, 0xFC2E, 0x0000); - r8168_mac_ocp_write(tp, 0xFC30, 0x0000); - r8168_mac_ocp_write(tp, 0xFC32, 0x0000); - r8168_mac_ocp_write(tp, 0xFC34, 0x0000); - r8168_mac_ocp_write(tp, 0xFC36, 0x0000); + + r8168_mac_ocp_write_seq(tp, init_zero_seq); mdelay(3); r8168_mac_ocp_write(tp, 0xFC26, 0x0000); - r8168_mac_ocp_write(tp, 0xF800, 0xE008); - r8168_mac_ocp_write(tp, 0xF802, 0xE00A); - r8168_mac_ocp_write(tp, 0xF804, 0xE00C); - r8168_mac_ocp_write(tp, 0xF806, 0xE00E); - r8168_mac_ocp_write(tp, 0xF808, 0xE027); - r8168_mac_ocp_write(tp, 0xF80A, 0xE04F); - r8168_mac_ocp_write(tp, 0xF80C, 0xE05E); - r8168_mac_ocp_write(tp, 0xF80E, 0xE065); - r8168_mac_ocp_write(tp, 0xF810, 0xC602); - r8168_mac_ocp_write(tp, 0xF812, 0xBE00); - r8168_mac_ocp_write(tp, 0xF814, 0x0000); - r8168_mac_ocp_write(tp, 0xF816, 0xC502); - r8168_mac_ocp_write(tp, 0xF818, 0xBD00); - r8168_mac_ocp_write(tp, 0xF81A, 0x074C); - r8168_mac_ocp_write(tp, 0xF81C, 0xC302); - r8168_mac_ocp_write(tp, 0xF81E, 0xBB00); - r8168_mac_ocp_write(tp, 0xF820, 0x080A); - r8168_mac_ocp_write(tp, 0xF822, 0x6420); - r8168_mac_ocp_write(tp, 0xF824, 0x48C2); - r8168_mac_ocp_write(tp, 0xF826, 0x8C20); - r8168_mac_ocp_write(tp, 0xF828, 0xC516); - r8168_mac_ocp_write(tp, 0xF82A, 0x64A4); - r8168_mac_ocp_write(tp, 0xF82C, 0x49C0); - r8168_mac_ocp_write(tp, 0xF82E, 0xF009); - r8168_mac_ocp_write(tp, 0xF830, 0x74A2); - r8168_mac_ocp_write(tp, 0xF832, 0x8CA5); - r8168_mac_ocp_write(tp, 0xF834, 0x74A0); - r8168_mac_ocp_write(tp, 0xF836, 0xC50E); - r8168_mac_ocp_write(tp, 0xF838, 0x9CA2); - r8168_mac_ocp_write(tp, 0xF83A, 0x1C11); - r8168_mac_ocp_write(tp, 0xF83C, 0x9CA0); - r8168_mac_ocp_write(tp, 0xF83E, 0xE006); - r8168_mac_ocp_write(tp, 0xF840, 0x74F8); - r8168_mac_ocp_write(tp, 0xF842, 0x48C4); - r8168_mac_ocp_write(tp, 0xF844, 0x8CF8); - r8168_mac_ocp_write(tp, 0xF846, 0xC404); - r8168_mac_ocp_write(tp, 0xF848, 0xBC00); - r8168_mac_ocp_write(tp, 0xF84A, 0xC403); - r8168_mac_ocp_write(tp, 0xF84C, 0xBC00); - r8168_mac_ocp_write(tp, 0xF84E, 0x0BF2); - r8168_mac_ocp_write(tp, 0xF850, 0x0C0A); - r8168_mac_ocp_write(tp, 0xF852, 0xE434); - r8168_mac_ocp_write(tp, 0xF854, 0xD3C0); - r8168_mac_ocp_write(tp, 0xF856, 0x49D9); - r8168_mac_ocp_write(tp, 0xF858, 0xF01F); - r8168_mac_ocp_write(tp, 0xF85A, 0xC526); - r8168_mac_ocp_write(tp, 0xF85C, 0x64A5); - r8168_mac_ocp_write(tp, 0xF85E, 0x1400); - r8168_mac_ocp_write(tp, 0xF860, 0xF007); - r8168_mac_ocp_write(tp, 0xF862, 0x0C01); - r8168_mac_ocp_write(tp, 0xF864, 0x8CA5); - r8168_mac_ocp_write(tp, 0xF866, 0x1C15); - r8168_mac_ocp_write(tp, 0xF868, 0xC51B); - r8168_mac_ocp_write(tp, 0xF86A, 0x9CA0); - r8168_mac_ocp_write(tp, 0xF86C, 0xE013); - r8168_mac_ocp_write(tp, 0xF86E, 0xC519); - r8168_mac_ocp_write(tp, 0xF870, 0x74A0); - r8168_mac_ocp_write(tp, 0xF872, 0x48C4); - r8168_mac_ocp_write(tp, 0xF874, 0x8CA0); - r8168_mac_ocp_write(tp, 0xF876, 0xC516); - r8168_mac_ocp_write(tp, 0xF878, 0x74A4); - r8168_mac_ocp_write(tp, 0xF87A, 0x48C8); - r8168_mac_ocp_write(tp, 0xF87C, 0x48CA); - r8168_mac_ocp_write(tp, 0xF87E, 0x9CA4); - r8168_mac_ocp_write(tp, 0xF880, 0xC512); - r8168_mac_ocp_write(tp, 0xF882, 0x1B00); - r8168_mac_ocp_write(tp, 0xF884, 0x9BA0); - r8168_mac_ocp_write(tp, 0xF886, 0x1B1C); - r8168_mac_ocp_write(tp, 0xF888, 0x483F); - r8168_mac_ocp_write(tp, 0xF88A, 0x9BA2); - r8168_mac_ocp_write(tp, 0xF88C, 0x1B04); - r8168_mac_ocp_write(tp, 0xF88E, 0xC508); - r8168_mac_ocp_write(tp, 0xF890, 0x9BA0); - r8168_mac_ocp_write(tp, 0xF892, 0xC505); - r8168_mac_ocp_write(tp, 0xF894, 0xBD00); - r8168_mac_ocp_write(tp, 0xF896, 0xC502); - r8168_mac_ocp_write(tp, 0xF898, 0xBD00); - r8168_mac_ocp_write(tp, 0xF89A, 0x0300); - r8168_mac_ocp_write(tp, 0xF89C, 0x051E); - r8168_mac_ocp_write(tp, 0xF89E, 0xE434); - r8168_mac_ocp_write(tp, 0xF8A0, 0xE018); - r8168_mac_ocp_write(tp, 0xF8A2, 0xE092); - r8168_mac_ocp_write(tp, 0xF8A4, 0xDE20); - r8168_mac_ocp_write(tp, 0xF8A6, 0xD3C0); - r8168_mac_ocp_write(tp, 0xF8A8, 0xC50F); - r8168_mac_ocp_write(tp, 0xF8AA, 0x76A4); - r8168_mac_ocp_write(tp, 0xF8AC, 0x49E3); - r8168_mac_ocp_write(tp, 0xF8AE, 0xF007); - r8168_mac_ocp_write(tp, 0xF8B0, 0x49C0); - r8168_mac_ocp_write(tp, 0xF8B2, 0xF103); - r8168_mac_ocp_write(tp, 0xF8B4, 0xC607); - r8168_mac_ocp_write(tp, 0xF8B6, 0xBE00); - r8168_mac_ocp_write(tp, 0xF8B8, 0xC606); - r8168_mac_ocp_write(tp, 0xF8BA, 0xBE00); - r8168_mac_ocp_write(tp, 0xF8BC, 0xC602); - r8168_mac_ocp_write(tp, 0xF8BE, 0xBE00); - r8168_mac_ocp_write(tp, 0xF8C0, 0x0C4C); - r8168_mac_ocp_write(tp, 0xF8C2, 0x0C28); - r8168_mac_ocp_write(tp, 0xF8C4, 0x0C2C); - r8168_mac_ocp_write(tp, 0xF8C6, 0xDC00); - r8168_mac_ocp_write(tp, 0xF8C8, 0xC707); - r8168_mac_ocp_write(tp, 0xF8CA, 0x1D00); - r8168_mac_ocp_write(tp, 0xF8CC, 0x8DE2); - r8168_mac_ocp_write(tp, 0xF8CE, 0x48C1); - r8168_mac_ocp_write(tp, 0xF8D0, 0xC502); - r8168_mac_ocp_write(tp, 0xF8D2, 0xBD00); - r8168_mac_ocp_write(tp, 0xF8D4, 0x00AA); - r8168_mac_ocp_write(tp, 0xF8D6, 0xE0C0); - r8168_mac_ocp_write(tp, 0xF8D8, 0xC502); - r8168_mac_ocp_write(tp, 0xF8DA, 0xBD00); - r8168_mac_ocp_write(tp, 0xF8DC, 0x0132); + r8168_mac_ocp_write_seq(tp, recover_seq); r8168_mac_ocp_write(tp, 0xFC26, 0x8000); - r8168_mac_ocp_write(tp, 0xFC2A, 0x0743); - r8168_mac_ocp_write(tp, 0xFC2C, 0x0801); - r8168_mac_ocp_write(tp, 0xFC2E, 0x0BE9); - r8168_mac_ocp_write(tp, 0xFC30, 0x02FD); - r8168_mac_ocp_write(tp, 0xFC32, 0x0C25); - r8168_mac_ocp_write(tp, 0xFC34, 0x00A9); - r8168_mac_ocp_write(tp, 0xFC36, 0x012D); + r8168_mac_ocp_write_seq(tp, final_seq); + } static void rtl_hw_start_8168h_1(struct rtl8169_private *tp) -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls to reduce spinlock contention 2023-10-29 18:35 ` [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls " Mirsad Goran Todorovac @ 2023-10-30 13:48 ` Heiner Kallweit 0 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 13:48 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:35, Mirsad Goran Todorovac wrote: > On RTL8411b the RX unit gets confused if the PHY is powered-down. > This was reported in [0] and confirmed by Realtek. Realtek provided > a sequence to fix the RX unit after PHY wakeup. > > A series of about 130 r8168_mac_ocp_write() calls is performed to > program the RTL registers for recovery. > > With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like > a lock storm that will stall all of the cores and CPUs on the same memory controller > for certain time I/O takes to finish. > > In a sequential case of RTL register programming, a sequence of writes to the RTL > registers can be coalesced under a same raw spinlock. This can dramatically decrease > the number of bus stalls in a multicore or multi-CPU system: > > static void rtl_hw_start_8411_2(struct rtl8169_private *tp) > { > > ... > > /* The following Realtek-provided magic fixes an issue with the RX unit > * getting confused after the PHY having been powered-down. > */ > > static const struct recover_8411b_info init_zero_seq[] = { > { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 }, > ... > }; > > static const struct recover_8411b_info recover_seq[] = { > { 0xF800, 0xE008 }, { 0xF802, 0xE00A }, { 0xF804, 0xE00C }, > ... > }; > > static const struct recover_8411b_info final_seq[] = { > { 0xFC2A, 0x0743 }, { 0xFC2C, 0x0801 }, { 0xFC2E, 0x0BE9 }, > ... > }; > > r8168_mac_ocp_write_seq(tp, init_zero_seq); > mdelay(3); > r8168_mac_ocp_write(tp, 0xFC26, 0x0000); > r8168_mac_ocp_write_seq(tp, recover_seq); > r8168_mac_ocp_write(tp, 0xFC26, 0x8000); > r8168_mac_ocp_write_seq(tp, final_seq); > } > > The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/ > functions that only changed the function names and the ending of the line, so the actual > hex data is unchanged. > > Note that the reason for the introduction of the original commit > was to enable recovery of the RX unit on the RTL8411b which was confused by the > powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem > into a series of about 500+ memory bus locks, most waiting for the main MMIO memory > read-modify-write under a LOCK. The memory barrier in RTL_W32 should suffice for > the programming sequence to reach RTL NIC registers. > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075 > > Fixes: fe4e8db0392a6 ("r8169: fix issue with confused RX unit after PHY power-down on RTL8411b") > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 173 ++++++---------------- > 1 file changed, 46 insertions(+), 127 deletions(-) > Patch it self looks good to me, just consider the comments regarding commit message and Fixes tag for patch 1. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 3/7] r8169: Coalesce mac ocp write and modify for 8168H start to reduce spinlock contention 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac 2023-10-29 18:35 ` [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls " Mirsad Goran Todorovac @ 2023-10-29 18:35 ` Mirsad Goran Todorovac 2023-10-30 13:50 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 4/7] r8169: Coalesce mac ocp write and modify for 8168ep " Mirsad Goran Todorovac ` (4 subsequent siblings) 6 siblings, 1 reply; 16+ messages in thread From: Mirsad Goran Todorovac @ 2023-10-29 18:35 UTC (permalink / raw) To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mirsad Goran Todorovac, Marco Elver Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in the startup of 8168H involve implicit spin_lock_irqsave() and spin_unlock_irqrestore() on each invocation. Coalesced with the corresponding helpers into r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() with a sinqle paired spin_lock_irqsave() and spin_unlock_irqrestore(), these calls help reduce overall spinlock contention. Fixes: ef712ede3541d ("r8169: add helper r8168_mac_ocp_modify") Fixes: 6e1d0b8988188 ("r8169:add support for RTL8168H and RTL8107E") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Marco Elver <elver@google.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- v5: added unlocked primitives to allow mac ocs modify grouping applied coalescing of mac ocp writes/modifies for 8168ep and 8117 some formatting fixes to please checkpatch.pl v4: fixed complaints as advised by Heiner and checkpatch.pl split the patch into five sections to be more easily manipulated and reviewed introduced r8168_mac_ocp_write_seq() applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B v3: removed register/mask pair array sentinels, so using ARRAY_SIZE(). avoided duplication of RTL_W32() call code as advised by Heiner. drivers/net/ethernet/realtek/r8169_main.c | 26 +++++++++++++++-------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index f4a1d1a74b8b..29ee93b8b702 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -3231,6 +3231,21 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp) { 0x04, 0xffff, 0x854a }, { 0x01, 0xffff, 0x068b } }; + + static const struct e_info_regmaskset e_info_regmaskset_8168h_1[] = { + { 0xe056, 0x00f0, 0x0070 }, + { 0xe052, 0x6000, 0x8008 }, + { 0xe0d6, 0x01ff, 0x017f }, + { 0xd420, 0x0fff, 0x047f }, + }; + + static const struct e_info_regdata e_info_regdata_8168h_1[] = { + { 0xe63e, 0x0001 }, + { 0xe63e, 0x0000 }, + { 0xc094, 0x0000 }, + { 0xc09e, 0x0000 }, + }; + int rg_saw_cnt; rtl_ephy_init(tp, e_info_8168h_1); @@ -3271,15 +3286,8 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp) r8168_mac_ocp_modify(tp, 0xd412, 0x0fff, sw_cnt_1ms_ini); } - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0070); - r8168_mac_ocp_modify(tp, 0xe052, 0x6000, 0x8008); - r8168_mac_ocp_modify(tp, 0xe0d6, 0x01ff, 0x017f); - r8168_mac_ocp_modify(tp, 0xd420, 0x0fff, 0x047f); - - r8168_mac_ocp_write(tp, 0xe63e, 0x0001); - r8168_mac_ocp_write(tp, 0xe63e, 0x0000); - r8168_mac_ocp_write(tp, 0xc094, 0x0000); - r8168_mac_ocp_write(tp, 0xc09e, 0x0000); + r8168_mac_ocp_modify_seq(tp, e_info_regmaskset_8168h_1); + r8168_mac_ocp_write_seq(tp, e_info_regdata_8168h_1); } static void rtl_hw_start_8168ep(struct rtl8169_private *tp) -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 3/7] r8169: Coalesce mac ocp write and modify for 8168H start to reduce spinlock contention 2023-10-29 18:35 ` [PATCH v5 3/7] r8169: Coalesce mac ocp write and modify for 8168H start " Mirsad Goran Todorovac @ 2023-10-30 13:50 ` Heiner Kallweit 0 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 13:50 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:35, Mirsad Goran Todorovac wrote: > Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in > the startup of 8168H involve implicit spin_lock_irqsave() and spin_unlock_irqrestore() > on each invocation. > > Coalesced with the corresponding helpers into r8168_mac_ocp_write_seq() and > r8168_mac_ocp_modify_seq() with a sinqle paired spin_lock_irqsave() and > spin_unlock_irqrestore(), these calls help reduce overall spinlock contention. > > Fixes: ef712ede3541d ("r8169: add helper r8168_mac_ocp_modify") > Fixes: 6e1d0b8988188 ("r8169:add support for RTL8168H and RTL8107E") > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 26 +++++++++++++++-------- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index f4a1d1a74b8b..29ee93b8b702 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -3231,6 +3231,21 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp) > { 0x04, 0xffff, 0x854a }, > { 0x01, 0xffff, 0x068b } > }; > + > + static const struct e_info_regmaskset e_info_regmaskset_8168h_1[] = { > + { 0xe056, 0x00f0, 0x0070 }, > + { 0xe052, 0x6000, 0x8008 }, > + { 0xe0d6, 0x01ff, 0x017f }, > + { 0xd420, 0x0fff, 0x047f }, > + }; > + > + static const struct e_info_regdata e_info_regdata_8168h_1[] = { > + { 0xe63e, 0x0001 }, > + { 0xe63e, 0x0000 }, > + { 0xc094, 0x0000 }, > + { 0xc09e, 0x0000 }, > + }; > + > int rg_saw_cnt; > > rtl_ephy_init(tp, e_info_8168h_1); > @@ -3271,15 +3286,8 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp) > r8168_mac_ocp_modify(tp, 0xd412, 0x0fff, sw_cnt_1ms_ini); > } > > - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0070); > - r8168_mac_ocp_modify(tp, 0xe052, 0x6000, 0x8008); > - r8168_mac_ocp_modify(tp, 0xe0d6, 0x01ff, 0x017f); > - r8168_mac_ocp_modify(tp, 0xd420, 0x0fff, 0x047f); > - > - r8168_mac_ocp_write(tp, 0xe63e, 0x0001); > - r8168_mac_ocp_write(tp, 0xe63e, 0x0000); > - r8168_mac_ocp_write(tp, 0xc094, 0x0000); > - r8168_mac_ocp_write(tp, 0xc09e, 0x0000); > + r8168_mac_ocp_modify_seq(tp, e_info_regmaskset_8168h_1); > + r8168_mac_ocp_write_seq(tp, e_info_regdata_8168h_1); > } > > static void rtl_hw_start_8168ep(struct rtl8169_private *tp) For these few calls I don't really see a benefit in the change. Instead it makes the code harder to read. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 4/7] r8169: Coalesce mac ocp write and modify for 8168ep start to reduce spinlock contention 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac 2023-10-29 18:35 ` [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls " Mirsad Goran Todorovac 2023-10-29 18:35 ` [PATCH v5 3/7] r8169: Coalesce mac ocp write and modify for 8168H start " Mirsad Goran Todorovac @ 2023-10-29 18:36 ` Mirsad Goran Todorovac 2023-10-30 13:51 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 5/7] r8169: Reduce spinlock contention for the start of RTL8117 Mirsad Goran Todorovac ` (3 subsequent siblings) 6 siblings, 1 reply; 16+ messages in thread From: Mirsad Goran Todorovac @ 2023-10-29 18:36 UTC (permalink / raw) To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mirsad Goran Todorovac, Marco Elver Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in the startup of RTL8168ep involve implicit spin_lock_irqsave() and spin_unlock_irqrestore() on each invocation. Coalesced with the corresponding helpers into r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() with a sinqle paired spin_lock_irqsave() and spin_unlock_irqrestore(), these calls help reduce overall spinlock contention. Fixes: ef712ede3541d ("r8169: add helper r8168_mac_ocp_modify") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Marco Elver <elver@google.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- v5: added unlocked primitives to allow mac ocs modify grouping applied coalescing of mac ocp writes/modifies for 8168ep and 8117 some formatting fixes to please checkpatch.pl v4: fixed complaints as advised by Heiner and checkpatch.pl split the patch into five sections to be more easily manipulated and reviewed introduced r8168_mac_ocp_write_seq() applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B v3: removed register/mask pair array sentinels, so using ARRAY_SIZE(). avoided duplication of RTL_W32() call code as advised by Heiner. drivers/net/ethernet/realtek/r8169_main.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 29ee93b8b702..27016aaeb6a0 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -3326,6 +3326,12 @@ static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp) { 0x1e, 0x0000, 0x2000 }, }; + static const struct e_info_regmaskset e_info_8168ep_3_mod_1[] = { + { 0xd3e2, 0x0fff, 0x0271 }, + { 0xd3e4, 0x00ff, 0x0000 }, + { 0xe860, 0x0000, 0x0080 }, + }; + rtl_ephy_init(tp, e_info_8168ep_3); rtl_hw_start_8168ep(tp); @@ -3333,9 +3339,7 @@ static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp) RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) & ~PFM_EN); RTL_W8(tp, MISC_1, RTL_R8(tp, MISC_1) & ~PFM_D3COLD_EN); - r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x0271); - r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000); - r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080); + r8168_mac_ocp_modify_seq(tp, e_info_8168ep_3_mod_1); } static void rtl_hw_start_8117(struct rtl8169_private *tp) -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 4/7] r8169: Coalesce mac ocp write and modify for 8168ep start to reduce spinlock contention 2023-10-29 18:36 ` [PATCH v5 4/7] r8169: Coalesce mac ocp write and modify for 8168ep " Mirsad Goran Todorovac @ 2023-10-30 13:51 ` Heiner Kallweit 0 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 13:51 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:36, Mirsad Goran Todorovac wrote: > Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in > the startup of RTL8168ep involve implicit spin_lock_irqsave() and > spin_unlock_irqrestore() on each invocation. > > Coalesced with the corresponding helpers into r8168_mac_ocp_write_seq() and > r8168_mac_ocp_modify_seq() with a sinqle paired spin_lock_irqsave() and > spin_unlock_irqrestore(), these calls help reduce overall spinlock contention. > > Fixes: ef712ede3541d ("r8169: add helper r8168_mac_ocp_modify") > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 29ee93b8b702..27016aaeb6a0 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -3326,6 +3326,12 @@ static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp) > { 0x1e, 0x0000, 0x2000 }, > }; > > + static const struct e_info_regmaskset e_info_8168ep_3_mod_1[] = { > + { 0xd3e2, 0x0fff, 0x0271 }, > + { 0xd3e4, 0x00ff, 0x0000 }, > + { 0xe860, 0x0000, 0x0080 }, > + }; > + > rtl_ephy_init(tp, e_info_8168ep_3); > > rtl_hw_start_8168ep(tp); > @@ -3333,9 +3339,7 @@ static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp) > RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) & ~PFM_EN); > RTL_W8(tp, MISC_1, RTL_R8(tp, MISC_1) & ~PFM_D3COLD_EN); > > - r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x0271); > - r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000); > - r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080); > + r8168_mac_ocp_modify_seq(tp, e_info_8168ep_3_mod_1); > } > > static void rtl_hw_start_8117(struct rtl8169_private *tp) Same here, it' not worth it and makes the code harder to read. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 5/7] r8169: Reduce spinlock contention for the start of RTL8117 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac ` (2 preceding siblings ...) 2023-10-29 18:36 ` [PATCH v5 4/7] r8169: Coalesce mac ocp write and modify for 8168ep " Mirsad Goran Todorovac @ 2023-10-29 18:36 ` Mirsad Goran Todorovac 2023-10-30 13:51 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks Mirsad Goran Todorovac ` (2 subsequent siblings) 6 siblings, 1 reply; 16+ messages in thread From: Mirsad Goran Todorovac @ 2023-10-29 18:36 UTC (permalink / raw) To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mirsad Goran Todorovac, Marco Elver Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in the startup of RTL8168ep involve implicit spin_lock_irqsave() and spin_unlock_irqrestore() on each invocation. This is avoided by grouping unlocked __r8168_mac_ocp_write() and __r8168_mac_ocp_modify() primitives enclosed in the explicit spin_lock_irqsave()/spin_unlock_irqrestore() pair. Even if the lock is not contended, the check requires a LOCK CMPXCHG or a similar instruction, which prevents all cores from accessing the memory bus. Fixes: 1287723aa139b ("r8169: add support for RTL8117") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Marco Elver <elver@google.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- v5: added unlocked primitives to allow mac ocs modify grouping applied coalescing of mac ocp writes/modifies for 8168ep and 8117 some formatting fixes to please checkpatch.pl v4: fixed complaints as advised by Heiner and checkpatch.pl split the patch into five sections to be more easily manipulated and reviewed introduced r8168_mac_ocp_write_seq() applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B v3: removed register/mask pair array sentinels, so using ARRAY_SIZE(). avoided duplication of RTL_W32() call code as advised by Heiner. drivers/net/ethernet/realtek/r8169_main.c | 24 ++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 27016aaeb6a0..50fbacb05953 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -3348,7 +3348,15 @@ static void rtl_hw_start_8117(struct rtl8169_private *tp) { 0x19, 0x0040, 0x1100 }, { 0x59, 0x0040, 0x1100 }, }; + + static const struct e_info_regdata e_info_8117_wr_1[] = { + { 0xe63e, 0x0001 }, + { 0xe63e, 0x0000 }, + { 0xc094, 0x0000 }, + { 0xc09e, 0x0000 }, + }; int rg_saw_cnt; + unsigned long flags; rtl8168ep_stop_cmac(tp); rtl_ephy_init(tp, e_info_8117); @@ -3388,15 +3396,13 @@ static void rtl_hw_start_8117(struct rtl8169_private *tp) r8168_mac_ocp_modify(tp, 0xd412, 0x0fff, sw_cnt_1ms_ini); } - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0070); - r8168_mac_ocp_write(tp, 0xea80, 0x0003); - r8168_mac_ocp_modify(tp, 0xe052, 0x0000, 0x0009); - r8168_mac_ocp_modify(tp, 0xd420, 0x0fff, 0x047f); - - r8168_mac_ocp_write(tp, 0xe63e, 0x0001); - r8168_mac_ocp_write(tp, 0xe63e, 0x0000); - r8168_mac_ocp_write(tp, 0xc094, 0x0000); - r8168_mac_ocp_write(tp, 0xc09e, 0x0000); + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); + __r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0070); + __r8168_mac_ocp_write(tp, 0xea80, 0x0003); + __r8168_mac_ocp_modify(tp, 0xe052, 0x0000, 0x0009); + __r8168_mac_ocp_modify(tp, 0xd420, 0x0fff, 0x047f); + __r8168_mac_ocp_write_seq(tp, e_info_8117_wr_1); + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); /* firmware is for MAC only */ r8169_apply_firmware(tp); -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 5/7] r8169: Reduce spinlock contention for the start of RTL8117 2023-10-29 18:36 ` [PATCH v5 5/7] r8169: Reduce spinlock contention for the start of RTL8117 Mirsad Goran Todorovac @ 2023-10-30 13:51 ` Heiner Kallweit 0 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 13:51 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:36, Mirsad Goran Todorovac wrote: > Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() > in the startup of RTL8168ep involve implicit spin_lock_irqsave() and > spin_unlock_irqrestore() on each invocation. > > This is avoided by grouping unlocked __r8168_mac_ocp_write() and > __r8168_mac_ocp_modify() primitives enclosed in the explicit > spin_lock_irqsave()/spin_unlock_irqrestore() pair. > > Even if the lock is not contended, the check requires a LOCK CMPXCHG > or a similar instruction, which prevents all cores from accessing the > memory bus. > > Fixes: 1287723aa139b ("r8169: add support for RTL8117") > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 24 ++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 27016aaeb6a0..50fbacb05953 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -3348,7 +3348,15 @@ static void rtl_hw_start_8117(struct rtl8169_private *tp) > { 0x19, 0x0040, 0x1100 }, > { 0x59, 0x0040, 0x1100 }, > }; > + > + static const struct e_info_regdata e_info_8117_wr_1[] = { > + { 0xe63e, 0x0001 }, > + { 0xe63e, 0x0000 }, > + { 0xc094, 0x0000 }, > + { 0xc09e, 0x0000 }, > + }; > int rg_saw_cnt; > + unsigned long flags; > > rtl8168ep_stop_cmac(tp); > rtl_ephy_init(tp, e_info_8117); > @@ -3388,15 +3396,13 @@ static void rtl_hw_start_8117(struct rtl8169_private *tp) > r8168_mac_ocp_modify(tp, 0xd412, 0x0fff, sw_cnt_1ms_ini); > } > > - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0070); > - r8168_mac_ocp_write(tp, 0xea80, 0x0003); > - r8168_mac_ocp_modify(tp, 0xe052, 0x0000, 0x0009); > - r8168_mac_ocp_modify(tp, 0xd420, 0x0fff, 0x047f); > - > - r8168_mac_ocp_write(tp, 0xe63e, 0x0001); > - r8168_mac_ocp_write(tp, 0xe63e, 0x0000); > - r8168_mac_ocp_write(tp, 0xc094, 0x0000); > - r8168_mac_ocp_write(tp, 0xc09e, 0x0000); > + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > + __r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0070); > + __r8168_mac_ocp_write(tp, 0xea80, 0x0003); > + __r8168_mac_ocp_modify(tp, 0xe052, 0x0000, 0x0009); > + __r8168_mac_ocp_modify(tp, 0xd420, 0x0fff, 0x047f); > + __r8168_mac_ocp_write_seq(tp, e_info_8117_wr_1); > + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > > /* firmware is for MAC only */ > r8169_apply_firmware(tp); Same ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac ` (3 preceding siblings ...) 2023-10-29 18:36 ` [PATCH v5 5/7] r8169: Reduce spinlock contention for the start of RTL8117 Mirsad Goran Todorovac @ 2023-10-29 18:36 ` Mirsad Goran Todorovac 2023-10-30 14:02 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 7/7] r8169: Coalesce mac ocp write and modify for rtl_hw_init_8125 to reduce spinlock contention Mirsad Goran Todorovac 2023-10-30 13:39 ` [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers " Heiner Kallweit 6 siblings, 1 reply; 16+ messages in thread From: Mirsad Goran Todorovac @ 2023-10-29 18:36 UTC (permalink / raw) To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mirsad Goran Todorovac, Marco Elver Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in the startup of 8125 and 8125B involve implicit spin_lock_irqsave() and spin_unlock_irqrestore() on each invocation. Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle pair of spin_lock_irqsave() and spin_unlock_irqrestore(), these calls reduce overall lock contention. Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125") Fixes: 0439297be9511 ("r8169: add support for RTL8125B") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Marco Elver <elver@google.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- v5: added unlocked primitives to allow mac ocs modify grouping applied coalescing of mac ocp writes/modifies for 8168ep and 8117 some formatting fixes to please checkpatch.pl v4: fixed complaints as advised by Heiner and checkpatch.pl split the patch into five sections to be more easily manipulated and reviewed introduced r8168_mac_ocp_write_seq() applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B v3: removed register/mask pair array sentinels, so using ARRAY_SIZE(). avoided duplication of RTL_W32() call code as advised by Heiner. drivers/net/ethernet/realtek/r8169_main.c | 75 +++++++++++++---------- 1 file changed, 44 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 50fbacb05953..0778cd0ba2e0 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -3553,6 +3553,28 @@ DECLARE_RTL_COND(rtl_mac_ocp_e00e_cond) static void rtl_hw_start_8125_common(struct rtl8169_private *tp) { + static const struct e_info_regmaskset e_info_8125_common_1[] = { + { 0xd3e2, 0x0fff, 0x03a9 }, + { 0xd3e4, 0x00ff, 0x0000 }, + { 0xe860, 0x0000, 0x0080 }, + }; + + static const struct e_info_regmaskset e_info_8125_common_2[] = { + { 0xc0b4, 0x0000, 0x000c }, + { 0xeb6a, 0x00ff, 0x0033 }, + { 0xeb50, 0x03e0, 0x0040 }, + { 0xe056, 0x00f0, 0x0030 }, + { 0xe040, 0x1000, 0x0000 }, + { 0xea1c, 0x0003, 0x0001 }, + { 0xe0c0, 0x4f0f, 0x4403 }, + { 0xe052, 0x0080, 0x0068 }, + { 0xd430, 0x0fff, 0x047f }, + { 0xea1c, 0x0004, 0x0000 }, + { 0xeb54, 0x0000, 0x0001 }, + }; + + unsigned long flags; + rtl_pcie_state_l2l3_disable(tp); RTL_W16(tp, 0x382, 0x221b); @@ -3560,47 +3582,38 @@ static void rtl_hw_start_8125_common(struct rtl8169_private *tp) RTL_W16(tp, 0x4800, 0); /* disable UPS */ - r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); + + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); + __r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); RTL_W8(tp, Config1, RTL_R8(tp, Config1) & ~0x10); - r8168_mac_ocp_write(tp, 0xc140, 0xffff); - r8168_mac_ocp_write(tp, 0xc142, 0xffff); + __r8168_mac_ocp_write(tp, 0xc140, 0xffff); + __r8168_mac_ocp_write(tp, 0xc142, 0xffff); - r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x03a9); - r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000); - r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080); + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_1); /* disable new tx descriptor format */ - r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); + __r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); - if (tp->mac_version == RTL_GIGA_MAC_VER_63) - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); - else - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); + if (tp->mac_version == RTL_GIGA_MAC_VER_63) { + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); + } else { + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); + } + + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_2); + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); - if (tp->mac_version == RTL_GIGA_MAC_VER_63) - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); - else - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); - - r8168_mac_ocp_modify(tp, 0xc0b4, 0x0000, 0x000c); - r8168_mac_ocp_modify(tp, 0xeb6a, 0x00ff, 0x0033); - r8168_mac_ocp_modify(tp, 0xeb50, 0x03e0, 0x0040); - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0030); - r8168_mac_ocp_modify(tp, 0xe040, 0x1000, 0x0000); - r8168_mac_ocp_modify(tp, 0xea1c, 0x0003, 0x0001); - r8168_mac_ocp_modify(tp, 0xe0c0, 0x4f0f, 0x4403); - r8168_mac_ocp_modify(tp, 0xe052, 0x0080, 0x0068); - r8168_mac_ocp_modify(tp, 0xd430, 0x0fff, 0x047f); - - r8168_mac_ocp_modify(tp, 0xea1c, 0x0004, 0x0000); - r8168_mac_ocp_modify(tp, 0xeb54, 0x0000, 0x0001); udelay(1); - r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); - RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); - r8168_mac_ocp_write(tp, 0xe098, 0xc302); + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); + __r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); + RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); + __r8168_mac_ocp_write(tp, 0xe098, 0xc302); + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); rtl_loop_wait_low(tp, &rtl_mac_ocp_e00e_cond, 1000, 10); -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks 2023-10-29 18:36 ` [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks Mirsad Goran Todorovac @ 2023-10-30 14:02 ` Heiner Kallweit 2023-10-30 15:02 ` Mirsad Todorovac 0 siblings, 1 reply; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 14:02 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:36, Mirsad Goran Todorovac wrote: > Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in > the startup of 8125 and 8125B involve implicit spin_lock_irqsave() and > spin_unlock_irqrestore() on each invocation. > > Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and > r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle > pair of spin_lock_irqsave() and spin_unlock_irqrestore(), these calls > reduce overall lock contention. > > Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125") > Fixes: 0439297be9511 ("r8169: add support for RTL8125B") > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 75 +++++++++++++---------- > 1 file changed, 44 insertions(+), 31 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 50fbacb05953..0778cd0ba2e0 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -3553,6 +3553,28 @@ DECLARE_RTL_COND(rtl_mac_ocp_e00e_cond) > > static void rtl_hw_start_8125_common(struct rtl8169_private *tp) > { > + static const struct e_info_regmaskset e_info_8125_common_1[] = { > + { 0xd3e2, 0x0fff, 0x03a9 }, > + { 0xd3e4, 0x00ff, 0x0000 }, > + { 0xe860, 0x0000, 0x0080 }, > + }; > + > + static const struct e_info_regmaskset e_info_8125_common_2[] = { > + { 0xc0b4, 0x0000, 0x000c }, > + { 0xeb6a, 0x00ff, 0x0033 }, > + { 0xeb50, 0x03e0, 0x0040 }, > + { 0xe056, 0x00f0, 0x0030 }, > + { 0xe040, 0x1000, 0x0000 }, > + { 0xea1c, 0x0003, 0x0001 }, > + { 0xe0c0, 0x4f0f, 0x4403 }, > + { 0xe052, 0x0080, 0x0068 }, > + { 0xd430, 0x0fff, 0x047f }, > + { 0xea1c, 0x0004, 0x0000 }, > + { 0xeb54, 0x0000, 0x0001 }, > + }; > + > + unsigned long flags; > + > rtl_pcie_state_l2l3_disable(tp); > > RTL_W16(tp, 0x382, 0x221b); > @@ -3560,47 +3582,38 @@ static void rtl_hw_start_8125_common(struct rtl8169_private *tp) > RTL_W16(tp, 0x4800, 0); > > /* disable UPS */ > - r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); > + > + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > + __r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); > > RTL_W8(tp, Config1, RTL_R8(tp, Config1) & ~0x10); > > - r8168_mac_ocp_write(tp, 0xc140, 0xffff); > - r8168_mac_ocp_write(tp, 0xc142, 0xffff); > + __r8168_mac_ocp_write(tp, 0xc140, 0xffff); > + __r8168_mac_ocp_write(tp, 0xc142, 0xffff); > > - r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x03a9); > - r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000); > - r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080); > + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_1); > > /* disable new tx descriptor format */ > - r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); > + __r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); > > - if (tp->mac_version == RTL_GIGA_MAC_VER_63) > - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); > - else > - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); > + if (tp->mac_version == RTL_GIGA_MAC_VER_63) { > + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); > + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); > + } else { > + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); > + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); > + } > + > + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_2); > + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > > - if (tp->mac_version == RTL_GIGA_MAC_VER_63) > - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); > - else > - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); > - > - r8168_mac_ocp_modify(tp, 0xc0b4, 0x0000, 0x000c); > - r8168_mac_ocp_modify(tp, 0xeb6a, 0x00ff, 0x0033); > - r8168_mac_ocp_modify(tp, 0xeb50, 0x03e0, 0x0040); > - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0030); > - r8168_mac_ocp_modify(tp, 0xe040, 0x1000, 0x0000); > - r8168_mac_ocp_modify(tp, 0xea1c, 0x0003, 0x0001); > - r8168_mac_ocp_modify(tp, 0xe0c0, 0x4f0f, 0x4403); > - r8168_mac_ocp_modify(tp, 0xe052, 0x0080, 0x0068); > - r8168_mac_ocp_modify(tp, 0xd430, 0x0fff, 0x047f); > - > - r8168_mac_ocp_modify(tp, 0xea1c, 0x0004, 0x0000); > - r8168_mac_ocp_modify(tp, 0xeb54, 0x0000, 0x0001); > udelay(1); > - r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); > - RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); > > - r8168_mac_ocp_write(tp, 0xe098, 0xc302); > + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > + __r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); > + RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); > + __r8168_mac_ocp_write(tp, 0xe098, 0xc302); > + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > > rtl_loop_wait_low(tp, &rtl_mac_ocp_e00e_cond, 1000, 10); > All this manual locking and unlocking makes the code harder to read and more error-prone. Maybe, as a rule of thumb: If you can replace a block with more than 10 mac ocp ops, then fine with me. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks 2023-10-30 14:02 ` Heiner Kallweit @ 2023-10-30 15:02 ` Mirsad Todorovac 2023-10-30 15:53 ` Heiner Kallweit 0 siblings, 1 reply; 16+ messages in thread From: Mirsad Todorovac @ 2023-10-30 15:02 UTC (permalink / raw) To: Heiner Kallweit, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 10/30/23 15:02, Heiner Kallweit wrote: > On 29.10.2023 19:36, Mirsad Goran Todorovac wrote: >> Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in >> the startup of 8125 and 8125B involve implicit spin_lock_irqsave() and >> spin_unlock_irqrestore() on each invocation. >> >> Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and >> r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle >> pair of spin_lock_irqsave() and spin_unlock_irqrestore(), these calls >> reduce overall lock contention. >> >> Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125") >> Fixes: 0439297be9511 ("r8169: add support for RTL8125B") >> Cc: Heiner Kallweit <hkallweit1@gmail.com> >> Cc: Marco Elver <elver@google.com> >> Cc: nic_swsd@realtek.com >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: Eric Dumazet <edumazet@google.com> >> Cc: Jakub Kicinski <kuba@kernel.org> >> Cc: Paolo Abeni <pabeni@redhat.com> >> Cc: netdev@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ >> Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ >> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> >> --- >> v5: >> added unlocked primitives to allow mac ocs modify grouping >> applied coalescing of mac ocp writes/modifies for 8168ep and 8117 >> some formatting fixes to please checkpatch.pl >> >> v4: >> fixed complaints as advised by Heiner and checkpatch.pl >> split the patch into five sections to be more easily manipulated and reviewed >> introduced r8168_mac_ocp_write_seq() >> applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B >> >> v3: >> removed register/mask pair array sentinels, so using ARRAY_SIZE(). >> avoided duplication of RTL_W32() call code as advised by Heiner. >> >> drivers/net/ethernet/realtek/r8169_main.c | 75 +++++++++++++---------- >> 1 file changed, 44 insertions(+), 31 deletions(-) >> >> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c >> index 50fbacb05953..0778cd0ba2e0 100644 >> --- a/drivers/net/ethernet/realtek/r8169_main.c >> +++ b/drivers/net/ethernet/realtek/r8169_main.c >> @@ -3553,6 +3553,28 @@ DECLARE_RTL_COND(rtl_mac_ocp_e00e_cond) >> >> static void rtl_hw_start_8125_common(struct rtl8169_private *tp) >> { >> + static const struct e_info_regmaskset e_info_8125_common_1[] = { >> + { 0xd3e2, 0x0fff, 0x03a9 }, >> + { 0xd3e4, 0x00ff, 0x0000 }, >> + { 0xe860, 0x0000, 0x0080 }, >> + }; >> + >> + static const struct e_info_regmaskset e_info_8125_common_2[] = { >> + { 0xc0b4, 0x0000, 0x000c }, >> + { 0xeb6a, 0x00ff, 0x0033 }, >> + { 0xeb50, 0x03e0, 0x0040 }, >> + { 0xe056, 0x00f0, 0x0030 }, >> + { 0xe040, 0x1000, 0x0000 }, >> + { 0xea1c, 0x0003, 0x0001 }, >> + { 0xe0c0, 0x4f0f, 0x4403 }, >> + { 0xe052, 0x0080, 0x0068 }, >> + { 0xd430, 0x0fff, 0x047f }, >> + { 0xea1c, 0x0004, 0x0000 }, >> + { 0xeb54, 0x0000, 0x0001 }, >> + }; >> + >> + unsigned long flags; >> + >> rtl_pcie_state_l2l3_disable(tp); >> >> RTL_W16(tp, 0x382, 0x221b); >> @@ -3560,47 +3582,38 @@ static void rtl_hw_start_8125_common(struct rtl8169_private *tp) >> RTL_W16(tp, 0x4800, 0); >> >> /* disable UPS */ >> - r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); >> + >> + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); >> + __r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); >> >> RTL_W8(tp, Config1, RTL_R8(tp, Config1) & ~0x10); >> >> - r8168_mac_ocp_write(tp, 0xc140, 0xffff); >> - r8168_mac_ocp_write(tp, 0xc142, 0xffff); >> + __r8168_mac_ocp_write(tp, 0xc140, 0xffff); >> + __r8168_mac_ocp_write(tp, 0xc142, 0xffff); >> >> - r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x03a9); >> - r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000); >> - r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080); >> + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_1); >> >> /* disable new tx descriptor format */ >> - r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); >> + __r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); >> >> - if (tp->mac_version == RTL_GIGA_MAC_VER_63) >> - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); >> - else >> - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); >> + if (tp->mac_version == RTL_GIGA_MAC_VER_63) { >> + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); >> + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); >> + } else { >> + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); >> + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); >> + } >> + >> + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_2); >> + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); >> >> - if (tp->mac_version == RTL_GIGA_MAC_VER_63) >> - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); >> - else >> - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); >> - >> - r8168_mac_ocp_modify(tp, 0xc0b4, 0x0000, 0x000c); >> - r8168_mac_ocp_modify(tp, 0xeb6a, 0x00ff, 0x0033); >> - r8168_mac_ocp_modify(tp, 0xeb50, 0x03e0, 0x0040); >> - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0030); >> - r8168_mac_ocp_modify(tp, 0xe040, 0x1000, 0x0000); >> - r8168_mac_ocp_modify(tp, 0xea1c, 0x0003, 0x0001); >> - r8168_mac_ocp_modify(tp, 0xe0c0, 0x4f0f, 0x4403); >> - r8168_mac_ocp_modify(tp, 0xe052, 0x0080, 0x0068); >> - r8168_mac_ocp_modify(tp, 0xd430, 0x0fff, 0x047f); >> - >> - r8168_mac_ocp_modify(tp, 0xea1c, 0x0004, 0x0000); >> - r8168_mac_ocp_modify(tp, 0xeb54, 0x0000, 0x0001); [1] I think we have a candidate for a squeeze here in this driver. >> udelay(1); >> - r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); >> - RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); >> >> - r8168_mac_ocp_write(tp, 0xe098, 0xc302); >> + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); >> + __r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); >> + RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); >> + __r8168_mac_ocp_write(tp, 0xe098, 0xc302); >> + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); >> >> rtl_loop_wait_low(tp, &rtl_mac_ocp_e00e_cond, 1000, 10); >> > > All this manual locking and unlocking makes the code harder > to read and more error-prone. Maybe, as a rule of thumb: > If you can replace a block with more than 10 mac ocp ops, > then fine with me As I worked with another German developer, Mr. Frank Heckenbach from the GNU Pascal project, I know that Germans are pedantic and reliable :-) If this rtl_hw_start_8125_common() is called only once, then maybe every memory bus cycle isn't worth saving, and then maybe the additional complexity isn't worth adding (but it was fun doing, and it works with my NIC). AFAIK, a spin_lock_irqsave()/spin_unlock_irqrestore() isn't a free lunch as you know, and I read from the manuals that on modern CPUs a locked ADD $0, -128(%esp) or something takes about 50 clock cycles, in which all cores have to wait. Doing that in storm of 10 lock/unlock pairs amounts to 500 cycles or 125 ns in the best case on a 4 GHz CPU. But I trust that you as the maintainer have the big picture and greater insight in the actual hw. Elon Musk said that the greatest error in engineering is optimising stuff that never should have been written. As I like my RTL 8125b NIC to be fast (TM), I am not giving up on it. Just recalled that you mentioned that it supports RSS. Is it feasible to add it to the r8169 driver? It is OK for me that we have a formal and less formal mode of communication and switch between them. Thank you for all your quick reviews, much obliged, thank you for partially accepting the patches, but I think I need to sleep over it before submitting a new version. Regards, Mirsad Todorovac ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks 2023-10-30 15:02 ` Mirsad Todorovac @ 2023-10-30 15:53 ` Heiner Kallweit 0 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 15:53 UTC (permalink / raw) To: Mirsad Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 30.10.2023 16:02, Mirsad Todorovac wrote: > > > On 10/30/23 15:02, Heiner Kallweit wrote: >> On 29.10.2023 19:36, Mirsad Goran Todorovac wrote: >>> Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in >>> the startup of 8125 and 8125B involve implicit spin_lock_irqsave() and >>> spin_unlock_irqrestore() on each invocation. >>> >>> Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and >>> r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle >>> pair of spin_lock_irqsave() and spin_unlock_irqrestore(), these calls >>> reduce overall lock contention. >>> >>> Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125") >>> Fixes: 0439297be9511 ("r8169: add support for RTL8125B") >>> Cc: Heiner Kallweit <hkallweit1@gmail.com> >>> Cc: Marco Elver <elver@google.com> >>> Cc: nic_swsd@realtek.com >>> Cc: "David S. Miller" <davem@davemloft.net> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Cc: Jakub Kicinski <kuba@kernel.org> >>> Cc: Paolo Abeni <pabeni@redhat.com> >>> Cc: netdev@vger.kernel.org >>> Cc: linux-kernel@vger.kernel.org >>> Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ >>> Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ >>> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> >>> --- >>> v5: >>> added unlocked primitives to allow mac ocs modify grouping >>> applied coalescing of mac ocp writes/modifies for 8168ep and 8117 >>> some formatting fixes to please checkpatch.pl >>> >>> v4: >>> fixed complaints as advised by Heiner and checkpatch.pl >>> split the patch into five sections to be more easily manipulated and reviewed >>> introduced r8168_mac_ocp_write_seq() >>> applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B >>> >>> v3: >>> removed register/mask pair array sentinels, so using ARRAY_SIZE(). >>> avoided duplication of RTL_W32() call code as advised by Heiner. >>> >>> drivers/net/ethernet/realtek/r8169_main.c | 75 +++++++++++++---------- >>> 1 file changed, 44 insertions(+), 31 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c >>> index 50fbacb05953..0778cd0ba2e0 100644 >>> --- a/drivers/net/ethernet/realtek/r8169_main.c >>> +++ b/drivers/net/ethernet/realtek/r8169_main.c >>> @@ -3553,6 +3553,28 @@ DECLARE_RTL_COND(rtl_mac_ocp_e00e_cond) >>> static void rtl_hw_start_8125_common(struct rtl8169_private *tp) >>> { >>> + static const struct e_info_regmaskset e_info_8125_common_1[] = { >>> + { 0xd3e2, 0x0fff, 0x03a9 }, >>> + { 0xd3e4, 0x00ff, 0x0000 }, >>> + { 0xe860, 0x0000, 0x0080 }, >>> + }; >>> + >>> + static const struct e_info_regmaskset e_info_8125_common_2[] = { >>> + { 0xc0b4, 0x0000, 0x000c }, >>> + { 0xeb6a, 0x00ff, 0x0033 }, >>> + { 0xeb50, 0x03e0, 0x0040 }, >>> + { 0xe056, 0x00f0, 0x0030 }, >>> + { 0xe040, 0x1000, 0x0000 }, >>> + { 0xea1c, 0x0003, 0x0001 }, >>> + { 0xe0c0, 0x4f0f, 0x4403 }, >>> + { 0xe052, 0x0080, 0x0068 }, >>> + { 0xd430, 0x0fff, 0x047f }, >>> + { 0xea1c, 0x0004, 0x0000 }, >>> + { 0xeb54, 0x0000, 0x0001 }, >>> + }; >>> + >>> + unsigned long flags; >>> + >>> rtl_pcie_state_l2l3_disable(tp); >>> RTL_W16(tp, 0x382, 0x221b); >>> @@ -3560,47 +3582,38 @@ static void rtl_hw_start_8125_common(struct rtl8169_private *tp) >>> RTL_W16(tp, 0x4800, 0); >>> /* disable UPS */ >>> - r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); >>> + >>> + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); >>> + __r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000); >>> RTL_W8(tp, Config1, RTL_R8(tp, Config1) & ~0x10); >>> - r8168_mac_ocp_write(tp, 0xc140, 0xffff); >>> - r8168_mac_ocp_write(tp, 0xc142, 0xffff); >>> + __r8168_mac_ocp_write(tp, 0xc140, 0xffff); >>> + __r8168_mac_ocp_write(tp, 0xc142, 0xffff); >>> - r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x03a9); >>> - r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000); >>> - r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080); >>> + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_1); >>> /* disable new tx descriptor format */ >>> - r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); >>> + __r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000); >>> - if (tp->mac_version == RTL_GIGA_MAC_VER_63) >>> - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); >>> - else >>> - r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); >>> + if (tp->mac_version == RTL_GIGA_MAC_VER_63) { >>> + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0200); >>> + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); >>> + } else { >>> + __r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400); >>> + __r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); >>> + } >>> + >>> + __r8168_mac_ocp_modify_seq(tp, e_info_8125_common_2); >>> + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); >>> - if (tp->mac_version == RTL_GIGA_MAC_VER_63) >>> - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0000); >>> - else >>> - r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020); >>> - >>> - r8168_mac_ocp_modify(tp, 0xc0b4, 0x0000, 0x000c); >>> - r8168_mac_ocp_modify(tp, 0xeb6a, 0x00ff, 0x0033); >>> - r8168_mac_ocp_modify(tp, 0xeb50, 0x03e0, 0x0040); >>> - r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0030); >>> - r8168_mac_ocp_modify(tp, 0xe040, 0x1000, 0x0000); >>> - r8168_mac_ocp_modify(tp, 0xea1c, 0x0003, 0x0001); >>> - r8168_mac_ocp_modify(tp, 0xe0c0, 0x4f0f, 0x4403); >>> - r8168_mac_ocp_modify(tp, 0xe052, 0x0080, 0x0068); >>> - r8168_mac_ocp_modify(tp, 0xd430, 0x0fff, 0x047f); >>> - >>> - r8168_mac_ocp_modify(tp, 0xea1c, 0x0004, 0x0000); >>> - r8168_mac_ocp_modify(tp, 0xeb54, 0x0000, 0x0001); > > [1] I think we have a candidate for a squeeze here in this driver. > >>> udelay(1); >>> - r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); >>> - RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); >>> - r8168_mac_ocp_write(tp, 0xe098, 0xc302); >>> + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); >>> + __r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000); >>> + RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030); >>> + __r8168_mac_ocp_write(tp, 0xe098, 0xc302); >>> + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); >>> rtl_loop_wait_low(tp, &rtl_mac_ocp_e00e_cond, 1000, 10); >>> >> >> All this manual locking and unlocking makes the code harder >> to read and more error-prone. Maybe, as a rule of thumb: >> If you can replace a block with more than 10 mac ocp ops, >> then fine with me > As I worked with another German developer, Mr. Frank Heckenbach from the GNU Pascal project, > I know that Germans are pedantic and reliable :-) > > If this rtl_hw_start_8125_common() is called only once, then maybe every memory bus cycle > isn't worth saving, and then maybe the additional complexity isn't worth adding (but it > was fun doing, and it works with my NIC). > > AFAIK, a spin_lock_irqsave()/spin_unlock_irqrestore() isn't a free lunch as you know, and I read > from the manuals that on modern CPUs a locked ADD $0, -128(%esp) or something takes about 50 > clock cycles, in which all cores have to wait. > > Doing that in storm of 10 lock/unlock pairs amounts to 500 cycles or 125 ns in the best case > on a 4 GHz CPU. > > But I trust that you as the maintainer have the big picture and greater insight in the actual hw. > Big picture: maybe, as I've been working on r8169 for about 5 yrs Greater insight in the actual hw: not really, because Realtek doesn't publish datasheets and errata information. I just know the history of a lot of needed quirks in r8169, and I can recall what we tried to improve and had to roll back because users had problems. Most famous example is ASPM L1 + sub-state support. ASPM with r8169-driven NIC versions has a long history of users facing slow network connectivity / tx timeouts / complete NIC stalls. And it's still not solved with the newest NIC versions. But maybe not only Realtek is to blame. Basically every consumer mainboard comes with a Realtek NIC, chipset issues and BIOS bugs contribute to the mess. > Elon Musk said that the greatest error in engineering is optimising stuff that never should > have been written. > > As I like my RTL 8125b NIC to be fast (TM), I am not giving up on it. > Does this mean that you're not achieving the 2500Mbps line rate with r8169 as-is? > Just recalled that you mentioned that it supports RSS. Is it feasible to add it to the r8169 > driver? > Feasible: definitely, as you "just" have to take the RSS-related code from r8125 and adjust it to meet mainline code quality criteria Question is which actual benefit it brings, and whether it's worth the additional complexity. As a starting point you could compare performance KPI's when using r8169 vs. r8125. > It is OK for me that we have a formal and less formal mode of communication and switch between > them. > > Thank you for all your quick reviews, much obliged, thank you for partially accepting the patches, > but I think I need to sleep over it before submitting a new version. > > Regards, > Mirsad Todorovac ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 7/7] r8169: Coalesce mac ocp write and modify for rtl_hw_init_8125 to reduce spinlock contention 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac ` (4 preceding siblings ...) 2023-10-29 18:36 ` [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks Mirsad Goran Todorovac @ 2023-10-29 18:36 ` Mirsad Goran Todorovac 2023-10-30 14:03 ` Heiner Kallweit 2023-10-30 13:39 ` [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers " Heiner Kallweit 6 siblings, 1 reply; 16+ messages in thread From: Mirsad Goran Todorovac @ 2023-10-29 18:36 UTC (permalink / raw) To: Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, Heiner Kallweit, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mirsad Goran Todorovac, Marco Elver Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in the init sequence of the 8125 involve implicit spin_lock_irqsave() and spin_unlock_irqrestore() on each invocation. Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle pair of spin_lock_irqsave() and spin_unlock_irqrestore(), these calls reduce overall lock contention. Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125") Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Marco Elver <elver@google.com> Cc: nic_swsd@realtek.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- v5: added unlocked primitives to allow mac ocs modify grouping applied coalescing of mac ocp writes/modifies for 8168ep and 8117 some formatting fixes to please checkpatch.pl v4: fixed complaints as advised by Heiner and checkpatch.pl split the patch into five sections to be more easily manipulated and reviewed introduced r8168_mac_ocp_write_seq() applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B v3: removed register/mask pair array sentinels, so using ARRAY_SIZE(). avoided duplication of RTL_W32() call code as advised by Heiner. drivers/net/ethernet/realtek/r8169_main.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 0778cd0ba2e0..76f0f1e13909 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -5089,6 +5089,12 @@ static void rtl_hw_init_8168g(struct rtl8169_private *tp) static void rtl_hw_init_8125(struct rtl8169_private *tp) { + static const struct e_info_regdata hw_init_8125_1[] = { + { 0xc0aa, 0x07d0 }, + { 0xc0a6, 0x0150 }, + { 0xc01e, 0x5555 }, + }; + rtl_enable_rxdvgate(tp); RTL_W8(tp, ChipCmd, RTL_R8(tp, ChipCmd) & ~(CmdTxEnb | CmdRxEnb)); @@ -5098,9 +5104,7 @@ static void rtl_hw_init_8125(struct rtl8169_private *tp) r8168_mac_ocp_modify(tp, 0xe8de, BIT(14), 0); r8168g_wait_ll_share_fifo_ready(tp); - r8168_mac_ocp_write(tp, 0xc0aa, 0x07d0); - r8168_mac_ocp_write(tp, 0xc0a6, 0x0150); - r8168_mac_ocp_write(tp, 0xc01e, 0x5555); + r8168_mac_ocp_write_seq(tp, hw_init_8125_1); r8168g_wait_ll_share_fifo_ready(tp); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 7/7] r8169: Coalesce mac ocp write and modify for rtl_hw_init_8125 to reduce spinlock contention 2023-10-29 18:36 ` [PATCH v5 7/7] r8169: Coalesce mac ocp write and modify for rtl_hw_init_8125 to reduce spinlock contention Mirsad Goran Todorovac @ 2023-10-30 14:03 ` Heiner Kallweit 0 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 14:03 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:36, Mirsad Goran Todorovac wrote: > Repeated calls to r8168_mac_ocp_write() and r8168_mac_ocp_modify() in > the init sequence of the 8125 involve implicit spin_lock_irqsave() and > spin_unlock_irqrestore() on each invocation. > > Coalesced with the corresponding helpers r8168_mac_ocp_write_seq() and > r8168_mac_ocp_modify_seq() into sequential write or modidy with a sinqle > pair of spin_lock_irqsave() and spin_unlock_irqrestore(), these calls > reduce overall lock contention. > > Fixes: f1bce4ad2f1ce ("r8169: add support for RTL8125") > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 0778cd0ba2e0..76f0f1e13909 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -5089,6 +5089,12 @@ static void rtl_hw_init_8168g(struct rtl8169_private *tp) > > static void rtl_hw_init_8125(struct rtl8169_private *tp) > { > + static const struct e_info_regdata hw_init_8125_1[] = { > + { 0xc0aa, 0x07d0 }, > + { 0xc0a6, 0x0150 }, > + { 0xc01e, 0x5555 }, > + }; > + > rtl_enable_rxdvgate(tp); > > RTL_W8(tp, ChipCmd, RTL_R8(tp, ChipCmd) & ~(CmdTxEnb | CmdRxEnb)); > @@ -5098,9 +5104,7 @@ static void rtl_hw_init_8125(struct rtl8169_private *tp) > r8168_mac_ocp_modify(tp, 0xe8de, BIT(14), 0); > r8168g_wait_ll_share_fifo_ready(tp); > > - r8168_mac_ocp_write(tp, 0xc0aa, 0x07d0); > - r8168_mac_ocp_write(tp, 0xc0a6, 0x0150); > - r8168_mac_ocp_write(tp, 0xc01e, 0x5555); > + r8168_mac_ocp_write_seq(tp, hw_init_8125_1); > r8168g_wait_ll_share_fifo_ready(tp); > } > As for few other patches in the series, not really worth it. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac ` (5 preceding siblings ...) 2023-10-29 18:36 ` [PATCH v5 7/7] r8169: Coalesce mac ocp write and modify for rtl_hw_init_8125 to reduce spinlock contention Mirsad Goran Todorovac @ 2023-10-30 13:39 ` Heiner Kallweit 6 siblings, 0 replies; 16+ messages in thread From: Heiner Kallweit @ 2023-10-30 13:39 UTC (permalink / raw) To: Mirsad Goran Todorovac, Jason Gunthorpe, Joerg Roedel, Lu Baolu, iommu, linux-kernel, netdev Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Marco Elver On 29.10.2023 19:35, Mirsad Goran Todorovac wrote: > A pair of new helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() > are introduced. > At first one more formal remark: series is missing a cover letter > The motivation for these helpers was the locking overhead of 130 consecutive > r8168_mac_ocp_write() calls in the RTL8411b reset after the NIC gets confused > if the PHY is powered-down. > > To quote Heiner: > > On RTL8411b the RX unit gets confused if the PHY is powered-down. > This was reported in [0] and confirmed by Realtek. Realtek provided > a sequence to fix the RX unit after PHY wakeup. > > A series of about 130 r8168_mac_ocp_write() calls is performed to program the > RTL registers for recovery, each doing an expensive spin_lock_irqsave() and > spin_unlock_irqrestore(). > > Each mac ocp write is made of: > > static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, > u32 data) > { > if (rtl_ocp_reg_failure(reg)) > return; > > RTL_W32(tp, OCPDR, OCPAR_FLAG | (reg << 15) | data); > } > > static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, > u32 data) > { > unsigned long flags; > > raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > __r8168_mac_ocp_write(tp, reg, data); > raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > } > > Register programming is done through RTL_W32() macro which expands into > > #define RTL_W32(tp, reg, val32) writel((val32), tp->mmio_addr + (reg)) > > which is further (on Alpha): > > extern inline void writel(u32 b, volatile void __iomem *addr) > { > mb(); > __raw_writel(b, addr); > } > > or on i386/x86_64: > > #define build_mmio_write(name, size, type, reg, barrier) \ > static inline void name(type val, volatile void __iomem *addr) \ > { asm volatile("mov" size " %0,%1": :reg (val), \ > "m" (*(volatile type __force *)addr) barrier); } > > build_mmio_write(writel, "l", unsigned int, "r", :"memory") > > This obviously involves iat least a compiler barrier. > > mb() expands into something like this i.e. on x86_64: > > #define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory") > > This means a whole lot of memory bus stalls: for spin_lock_irqsave(), > memory barrier, writel(), and spin_unlock_irqrestore(). > > With about 130 of these sequential calls to r8168_mac_ocp_write() this looks like > a lock storm that will stall all of the cores and CPUs on the same memory controller > for certain time I/O takes to finish. > > In a sequential case of RTL register programming, the writes to RTL registers > can be coalesced under a same raw spinlock. This can dramatically decrease the > number of bus stalls in a multicore or multi-CPU system. > > Macro helpers r8168_mac_ocp_write_seq() and r8168_mac_ocp_modify_seq() are > provided to reduce lock contention: > > static void rtl_hw_start_8411_2(struct rtl8169_private *tp) > { > > ... > > /* The following Realtek-provided magic fixes an issue with the RX unit > * getting confused after the PHY having been powered-down. > */ > > static const struct recover_8411b_info init_zero_seq[] = { > { 0xFC28, 0x0000 }, { 0xFC2A, 0x0000 }, { 0xFC2C, 0x0000 }, > ... > }; > > ... > > r8168_mac_ocp_write_seq(tp, init_zero_seq); > > ... > > } > > The hex data is preserved intact through s/r8168_mac_ocp_write[(]tp,/{ / and s/[)];/ },/ > functions that only changed the function names and the ending of the line, so the actual > hex data is unchanged. > > To repeat, the reason for the introduction of the original commit > was to enable recovery of the RX unit on the RTL8411b which was confused by the > powered-down PHY. This sequence of r8168_mac_ocp_write() calls amplifies the problem > into a series of about 500+ memory bus locks, most waiting for the main memory read, > modify and write under a LOCK. The memory barrier in RTL_W32 should suffice for > the programming sequence to reach RTL NIC registers. > > [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075 > The motivation for the original change isn't relevant here. In general I'd prefer a shorter commit message that comes to the point immediately. > Fixes: fe4e8db0392a6 ("r8169: fix issue with confused RX unit after PHY power-down on RTL8411b") > Fixes: 91c8643578a21 ("r8169: use spinlock to protect mac ocp register access") > Fixes: d6c36cbc5e533 ("r8169: Use a raw_spinlock_t for the register locks.") Please submit the patches as net-next material and remove the Fixes tags as there's no evidence of any actual issue. See stable submission criteria: https://www.kernel.org/doc/html/next/process/stable-kernel-rules.html#:~:text=It%20or%20an%20equivalent%20fix,%2Fprocess%2Fsubmitting%2Dpatches. No "This could be a problem..." type of things like a "theoretical race condition", unless an explanation of how the bug can be exploited is also provided. See > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Marco Elver <elver@google.com> > Cc: nic_swsd@realtek.com > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Link: https://lore.kernel.org/lkml/20231028005153.2180411-1-mirsad.todorovac@alu.unizg.hr/ > Link: https://lore.kernel.org/lkml/20231028110459.2644926-1-mirsad.todorovac@alu.unizg.hr/ > Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> > --- > v5: > added unlocked primitives to allow mac ocs modify grouping > applied coalescing of mac ocp writes/modifies for 8168ep and 8117 > some formatting fixes to please checkpatch.pl > > v4: > fixed complaints as advised by Heiner and checkpatch.pl > split the patch into five sections to be more easily manipulated and reviewed > introduced r8168_mac_ocp_write_seq() > applied coalescing of mac ocp writes/modifies for 8168H, 8125 and 8125B > > v3: > removed register/mask pair array sentinels, so using ARRAY_SIZE(). > avoided duplication of RTL_W32() call code as advised by Heiner. > > drivers/net/ethernet/realtek/r8169_main.c | 71 +++++++++++++++++++++-- > 1 file changed, 66 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 361b90007148..da1f5d1b4fd5 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -888,7 +888,7 @@ static int r8168_phy_ocp_read(struct rtl8169_private *tp, u32 reg) > (RTL_R32(tp, GPHY_OCP) & 0xffff) : -ETIMEDOUT; > } > > -static void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data) > +static inline void __r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data) No inline declarations please, let the compiler decide. > { > if (rtl_ocp_reg_failure(reg)) > return; > @@ -905,7 +905,7 @@ static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data) > raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > } > > -static u16 __r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg) > +static inline u16 __r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg) > { > if (rtl_ocp_reg_failure(reg)) > return 0; > @@ -927,18 +927,79 @@ static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg) > return val; > } > > +static inline void __r8168_mac_ocp_modify(struct rtl8169_private *tp, u32 reg, u16 mask, > + u16 set) > +{ > + u16 data; > + > + data = __r8168_mac_ocp_read(tp, reg); > + __r8168_mac_ocp_write(tp, reg, (data & ~mask) | set); > +} > + > static void r8168_mac_ocp_modify(struct rtl8169_private *tp, u32 reg, u16 mask, > u16 set) > { > unsigned long flags; > - u16 data; > > raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > - data = __r8168_mac_ocp_read(tp, reg); > - __r8168_mac_ocp_write(tp, reg, (data & ~mask) | set); > + __r8168_mac_ocp_modify(tp, reg, mask, set); > raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > } > > +struct e_info_regdata { What does the e prefix stand for? Maybe macocp_info_write and macocp_info_modify would be better names and more in line with e.g. struct ephy_info. > + u32 reg; > + u32 data; > +}; > + > +struct e_info_regmaskset { > + u32 reg; > + u16 mask; > + u16 set; > +}; > + > +static void __r8168_mac_ocp_write_seqlen(struct rtl8169_private *tp, > + const struct e_info_regdata *array, int len) > +{ > + struct e_info_regdata const *p; > + > + for (p = array; len--; p++) > + __r8168_mac_ocp_write(tp, p->reg, p->data); > +} > + > +static void r8168_mac_ocp_write_seqlen(struct rtl8169_private *tp, > + const struct e_info_regdata *array, int len) > +{ > + unsigned long flags; > + > + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > + __r8168_mac_ocp_write_seqlen(tp, array, len); > + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > +} > + > +static void __r8168_mac_ocp_modify_seqlen(struct rtl8169_private *tp, > + const struct e_info_regmaskset *array, int len) > +{ > + struct e_info_regmaskset const *p; > + > + for (p = array; len--; p++) > + __r8168_mac_ocp_modify(tp, p->reg, p->mask, p->set); > +} > + > +static void r8168_mac_ocp_modify_seqlen(struct rtl8169_private *tp, > + const struct e_info_regmaskset *array, int len) > +{ > + unsigned long flags; > + > + raw_spin_lock_irqsave(&tp->mac_ocp_lock, flags); > + __r8168_mac_ocp_modify_seqlen(tp, array, len); > + raw_spin_unlock_irqrestore(&tp->mac_ocp_lock, flags); > +} > + > +#define r8168_mac_ocp_write_seq(tp, a) r8168_mac_ocp_write_seqlen(tp, a, ARRAY_SIZE(a)) > +#define r8168_mac_ocp_modify_seq(tp, a) r8168_mac_ocp_modify_seqlen(tp, a, ARRAY_SIZE(a)) > +#define __r8168_mac_ocp_write_seq(tp, a) __r8168_mac_ocp_write_seqlen(tp, a, ARRAY_SIZE(a)) > +#define __r8168_mac_ocp_modify_seq(tp, a) __r8168_mac_ocp_modify_seqlen(tp, a, ARRAY_SIZE(a)) > + > /* Work around a hw issue with RTL8168g PHY, the quirk disables > * PHY MCU interrupts before PHY power-down. > */ ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-10-30 15:54 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-29 18:35 [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers to reduce spinlock contention Mirsad Goran Todorovac 2023-10-29 18:35 ` [PATCH v5 2/7] r8169: Coalesce RTL8411b PHY power-down recovery calls " Mirsad Goran Todorovac 2023-10-30 13:48 ` Heiner Kallweit 2023-10-29 18:35 ` [PATCH v5 3/7] r8169: Coalesce mac ocp write and modify for 8168H start " Mirsad Goran Todorovac 2023-10-30 13:50 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 4/7] r8169: Coalesce mac ocp write and modify for 8168ep " Mirsad Goran Todorovac 2023-10-30 13:51 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 5/7] r8169: Reduce spinlock contention for the start of RTL8117 Mirsad Goran Todorovac 2023-10-30 13:51 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks Mirsad Goran Todorovac 2023-10-30 14:02 ` Heiner Kallweit 2023-10-30 15:02 ` Mirsad Todorovac 2023-10-30 15:53 ` Heiner Kallweit 2023-10-29 18:36 ` [PATCH v5 7/7] r8169: Coalesce mac ocp write and modify for rtl_hw_init_8125 to reduce spinlock contention Mirsad Goran Todorovac 2023-10-30 14:03 ` Heiner Kallweit 2023-10-30 13:39 ` [PATCH v5 1/7] r8169: Add r8169_mac_ocp_(write|modify)_seq helpers " Heiner Kallweit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox