Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 17/22] dt-bindings: clock: Add StarFive JHB100 Peripheral-2 clock and reset generator
From: Changhuang Liang @ 2026-05-08  5:36 UTC (permalink / raw)
  To: Michael Turquette, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Philipp Zabel, Emil Renner Berthing, Kees Cook,
	Gustavo A . R . Silva, Richard Cochran
  Cc: linux-clk, linux-kernel, devicetree, linux-riscv, linux-hardening,
	netdev, Sia Jee Heng, Hal Feng, Ley Foon Tan, Changhuang Liang
In-Reply-To: <20260508053632.818548-1-changhuang.liang@starfivetech.com>

Add bindings for the Peripheral-2 clock and reset generator (PER2CRG)
on the JHB100 RISC-V SoC by StarFive Ltd.

Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
---
 .../clock/starfive,jhb100-per2crg.yaml        | 76 +++++++++++++++++++
 .../dt-bindings/clock/starfive,jhb100-crg.h   | 57 ++++++++++++++
 .../dt-bindings/reset/starfive,jhb100-crg.h   | 17 +++++
 3 files changed, 150 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml

diff --git a/Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml b/Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
new file mode 100644
index 000000000000..3c266bc2eac2
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
@@ -0,0 +1,76 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/starfive,jhb100-per2crg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: StarFive JHB100 Peripheral-2 Clock and Reset Generator
+
+maintainers:
+  - Changhuang Liang <changhuang.liang@starfivetech.com>
+
+properties:
+  compatible:
+    const: starfive,jhb100-per2crg
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    items:
+      - description: Non Coherent NOC Initiator
+      - description: Configure 400MHz
+      - description: Configure 125MHz
+      - description: GMAC2 RGMII RX
+      - description: GMAC2 RMII Reference
+      - description: GMAC3 SGMII TX
+      - description: GMAC3 SGMII RX
+      - description: Main Oscillator (25 MHz)
+
+  clock-names:
+    items:
+      - const: ncnoc_init
+      - const: cfg_400
+      - const: cfg_125
+      - const: gmac2_rgmii_rx
+      - const: gmac2_rmii_ref
+      - const: gmac3_sgmii_tx
+      - const: gmac3_sgmii_rx
+      - const: osc
+
+  '#clock-cells':
+    const: 1
+    description:
+      See <dt-bindings/clock/starfive,jhb100-crg.h> for valid indices.
+
+  '#reset-cells':
+    const: 1
+    description:
+      See <dt-bindings/reset/starfive-jhb100-crg.h> for valid indices.
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - clock-names
+  - '#clock-cells'
+  - '#reset-cells'
+
+additionalProperties: false
+
+examples:
+  - |
+    clock-controller@11bc0000 {
+      compatible = "starfive,jhb100-per2crg";
+      reg = <0x11bc0000 0x1000>;
+      clocks = <&sys0crg 52>, <&sys0crg 54>, <&sys0crg 55>,
+               <&per2_gmac2_rgmii_rx>, <&per2_gmac2_rmii_ref>,
+               <&per2_gmac3_sgmii_tx>, <&per2_gmac3_sgmii_rx>,
+               <&osc>;
+      clock-names = "ncnoc_init", "cfg_400", "cfg_125",
+                    "gmac2_rgmii_rx", "gmac2_rmii_ref",
+                    "gmac3_sgmii_tx", "gmac3_sgmii_rx",
+                    "osc";
+      #clock-cells = <1>;
+      #reset-cells = <1>;
+    };
diff --git a/include/dt-bindings/clock/starfive,jhb100-crg.h b/include/dt-bindings/clock/starfive,jhb100-crg.h
index 7f508574177c..2b2e148ce5ce 100644
--- a/include/dt-bindings/clock/starfive,jhb100-crg.h
+++ b/include/dt-bindings/clock/starfive,jhb100-crg.h
@@ -447,4 +447,61 @@
 #define JHB100_PER1CLK_MAIN_ICG_EN_RAS			75
 #define JHB100_PER1CLK_MAIN_ICG_EN_UFS			76
 
+/* PER2CRG clocks */
+#define JHB100_PER2CLK_300				0
+#define JHB100_PER2CLK_100				1
+#define JHB100_PER2CLK_50				2
+#define JHB100_PER2CLK_GMAC2_RMII_50			3
+#define JHB100_PER2CLK_CAN0_CORE_DIV			4
+#define JHB100_PER2CLK_CAN1_CORE_DIV			5
+#define JHB100_PER2CLK_CAN0_TIMER			6
+#define JHB100_PER2CLK_CAN1_TIMER			7
+
+#define JHB100_PER2CLK_RTC_CORE_DIV			11
+#define JHB100_PER2CLK_GMAC2_RMII_MUX_DLY		12
+#define JHB100_PER2CLK_GMAC2_RMII_DIV			13
+
+#define JHB100_PER2CLK_GMAC2_RGMII_125_MUX		15
+#define JHB100_PER2CLK_GMAC2_RGMII_DIV			16
+#define JHB100_PER2CLK_GMAC2_TX_MUX			17
+#define JHB100_PER2CLK_GMAC2_TX_180_BUF			18
+#define JHB100_PER2CLK_GMAC2_RX_MUX_DLY			19
+#define JHB100_PER2CLK_GMAC2_RX_180_BUF			20
+#define JHB100_PER2CLK_GMAC2_TXCK_MUX_DLY		21
+#define JHB100_PER2CLK_GMAC3_TX_125_MUX			22
+#define JHB100_PER2CLK_GMAC3_RX_125_MUX			23
+#define JHB100_PER2CLK_GMAC3_TX_DIV			24
+#define JHB100_PER2CLK_GMAC3_RX_DIV			25
+#define JHB100_PER2CLK_SENSORS_PERIPH2			26
+
+#define JHB100_PER2CLK_FAN_TACH_PCLK			33
+
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_TX_I		44
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_RX_I		45
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_TX_180_I	46
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_RX_180_I	47
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_PTP_REF_I	48
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_RMII_I	49
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_CSR_I	50
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_ACLK_I	51
+#define JHB100_PER2CLK_RMIIANDRGMII_IOMUX_GMAC2_TXCK	52
+#define JHB100_PER2CLK_ETHER1_SGMII_TX_I		53
+#define JHB100_PER2CLK_ETHER1_SGMII_RX_I		54
+#define JHB100_PER2CLK_ETHER1_SGMII_TX_125_I		55
+#define JHB100_PER2CLK_ETHER1_SGMII_RX_125_I		56
+#define JHB100_PER2CLK_ETHER1_SGMII_PTP_REF_I		57
+#define JHB100_PER2CLK_ETHER1_SGMII_CSR_I		58
+#define JHB100_PER2CLK_ETHER1_SGMII_ACLK_I		59
+#define JHB100_PER2CLK_ETHER1_SGMII_PHY_PCLK_I		60
+#define JHB100_PER2CLK_ETHER1_SGMII_REF_25_I		61
+#define JHB100_PER2CLK_MAIN_ICG_EN_CAN0			62
+#define JHB100_PER2CLK_MAIN_ICG_EN_CAN1			63
+
+#define JHB100_PER2CLK_MAIN_ICG_EN_DMAC_8CH		65
+#define JHB100_PER2CLK_MAIN_ICG_EN_RTC_SCAN		66
+#define JHB100_PER2CLK_MAIN_ICG_EN_ADC0			67
+#define JHB100_PER2CLK_MAIN_ICG_EN_ADC1			68
+#define JHB100_PER2CLK_MAIN_ICG_EN_GMAC2		69
+#define JHB100_PER2CLK_MAIN_ICG_EN_GMAC3		70
+
 #endif /* __DT_BINDINGS_CLOCK_STARFIVE_JHB100_H__ */
diff --git a/include/dt-bindings/reset/starfive,jhb100-crg.h b/include/dt-bindings/reset/starfive,jhb100-crg.h
index cf933a1befbb..0965f3798397 100644
--- a/include/dt-bindings/reset/starfive,jhb100-crg.h
+++ b/include/dt-bindings/reset/starfive,jhb100-crg.h
@@ -157,4 +157,21 @@
 #define JHB100_PER1RST_MAIN_RSTN_DMAC_SPI0				15
 #define JHB100_PER1RST_MAIN_RSTN_PERIPH1_RAS				16
 
+/* PER2CRG resets */
+#define JHB100_PER2RST_IOMUX_PRESETN					0
+#define JHB100_PER2RST_POK_IOMUX_PRESETN				1
+#define JHB100_PER2RST_SYSREG_RSTN					2
+#define JHB100_PER2RST_MAIN_RSTN_CAN0					3
+#define JHB100_PER2RST_MAIN_RSTN_CAN1					4
+#define JHB100_PER2RST_FAN_TACH_PRESETN					5
+#define JHB100_PER2RST_MAIN_RSTN_GMAC2					6
+#define JHB100_PER2RST_MAIN_RSTN_GMAC3					7
+#define JHB100_PER2RST_MAIN_RSTN_DMAC_8CH				8
+#define JHB100_PER2RST_MAIN_RSTN_RTC					9
+#define JHB100_PER2RST_ADC0_PRESETN					10
+#define JHB100_PER2RST_ADC0_IOMUX_PRESETN				11
+#define JHB100_PER2RST_ADC1_PRESETN					12
+#define JHB100_PER2RST_ADC1_IOMUX_PRESETN				13
+#define JHB100_PER2RST_MAIN_RSTN_PERIPH2_SENSORS			14
+
 #endif /* __DT_BINDINGS_RESET_STARFIVE_JHB100_CRG_H__ */
-- 
2.25.1


^ permalink raw reply related

* Re: [v3 PATCH] xfrm: ipcomp: Free destination pages on acomp errors
From: Yilin Zhu @ 2026-05-08  5:43 UTC (permalink / raw)
  To: Herbert Xu
  Cc: n05ec, netdev, steffen.klassert, davem, edumazet, kuba, pabeni,
	horms, yuantan098, yifanwucs, tomapufckgml, bird, ronbogo
In-Reply-To: <aftA0Iwi2aX_4ORo@gondor.apana.org.au>

On Wed, 6 May 2026 at 06:24, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Orion Zhu <zylzyl2333@gmail.com> wrote:
> >
> > Thanks, but I think moving the label is not safe for all paths reaching
> > ipcomp_post_acomp().
>
> Thanks for checking!
>
> We could fix this by checking whether req is NULL and whether
> sg_page(dsg) is NULL.
>
> ---8<---
> Move the out_free_req label up by a couple of lines so that the
> allocated dst SG list gets freed on error as well as success.
>
> Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
> Cc: stable@kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> index 5f38dff16177..671d48f8c937 100644
> --- a/net/xfrm/xfrm_ipcomp.c
> +++ b/net/xfrm/xfrm_ipcomp.c
> @@ -51,11 +51,15 @@ static int ipcomp_post_acomp(struct sk_buff *skb, int err, int hlen)
>         struct scatterlist *dsg;
>         int len, dlen;
>
> -       if (unlikely(err))
> -               goto out_free_req;
> +       if (unlikely(!req))
> +               return err;
>
>         extra = acomp_request_extra(req);
>         dsg = extra->sg;
> +
> +       if (unlikely(err))
> +               goto out_free_req;
> +
>         dlen = req->dlen;
>
>         pskb_trim_unique(skb, 0);
> @@ -84,10 +88,10 @@ static int ipcomp_post_acomp(struct sk_buff *skb, int err, int hlen)
>                 skb_shinfo(skb)->nr_frags++;
>         } while ((dlen -= len));
>
> -       for (; dsg; dsg = sg_next(dsg))
> +out_free_req:
> +       for (; dsg && sg_page(dsg); dsg = sg_next(dsg))
>                 __free_page(sg_page(dsg));
>
> -out_free_req:
>         acomp_request_free(req);
>         return err;
>  }
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Hi Herbert,

Thanks for the update.

I think this version looks reasonable to me. Checking req before
dereferencing it and stopping the SG walk at a NULL sg_page() should
cover the setup failure paths I was concerned about, while still freeing
the allocated destination pages on the acomp error path.

Could you please also add my Reported-by tag?

Reported-by: Yilin Zhu <zylzyl2333@gmail.com>

Thanks,
Yilin

^ permalink raw reply

* Re: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
From: 李志 @ 2026-05-08  5:43 UTC (permalink / raw)
  To: Conor Dooley
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
	conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, linux-stm32,
	linux-arm-kernel, ningyu, linmin, pinkesh.vaghela, pritesh.patel,
	weishangjuan
In-Reply-To: <20260507-mural-moocher-ad6e07ef8ae0@spud>




> -----原始邮件-----
> 发件人: "Conor Dooley" <conor@kernel.org>
> 发送时间:2026-05-08 01:24:02 (星期五)
> 收件人: lizhi2@eswincomputing.com
> 抄送: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com, rmk+kernel@armlinux.org.uk, maxime.chevallier@bootlin.com, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, ningyu@eswincomputing.com, linmin@eswincomputing.com, pinkesh.vaghela@einfochips.com, pritesh.patel@einfochips.com, weishangjuan@eswincomputing.com
> 主题: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
> 
> On Thu, May 07, 2026 at 04:31:36PM +0800, lizhi2@eswincomputing.com wrote:
> > From: Zhi Li <lizhi2@eswincomputing.com>
> > 
> > Refine the EIC7700 Ethernet dt-binding based on observed hardware behavior
> > and clarify the original delay model for eth0.
> > 
> > The previous binding used an enum-based definition for
> > rx-internal-delay-ps and tx-internal-delay-ps. Replace it with a
> > range-based model using:
> > 
> >   - minimum: 0
> >   - maximum: 2540
> >   - multipleOf: 20
> > 
> > This better reflects the actual hardware implementation, which
> > supports 20ps granularity delay steps in the MAC RGMII interface.
> > 
> > The tx/rx internal delay values are clarified as MAC-side programmable
> > delay components applied on the RGMII clock/data path, representing
> > the effective delay seen at the MAC interface.
> > 
> > This does not change the intended hardware semantics, but aligns the
> > binding with the actual hardware implementation.
> > 
> > These properties are optional and only required when MAC-side fine
> > tuning is needed; otherwise delay alignment is provided by PHY or
> > board design.
> > 
> > Depending on the selected RGMII timing mode, delay alignment may be
> > provided by the PHY (e.g. rgmii-id) or by board/MAC-side configuration.
> > When PHY or board design already provides the required delay, these
> > MAC-side properties may be omitted. When MAC-side fine tuning is
> > required, they should be provided to describe the internal RGMII
> > timing adjustment.
> > 
> > Additionally, extend the description of the HSP subsystem register
> > layout used by the MAC glue logic. This includes explicit TXD and RXD
> > delay control registers to ensure deterministic initialization and
> > to override any residual configuration potentially left by bootloaders.
> > 
> > Add reference to the EIC7700X SoC Technical Reference Manual,
> > Chapter 10 ("High-Speed Interface"), Part 4 for background of the
> > HSP CSR block:
> > https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual/releases
> > 
> > There are no in-tree users of this binding, so no ABI impact is
> > expected.
> > 
> > Fixes: 888bd0eca93c ("dt-bindings: ethernet: eswin: Document for EIC7700 SoC")
> > Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
> > ---
> 
> While this is v1, it's really v8 and there should therefore be a
> changelog that explains where my ack and the new compatible went.
> 

Thanks for the review.

Based on Jakub's feedback on the previous v7 series, I plan to split the
changes into two separate series:

- a smaller fix series intended for net,
- and a separate eth1 feature series intended for net-next.

After the split, the scope and target trees of the two series will differ
from the original combined series, so I plan to restart the revision
numbering from v1 for both series.

The additional compatible string and the eth1-specific DT binding
extensions will be moved into the separate feature series, and I will
reflect this in the v2 cover letter.

The DT binding changes in this fix series v1 are simply extracted from the
previous v7 series as part of the split.

Since the series has been restructured, I will drop the previous
Acked-by tags.

I will also document the reason for doing so and the impact of the split
in the v2 cover letter.

If you think the binding changes are still effectively unchanged and the
previous Acked-by can still apply, I am happy to retain them or re-apply
them as appropriate. Otherwise I will assume a fresh review is preferred.

Please let me know your preference.

Thanks,
Zhi

^ permalink raw reply

* Re: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
From: 李志 @ 2026-05-08  5:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
	conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, linux-stm32,
	linux-arm-kernel, ningyu, linmin, pinkesh.vaghela, pritesh.patel,
	weishangjuan
In-Reply-To: <2436c6e9-4aad-4ffd-9fef-0cbbe38dc66d@lunn.ch>




> -----原始邮件-----
> 发件人: "Andrew Lunn" <andrew@lunn.ch>
> 发送时间:2026-05-07 20:29:10 (星期四)
> 收件人: lizhi2@eswincomputing.com
> 抄送: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com, rmk+kernel@armlinux.org.uk, maxime.chevallier@bootlin.com, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, ningyu@eswincomputing.com, linmin@eswincomputing.com, pinkesh.vaghela@einfochips.com, pritesh.patel@einfochips.com, weishangjuan@eswincomputing.com
> 主题: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
> 
> >      ethernet@50400000 {
> >          compatible = "eswin,eic7700-qos-eth", "snps,dwmac-5.20";
> >          reg = <0x50400000 0x10000>;
> > -        clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
> > -                <&d0_clock 193>;
> > -        clock-names = "axi", "cfg", "stmmaceth", "tx";
> >          interrupt-parent = <&plic>;
> >          interrupts = <61>;
> >          interrupt-names = "macirq";
> > -        phy-mode = "rgmii-id";
> > -        phy-handle = <&phy0>;
> > +        clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
> > +                <&d0_clock 193>;
> > +        clock-names = "axi", "cfg", "stmmaceth", "tx";
> 
> Please don't move the clocks around, since they have nothing to do
> with RGMII delays.
> 
> 
> >          resets = <&reset 95>;
> >          reset-names = "stmmaceth";
> > -        rx-internal-delay-ps = <200>;
> > -        tx-internal-delay-ps = <200>;
> > -        eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118>;
> > -        snps,axi-config = <&stmmac_axi_setup>;
> > +        eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118 0x114 0x11c>;
> > +        phy-handle = <&phy0>;
> > +        phy-mode = "rgmii-id";
> >          snps,aal;
> >          snps,fixed-burst;
> >          snps,tso;
> > -        stmmac_axi_setup: stmmac-axi-config {
> > +        snps,axi-config = <&stmmac_axi_setup_gmac0>;
> > +
> > +        stmmac_axi_setup_gmac0: stmmac-axi-config {
> 
> And what do these changes have to do with RGMII delays?
> 

You're right, those unrelated example changes should not be mixed into the
fix-related binding update.

I will limit the binding changes to only what is required for the fixes,
such as the additional HSP CSR offsets needed for explicit TXD/RXD delay
register initialization, and drop the unrelated DTS example reordering or
cleanup changes from this series.

^ permalink raw reply

* Re: [PATCH net v2] net: mana: Optimize irq affinity for low vcpu configs
From: Shradha Gupta @ 2026-05-08  5:51 UTC (permalink / raw)
  To: Yury Norov
  Cc: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Konstantin Taranov, Simon Horman, Erni Sri Satya Vennela,
	Dipayaan Roy, Shiraz Saleem, Michael Kelley, Long Li, Yury Norov,
	linux-hyperv, linux-kernel, netdev, Paul Rosswurm, Shradha Gupta,
	Saurabh Singh Sengar, stable
In-Reply-To: <afoQHm28qj8JnKww@yury>

On Tue, May 05, 2026 at 11:43:26AM -0400, Yury Norov wrote:
> On Mon, May 04, 2026 at 11:15:03PM -0700, Shradha Gupta wrote:
> > On Sat, May 02, 2026 at 01:15:36PM -0400, Yury Norov wrote:
> > > On Sat, May 02, 2026 at 07:37:43AM -0700, Shradha Gupta wrote:
> > > > On Fri, May 01, 2026 at 12:22:20PM -0400, Yury Norov wrote:
> > > > > On Wed, Apr 29, 2026 at 02:06:37AM -0700, Shradha Gupta wrote:
> > > > > > In mana driver, the number of IRQs allocated is capped by the
> > > > > > min(num_cpu + 1, queue count). In cases, where the IRQ count is greater
> > > > > > than the vcpu count, we want to utilize all the vCPUs, irrespective of
> > > > > > their NUMA/core bindings.
> > > > > > 
> > > > > > This is important, especially in the envs where number of vCPUs are so
> > > > > > few that the softIRQ handling overhead on two IRQs on the same vCPU is
> > > > > > much more than their overheads if they were spread across sibling vCPUs.
> > > > > > 
> > > > > > This behaviour is more evident with dynamic IRQ allocation. Since MANA
> > > > > > IRQs are assigned at a later stage compared to static allocation, other
> > > > > > device IRQs may already be affinitized to the vCPUs. As a result, IRQ
> > > > > > weights become imbalanced, causing multiple MANA IRQs to land on the
> > > > > > same vCPU, while some vCPUs have none.
> > > > > > 
> > > > > > In such cases when many parallel TCP connections are tested, the
> > > > > > throughput drops significantly.
> > > > > > 
> > > > > > Test envs:
> > > > > > =======================================================
> > > > > > Case 1: without this patch
> > > > > > =======================================================
> > > > > > 4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
> > > > > > 
> > > > > > 	TYPE		effective vCPU aff
> > > > > > =======================================================
> > > > > > IRQ0:	HWC		0
> > > > > > IRQ1:	mana_q1		0
> > > > > > IRQ2:	mana_q2		2
> > > > > > IRQ3:	mana_q3		0
> > > > > > IRQ4:	mana_q4		3
> > > > > > 
> > > > > > %soft on each vCPU(mpstat -P ALL 1) on receiver
> > > > > > vCPU		0	1	2	3
> > > > > > =======================================================
> > > > > > pass 1:		38.85	0.03	24.89	24.65
> > > > > > pass 2:		39.15	0.03	24.57	25.28
> > > > > > pass 3:		40.36	0.03	23.20	23.17
> > > > > > 
> > > > > > =======================================================
> > > > > > Case 2: with this patch
> > > > > > =======================================================
> > > > > > 4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
> > > > > > 
> > > > > >         TYPE            effective vCPU aff
> > > > > > =======================================================
> > > > > > IRQ0:   HWC             0
> > > > > > IRQ1:   mana_q1         0
> > > > > > IRQ2:   mana_q2         1
> > > > > > IRQ3:   mana_q3         2
> > > > > > IRQ4:   mana_q4         3
> > > > > > 
> > > > > > %soft on each vCPU(mpstat -P ALL 1) on receiver
> > > > > > vCPU            0       1       2       3
> > > > > > =======================================================
> > > > > > pass 1:         15.42	15.85	14.99	14.51
> > > > > > pass 2:         15.53	15.94	15.81	15.93
> > > > > > pass 3:         16.41	16.35	16.40	16.36
> > > > > > 
> > > > > > =======================================================
> > > > > > Throughput Impact(in Gbps, same env)
> > > > > > =======================================================
> > > > > > TCP conn	with patch	w/o patch
> > > > > > 20480		15.65		7.73
> > > > > > 10240		15.63		8.93
> > > > > > 8192		15.64		9.69
> > > > > > 6144		15.64		13.16
> > > > > > 4096		15.69		15.75
> > > > > > 2048		15.69		15.83
> > > > > > 1024		15.71		15.28
> > > > > > 
> > > > > > Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Co-developed-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> > > > > > Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> > > > > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> > > > > > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > > > > > ---
> > > > > > Changes in v2
> > > > > >  * Removed the unused skip_first_cpu variable
> > > > > >  * fixed exit condition in irq_setup_linear() with len == 0
> > > > > >  * changed return type of irq_setup_linear() as it will always be 0
> > > > > >  * removed the unnecessary rcu_read_lock() in irq_setup_linear()
> > > > > >  * added appropriate comments to indicate expected behaviour when
> > > > > >    IRQs are more than or equal to num_online_cpus()
> > > > > > ---
> > > > > >  .../net/ethernet/microsoft/mana/gdma_main.c   | 47 ++++++++++++++++---
> > > > > >  1 file changed, 40 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > > index 098fbda0d128..d740d1dc43da 100644
> > > > > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > > @@ -167,6 +167,8 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> > > > > >  	} else {
> > > > > >  		/* If dynamic allocation is enabled we have already allocated
> > > > > >  		 * hwc msi
> > > > > > +		 * Also, we make sure in this case the following is always true
> > > > > > +		 * (num_msix_usable - 1 HWC) <= num_online_cpus()
> > > > > >  		 */
> > > > > >  		gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
> > > > > >  	}
> > > > > > @@ -1672,11 +1674,24 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node,
> > > > > >  	return 0;
> > > > > >  }
> > > > > >  
> > > > > > +/* should be called with cpus_read_lock() held */
> > > > > > +static void irq_setup_linear(unsigned int *irqs, unsigned int len)
> > > > > > +{
> > > > > > +	int cpu;
> > > > > > +
> > > > > > +	for_each_online_cpu(cpu) {
> > > > > > +		if (len == 0)
> > > > > > +			break;
> > > > > > +
> > > > > > +		irq_set_affinity_and_hint(*irqs++, cpumask_of(cpu));
> > > > > > +		len--;
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > >  static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> > > > > >  {
> > > > > >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> > > > > >  	struct gdma_irq_context *gic;
> > > > > > -	bool skip_first_cpu = false;
> > > > > >  	int *irqs, irq, err, i;
> > > > > >  
> > > > > >  	irqs = kmalloc_objs(int, nvec);
> > > > > 
> > > > > So what about WARN_ON() and nvec adjustment before kmalloc?
> > > > Hey Yury,
> > > > 
> > > > I am still a bit unsure about the WARN_ON() before kmalloc, as after
> > > > that also, in the same function till we take the cpus_read_lock() the
> > > > num_online_cpus() can change(or reduce). That's why I introduced the
> > > > dev_dbg() to capture hot-remove edge case.
> > > 
> > > OK.
> > >  
> > > > Do you still think it adds more value?
> > > 
> > > It's your driver, so you know better. I just wonder because you said
> > > it's good to add WARN_ON(), and then didn't do that.
> > > 
> > > > > 
> > > > > > @@ -1722,13 +1737,31 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> > > > > >  	 * first CPU sibling group since they are already affinitized to HWC IRQ
> > > > > >  	 */
> > > > > >  	cpus_read_lock();
> > > > > > -	if (gc->num_msix_usable <= num_online_cpus())
> > > > > > -		skip_first_cpu = true;
> > > > > > +	if (gc->num_msix_usable <= num_online_cpus()) {
> > > > > > +		err = irq_setup(irqs, nvec, gc->numa_node, true);
> > > > > > +		if (err) {
> > > > > > +			cpus_read_unlock();
> > > > > > +			goto free_irq;
> > > > > 
> > > > > One thing puzzles me: if you skip first CPU with this 'true', and the
> > > > > gc->num_msix_usable == num_online_cpus(), it's one more than you can
> > > > > distribute. What do I miss?
> > > > > 
> > > > 
> > > > Let me explain this case a bit better then,
> > > > 
> > > > - num_msix_usable = HWC IRQ + Queue IRQ
> > > > - nvec in this functions is only Queue IRQ (HWC already setup)
> > > > 
> > > > When num_online_cpus == num_msix_usable:
> > > > - nvec = num_online_cpus - 1
> > > > - first CPU is already assigned to HWC IRQ, so skip it
> > > > - Queue IRQs fit in the remaining CPUs
> > > > 
> > > > please let me know if I did not get your question right
> > > 
> > > Can you put that in a comment?
> > 
> > Sure I will. thanks
> > 
> > > 
> > > > > > +		}
> > > > > > +	} else {
> > > > > > +		/*
> > > > > > +		 * When num_msix_usable are more than num_online_cpus, we try to
> > > > > > +		 * make sure we are using all vcpus. In such a case NUMA or
> > > > > > +		 * CPU core affinity does not matter.
> > > > > 
> > > > > If it doesn't matter, why don't you assign each IRQ to all CPUs then?
> > > > > In theory, the system would have most of flexibility to balance them.
> > > > > 
> > > > 
> > > > Okay, let me fix the comment and elaborate on this. It doesn't matter
> > > > because in such a case we want to anyway exhaust and distribute the
> > > > Queue IRQs to all vCPUs.
> > > > We don't want to rely on the system's balancer in this case as it could
> > > > be skewed by other devices' IRQ weights
> > > 
> > > I don't understand this. If I want to reserve some CPUs to solely
> > > handle IRQs from my high-priority hardware, then I configure my system
> > > accordingly. For example, assign all non-networking IRQs on CPU0, and
> > > all networking IRQs to all CPUs.
> > > 
> > > In your case, you distribute IRQs evenly, which means you've no
> > > preferred CPUs. So, assuming the system is only running your IRQ
> > > driver, it's at max is as good as all-CPU distribution. In case of
> > > heavy loading some particular CPU, your scheme could cause
> > > corresponding IRQs to starve.
> > > 
> > > I recall, when we was working on irq_setup(), the original idea was to
> > > distribute IRQs one-to-one, but than I suggested the 
> > > 
> > >         irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> > > 
> > > and after experiments, you agreed on that.
> > > 
> > > Can you please run your throughput test for my suggested distribution
> > > too? Would be also nice to see how each distribution works when some
> > > CPUs are under stress.
> > > 
> > > Thanks,
> > > Yury
> > 
> > The design of irq_setup() works exactly how we want it for our IRQs for
> > almost all of our usecases, so we want to keep that as is. The only
> > scenarios where this is an issue in terms of significant throughput drop
> > is when we are working with low vCPU VMs (vCPU <= 4 with high TCP
> > connection counts) and where there are additional NVMe devices attached
> > to the VM.
> > 
> > The current patch about utilizing all the vCPUs helps in that case and
> > doesn't cause any regression for other cases.
> > 
> > This linear path is only taken when num_msix_usable > num_online_cpus(),
> > which is limited to low-vCPU VMs. Larger VMs continue using irq_setup()
> > as before.
> > 
> > We can definately get our throughput run results on other suggestions
> > you have. And about that, I just needed a bit more clarity on what to
> > test against. Are you suggesting, with irq_setup() intact and in use, we
> > configure the non-mana IRQs to say CPU0 and capture the numbers?
> 
> Can you try this:
> 
>        while(len--)
>                // Or cpu_online_mask or cpu_all_mask?
>                irq_set_affinity_and_hint(*irqs++, NULL);
> 
> And compare it to the linear version under your vCPU scenario?
> 
> Can you run your throughput test alone and on parallel with some
> IRQ torture test?
> 
>         stress-ng --timer 4 --timeout 60s
> 
> And maybe pin the stress test to the default CPU. Assuming it's 0:
> 
>         taskset -c 0 stress-ng --timer 4 --timeout 60s
> 
> Unless the 'linear' version is significantly faster, I'd stick to the
> above.
> 
> Thanks,
> Yury

Hey Yury,

We tried a few tests with your suggestion, and throughput seems to be
the same compared to the linear distribution approach. We stressed out
CPU0 in both the cases and the results were similar. No IRQ migration
was observed in either case and no throughput drop.
 
But one observation I had was that " irq_set_affinity_and_hint(*irqs++,
NULL);" is essentially a no-op and we end up relying on the initial
placement from pci_alloc_irq_vectors(). Even though in these tests we
were not able to reproduce it, but with this distribution there is a
chance we end up clustering the mana queue IRQs, while other vCPUs are
not running any network load. It's because the placement depends on
system-wide IRQ state at allocation time.
 
The linear approach however gaurantees each queue IRQ lands on a
distinct vCPU regardless of system state. Even after stressing the cpus
using stress-ng, we did not observe any significant throughput drop.


regards,
Shradha.

^ permalink raw reply

* [PATCH v2 11/22] dt-bindings: clock: Add StarFive JHB100 Peripheral-0 clock and reset generator
From: Changhuang Liang @ 2026-05-08  5:36 UTC (permalink / raw)
  To: Michael Turquette, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Philipp Zabel, Emil Renner Berthing, Kees Cook,
	Gustavo A . R . Silva, Richard Cochran
  Cc: linux-clk, linux-kernel, devicetree, linux-riscv, linux-hardening,
	netdev, Sia Jee Heng, Hal Feng, Ley Foon Tan, Changhuang Liang
In-Reply-To: <20260508053632.818548-1-changhuang.liang@starfivetech.com>

Add bindings for the Peripheral-0 clock and reset generator (PER0CRG)
on the JHB100 RISC-V SoC by StarFive Ltd.

Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
---
 .../clock/starfive,jhb100-per0crg.yaml        |  70 +++++
 .../dt-bindings/clock/starfive,jhb100-crg.h   | 281 ++++++++++++++++++
 .../dt-bindings/reset/starfive,jhb100-crg.h   |  77 +++++
 3 files changed, 428 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml

diff --git a/Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml b/Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml
new file mode 100644
index 000000000000..d3d426a741cc
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/starfive,jhb100-per0crg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: StarFive JHB100 Peripheral-0 Clock and Reset Generator
+
+maintainers:
+  - Changhuang Liang <changhuang.liang@starfivetech.com>
+
+properties:
+  compatible:
+    const: starfive,jhb100-per0crg
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    items:
+      - description: Main Oscillator (25 MHz)
+      - description: PLL6
+      - description: Configure 400MHz
+      - description: Configure 800MHZ
+      - description: Non Coherent NOC Initiator
+      - description: Non Coherent NOC Target
+
+  clock-names:
+    items:
+      - const: osc
+      - const: pll6
+      - const: cfg_400
+      - const: cfg_800
+      - const: ncnoc_init
+      - const: ncnoc_targ
+
+  '#clock-cells':
+    const: 1
+    description:
+      See <dt-bindings/clock/starfive,jhb100-crg.h> for valid indices.
+
+  '#reset-cells':
+    const: 1
+    description:
+      See <dt-bindings/reset/starfive-jhb100-crg.h> for valid indices.
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - clock-names
+  - '#clock-cells'
+  - '#reset-cells'
+
+additionalProperties: false
+
+examples:
+  - |
+    clock-controller@11a08000 {
+      compatible = "starfive,jhb100-per0crg";
+      reg = <0x11a08000 0x1000>;
+      clocks = <&osc>, <&pll6>, <&sys0crg 71>,
+               <&sys0crg 72>, <&sys0crg 70>,
+               <&sys2crg 23>;
+      clock-names = "osc", "pll6", "cfg_400",
+                    "cfg_800", "ncnoc_init",
+                    "ncnoc_targ";
+      #clock-cells = <1>;
+      #reset-cells = <1>;
+    };
diff --git a/include/dt-bindings/clock/starfive,jhb100-crg.h b/include/dt-bindings/clock/starfive,jhb100-crg.h
index d19618e2a846..add2cd093dbd 100644
--- a/include/dt-bindings/clock/starfive,jhb100-crg.h
+++ b/include/dt-bindings/clock/starfive,jhb100-crg.h
@@ -106,4 +106,285 @@
 #define JHB100_SYS2CLK_MAIN_ICG_EN_JTAG0		32
 #define JHB100_SYS2CLK_MAIN_ICG_EN_JTAG1		33
 
+/* PER0CRG clocks */
+#define JHB100_PER0CLK_CDR_I3C0				0
+#define JHB100_PER0CLK_CDR_I3C1				1
+#define JHB100_PER0CLK_CDR_I3C2				2
+#define JHB100_PER0CLK_CDR_I3C3				3
+#define JHB100_PER0CLK_CDR_I3C4				4
+#define JHB100_PER0CLK_CDR_I3C5				5
+#define JHB100_PER0CLK_CDR_I3C6				6
+#define JHB100_PER0CLK_CDR_I3C7				7
+#define JHB100_PER0CLK_CDR_I3C8				8
+#define JHB100_PER0CLK_CDR_I3C9				9
+#define JHB100_PER0CLK_CDR_I3C10			10
+#define JHB100_PER0CLK_CDR_I3C11			11
+#define JHB100_PER0CLK_CDR_I3C12			12
+#define JHB100_PER0CLK_CDR_I3C13			13
+#define JHB100_PER0CLK_CDR_I3C14			14
+#define JHB100_PER0CLK_CDR_I3C15			15
+#define JHB100_PER0CLK_200				16
+#define JHB100_PER0CLK_600_DIV6				17
+#define JHB100_PER0CLK_600_DIV6_DIV5			18
+#define JHB100_PER0CLK_TIMER0_DUALTIMER0		19
+#define JHB100_PER0CLK_TIMER1_DUALTIMER0		20
+#define JHB100_PER0CLK_TIMER0_DUALTIMER1		21
+#define JHB100_PER0CLK_TIMER1_DUALTIMER1		22
+#define JHB100_PER0CLK_TIMER0_DUALTIMER2		23
+#define JHB100_PER0CLK_TIMER1_DUALTIMER2		24
+#define JHB100_PER0CLK_1200_PH0_LVDS0			25
+#define JHB100_PER0CLK_1200_PH0_LVDS1			26
+#define JHB100_PER0CLK_1200_CORE0			27
+#define JHB100_PER0CLK_1200_CORE1			28
+#define JHB100_PER0CLK_1200_SHIFT90_LVDS0		29
+#define JHB100_PER0CLK_1200_SHIFT90_LVDS1		30
+#define JHB100_PER0CLK_1200_DIV5_CORE0			31
+#define JHB100_PER0CLK_1200_DIV5_CORE1			32
+#define JHB100_PER0CLK_PH0_LTPI0			33
+
+#define JHB100_PER0CLK_PH0_LTPI1			35
+
+#define JHB100_PER0CLK_PH90_LTPI0			37
+
+#define JHB100_PER0CLK_PH90_LTPI1			39
+
+#define JHB100_PER0CLK_240_CORE_LTPI0			41
+
+#define JHB100_PER0CLK_240_CORE_LTPI1			43
+
+#define JHB100_PER0CLK_AXI_DMA_I2C_INIT			45
+#define JHB100_PER0CLK_AXI_DMA_I3C_INIT			46
+#define JHB100_PER0CLK_AXI_DMA_UART_INIT		47
+#define JHB100_PER0CLK_CORE_DMAC0			48
+#define JHB100_PER0CLK_CORE_DMAC1			49
+#define JHB100_PER0CLK_CORE_DMAC2			50
+
+#define JHB100_PER0CLK_HDR_TX_I3C0			78
+#define JHB100_PER0CLK_HDR_TX_I3C1			79
+#define JHB100_PER0CLK_HDR_TX_I3C2			80
+#define JHB100_PER0CLK_HDR_TX_I3C3			81
+#define JHB100_PER0CLK_HDR_TX_I3C4			82
+#define JHB100_PER0CLK_HDR_TX_I3C5			83
+#define JHB100_PER0CLK_HDR_TX_I3C6			84
+#define JHB100_PER0CLK_HDR_TX_I3C7			85
+#define JHB100_PER0CLK_HDR_TX_I3C8			86
+#define JHB100_PER0CLK_HDR_TX_I3C9			87
+#define JHB100_PER0CLK_HDR_TX_I3C10			88
+#define JHB100_PER0CLK_HDR_TX_I3C11			89
+#define JHB100_PER0CLK_HDR_TX_I3C12			90
+#define JHB100_PER0CLK_HDR_TX_I3C13			91
+#define JHB100_PER0CLK_HDR_TX_I3C14			92
+#define JHB100_PER0CLK_HDR_TX_I3C15			93
+#define JHB100_PER0CLK_CORE_I2C0			94
+#define JHB100_PER0CLK_CORE_I2C1			95
+#define JHB100_PER0CLK_CORE_I2C2			96
+#define JHB100_PER0CLK_CORE_I2C3			97
+#define JHB100_PER0CLK_CORE_I2C4			98
+#define JHB100_PER0CLK_CORE_I2C5			99
+#define JHB100_PER0CLK_CORE_I2C6			100
+#define JHB100_PER0CLK_CORE_I2C7			101
+#define JHB100_PER0CLK_CORE_I2C8			102
+#define JHB100_PER0CLK_CORE_I2C9			103
+#define JHB100_PER0CLK_CORE_I2C10			104
+#define JHB100_PER0CLK_CORE_I2C11			105
+#define JHB100_PER0CLK_CORE_I2C12			106
+#define JHB100_PER0CLK_CORE_I2C13			107
+#define JHB100_PER0CLK_CORE_I2C14			108
+#define JHB100_PER0CLK_CORE_I2C15			109
+
+#define JHB100_PER0CLK_WDOGCLK_WDT0			126
+#define JHB100_PER0CLK_WDOGCLK_WDT1			127
+#define JHB100_PER0CLK_WDOGCLK_WDT2			128
+#define JHB100_PER0CLK_WDOGCLK_WDT3			129
+#define JHB100_PER0CLK_WDOGCLK_WDT_EXTERNAL		130
+#define JHB100_PER0CLK_SCLK_UART4			131
+#define JHB100_PER0CLK_SCLK_UART5			132
+#define JHB100_PER0CLK_SCLK_UART6			133
+#define JHB100_PER0CLK_SCLK_UART7			134
+#define JHB100_PER0CLK_SCLK_UART8			135
+#define JHB100_PER0CLK_SCLK_UART9			136
+#define JHB100_PER0CLK_SCLK_UART10			137
+#define JHB100_PER0CLK_SCLK_UART11			138
+#define JHB100_PER0CLK_SCLK_UART12			139
+#define JHB100_PER0CLK_SCLK_UART13			140
+#define JHB100_PER0CLK_SCLK_UART14			141
+
+#define JHB100_PER0CLK_PCLK_DMA_UART_CFG		148
+#define JHB100_PER0CLK_PCLK_DMA_I2C_CFG			149
+#define JHB100_PER0CLK_PCLK_DMA_I3C_CFG			150
+#define JHB100_PER0CLK_PCLK_DUALTIMER0			151
+#define JHB100_PER0CLK_PCLK_DUALTIMER1			152
+#define JHB100_PER0CLK_PCLK_DUALTIMER2			153
+
+#define JHB100_PER0CLK_HCLK_TRNG			156
+#define JHB100_PER0CLK_APB_I2C0				157
+#define JHB100_PER0CLK_APB_I2C1				158
+#define JHB100_PER0CLK_APB_I2C2				159
+#define JHB100_PER0CLK_APB_I2C3				160
+#define JHB100_PER0CLK_APB_I2C4				161
+#define JHB100_PER0CLK_APB_I2C5				162
+#define JHB100_PER0CLK_APB_I2C6				163
+#define JHB100_PER0CLK_APB_I2C7				164
+#define JHB100_PER0CLK_APB_I2C8				165
+#define JHB100_PER0CLK_APB_I2C9				166
+#define JHB100_PER0CLK_APB_I2C10			167
+#define JHB100_PER0CLK_APB_I2C11			168
+#define JHB100_PER0CLK_APB_I2C12			169
+#define JHB100_PER0CLK_APB_I2C13			170
+#define JHB100_PER0CLK_APB_I2C14			171
+#define JHB100_PER0CLK_APB_I2C15			172
+#define JHB100_PER0CLK_APB_I2CF0			173
+#define JHB100_PER0CLK_APB_I2CF1			174
+#define JHB100_PER0CLK_APB_I2CF2			175
+#define JHB100_PER0CLK_APB_I2CF3			176
+#define JHB100_PER0CLK_APB_I2CF4			177
+#define JHB100_PER0CLK_APB_I2CF5			178
+#define JHB100_PER0CLK_APB_I2CF6			179
+#define JHB100_PER0CLK_APB_I2CF7			180
+#define JHB100_PER0CLK_APB_I2CF8			181
+#define JHB100_PER0CLK_APB_I2CF9			182
+#define JHB100_PER0CLK_APB_I2CF10			183
+#define JHB100_PER0CLK_APB_I2CF11			184
+#define JHB100_PER0CLK_APB_I2CF12			185
+#define JHB100_PER0CLK_APB_I2CF13			186
+#define JHB100_PER0CLK_APB_I2CF14			187
+#define JHB100_PER0CLK_APB_I2CF15			188
+#define JHB100_PER0CLK_APB_I3C0				189
+#define JHB100_PER0CLK_APB_I3C1				190
+#define JHB100_PER0CLK_APB_I3C2				191
+#define JHB100_PER0CLK_APB_I3C3				192
+#define JHB100_PER0CLK_APB_I3C4				193
+#define JHB100_PER0CLK_APB_I3C5				194
+#define JHB100_PER0CLK_APB_I3C6				195
+#define JHB100_PER0CLK_APB_I3C7				196
+#define JHB100_PER0CLK_APB_I3C8				197
+#define JHB100_PER0CLK_APB_I3C9				198
+#define JHB100_PER0CLK_APB_I3C10			199
+#define JHB100_PER0CLK_APB_I3C11			200
+#define JHB100_PER0CLK_APB_I3C12			201
+#define JHB100_PER0CLK_APB_I3C13			202
+#define JHB100_PER0CLK_APB_I3C14			203
+#define JHB100_PER0CLK_APB_I3C15			204
+#define JHB100_PER0CLK_APB_UART0			205
+#define JHB100_PER0CLK_APB_UART1			206
+#define JHB100_PER0CLK_APB_UART2			207
+#define JHB100_PER0CLK_APB_UART3			208
+#define JHB100_PER0CLK_APB_UART4			209
+#define JHB100_PER0CLK_APB_UART5			210
+#define JHB100_PER0CLK_APB_UART6			211
+#define JHB100_PER0CLK_APB_UART7			212
+#define JHB100_PER0CLK_APB_UART8			213
+#define JHB100_PER0CLK_APB_UART9			214
+#define JHB100_PER0CLK_APB_UART10			215
+#define JHB100_PER0CLK_APB_UART11			216
+#define JHB100_PER0CLK_APB_UART12			217
+#define JHB100_PER0CLK_APB_UART13			218
+#define JHB100_PER0CLK_APB_UART14			219
+#define JHB100_PER0CLK_DMA_I3C0				220
+#define JHB100_PER0CLK_DMA_I3C1				221
+#define JHB100_PER0CLK_DMA_I3C2				222
+#define JHB100_PER0CLK_DMA_I3C3				223
+#define JHB100_PER0CLK_DMA_I3C4				224
+#define JHB100_PER0CLK_DMA_I3C5				225
+#define JHB100_PER0CLK_DMA_I3C6				226
+#define JHB100_PER0CLK_DMA_I3C7				227
+#define JHB100_PER0CLK_DMA_I3C8				228
+#define JHB100_PER0CLK_DMA_I3C9				229
+#define JHB100_PER0CLK_DMA_I3C10			230
+#define JHB100_PER0CLK_DMA_I3C11			231
+#define JHB100_PER0CLK_DMA_I3C12			232
+#define JHB100_PER0CLK_DMA_I3C13			233
+#define JHB100_PER0CLK_DMA_I3C14			234
+#define JHB100_PER0CLK_DMA_I3C15			235
+#define JHB100_PER0CLK_CORE_I3C0			236
+#define JHB100_PER0CLK_CORE_I3C1			237
+#define JHB100_PER0CLK_CORE_I3C2			238
+#define JHB100_PER0CLK_CORE_I3C3			239
+#define JHB100_PER0CLK_CORE_I3C4			240
+#define JHB100_PER0CLK_CORE_I3C5			241
+#define JHB100_PER0CLK_CORE_I3C6			242
+#define JHB100_PER0CLK_CORE_I3C7			243
+#define JHB100_PER0CLK_CORE_I3C8			244
+#define JHB100_PER0CLK_CORE_I3C9			245
+#define JHB100_PER0CLK_CORE_I3C10			246
+#define JHB100_PER0CLK_CORE_I3C11			247
+#define JHB100_PER0CLK_CORE_I3C12			248
+#define JHB100_PER0CLK_CORE_I3C13			249
+#define JHB100_PER0CLK_CORE_I3C14			250
+#define JHB100_PER0CLK_CORE_I3C15			251
+#define JHB100_PER0CLK_DMAC_AXI_PERIPH0_HS_CLK_I2C	252
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C0			253
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C1			254
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C2			255
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C3			256
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C4			257
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C5			258
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C6			259
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C7			260
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C8			261
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C9			262
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C10		263
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C11		264
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C12		265
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C13		266
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C14		267
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C15		268
+#define JHB100_PER0CLK_MAIN_ICG_EN_DUALTIMER0		269
+#define JHB100_PER0CLK_MAIN_ICG_EN_DUALTIMER1		270
+#define JHB100_PER0CLK_MAIN_ICG_EN_DUALTIMER2		271
+#define JHB100_PER0CLK_MAIN_ICG_EN_LTPI0		272
+#define JHB100_PER0CLK_MAIN_ICG_EN_LTPI1		273
+#define JHB100_PER0CLK_MAIN_ICG_EN_DMAC_I2C		274
+#define JHB100_PER0CLK_MAIN_ICG_EN_DMAC_I3C		275
+#define JHB100_PER0CLK_MAIN_ICG_EN_DMAC_UART		276
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL4			277
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL5			278
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL6			279
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL7			280
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL8			281
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL9			282
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL10		283
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL11		284
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL12		285
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL13		286
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL14		287
+
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C0			304
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C1			305
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C2			306
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C3			307
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C4			308
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C5			309
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C6			310
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C7			311
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C8			312
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C9			313
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C10		314
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C11		315
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C12		316
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C13		317
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C14		318
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C15		319
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT0			320
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT1			321
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT2			322
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT3			323
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT_EXTERNAL		324
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART4		325
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART5		326
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART6		327
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART7		328
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART8		329
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART9		330
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART10		331
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART11		332
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART12		333
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART13		334
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART14		335
+#define JHB100_PER0CLK_MAIN_ICG_EN_LDO0			336
+#define JHB100_PER0CLK_MAIN_ICG_EN_LDO1			337
+#define JHB100_PER0CLK_MAIN_ICG_EN_SENSORS_PERIPH0	338
+#define JHB100_PER0CLK_MAIN_ICG_EN_SENSORS_DMAC		339
+#define JHB100_PER0CLK_MAIN_ICG_EN_TRNG			340
+
 #endif /* __DT_BINDINGS_CLOCK_STARFIVE_JHB100_H__ */
diff --git a/include/dt-bindings/reset/starfive,jhb100-crg.h b/include/dt-bindings/reset/starfive,jhb100-crg.h
index fbc55f95e76c..ccfb7616e1a7 100644
--- a/include/dt-bindings/reset/starfive,jhb100-crg.h
+++ b/include/dt-bindings/reset/starfive,jhb100-crg.h
@@ -61,4 +61,81 @@
 #define JHB100_SYS2RST_GPU1_RSTN_BUS					21
 #define JHB100_SYS2RST_GPU1_HOST_PCIE_RST_N				22
 
+/* PER0CRG resets */
+#define JHB100_PER0RST_MAIN_RSTN_UART4					0
+#define JHB100_PER0RST_MAIN_RSTN_UART5					1
+#define JHB100_PER0RST_MAIN_RSTN_UART6					2
+#define JHB100_PER0RST_MAIN_RSTN_UART7					3
+#define JHB100_PER0RST_MAIN_RSTN_UART8					4
+#define JHB100_PER0RST_MAIN_RSTN_UART9					5
+#define JHB100_PER0RST_MAIN_RSTN_UART10					6
+#define JHB100_PER0RST_MAIN_RSTN_UART11					7
+#define JHB100_PER0RST_MAIN_RSTN_UART12					8
+#define JHB100_PER0RST_MAIN_RSTN_UART13					9
+#define JHB100_PER0RST_MAIN_RSTN_UART14					10
+#define JHB100_PER0RST_MAIN_RSTN_I2C0					11
+#define JHB100_PER0RST_MAIN_RSTN_I2C1					12
+#define JHB100_PER0RST_MAIN_RSTN_I2C2					13
+#define JHB100_PER0RST_MAIN_RSTN_I2C3					14
+#define JHB100_PER0RST_MAIN_RSTN_I2C4					15
+#define JHB100_PER0RST_MAIN_RSTN_I2C5					16
+#define JHB100_PER0RST_MAIN_RSTN_I2C6					17
+#define JHB100_PER0RST_MAIN_RSTN_I2C7					18
+#define JHB100_PER0RST_MAIN_RSTN_I2C8					19
+#define JHB100_PER0RST_MAIN_RSTN_I2C9					20
+#define JHB100_PER0RST_MAIN_RSTN_I2C10					21
+#define JHB100_PER0RST_MAIN_RSTN_I2C11					22
+#define JHB100_PER0RST_MAIN_RSTN_I2C12					23
+#define JHB100_PER0RST_MAIN_RSTN_I2C13					24
+#define JHB100_PER0RST_MAIN_RSTN_I2C14					25
+#define JHB100_PER0RST_MAIN_RSTN_I2C15					26
+#define JHB100_PER0RST_MAIN_RSTN_I3C0					27
+#define JHB100_PER0RST_MAIN_RSTN_I3C1					28
+#define JHB100_PER0RST_MAIN_RSTN_I3C2					29
+#define JHB100_PER0RST_MAIN_RSTN_I3C3					30
+#define JHB100_PER0RST_MAIN_RSTN_I3C4					31
+#define JHB100_PER0RST_MAIN_RSTN_I3C5					32
+#define JHB100_PER0RST_MAIN_RSTN_I3C6					33
+#define JHB100_PER0RST_MAIN_RSTN_I3C7					34
+#define JHB100_PER0RST_MAIN_RSTN_I3C8					35
+#define JHB100_PER0RST_MAIN_RSTN_I3C9					36
+#define JHB100_PER0RST_MAIN_RSTN_I3C10					37
+#define JHB100_PER0RST_MAIN_RSTN_I3C11					38
+#define JHB100_PER0RST_MAIN_RSTN_I3C12					39
+#define JHB100_PER0RST_MAIN_RSTN_I3C13					40
+#define JHB100_PER0RST_MAIN_RSTN_I3C14					41
+#define JHB100_PER0RST_MAIN_RSTN_I3C15					42
+#define JHB100_PER0RST_MAIN_RSTN_WDT0					43
+#define JHB100_PER0RST_MAIN_RSTN_WDT1					44
+#define JHB100_PER0RST_MAIN_RSTN_WDT2					45
+#define JHB100_PER0RST_MAIN_RSTN_WDT3					46
+#define JHB100_PER0RST_MAIN_RSTN_WDT4					47
+#define JHB100_PER0RST_MAIN_RSTN_DUALTIMER0				48
+#define JHB100_PER0RST_MAIN_RSTN_DUALTIMER1				49
+#define JHB100_PER0RST_MAIN_RSTN_DUALTIMER2				50
+#define JHB100_PER0RST_MAIN_RSTN_TRNG					51
+#define JHB100_PER0RST_MAIN_RSTN_DMAC0					52
+#define JHB100_PER0RST_MAIN_RSTN_DMAC1					53
+#define JHB100_PER0RST_MAIN_RSTN_DMAC2					54
+#define JHB100_PER0RST_MAIN_RSTN_LTPI0					55
+#define JHB100_PER0RST_MAIN_RSTN_LTPI1					56
+#define JHB100_PER0RST_MAIN_RSTN_SOL4					57
+#define JHB100_PER0RST_MAIN_RSTN_SOL5					58
+#define JHB100_PER0RST_MAIN_RSTN_SOL6					59
+#define JHB100_PER0RST_MAIN_RSTN_SOL7					60
+#define JHB100_PER0RST_MAIN_RSTN_SOL8					61
+#define JHB100_PER0RST_MAIN_RSTN_SOL9					62
+#define JHB100_PER0RST_MAIN_RSTN_SOL10					63
+#define JHB100_PER0RST_MAIN_RSTN_SOL11					64
+#define JHB100_PER0RST_MAIN_RSTN_SOL12					65
+#define JHB100_PER0RST_MAIN_RSTN_SOL13					66
+#define JHB100_PER0RST_MAIN_RSTN_SOL14					67
+#define JHB100_PER0RST_MAIN_RSTN_LDO0					68
+#define JHB100_PER0RST_MAIN_RSTN_LDO1					69
+#define JHB100_PER0RST_MAIN_RSTN_PERIPH0_SENSORS			70
+#define JHB100_PER0RST_MAIN_RSTN_DMAC0_SENSORS				71
+#define JHB100_PER0RST_SYSCON_PRESETN					72
+#define JHB100_PER0RST_GPIO_IOMUX_PRESETN				73
+#define JHB100_PER0RST_UART_MUX_REG_WRAP				74
+
 #endif /* __DT_BINDINGS_RESET_STARFIVE_JHB100_CRG_H__ */
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 10/22] clk: starfive: Add JHB100 System-2 clock generator driver
From: Changhuang Liang @ 2026-05-08  5:36 UTC (permalink / raw)
  To: Michael Turquette, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Philipp Zabel, Emil Renner Berthing, Kees Cook,
	Gustavo A . R . Silva, Richard Cochran
  Cc: linux-clk, linux-kernel, devicetree, linux-riscv, linux-hardening,
	netdev, Sia Jee Heng, Hal Feng, Ley Foon Tan, Changhuang Liang
In-Reply-To: <20260508053632.818548-1-changhuang.liang@starfivetech.com>

Add support for JHB100 System-2 clock generator (SYS2CRG).

Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
---
 drivers/clk/starfive/Kconfig                  |   8 ++
 drivers/clk/starfive/Makefile                 |   1 +
 .../clk/starfive/clk-starfive-jhb100-sys2.c   | 128 ++++++++++++++++++
 3 files changed, 137 insertions(+)
 create mode 100644 drivers/clk/starfive/clk-starfive-jhb100-sys2.c

diff --git a/drivers/clk/starfive/Kconfig b/drivers/clk/starfive/Kconfig
index b6042bcb5992..729bdfce7b8a 100644
--- a/drivers/clk/starfive/Kconfig
+++ b/drivers/clk/starfive/Kconfig
@@ -91,3 +91,11 @@ config CLK_STARFIVE_JHB100_SYS1
 	help
 	  Say yes here to support the system-1 clock controller on the
 	  StarFive JHB100 SoC.
+
+config CLK_STARFIVE_JHB100_SYS2
+	bool "StarFive JHB100 system-2 clock support"
+	depends on CLK_STARFIVE_JHB100_SYS0
+	default ARCH_STARFIVE
+	help
+	  Say yes here to support the system-2 clock controller on the
+	  StarFive JHB100 SoC.
diff --git a/drivers/clk/starfive/Makefile b/drivers/clk/starfive/Makefile
index b3571e2f0555..90b6390296bd 100644
--- a/drivers/clk/starfive/Makefile
+++ b/drivers/clk/starfive/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_CLK_STARFIVE_JH7110_VOUT)	+= clk-starfive-jh7110-vout.o
 
 obj-$(CONFIG_CLK_STARFIVE_JHB100_SYS0)		+= clk-starfive-jhb100-sys0.o
 obj-$(CONFIG_CLK_STARFIVE_JHB100_SYS1)		+= clk-starfive-jhb100-sys1.o
+obj-$(CONFIG_CLK_STARFIVE_JHB100_SYS2)		+= clk-starfive-jhb100-sys2.o
diff --git a/drivers/clk/starfive/clk-starfive-jhb100-sys2.c b/drivers/clk/starfive/clk-starfive-jhb100-sys2.c
new file mode 100644
index 000000000000..20ea5acf31ca
--- /dev/null
+++ b/drivers/clk/starfive/clk-starfive-jhb100-sys2.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * StarFive JHB100 System-2 Clock Driver
+ *
+ * Copyright (C) 2024 StarFive Technology Co., Ltd.
+ *
+ * Author: Changhuang Liang <changhuang.liang@starfivetech.com>
+ *
+ */
+
+#include <dt-bindings/clock/starfive,jhb100-crg.h>
+#include <linux/init.h>
+#include <linux/platform_device.h>
+
+#include "clk-starfive-common.h"
+
+#define JHB100_SYS2CLK_NUM_CLKS			(JHB100_SYS2CLK_MAIN_ICG_EN_JTAG1 + 1)
+
+/* external clocks */
+#define JHB100_SYS2CLK_OSC			(JHB100_SYS2CLK_NUM_CLKS + 0)
+#define JHB100_SYS2CLK_PLL1			(JHB100_SYS2CLK_NUM_CLKS + 1)
+#define JHB100_SYS2CLK_GPU0_NCNOC_INIT		(JHB100_SYS2CLK_NUM_CLKS + 2)
+#define JHB100_SYS2CLK_GPU1_NCNOC_INIT		(JHB100_SYS2CLK_NUM_CLKS + 3)
+
+char *jhb100_sys2_ext_clk[] = {
+	"osc",
+	"pll1",
+	"gpu0_ncnoc_init",
+	"gpu1_ncnoc_init",
+};
+
+static const struct starfive_clk_data jhb100_sys2crg_clk_data[] __initconst = {
+	/* jtag mst*/
+	STARFIVE__DIV(JHB100_SYS2CLK_JTAGM0_HCLK, "jtagm0_hclk", 6,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_JTAGM1_HCLK, "jtagm1_hclk", 6,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_JTAGM0_ATPG, "jtagm0_ATPG", 12,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_JTAGM1_ATPG, "jtagm1_ATPG", 12,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_JTAGM0_ATPG_TCLOCK, "jtagm0_atpg_tclock", 2,
+		      JHB100_SYS2CLK_JTAGM0_ATPG),
+	STARFIVE__DIV(JHB100_SYS2CLK_JTAGM1_ATPG_TCLOCK, "jtagm1_atpg_tclock", 2,
+		      JHB100_SYS2CLK_JTAGM1_ATPG),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_HCLK, "jtag0_mst_wrap_hclk",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_HCLK),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_CLK_JTAG, "jtag0_mst_wrap_clk_jtag",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_HCLK),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_APB_PCLK, "jtag0_mst_wrap_apb_pclk",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_ATPG),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_ATPG_TCLOCK, "jtag0_mst_wrap_atpg_tclock",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_ATPG),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_HCLK, "jtag1_mst_wrap_hclk",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_HCLK),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_CLK_JTAG, "jtag1_mst_wrap_clk_jtag",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_HCLK),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_APB_PCLK, "jtag1_mst_wrap_apb_pclk",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_ATPG),
+	STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_ATPG_TCLOCK, "jtag1_mst_wrap_atpg_tclock",
+		      CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_ATPG),
+	/* hostusbcmn */
+	STARFIVE__DIV(JHB100_SYS2CLK_HOSTUSB_NCNOC_TARG, "hostusb_ncnoc_targ", 12,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_HOSTUSBCMN_CFG_500, "hostusbcmn_cfg_500", 4,
+		      JHB100_SYS2CLK_PLL1),
+	/* bmcperiph1 */
+	STARFIVE__DIV(JHB100_SYS2CLK_BMCPER1_NCNOC_TARG, "bmcper1_ncnoc_targ", 6,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_BMCPER1_CFG_250, "bmcper1_cfg_250", 5,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE__DIV(JHB100_SYS2CLK_BMCPER1_CFG_143_DFT, "bmcper1_cfg_143_dft", 8,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE_GATE(JHB100_SYS2CLK_BMCPER1_CFG_143, "bmcper1_cfg_143", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_BMCPER1_CFG_143_DFT),
+	/* bmcperiph0 */
+	STARFIVE__DIV(JHB100_SYS2CLK_BMCPER0_NCNOC_TARG, "bmcper0_ncnoc_targ", 6,
+		      JHB100_SYS2CLK_PLL1),
+	/* gpu0 */
+	STARFIVE__DIV(JHB100_SYS2CLK_GPU0_NCNOC_TARG, "gpu0_ncnoc_targ", 12,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE_GATE(JHB100_SYS2CLK_GPU0_BUS_CLK, "gpu0_bus_clk", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_GPU0_NCNOC_INIT),
+	STARFIVE_GATE(JHB100_SYS2CLK_GPU0_APB_CLK, "gpu0_apb_clk", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_GPU0_NCNOC_TARG),
+	STARFIVE_GATE(JHB100_SYS2CLK_GPU0_OSC_CLK, "gpu0_osc_clk", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_OSC),
+	/* gpu1 */
+	STARFIVE__DIV(JHB100_SYS2CLK_GPU1_NCNOC_TARG, "gpu1_ncnoc_targ", 12,
+		      JHB100_SYS2CLK_PLL1),
+	STARFIVE_GATE(JHB100_SYS2CLK_GPU1_BUS_CLK, "gpu1_bus_clk", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_GPU1_NCNOC_INIT),
+	STARFIVE_GATE(JHB100_SYS2CLK_GPU1_APB_CLK, "gpu1_apb_clk", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_GPU1_NCNOC_TARG),
+	STARFIVE_GATE(JHB100_SYS2CLK_GPU1_OSC_CLK, "gpu1_osc_clk", CLK_IS_CRITICAL,
+		      JHB100_SYS2CLK_OSC),
+	/* main icg */
+	STARFIVE_GATE(JHB100_SYS2CLK_MAIN_ICG_EN_JTAG0, "main_icg_en_jtag0", 0,
+		      JHB100_SYS2CLK_JTAGM0_HCLK),
+	STARFIVE_GATE(JHB100_SYS2CLK_MAIN_ICG_EN_JTAG1, "main_icg_en_jtag1", 0,
+		      JHB100_SYS2CLK_JTAGM1_HCLK),
+};
+
+const struct jhb100_crg_domain_info jhb100_sys2crg_info = {
+	.clk_data	= jhb100_sys2crg_clk_data,
+	.num_clk	= ARRAY_SIZE(jhb100_sys2crg_clk_data),
+	.ext_clk	= jhb100_sys2_ext_clk,
+	.num_ext_clk	= ARRAY_SIZE(jhb100_sys2_ext_clk),
+	.rst_name	= "jhb100-r-sys2",
+	.power_domain	= false,
+};
+
+static const struct of_device_id jhb100_sys2crg_match[] = {
+	{
+		.compatible = "starfive,jhb100-sys2crg",
+		.data = &jhb100_sys2crg_info,
+	},
+	{ /* sentinel */ }
+};
+
+static struct platform_driver jhb100_sys2crg_driver = {
+	.driver = {
+		.name = "clk-starfive-jhb100-sys2",
+		.of_match_table = jhb100_sys2crg_match,
+		.suppress_bind_attrs = true,
+	},
+};
+builtin_platform_driver_probe(jhb100_sys2crg_driver, starfive_crg_probe);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
From: Qingfang Deng @ 2026-05-08  5:57 UTC (permalink / raw)
  To: Hyunwoo Kim
  Cc: dhowells, marc.dionne, davem, edumazet, kuba, pabeni, horms,
	linux-afs, netdev
In-Reply-To: <afKV2zGR6rrelPC7@v4bel>

On Thu, 30 Apr 2026 08:35:55 +0900, Hyunwoo Kim wrote:
> 
> The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
> handler in rxrpc_verify_response() copy the skb to a linear one before
> calling into the security ops only when skb_cloned() is true.  An skb
> that is not cloned but still carries paged fragments (skb->data_len != 0)
> falls through to the in-place decryption path, which binds the frag
> pages directly into the AEAD/skcipher SGL via skb_to_sgvec().
> 
> Extend the gate so that any skb with non-linear data is also copied,
> ensuring the security handler always operates on a fully linear skb.
> The OOM/trace handling already in place is reused.
> 
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
>  net/rxrpc/call_event.c | 2 +-
>  net/rxrpc/conn_event.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index fdd683261226..6c924ef55208 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -334,7 +334,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
>  
>  			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
>  			    sp->hdr.securityIndex != 0 &&
> -			    skb_cloned(skb)) {
> +			    (skb_cloned(skb) || skb->data_len)) {

It's recommended to use skb_is_nonlinear() instead of open-coding
skb->data_len.

>  				/* Unshare the packet so that it can be
>  				 * modified by in-place decryption.
>  				 */
> diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
> index a2130d25aaa9..eab7c5f2517a 100644
> --- a/net/rxrpc/conn_event.c
> +++ b/net/rxrpc/conn_event.c
> @@ -245,7 +245,7 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
>  {
>  	int ret;
>  
> -	if (skb_cloned(skb)) {
> +	if (skb_cloned(skb) || skb->data_len) {

Ditto.

>  		/* Copy the packet if shared so that we can do in-place
>  		 * decryption.
>  		 */

Regards,
Qingfang

^ permalink raw reply

* Re: [PATCH net] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
From: Hyunwoo Kim @ 2026-05-08  6:07 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: dhowells, marc.dionne, davem, edumazet, kuba, pabeni, horms,
	linux-afs, netdev, imv4bel
In-Reply-To: <20260508055716.89380-1-qingfang.deng@linux.dev>

On Fri, May 08, 2026 at 01:57:15PM +0800, Qingfang Deng wrote:
> On Thu, 30 Apr 2026 08:35:55 +0900, Hyunwoo Kim wrote:
> > 
> > The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
> > handler in rxrpc_verify_response() copy the skb to a linear one before
> > calling into the security ops only when skb_cloned() is true.  An skb
> > that is not cloned but still carries paged fragments (skb->data_len != 0)
> > falls through to the in-place decryption path, which binds the frag
> > pages directly into the AEAD/skcipher SGL via skb_to_sgvec().
> > 
> > Extend the gate so that any skb with non-linear data is also copied,
> > ensuring the security handler always operates on a fully linear skb.
> > The OOM/trace handling already in place is reused.
> > 
> > Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > ---
> >  net/rxrpc/call_event.c | 2 +-
> >  net/rxrpc/conn_event.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> > index fdd683261226..6c924ef55208 100644
> > --- a/net/rxrpc/call_event.c
> > +++ b/net/rxrpc/call_event.c
> > @@ -334,7 +334,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
> >  
> >  			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> >  			    sp->hdr.securityIndex != 0 &&
> > -			    skb_cloned(skb)) {
> > +			    (skb_cloned(skb) || skb->data_len)) {
> 
> It's recommended to use skb_is_nonlinear() instead of open-coding
> skb->data_len.
> 
> >  				/* Unshare the packet so that it can be
> >  				 * modified by in-place decryption.
> >  				 */
> > diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
> > index a2130d25aaa9..eab7c5f2517a 100644
> > --- a/net/rxrpc/conn_event.c
> > +++ b/net/rxrpc/conn_event.c
> > @@ -245,7 +245,7 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
> >  {
> >  	int ret;
> >  
> > -	if (skb_cloned(skb)) {
> > +	if (skb_cloned(skb) || skb->data_len) {
> 
> Ditto.

Thank you for the review.

I will submit a v2 patch.


Best regards,
Hyunwoo Kim

> 
> >  		/* Copy the packet if shared so that we can do in-place
> >  		 * decryption.
> >  		 */
> 
> Regards,
> Qingfang

^ permalink raw reply

* Re: [mst-vhost:balloon 6/30] Warning: mm/mempolicy.c:2444 function parameter 'user_addr' not described in '__alloc_pages_mpol'
From: Michael S. Tsirkin @ 2026-05-08  6:09 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-kbuild-all, kvm, virtualization, netdev
In-Reply-To: <202605080515.6jRN5wN7-lkp@intel.com>

On Fri, May 08, 2026 at 05:26:30AM +0800, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
> head:   9f56ee36fbf6a6d336dc6a9eaeb4f8a67cb42a31
> commit: c4289f5a4e563611a468b4b5379025a4aa4a7c12 [6/30] mm: thread user_addr through page allocator for cache-friendly zeroing
> config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20260508/202605080515.6jRN5wN7-lkp@intel.com/config)
> compiler: powerpc64-linux-gcc (GCC) 15.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260508/202605080515.6jRN5wN7-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202605080515.6jRN5wN7-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> Warning: mm/mempolicy.c:2444 function parameter 'user_addr' not described in '__alloc_pages_mpol'
> >> Warning: mm/mempolicy.c:2444 expecting prototype for alloc_pages_mpol(). Prototype was for __alloc_pages_mpol() instead
>    Warning: mm/mempolicy.c:2547 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


kerneldoc messed up with a rebase. will fix in v6.


^ permalink raw reply

* Re: [mst-vhost:balloon 4/30] Warning: mm/mempolicy.c:2527 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead
From: Michael S. Tsirkin @ 2026-05-08  6:16 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-kbuild-all, kvm, virtualization, netdev
In-Reply-To: <202605080331.y1eIdVUC-lkp@intel.com>

On Fri, May 08, 2026 at 03:56:56AM +0800, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
> head:   9f56ee36fbf6a6d336dc6a9eaeb4f8a67cb42a31
> commit: 95744e0e9c4df79c6bc8ec96306b29c7a8e8984e [4/30] mm: move vma_alloc_folio to page_alloc.c
> config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20260508/202605080331.y1eIdVUC-lkp@intel.com/config)
> compiler: powerpc64-linux-gcc (GCC) 15.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260508/202605080331.y1eIdVUC-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202605080331.y1eIdVUC-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> Warning: mm/mempolicy.c:2527 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


kerneldoc messed up by a rebase. will fix up.


^ permalink raw reply

* [PATCH net] net: ena: PHC: Fix potential use-after-free in get_timestamp
From: Arthur Kiyanovski @ 2026-05-08  6:21 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski, netdev
  Cc: Arthur Kiyanovski, Richard Cochran, Eric Dumazet, Paolo Abeni,
	David Woodhouse, Thomas Gleixner, Miroslav Lichvar, Andrew Lunn,
	Wen Gu, Xuan Zhuo, David Woodhouse, Yonatan Sarna,
	Zorik Machulsky, Alexander Matushevsky, Saeed Bshara, Matt Wilson,
	Anthony Liguori, Nafea Bshara, Evgeny Schmeilin, Netanel Belgazal,
	Ali Saidi, Benjamin Herrenschmidt, Noam Dagan, David Arinzon,
	Evgeny Ostrovsky, Ofir Tabachnik, Amit Bernstein, stable

Move the phc->active check and resp pointer assignment to after
acquiring the spinlock. Previously, phc->active was checked without
holding the lock, and resp was cached from ena_dev->phc.virt_addr
before the lock was acquired.

If ena_com_phc_destroy() runs between the lockless active check and
the lock acquisition, it sets active=false, releases the lock, frees
the DMA memory, and sets virt_addr=NULL. The get_timestamp path would
then read a NULL virt_addr and dereference it.

With both the active check and the pointer read under the lock,
destroy cannot free the memory while get_timestamp is using it.

Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index e67b592..8c86789 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -1782,20 +1782,23 @@ void ena_com_phc_destroy(struct ena_com_dev *ena_dev)
 
 int ena_com_phc_get_timestamp(struct ena_com_dev *ena_dev, u64 *timestamp)
 {
-	volatile struct ena_admin_phc_resp *resp = ena_dev->phc.virt_addr;
 	const ktime_t zero_system_time = ktime_set(0, 0);
 	struct ena_com_phc_info *phc = &ena_dev->phc;
+	volatile struct ena_admin_phc_resp *resp;
 	ktime_t expire_time;
 	ktime_t block_time;
 	unsigned long flags = 0;
 	int ret = 0;
 
+	spin_lock_irqsave(&phc->lock, flags);
+
 	if (!phc->active) {
+		spin_unlock_irqrestore(&phc->lock, flags);
 		netdev_err(ena_dev->net_device, "PHC feature is not active in the device\n");
 		return -EOPNOTSUPP;
 	}
 
-	spin_lock_irqsave(&phc->lock, flags);
+	resp = ena_dev->phc.virt_addr;
 
 	/* Check if PHC is in blocked state */
 	if (unlikely(ktime_compare(phc->system_time, zero_system_time))) {
-- 
2.47.3


^ permalink raw reply related

* Re: Re: [PATCH net v1 2/2] net: stmmac: eic7700: fix delay step calculation and ensure safe register initialization
From: 李志 @ 2026-05-08  6:25 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
	conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, linux-stm32, linux-arm-kernel,
	ningyu, linmin, pinkesh.vaghela, pritesh.patel, weishangjuan
In-Reply-To: <92e8a3dd-a46a-499f-b5f6-99f7b99f45f5@bootlin.com>

> -----Original Messages-----
> From: "Maxime Chevallier" <maxime.chevallier@bootlin.com>
> Send time:Thursday, 07/05/2026 19:21:41
> To: lizhi2@eswincomputing.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com, rmk+kernel@armlinux.org.uk, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org
> Cc: ningyu@eswincomputing.com, linmin@eswincomputing.com, pinkesh.vaghela@einfochips.com, pritesh.patel@einfochips.com, weishangjuan@eswincomputing.com
> Subject: Re: [PATCH net v1 2/2] net: stmmac: eic7700: fix delay step calculation and ensure safe register initialization
> 
> Hi,
> 
> On 07/05/2026 10:32, lizhi2@eswincomputing.com wrote:
> > From: Zhi Li <lizhi2@eswincomputing.com>
> > 
> > Fix several issues in the EIC7700 DWMAC glue driver related to delay
> > configuration and register initialization.
> > 
> > The hardware implements TX/RX delay with a granularity of 20 ps per
> > step, but the driver previously assumed a 100 ps step. Update the
> > definitions to match the actual hardware behaviour and align with
> > the binding constraints.
> > 
> > Introduce explicit definitions for the maximum programmable delay
> > range based on the hardware limits.
> > 
> > Move HSP CSR configuration into the initialization path after clocks
> > are enabled. This ensures that all register accesses occur with the
> > required clocks active, avoiding undefined behaviour.
> > 
> > Clear the TXD and RXD delay control registers during initialization
> > to override any residual configuration left by the bootloader. This
> > ensures deterministic RGMII timing and prevents unintended delay
> > being applied.
> > 
> > The MAC RGMII delay programming is only required for 100Mbps and
> > 1000Mbps modes, where precise clock-to-data alignment is necessary for
> > reliable sampling.
> > 
> > For 10Mbps operation, timing margins are sufficiently relaxed and no
> > additional delay compensation is required. In this case, the driver
> > falls back to a safe default configuration with delay disabled.
> > 
> > For unsupported or unexpected link speeds, the driver avoids
> > programming invalid delay values and falls back to a safe default
> > state by explicitly clearing the delay configuration.
> > 
> > Explicitly programming zero ensures that no residual delay settings
> > from previous configurations or bootloader state remain active.
> > 
> > These changes fix incorrect delay programming and initialization
> > ordering for existing users.
> > 
> > This also aligns the driver implementation with the updated device
> > tree binding.
> 
> There's a lot going on in this patch, can you split this into patches
> that solves each of these individual issues ?
> 
> It's a mix of fixes (the reg access moved after clk config for example)
> and non-fixes (the RGMII timings, you're improving the granularity of
> the delays, is this required to fix existing setups, or is it a generic
> improvement ?), splitting this would make it both easier to review, and
> easier to bisect should problems arise in the future.
> 

Thanks for the detailed review and suggestions.

You're right that the current patch mixes several logically independent
changes, and splitting them will make the series easier to review and
bisect. I will follow your suggestion and split the current patch into
multiple smaller patches within the same series.

All five changes below are correctness fixes addressing hardware or driver
issues, not improvements or new features.

Based on the current change set, the individual fixes are:

1. TX/RX delay granularity correction (100 ps -> 20 ps step)
   This corrects an incorrect hardware capability modeling in the driver.
   The driver previously assumed a 100 ps step, while the hardware actually
   implements 20 ps granularity.
   This fixes incorrect delay programming that could occur when fine-grained
   delay values are used, ensuring correct representation of the hardware
   capability.

2. Introduce explicit maximum delay range definitions
   This fixes missing enforcement of hardware constraints, preventing invalid
   delay values from being accepted or programmed.

3. Move HSP CSR configuration after clock enable
   This fixes a register access ordering issue where accessing HSP CSR before
   clocks are enabled may result in undefined behavior during initialization.

4. Clear TXD/RXD delay control registers during initialization
   This fixes residual configuration left by bootloader state, ensuring
   deterministic behavior across reboot and driver reload.

5. Delay handling for 10Mbps and invalid link speeds
   This fixes incorrect application of RGMII delay programming outside valid
   operating modes, preventing invalid configuration from being applied.

I will split these into separate patches in the next revision, while keeping
them within the same series.

For the DT binding side, would you also recommend splitting the binding
changes to match the driver-level granularity, or would it be better to keep
them consolidated in a single binding patch?

If you have any further suggestions on the split or classification, please
let me know.

Thanks,
Zhi

^ permalink raw reply

* [PATCH net-next 0/2] net/smc: transition to RDMA core CQ pooling
From: D. Wythe @ 2026-05-08  6:37 UTC (permalink / raw)
  To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
  Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
	linux-rdma, linux-s390, netdev, oliver.yang, pasic

This series transitions SMC-R completion handling to RDMA core CQ pooling
via the ib_cqe API. The new completion model improves scalability by
allowing per-link completion processing across multiple cores and enables
DIM-based interrupt moderation.

As a side effect, the increased concurrency can amplify contention for TX
slots on the shared wait queue. Patch 2 addresses this by switching TX slot
allocation from non-exclusive wait_event() to prepare_to_wait_exclusive(),
which avoids thundering-herd wakeups under contention.

Patch 1 replaces the global per-device CQ and manual tasklet polling model
with RDMA core CQ pooling.
Patch 2 reduces TX slot contention by using exclusive wait queue entries
during allocation.

Link: https://lore.kernel.org/netdev/20260305022323.96125-1-alibuda@linux.alibaba.com/

D. Wythe (2):
  net/smc: transition to RDMA core CQ pooling
  net/smc: reduce TX slot contention with exclusive wait

 net/smc/smc_core.c |   9 +-
 net/smc/smc_core.h |  28 ++--
 net/smc/smc_ib.c   | 113 +++++----------
 net/smc/smc_ib.h   |   7 -
 net/smc/smc_tx.c   |   1 -
 net/smc/smc_wr.c   | 344 ++++++++++++++++++++-------------------------
 net/smc/smc_wr.h   |  40 ++----
 7 files changed, 215 insertions(+), 327 deletions(-)

-- 
2.45.0

^ permalink raw reply

* [PATCH net-next 1/2] net/smc: transition to RDMA core CQ pooling
From: D. Wythe @ 2026-05-08  6:37 UTC (permalink / raw)
  To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
  Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
	linux-rdma, linux-s390, netdev, oliver.yang, pasic,
	Leon Romanovsky
In-Reply-To: <20260508063718.101622-1-alibuda@linux.alibaba.com>

The current SMC-R implementation relies on global per-device CQs
and manual polling within tasklets, which introduces severe
scalability bottlenecks due to global lock contention and tasklet
scheduling overhead, resulting in poor performance as concurrency
increases.

Refactor the completion handling to utilize the ib_cqe API and
standard RDMA core CQ pooling. This transition provides several key
advantages:

1. Multi-CQ: Shift from a single shared per-device CQ to multiple
link-specific CQs via the CQ pool. This allows completion processing
to be parallelized across multiple CPU cores, effectively eliminating
the global CQ bottleneck.

2. Leverage DIM: Utilizing the standard CQ pool with IB_POLL_SOFTIRQ
enables Dynamic Interrupt Moderation from the RDMA core, optimizing
interrupt frequency and reducing CPU load under high pressure.

3. O(1) Context Retrieval: Replaces the expensive wr_id based lookup
logic (e.g., smc_wr_tx_find_pending_index) with direct context retrieval
using container_of() on the embedded ib_cqe.

4. Code Simplification: This refactoring results in a reduction of
~150 lines of code. It removes redundant sequence tracking, complex lookup
helpers, and manual CQ management, significantly improving maintainability.

Performance Test: redis-benchmark with max 32 connections per QP
Data format: Requests Per Second (RPS), Percentage in brackets
represents the gain/loss compared to TCP.

| Clients | TCP      | SMC (original)      | SMC (cq_pool)       |
|---------|----------|---------------------|---------------------|
| c = 1   | 24449    | 31172  (+27%)       | 34039  (+39%)       |
| c = 2   | 46420    | 53216  (+14%)       | 64391  (+38%)       |
| c = 16  | 159673   | 83668  (-48%)  <--  | 216947 (+36%)       |
| c = 32  | 164956   | 97631  (-41%)  <--  | 249376 (+51%)       |
| c = 64  | 166322   | 118192 (-29%)  <--  | 249488 (+50%)       |
| c = 128 | 167700   | 121497 (-27%)  <--  | 249480 (+48%)       |
| c = 256 | 175021   | 146109 (-16%)  <--  | 240384 (+37%)       |
| c = 512 | 168987   | 101479 (-40%)  <--  | 226634 (+34%)       |

The results demonstrate that this optimization effectively resolves the
scalability bottleneck, with RPS increasing by over 110% at c=64
compared to the original implementation.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/smc/smc_core.c |   9 +-
 net/smc/smc_core.h |  28 ++--
 net/smc/smc_ib.c   | 113 +++++-----------
 net/smc/smc_ib.h   |   7 -
 net/smc/smc_tx.c   |   1 -
 net/smc/smc_wr.c   | 312 +++++++++++++++++++--------------------------
 net/smc/smc_wr.h   |  40 ++----
 7 files changed, 193 insertions(+), 317 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index cf6b620fef05..218a10e85361 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -815,17 +815,11 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
 	lnk->lgr = lgr;
 	smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
 	lnk->link_idx = link_idx;
-	lnk->wr_rx_id_compl = 0;
 	smc_ibdev_cnt_inc(lnk);
 	smcr_copy_dev_info_to_link(lnk);
 	atomic_set(&lnk->conn_cnt, 0);
 	smc_llc_link_set_uid(lnk);
 	INIT_WORK(&lnk->link_down_wrk, smc_link_down_work);
-	if (!lnk->smcibdev->initialized) {
-		rc = (int)smc_ib_setup_per_ibdev(lnk->smcibdev);
-		if (rc)
-			goto out;
-	}
 	get_random_bytes(rndvec, sizeof(rndvec));
 	lnk->psn_initial = rndvec[0] + (rndvec[1] << 8) +
 		(rndvec[2] << 16);
@@ -863,6 +857,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
 	if (rc)
 		goto free_link_mem;
 	lnk->state = SMC_LNK_ACTIVATING;
+	smc_wr_init_cqes(lnk);
 	return 0;
 
 free_link_mem:
@@ -1373,7 +1368,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
 	smc_llc_link_clear(lnk, log);
 	smcr_buf_unmap_lgr(lnk);
 	smcr_rtoken_clear_link(lnk);
-	smc_ib_modify_qp_error(lnk);
+	smc_wr_drain_rq(lnk);
 	smc_wr_free_link(lnk);
 	smc_ib_destroy_queue_pair(lnk);
 	smc_ib_dealloc_protection_domain(lnk);
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 5c18f08a4c8a..f98c0f0cb14b 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -89,8 +89,21 @@ struct smc_rdma_sges {				/* sges per message send */
 struct smc_rdma_wr {				/* work requests per message
 						 * send
 						 */
+	struct ib_cqe		cqe;
 	struct ib_rdma_wr	wr_tx_rdma[SMC_MAX_RDMA_WRITES];
-};
+} ____cacheline_aligned_in_smp;
+
+struct smc_ib_recv_wr {
+	struct ib_cqe		cqe;
+	struct ib_recv_wr	wr;
+	int idx;
+} ____cacheline_aligned_in_smp;
+
+struct smc_ib_send_wr {
+	struct ib_cqe		cqe;
+	struct ib_send_wr	wr;
+	int idx;
+} ____cacheline_aligned_in_smp;
 
 #define SMC_LGR_ID_SIZE		4
 
@@ -100,23 +113,24 @@ struct smc_link {
 	struct ib_pd		*roce_pd;	/* IB protection domain,
 						 * unique for every RoCE QP
 						 */
+	unsigned int		nr_cqe;		/* number of CQ entries */
+	struct ib_cq		*ib_cq;		/* IB completion queue */
 	struct ib_qp		*roce_qp;	/* IB queue pair */
 	struct ib_qp_attr	qp_attr;	/* IB queue pair attributes */
 
 	struct smc_wr_buf	*wr_tx_bufs;	/* WR send payload buffers */
-	struct ib_send_wr	*wr_tx_ibs;	/* WR send meta data */
+	struct smc_ib_send_wr	*wr_tx_ibs;	/* WR send meta data */
 	struct ib_sge		*wr_tx_sges;	/* WR send gather meta data */
 	struct smc_rdma_sges	*wr_tx_rdma_sges;/*RDMA WRITE gather meta data*/
 	struct smc_rdma_wr	*wr_tx_rdmas;	/* WR RDMA WRITE */
 	struct smc_wr_tx_pend	*wr_tx_pends;	/* WR send waiting for CQE */
 	struct completion	*wr_tx_compl;	/* WR send CQE completion */
 	/* above four vectors have wr_tx_cnt elements and use the same index */
-	struct ib_send_wr	*wr_tx_v2_ib;	/* WR send v2 meta data */
+	struct smc_ib_send_wr	*wr_tx_v2_ib;	/* WR send v2 meta data */
 	struct ib_sge		*wr_tx_v2_sge;	/* WR send v2 gather meta data*/
 	struct smc_wr_tx_pend	*wr_tx_v2_pend;	/* WR send v2 waiting for CQE */
 	dma_addr_t		wr_tx_dma_addr;	/* DMA address of wr_tx_bufs */
 	dma_addr_t		wr_tx_v2_dma_addr; /* DMA address of v2 tx buf*/
-	atomic_long_t		wr_tx_id;	/* seq # of last sent WR */
 	unsigned long		*wr_tx_mask;	/* bit mask of used indexes */
 	u32			wr_tx_cnt;	/* number of WR send buffers */
 	wait_queue_head_t	wr_tx_wait;	/* wait for free WR send buf */
@@ -126,7 +140,7 @@ struct smc_link {
 	struct completion	tx_ref_comp;
 
 	u8			*wr_rx_bufs;	/* WR recv payload buffers */
-	struct ib_recv_wr	*wr_rx_ibs;	/* WR recv meta data */
+	struct smc_ib_recv_wr	*wr_rx_ibs;	/* WR recv meta data */
 	struct ib_sge		*wr_rx_sges;	/* WR recv scatter meta data */
 	/* above three vectors have wr_rx_cnt elements and use the same index */
 	int			wr_rx_sge_cnt; /* rx sge, V1 is 1, V2 is either 2 or 1 */
@@ -135,13 +149,11 @@ struct smc_link {
 						 */
 	dma_addr_t		wr_rx_dma_addr;	/* DMA address of wr_rx_bufs */
 	dma_addr_t		wr_rx_v2_dma_addr; /* DMA address of v2 rx buf*/
-	u64			wr_rx_id;	/* seq # of last recv WR */
-	u64			wr_rx_id_compl; /* seq # of last completed WR */
 	u32			wr_rx_cnt;	/* number of WR recv buffers */
 	unsigned long		wr_rx_tstamp;	/* jiffies when last buf rx */
-	wait_queue_head_t       wr_rx_empty_wait; /* wait for RQ empty */
 
 	struct ib_reg_wr	wr_reg;		/* WR register memory region */
+	struct ib_cqe		wr_reg_cqe;	/* ib_cqe for wr_reg */
 	wait_queue_head_t	wr_reg_wait;	/* wait for wr_reg result */
 	struct {
 		struct percpu_ref	wr_reg_refs;
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 9bb495707445..eaeb3bacc613 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -111,15 +111,6 @@ int smc_ib_modify_qp_rts(struct smc_link *lnk)
 			    IB_QP_MAX_QP_RD_ATOMIC);
 }
 
-int smc_ib_modify_qp_error(struct smc_link *lnk)
-{
-	struct ib_qp_attr qp_attr;
-
-	memset(&qp_attr, 0, sizeof(qp_attr));
-	qp_attr.qp_state = IB_QPS_ERR;
-	return ib_modify_qp(lnk->roce_qp, &qp_attr, IB_QP_STATE);
-}
-
 int smc_ib_ready_link(struct smc_link *lnk)
 {
 	struct smc_link_group *lgr = smc_get_lgr(lnk);
@@ -133,10 +124,7 @@ int smc_ib_ready_link(struct smc_link *lnk)
 	if (rc)
 		goto out;
 	smc_wr_remember_qp_attr(lnk);
-	rc = ib_req_notify_cq(lnk->smcibdev->roce_cq_recv,
-			      IB_CQ_SOLICITED_MASK);
-	if (rc)
-		goto out;
+
 	rc = smc_wr_rx_post_init(lnk);
 	if (rc)
 		goto out;
@@ -657,38 +645,59 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk)
 	if (lnk->roce_qp)
 		ib_destroy_qp(lnk->roce_qp);
 	lnk->roce_qp = NULL;
+	if (lnk->ib_cq) {
+		ib_cq_pool_put(lnk->ib_cq, lnk->nr_cqe);
+		lnk->ib_cq = NULL;
+	}
 }
 
 /* create a queue pair within the protection domain for a link */
 int smc_ib_create_queue_pair(struct smc_link *lnk)
 {
+	int max_send_wr, max_recv_wr, rc;
+	struct ib_cq *cq;
+
+	/* include unsolicited rdma_writes as well,
+	 * there are max. 2 RDMA_WRITE per 1 WR_SEND.
+	 */
+	max_send_wr = 3 * lnk->lgr->max_send_wr;
+	max_recv_wr = lnk->lgr->max_recv_wr + 1;	/* +1 for ib_drain_rq() */
+
+	cq = ib_cq_pool_get(lnk->smcibdev->ibdev, max_send_wr + max_recv_wr, -1,
+			    IB_POLL_SOFTIRQ);
+
+	if (IS_ERR(cq)) {
+		rc = PTR_ERR(cq);
+		return rc;
+	}
+
 	struct ib_qp_init_attr qp_attr = {
 		.event_handler = smc_ib_qp_event_handler,
 		.qp_context = lnk,
-		.send_cq = lnk->smcibdev->roce_cq_send,
-		.recv_cq = lnk->smcibdev->roce_cq_recv,
+		.send_cq = cq,
+		.recv_cq = cq,
 		.srq = NULL,
 		.cap = {
 			.max_send_sge = SMC_IB_MAX_SEND_SGE,
 			.max_recv_sge = lnk->wr_rx_sge_cnt,
+			.max_send_wr = max_send_wr,
+			.max_recv_wr = max_recv_wr,
 			.max_inline_data = 0,
 		},
 		.sq_sig_type = IB_SIGNAL_REQ_WR,
 		.qp_type = IB_QPT_RC,
 	};
-	int rc;
 
-	/* include unsolicited rdma_writes as well,
-	 * there are max. 2 RDMA_WRITE per 1 WR_SEND
-	 */
-	qp_attr.cap.max_send_wr = 3 * lnk->lgr->max_send_wr;
-	qp_attr.cap.max_recv_wr = lnk->lgr->max_recv_wr;
 	lnk->roce_qp = ib_create_qp(lnk->roce_pd, &qp_attr);
 	rc = PTR_ERR_OR_ZERO(lnk->roce_qp);
-	if (IS_ERR(lnk->roce_qp))
+	if (IS_ERR(lnk->roce_qp)) {
 		lnk->roce_qp = NULL;
-	else
+		ib_cq_pool_put(cq, max_send_wr + max_recv_wr);
+	} else {
 		smc_wr_remember_qp_attr(lnk);
+		lnk->nr_cqe = max_send_wr + max_recv_wr;
+		lnk->ib_cq = cq;
+	}
 	return rc;
 }
 
@@ -838,62 +847,6 @@ void smc_ib_buf_unmap_sg(struct smc_link *lnk,
 	buf_slot->sgt[lnk->link_idx].sgl->dma_address = 0;
 }
 
-long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev)
-{
-	struct ib_cq_init_attr cqattr =	{
-		.cqe = SMC_MAX_CQE, .comp_vector = 0 };
-	int cqe_size_order, smc_order;
-	long rc;
-
-	mutex_lock(&smcibdev->mutex);
-	rc = 0;
-	if (smcibdev->initialized)
-		goto out;
-	/* the calculated number of cq entries fits to mlx5 cq allocation */
-	cqe_size_order = cache_line_size() == 128 ? 7 : 6;
-	smc_order = MAX_PAGE_ORDER - cqe_size_order;
-	if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE)
-		cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2;
-	smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev,
-					      smc_wr_tx_cq_handler, NULL,
-					      smcibdev, &cqattr);
-	rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_send);
-	if (IS_ERR(smcibdev->roce_cq_send)) {
-		smcibdev->roce_cq_send = NULL;
-		goto out;
-	}
-	smcibdev->roce_cq_recv = ib_create_cq(smcibdev->ibdev,
-					      smc_wr_rx_cq_handler, NULL,
-					      smcibdev, &cqattr);
-	rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_recv);
-	if (IS_ERR(smcibdev->roce_cq_recv)) {
-		smcibdev->roce_cq_recv = NULL;
-		goto err;
-	}
-	smc_wr_add_dev(smcibdev);
-	smcibdev->initialized = 1;
-	goto out;
-
-err:
-	ib_destroy_cq(smcibdev->roce_cq_send);
-out:
-	mutex_unlock(&smcibdev->mutex);
-	return rc;
-}
-
-static void smc_ib_cleanup_per_ibdev(struct smc_ib_device *smcibdev)
-{
-	mutex_lock(&smcibdev->mutex);
-	if (!smcibdev->initialized)
-		goto out;
-	smcibdev->initialized = 0;
-	ib_destroy_cq(smcibdev->roce_cq_recv);
-	ib_destroy_cq(smcibdev->roce_cq_send);
-	smc_wr_remove_dev(smcibdev);
-out:
-	mutex_unlock(&smcibdev->mutex);
-}
-
 static struct ib_client smc_ib_client;
 
 static void smc_copy_netdev_ifindex(struct smc_ib_device *smcibdev, int port)
@@ -952,7 +905,6 @@ static int smc_ib_add_dev(struct ib_device *ibdev)
 	INIT_WORK(&smcibdev->port_event_work, smc_ib_port_event_work);
 	atomic_set(&smcibdev->lnk_cnt, 0);
 	init_waitqueue_head(&smcibdev->lnks_deleted);
-	mutex_init(&smcibdev->mutex);
 	mutex_lock(&smc_ib_devices.mutex);
 	list_add_tail(&smcibdev->list, &smc_ib_devices.list);
 	mutex_unlock(&smc_ib_devices.mutex);
@@ -1001,7 +953,6 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data)
 	pr_warn_ratelimited("smc: removing ib device %s\n",
 			    smcibdev->ibdev->name);
 	smc_smcr_terminate_all(smcibdev);
-	smc_ib_cleanup_per_ibdev(smcibdev);
 	ib_unregister_event_handler(&smcibdev->event_handler);
 	cancel_work_sync(&smcibdev->port_event_work);
 	kfree(smcibdev);
diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h
index ef8ac2b7546d..a75fe8bcef3a 100644
--- a/net/smc/smc_ib.h
+++ b/net/smc/smc_ib.h
@@ -37,17 +37,12 @@ struct smc_ib_device {				/* ib-device infos for smc */
 	struct ib_device	*ibdev;
 	struct ib_port_attr	pattr[SMC_MAX_PORTS];	/* ib dev. port attrs */
 	struct ib_event_handler	event_handler;	/* global ib_event handler */
-	struct ib_cq		*roce_cq_send;	/* send completion queue */
-	struct ib_cq		*roce_cq_recv;	/* recv completion queue */
-	struct tasklet_struct	send_tasklet;	/* called by send cq handler */
-	struct tasklet_struct	recv_tasklet;	/* called by recv cq handler */
 	char			mac[SMC_MAX_PORTS][ETH_ALEN];
 						/* mac address per port*/
 	u8			pnetid[SMC_MAX_PORTS][SMC_MAX_PNETID_LEN];
 						/* pnetid per port */
 	bool			pnetid_by_user[SMC_MAX_PORTS];
 						/* pnetid defined by user? */
-	u8			initialized : 1; /* ib dev CQ, evthdl done */
 	struct work_struct	port_event_work;
 	unsigned long		port_event_mask;
 	DECLARE_BITMAP(ports_going_away, SMC_MAX_PORTS);
@@ -96,8 +91,6 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk);
 int smc_ib_create_queue_pair(struct smc_link *lnk);
 int smc_ib_ready_link(struct smc_link *lnk);
 int smc_ib_modify_qp_rts(struct smc_link *lnk);
-int smc_ib_modify_qp_error(struct smc_link *lnk);
-long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev);
 int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
 			     struct smc_buf_desc *buf_slot, u8 link_idx);
 void smc_ib_put_memory_region(struct ib_mr *mr);
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 3144b4b1fe29..d301df9ed58b 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -321,7 +321,6 @@ static int smc_tx_rdma_write(struct smc_connection *conn, int peer_rmbe_offset,
 	struct smc_link *link = conn->lnk;
 	int rc;
 
-	rdma_wr->wr.wr_id = smc_wr_tx_get_next_wr_id(link);
 	rdma_wr->wr.num_sge = num_sges;
 	rdma_wr->remote_addr =
 		lgr->rtokens[conn->rtoken_idx][link->link_idx].dma_addr +
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 59c92b46945c..48037a3d97a3 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -31,14 +31,11 @@
 #include "smc.h"
 #include "smc_wr.h"
 
-#define SMC_WR_MAX_POLL_CQE 10	/* max. # of compl. queue elements in 1 poll */
-
 #define SMC_WR_RX_HASH_BITS 4
 static DEFINE_HASHTABLE(smc_wr_rx_hash, SMC_WR_RX_HASH_BITS);
 static DEFINE_SPINLOCK(smc_wr_rx_hash_lock);
 
 struct smc_wr_tx_pend {	/* control data for a pending send request */
-	u64			wr_id;		/* work request id sent */
 	smc_wr_tx_handler	handler;
 	enum ib_wc_status	wc_status;	/* CQE status */
 	struct smc_link		*link;
@@ -63,55 +60,52 @@ void smc_wr_tx_wait_no_pending_sends(struct smc_link *link)
 	wait_event(link->wr_tx_wait, !smc_wr_is_tx_pend(link));
 }
 
-static inline int smc_wr_tx_find_pending_index(struct smc_link *link, u64 wr_id)
+static void smc_wr_tx_rdma_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
 {
-	u32 i;
+	struct smc_link *link = wc->qp->qp_context;
 
-	for (i = 0; i < link->wr_tx_cnt; i++) {
-		if (link->wr_tx_pends[i].wr_id == wr_id)
-			return i;
-	}
-	return link->wr_tx_cnt;
+	/* terminate link */
+	if (wc->status)
+		smcr_link_down_cond_sched(link);
+}
+
+static void smc_wr_reg_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
+{
+	struct smc_link *link = wc->qp->qp_context;
+
+	if (wc->status)
+		link->wr_reg_state = FAILED;
+	else
+		link->wr_reg_state = CONFIRMED;
+	smc_wr_wakeup_reg_wait(link);
 }
 
-static inline void smc_wr_tx_process_cqe(struct ib_wc *wc)
+static void smc_wr_tx_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
 {
-	struct smc_wr_tx_pend pnd_snd;
+	struct smc_wr_tx_pend *tx_pend, pnd_snd;
+	struct smc_ib_send_wr *send_wr;
 	struct smc_link *link;
 	u32 pnd_snd_idx;
 
 	link = wc->qp->qp_context;
 
-	if (wc->opcode == IB_WC_REG_MR) {
-		if (wc->status)
-			link->wr_reg_state = FAILED;
-		else
-			link->wr_reg_state = CONFIRMED;
-		smc_wr_wakeup_reg_wait(link);
-		return;
-	}
+	send_wr = container_of(wc->wr_cqe, struct smc_ib_send_wr, cqe);
+	pnd_snd_idx = send_wr->idx;
+
+	tx_pend = (pnd_snd_idx == link->wr_tx_cnt) ? link->wr_tx_v2_pend :
+		&link->wr_tx_pends[pnd_snd_idx];
+
+	tx_pend->wc_status = wc->status;
+	memcpy(&pnd_snd, tx_pend, sizeof(pnd_snd));
+	/* clear the full struct smc_wr_tx_pend including .priv */
+	memset(tx_pend, 0, sizeof(*tx_pend));
 
-	pnd_snd_idx = smc_wr_tx_find_pending_index(link, wc->wr_id);
 	if (pnd_snd_idx == link->wr_tx_cnt) {
-		if (link->lgr->smc_version != SMC_V2 ||
-		    link->wr_tx_v2_pend->wr_id != wc->wr_id)
-			return;
-		link->wr_tx_v2_pend->wc_status = wc->status;
-		memcpy(&pnd_snd, link->wr_tx_v2_pend, sizeof(pnd_snd));
-		/* clear the full struct smc_wr_tx_pend including .priv */
-		memset(link->wr_tx_v2_pend, 0,
-		       sizeof(*link->wr_tx_v2_pend));
 		memset(link->lgr->wr_tx_buf_v2, 0,
 		       sizeof(*link->lgr->wr_tx_buf_v2));
 	} else {
-		link->wr_tx_pends[pnd_snd_idx].wc_status = wc->status;
-		if (link->wr_tx_pends[pnd_snd_idx].compl_requested)
+		if (pnd_snd.compl_requested)
 			complete(&link->wr_tx_compl[pnd_snd_idx]);
-		memcpy(&pnd_snd, &link->wr_tx_pends[pnd_snd_idx],
-		       sizeof(pnd_snd));
-		/* clear the full struct smc_wr_tx_pend including .priv */
-		memset(&link->wr_tx_pends[pnd_snd_idx], 0,
-		       sizeof(link->wr_tx_pends[pnd_snd_idx]));
 		memset(&link->wr_tx_bufs[pnd_snd_idx], 0,
 		       sizeof(link->wr_tx_bufs[pnd_snd_idx]));
 		if (!test_and_clear_bit(pnd_snd_idx, link->wr_tx_mask))
@@ -133,39 +127,6 @@ static inline void smc_wr_tx_process_cqe(struct ib_wc *wc)
 	wake_up(&link->wr_tx_wait);
 }
 
-static void smc_wr_tx_tasklet_fn(struct tasklet_struct *t)
-{
-	struct smc_ib_device *dev = from_tasklet(dev, t, send_tasklet);
-	struct ib_wc wc[SMC_WR_MAX_POLL_CQE];
-	int i = 0, rc;
-	int polled = 0;
-
-again:
-	polled++;
-	do {
-		memset(&wc, 0, sizeof(wc));
-		rc = ib_poll_cq(dev->roce_cq_send, SMC_WR_MAX_POLL_CQE, wc);
-		if (polled == 1) {
-			ib_req_notify_cq(dev->roce_cq_send,
-					 IB_CQ_NEXT_COMP |
-					 IB_CQ_REPORT_MISSED_EVENTS);
-		}
-		if (!rc)
-			break;
-		for (i = 0; i < rc; i++)
-			smc_wr_tx_process_cqe(&wc[i]);
-	} while (rc > 0);
-	if (polled == 1)
-		goto again;
-}
-
-void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
-{
-	struct smc_ib_device *dev = (struct smc_ib_device *)cq_context;
-
-	tasklet_schedule(&dev->send_tasklet);
-}
-
 /*---------------------------- request submission ---------------------------*/
 
 static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
@@ -201,8 +162,6 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
 	struct smc_link_group *lgr = smc_get_lgr(link);
 	struct smc_wr_tx_pend *wr_pend;
 	u32 idx = link->wr_tx_cnt;
-	struct ib_send_wr *wr_ib;
-	u64 wr_id;
 	int rc;
 
 	*wr_buf = NULL;
@@ -226,14 +185,10 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
 		if (idx == link->wr_tx_cnt)
 			return -EPIPE;
 	}
-	wr_id = smc_wr_tx_get_next_wr_id(link);
 	wr_pend = &link->wr_tx_pends[idx];
-	wr_pend->wr_id = wr_id;
 	wr_pend->handler = handler;
 	wr_pend->link = link;
 	wr_pend->idx = idx;
-	wr_ib = &link->wr_tx_ibs[idx];
-	wr_ib->wr_id = wr_id;
 	*wr_buf = &link->wr_tx_bufs[idx];
 	if (wr_rdma_buf)
 		*wr_rdma_buf = &link->wr_tx_rdmas[idx];
@@ -247,22 +202,16 @@ int smc_wr_tx_get_v2_slot(struct smc_link *link,
 			  struct smc_wr_tx_pend_priv **wr_pend_priv)
 {
 	struct smc_wr_tx_pend *wr_pend;
-	struct ib_send_wr *wr_ib;
-	u64 wr_id;
 
 	if (link->wr_tx_v2_pend->idx == link->wr_tx_cnt)
 		return -EBUSY;
 
 	*wr_buf = NULL;
 	*wr_pend_priv = NULL;
-	wr_id = smc_wr_tx_get_next_wr_id(link);
 	wr_pend = link->wr_tx_v2_pend;
-	wr_pend->wr_id = wr_id;
 	wr_pend->handler = handler;
 	wr_pend->link = link;
 	wr_pend->idx = link->wr_tx_cnt;
-	wr_ib = link->wr_tx_v2_ib;
-	wr_ib->wr_id = wr_id;
 	*wr_buf = link->lgr->wr_tx_buf_v2;
 	*wr_pend_priv = &wr_pend->priv;
 	return 0;
@@ -306,10 +255,8 @@ int smc_wr_tx_send(struct smc_link *link, struct smc_wr_tx_pend_priv *priv)
 	struct smc_wr_tx_pend *pend;
 	int rc;
 
-	ib_req_notify_cq(link->smcibdev->roce_cq_send,
-			 IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
 	pend = container_of(priv, struct smc_wr_tx_pend, priv);
-	rc = ib_post_send(link->roce_qp, &link->wr_tx_ibs[pend->idx], NULL);
+	rc = ib_post_send(link->roce_qp, &link->wr_tx_ibs[pend->idx].wr, NULL);
 	if (rc) {
 		smc_wr_tx_put_slot(link, priv);
 		smcr_link_down_cond_sched(link);
@@ -322,10 +269,8 @@ int smc_wr_tx_v2_send(struct smc_link *link, struct smc_wr_tx_pend_priv *priv,
 {
 	int rc;
 
-	link->wr_tx_v2_ib->sg_list[0].length = len;
-	ib_req_notify_cq(link->smcibdev->roce_cq_send,
-			 IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
-	rc = ib_post_send(link->roce_qp, link->wr_tx_v2_ib, NULL);
+	link->wr_tx_v2_ib->wr.sg_list[0].length = len;
+	rc = ib_post_send(link->roce_qp, &link->wr_tx_v2_ib->wr, NULL);
 	if (rc) {
 		smc_wr_tx_put_slot(link, priv);
 		smcr_link_down_cond_sched(link);
@@ -367,10 +312,7 @@ int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr)
 {
 	int rc;
 
-	ib_req_notify_cq(link->smcibdev->roce_cq_send,
-			 IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
 	link->wr_reg_state = POSTED;
-	link->wr_reg.wr.wr_id = (u64)(uintptr_t)mr;
 	link->wr_reg.mr = mr;
 	link->wr_reg.key = mr->rkey;
 	rc = ib_post_send(link->roce_qp, &link->wr_reg.wr, NULL);
@@ -431,94 +373,74 @@ static inline void smc_wr_rx_demultiplex(struct ib_wc *wc)
 {
 	struct smc_link *link = (struct smc_link *)wc->qp->qp_context;
 	struct smc_wr_rx_handler *handler;
+	struct smc_ib_recv_wr *recv_wr;
 	struct smc_wr_rx_hdr *wr_rx;
-	u64 temp_wr_id;
-	u32 index;
 
 	if (wc->byte_len < sizeof(*wr_rx))
 		return; /* short message */
-	temp_wr_id = wc->wr_id;
-	index = do_div(temp_wr_id, link->wr_rx_cnt);
-	wr_rx = (struct smc_wr_rx_hdr *)(link->wr_rx_bufs + index * link->wr_rx_buflen);
+
+	recv_wr = container_of(wc->wr_cqe, struct smc_ib_recv_wr, cqe);
+
+	wr_rx = (struct smc_wr_rx_hdr *)(link->wr_rx_bufs + recv_wr->idx * link->wr_rx_buflen);
 	hash_for_each_possible(smc_wr_rx_hash, handler, list, wr_rx->type) {
 		if (handler->type == wr_rx->type)
 			handler->handler(wc, wr_rx);
 	}
 }
 
-static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num)
+static void smc_wr_rx_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
 {
-	struct smc_link *link;
-	int i;
+	struct smc_link *link = wc->qp->qp_context;
 
-	for (i = 0; i < num; i++) {
-		link = wc[i].qp->qp_context;
-		link->wr_rx_id_compl = wc[i].wr_id;
-		if (wc[i].status == IB_WC_SUCCESS) {
-			link->wr_rx_tstamp = jiffies;
-			smc_wr_rx_demultiplex(&wc[i]);
-			smc_wr_rx_post(link); /* refill WR RX */
-		} else {
-			/* handle status errors */
-			switch (wc[i].status) {
-			case IB_WC_RETRY_EXC_ERR:
-			case IB_WC_RNR_RETRY_EXC_ERR:
-			case IB_WC_WR_FLUSH_ERR:
-				smcr_link_down_cond_sched(link);
-				if (link->wr_rx_id_compl == link->wr_rx_id)
-					wake_up(&link->wr_rx_empty_wait);
-				break;
-			default:
-				smc_wr_rx_post(link); /* refill WR RX */
-				break;
-			}
+	if (wc->status == IB_WC_SUCCESS) {
+		link->wr_rx_tstamp = jiffies;
+		smc_wr_rx_demultiplex(wc);
+		smc_wr_rx_post(link, wc->wr_cqe); /* refill WR RX */
+	} else {
+		/* handle status errors */
+		switch (wc->status) {
+		case IB_WC_RETRY_EXC_ERR:
+		case IB_WC_RNR_RETRY_EXC_ERR:
+		case IB_WC_WR_FLUSH_ERR:
+			smcr_link_down_cond_sched(link);
+			break;
+		default:
+			smc_wr_rx_post(link, wc->wr_cqe); /* refill WR RX */
+			break;
 		}
 	}
 }
 
-static void smc_wr_rx_tasklet_fn(struct tasklet_struct *t)
+int smc_wr_rx_post_init(struct smc_link *link)
 {
-	struct smc_ib_device *dev = from_tasklet(dev, t, recv_tasklet);
-	struct ib_wc wc[SMC_WR_MAX_POLL_CQE];
-	int polled = 0;
-	int rc;
+	int i, rc = 0;
 
-again:
-	polled++;
-	do {
-		memset(&wc, 0, sizeof(wc));
-		rc = ib_poll_cq(dev->roce_cq_recv, SMC_WR_MAX_POLL_CQE, wc);
-		if (polled == 1) {
-			ib_req_notify_cq(dev->roce_cq_recv,
-					 IB_CQ_SOLICITED_MASK
-					 | IB_CQ_REPORT_MISSED_EVENTS);
-		}
-		if (!rc)
-			break;
-		smc_wr_rx_process_cqes(&wc[0], rc);
-	} while (rc > 0);
-	if (polled == 1)
-		goto again;
+	for (i = 0; i < link->wr_rx_cnt; i++)
+		rc = smc_wr_rx_post(link, &link->wr_rx_ibs[i].cqe);
+	return rc;
 }
 
-void smc_wr_rx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
-{
-	struct smc_ib_device *dev = (struct smc_ib_device *)cq_context;
+/***************************** init, exit, misc ******************************/
 
-	tasklet_schedule(&dev->recv_tasklet);
+static inline void smc_wr_reg_init_cqe(struct ib_cqe *cqe)
+{
+	cqe->done = smc_wr_reg_process_cqe;
 }
 
-int smc_wr_rx_post_init(struct smc_link *link)
+static inline void smc_wr_tx_init_cqe(struct ib_cqe *cqe)
 {
-	u32 i;
-	int rc = 0;
+	cqe->done = smc_wr_tx_process_cqe;
+}
 
-	for (i = 0; i < link->wr_rx_cnt; i++)
-		rc = smc_wr_rx_post(link);
-	return rc;
+static inline void smc_wr_rx_init_cqe(struct ib_cqe *cqe)
+{
+	cqe->done = smc_wr_rx_process_cqe;
 }
 
-/***************************** init, exit, misc ******************************/
+static inline void smc_wr_tx_rdma_init_cqe(struct ib_cqe *cqe)
+{
+	cqe->done = smc_wr_tx_rdma_process_cqe;
+}
 
 void smc_wr_remember_qp_attr(struct smc_link *lnk)
 {
@@ -550,7 +472,7 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk)
 	lnk->wr_tx_cnt = min_t(size_t, lnk->max_send_wr,
 			       lnk->qp_attr.cap.max_send_wr);
 	lnk->wr_rx_cnt = min_t(size_t, lnk->max_recv_wr,
-			       lnk->qp_attr.cap.max_recv_wr);
+			       lnk->qp_attr.cap.max_recv_wr - 1);	/* -1 for ib_drain_rq() */
 }
 
 static void smc_wr_init_sge(struct smc_link *lnk)
@@ -571,14 +493,14 @@ static void smc_wr_init_sge(struct smc_link *lnk)
 			lnk->roce_pd->local_dma_lkey;
 		lnk->wr_tx_rdma_sges[i].tx_rdma_sge[1].wr_tx_rdma_sge[1].lkey =
 			lnk->roce_pd->local_dma_lkey;
-		lnk->wr_tx_ibs[i].next = NULL;
-		lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
-		lnk->wr_tx_ibs[i].num_sge = 1;
-		lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
-		lnk->wr_tx_ibs[i].send_flags =
+		lnk->wr_tx_ibs[i].wr.next = NULL;
+		lnk->wr_tx_ibs[i].wr.sg_list = &lnk->wr_tx_sges[i];
+		lnk->wr_tx_ibs[i].wr.num_sge = 1;
+		lnk->wr_tx_ibs[i].wr.opcode = IB_WR_SEND;
+		lnk->wr_tx_ibs[i].wr.send_flags =
 			IB_SEND_SIGNALED | IB_SEND_SOLICITED;
 		if (send_inline)
-			lnk->wr_tx_ibs[i].send_flags |= IB_SEND_INLINE;
+			lnk->wr_tx_ibs[i].wr.send_flags |= IB_SEND_INLINE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.opcode = IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[1].wr.opcode = IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.sg_list =
@@ -592,11 +514,11 @@ static void smc_wr_init_sge(struct smc_link *lnk)
 		lnk->wr_tx_v2_sge->length = SMC_WR_BUF_V2_SIZE;
 		lnk->wr_tx_v2_sge->lkey = lnk->roce_pd->local_dma_lkey;
 
-		lnk->wr_tx_v2_ib->next = NULL;
-		lnk->wr_tx_v2_ib->sg_list = lnk->wr_tx_v2_sge;
-		lnk->wr_tx_v2_ib->num_sge = 1;
-		lnk->wr_tx_v2_ib->opcode = IB_WR_SEND;
-		lnk->wr_tx_v2_ib->send_flags =
+		lnk->wr_tx_v2_ib->wr.next = NULL;
+		lnk->wr_tx_v2_ib->wr.sg_list = lnk->wr_tx_v2_sge;
+		lnk->wr_tx_v2_ib->wr.num_sge = 1;
+		lnk->wr_tx_v2_ib->wr.opcode = IB_WR_SEND;
+		lnk->wr_tx_v2_ib->wr.send_flags =
 			IB_SEND_SIGNALED | IB_SEND_SOLICITED;
 	}
 
@@ -622,10 +544,11 @@ static void smc_wr_init_sge(struct smc_link *lnk)
 			lnk->wr_rx_sges[x + 1].lkey =
 					lnk->roce_pd->local_dma_lkey;
 		}
-		lnk->wr_rx_ibs[i].next = NULL;
-		lnk->wr_rx_ibs[i].sg_list = &lnk->wr_rx_sges[x];
-		lnk->wr_rx_ibs[i].num_sge = lnk->wr_rx_sge_cnt;
+		lnk->wr_rx_ibs[i].wr.next = NULL;
+		lnk->wr_rx_ibs[i].wr.sg_list = &lnk->wr_rx_sges[x];
+		lnk->wr_rx_ibs[i].wr.num_sge = lnk->wr_rx_sge_cnt;
 	}
+
 	lnk->wr_reg.wr.next = NULL;
 	lnk->wr_reg.wr.num_sge = 0;
 	lnk->wr_reg.wr.send_flags = IB_SEND_SIGNALED;
@@ -641,7 +564,6 @@ void smc_wr_free_link(struct smc_link *lnk)
 		return;
 	ibdev = lnk->smcibdev->ibdev;
 
-	smc_wr_drain_cq(lnk);
 	smc_wr_wakeup_reg_wait(lnk);
 	smc_wr_wakeup_tx_wait(lnk);
 
@@ -826,18 +748,6 @@ int smc_wr_alloc_link_mem(struct smc_link *link)
 	return -ENOMEM;
 }
 
-void smc_wr_remove_dev(struct smc_ib_device *smcibdev)
-{
-	tasklet_kill(&smcibdev->recv_tasklet);
-	tasklet_kill(&smcibdev->send_tasklet);
-}
-
-void smc_wr_add_dev(struct smc_ib_device *smcibdev)
-{
-	tasklet_setup(&smcibdev->recv_tasklet, smc_wr_rx_tasklet_fn);
-	tasklet_setup(&smcibdev->send_tasklet, smc_wr_tx_tasklet_fn);
-}
-
 static void smcr_wr_tx_refs_free(struct percpu_ref *ref)
 {
 	struct smc_link *lnk = container_of(ref, struct smc_link, wr_tx_refs);
@@ -857,8 +767,6 @@ int smc_wr_create_link(struct smc_link *lnk)
 	struct ib_device *ibdev = lnk->smcibdev->ibdev;
 	int rc = 0;
 
-	smc_wr_tx_set_wr_id(&lnk->wr_tx_id, 0);
-	lnk->wr_rx_id = 0;
 	lnk->wr_rx_dma_addr = ib_dma_map_single(
 		ibdev, lnk->wr_rx_bufs,	lnk->wr_rx_buflen * lnk->wr_rx_cnt,
 		DMA_FROM_DEVICE);
@@ -906,7 +814,6 @@ int smc_wr_create_link(struct smc_link *lnk)
 	if (rc)
 		goto cancel_ref;
 	init_completion(&lnk->reg_ref_comp);
-	init_waitqueue_head(&lnk->wr_rx_empty_wait);
 	return rc;
 
 cancel_ref:
@@ -931,3 +838,42 @@ int smc_wr_create_link(struct smc_link *lnk)
 out:
 	return rc;
 }
+
+void smc_wr_init_cqes(struct smc_link *lnk)
+{
+	int i;
+
+	/* init CQE for WR fast reg */
+	smc_wr_reg_init_cqe(&lnk->wr_reg_cqe);
+	lnk->wr_reg.wr.wr_cqe = &lnk->wr_reg_cqe;
+
+	/* init CQE for WR WRITE */
+	for (i = 0; i < lnk->wr_tx_cnt; i++) {
+		int n;
+
+		smc_wr_tx_rdma_init_cqe(&lnk->wr_tx_rdmas[i].cqe);
+		for (n = 0; n < SMC_MAX_RDMA_WRITES; n++)
+			lnk->wr_tx_rdmas[i].wr_tx_rdma[n].wr.wr_cqe = &lnk->wr_tx_rdmas[i].cqe;
+	}
+
+	/* init CQEs for WR RECV */
+	for (i = 0; i < lnk->wr_rx_cnt; i++) {
+		smc_wr_rx_init_cqe(&lnk->wr_rx_ibs[i].cqe);
+		lnk->wr_rx_ibs[i].wr.wr_cqe = &lnk->wr_rx_ibs[i].cqe;
+		lnk->wr_rx_ibs[i].idx = i;
+	}
+
+	/* init CQEs for WR SEND */
+	for (i = 0; i < lnk->wr_tx_cnt; i++) {
+		smc_wr_tx_init_cqe(&lnk->wr_tx_ibs[i].cqe);
+		lnk->wr_tx_ibs[i].wr.wr_cqe = &lnk->wr_tx_ibs[i].cqe;
+		lnk->wr_tx_ibs[i].idx = i;
+	}
+
+	/* init CQE for SMC-Rv2 WR SEND */
+	if (lnk->lgr->smc_version == SMC_V2) {
+		smc_wr_tx_init_cqe(&lnk->wr_tx_v2_ib->cqe);
+		lnk->wr_tx_v2_ib->wr.wr_cqe = &lnk->wr_tx_v2_ib->cqe;
+		lnk->wr_tx_v2_ib->idx = lnk->wr_tx_cnt;
+	}
+}
diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h
index aa4533af9122..295575fb060a 100644
--- a/net/smc/smc_wr.h
+++ b/net/smc/smc_wr.h
@@ -44,19 +44,6 @@ struct smc_wr_rx_handler {
 	u8			type;
 };
 
-/* Only used by RDMA write WRs.
- * All other WRs (CDC/LLC) use smc_wr_tx_send handling WR_ID implicitly
- */
-static inline long smc_wr_tx_get_next_wr_id(struct smc_link *link)
-{
-	return atomic_long_inc_return(&link->wr_tx_id);
-}
-
-static inline void smc_wr_tx_set_wr_id(atomic_long_t *wr_tx_id, long val)
-{
-	atomic_long_set(wr_tx_id, val);
-}
-
 static inline bool smc_wr_tx_link_hold(struct smc_link *link)
 {
 	if (!smc_link_sendable(link))
@@ -70,9 +57,10 @@ static inline void smc_wr_tx_link_put(struct smc_link *link)
 	percpu_ref_put(&link->wr_tx_refs);
 }
 
-static inline void smc_wr_drain_cq(struct smc_link *lnk)
+static inline void smc_wr_drain_rq(struct smc_link *lnk)
 {
-	wait_event(lnk->wr_rx_empty_wait, lnk->wr_rx_id_compl == lnk->wr_rx_id);
+	if (lnk->qp_attr.cur_qp_state != IB_QPS_RESET)
+		ib_drain_rq(lnk->roce_qp);
 }
 
 static inline void smc_wr_wakeup_tx_wait(struct smc_link *lnk)
@@ -86,18 +74,12 @@ static inline void smc_wr_wakeup_reg_wait(struct smc_link *lnk)
 }
 
 /* post a new receive work request to fill a completed old work request entry */
-static inline int smc_wr_rx_post(struct smc_link *link)
+static inline int smc_wr_rx_post(struct smc_link *link, struct ib_cqe *cqe)
 {
-	int rc;
-	u64 wr_id, temp_wr_id;
-	u32 index;
-
-	wr_id = ++link->wr_rx_id; /* tasklet context, thus not atomic */
-	temp_wr_id = wr_id;
-	index = do_div(temp_wr_id, link->wr_rx_cnt);
-	link->wr_rx_ibs[index].wr_id = wr_id;
-	rc = ib_post_recv(link->roce_qp, &link->wr_rx_ibs[index], NULL);
-	return rc;
+	struct smc_ib_recv_wr *recv_wr;
+
+	recv_wr = container_of(cqe, struct smc_ib_recv_wr, cqe);
+	return ib_post_recv(link->roce_qp, &recv_wr->wr, NULL);
 }
 
 int smc_wr_create_link(struct smc_link *lnk);
@@ -107,8 +89,6 @@ void smc_wr_free_link(struct smc_link *lnk);
 void smc_wr_free_link_mem(struct smc_link *lnk);
 void smc_wr_free_lgr_mem(struct smc_link_group *lgr);
 void smc_wr_remember_qp_attr(struct smc_link *lnk);
-void smc_wr_remove_dev(struct smc_ib_device *smcibdev);
-void smc_wr_add_dev(struct smc_ib_device *smcibdev);
 
 int smc_wr_tx_get_free_slot(struct smc_link *link, smc_wr_tx_handler handler,
 			    struct smc_wr_buf **wr_buf,
@@ -126,12 +106,12 @@ int smc_wr_tx_v2_send(struct smc_link *link,
 		      struct smc_wr_tx_pend_priv *priv, int len);
 int smc_wr_tx_send_wait(struct smc_link *link, struct smc_wr_tx_pend_priv *priv,
 			unsigned long timeout);
-void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context);
 void smc_wr_tx_wait_no_pending_sends(struct smc_link *link);
 
 int smc_wr_rx_register_handler(struct smc_wr_rx_handler *handler);
 int smc_wr_rx_post_init(struct smc_link *link);
-void smc_wr_rx_cq_handler(struct ib_cq *ib_cq, void *cq_context);
 int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr);
 
+void smc_wr_init_cqes(struct smc_link *lnk);
+
 #endif /* SMC_WR_H */
-- 
2.45.0


^ permalink raw reply related

* [PATCH net-next 2/2] net/smc: reduce TX slot contention with exclusive wait
From: D. Wythe @ 2026-05-08  6:37 UTC (permalink / raw)
  To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
  Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
	linux-rdma, linux-s390, netdev, oliver.yang, pasic
In-Reply-To: <20260508063718.101622-1-alibuda@linux.alibaba.com>

smc_wr_tx_get_free_slot() waits for a free TX slot with
wait_event_interruptible_timeout(). Since the wait_event family
enqueues waiters as non-exclusive, wake_up() may wake multiple
waiters even though only one can use the slot, causing
thundering-herd contention when slots are scarce.

Use an exclusive wait loop with prepare_to_wait_exclusive() so
wake_up() wakes only one waiter per freed slot.
smc_wr_wakeup_tx_wait() still uses wake_up_all() during link
teardown, so teardown behavior is unchanged.

Performance measured with netperf TCP_RR (63 flows, 200B write /
1000B read, 60s duration):

+-------------------------------+---------------+---------------+
| smcr_max_conns_per_lgr        | 32            | 255           |
|-------------------------------+---------------+---------------|
| before                        | 4.85 Gb/s     | 657.95 Mb/s   |
|-------------------------------+---------------+---------------|
| after                         | 5.01 Gb/s     | 2.2 Gb/s      |
+-------------------------------+---------------+---------------+

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_wr.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 48037a3d97a3..0a6f2befb0e2 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -159,9 +159,11 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
 			    struct smc_rdma_wr **wr_rdma_buf,
 			    struct smc_wr_tx_pend_priv **wr_pend_priv)
 {
+	unsigned long timeout = SMC_WR_TX_WAIT_FREE_SLOT_TIME;
 	struct smc_link_group *lgr = smc_get_lgr(link);
 	struct smc_wr_tx_pend *wr_pend;
 	u32 idx = link->wr_tx_cnt;
+	DEFINE_WAIT(wait);
 	int rc;
 
 	*wr_buf = NULL;
@@ -171,17 +173,27 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
 		if (rc)
 			return rc;
 	} else {
-		rc = wait_event_interruptible_timeout(
-			link->wr_tx_wait,
-			!smc_link_sendable(link) ||
-			lgr->terminating ||
-			(smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY),
-			SMC_WR_TX_WAIT_FREE_SLOT_TIME);
-		if (!rc) {
-			/* timeout - terminate link */
-			smcr_link_down_cond_sched(link);
-			return -EPIPE;
+		rc = 0;
+		for (;;) {
+			prepare_to_wait_exclusive(&link->wr_tx_wait, &wait,
+						  TASK_INTERRUPTIBLE);
+			if (!smc_link_sendable(link) || lgr->terminating ||
+			    smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY)
+				break;
+			timeout = schedule_timeout(timeout);
+			if (!timeout) {
+				/* timeout - terminate link */
+				smcr_link_down_cond_sched(link);
+				break;
+			}
+			if (signal_pending(current)) {
+				rc = -ERESTARTSYS;
+				break;
+			}
 		}
+		finish_wait(&link->wr_tx_wait, &wait);
+		if (rc)
+			return rc;
 		if (idx == link->wr_tx_cnt)
 			return -EPIPE;
 	}
-- 
2.45.0


^ permalink raw reply related

* [PATCH net v2] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
From: Hyunwoo Kim @ 2026-05-08  6:42 UTC (permalink / raw)
  To: dhowells, marc.dionne, davem, edumazet, kuba, pabeni, horms,
	qingfang.deng
  Cc: linux-afs, netdev, stable, imv4bel

The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true.  An skb
that is not cloned but still carries paged fragments (skb->data_len != 0)
falls through to the in-place decryption path, which binds the frag
pages directly into the AEAD/skcipher SGL via skb_to_sgvec().

Extend the gate so that any skb with non-linear data is also copied,
ensuring the security handler always operates on a fully linear skb.
The OOM/trace handling already in place is reused.

Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
Changes in v2:
- Use skb_is_nonlinear() instead of skb->data_len
- v1: https://lore.kernel.org/all/afKV2zGR6rrelPC7@v4bel/
---
 net/rxrpc/call_event.c | 2 +-
 net/rxrpc/conn_event.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fdd683261226..a6ad5ff6ec5f 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -334,7 +334,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
 
 			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
 			    sp->hdr.securityIndex != 0 &&
-			    skb_cloned(skb)) {
+			    (skb_cloned(skb) || skb_is_nonlinear(skb))) {
 				/* Unshare the packet so that it can be
 				 * modified by in-place decryption.
 				 */
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index a2130d25aaa9..632cbeff1f5d 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -245,7 +245,7 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
 {
 	int ret;
 
-	if (skb_cloned(skb)) {
+	if (skb_cloned(skb) || skb_is_nonlinear(skb)) {
 		/* Copy the packet if shared so that we can do in-place
 		 * decryption.
 		 */
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] xfrm: route MIGRATE notifications to caller's netns
From: Steffen Klassert @ 2026-05-08  6:46 UTC (permalink / raw)
  To: Maoyi Xie
  Cc: herbert, davem, kuba, pabeni, edumazet, horms, antony.antony,
	netdev, linux-kernel, stable
In-Reply-To: <20260504142736.1228425-1-maoyi.xie@ntu.edu.sg>

On Mon, May 04, 2026 at 10:27:36PM +0800, Maoyi Xie wrote:
> xfrm_send_migrate() in net/xfrm/xfrm_user.c and pfkey_send_migrate()
> in net/key/af_key.c both hardcode &init_net for the multicast that
> announces a successful XFRM_MSG_MIGRATE / SADB_X_MIGRATE.
> 
> XFRM_MSG_MIGRATE arrives on a per-netns NETLINK_XFRM socket, and the
> rest of the xfrm/af_key netlink path was made netns-aware in 2008.
> The other 14 multicast paths in xfrm_user.c route their event using
> xs_net(x), xp_net(xp) or sock_net(skb->sk); only the migrate path
> was missed.
> 
> Two consequences of the init_net hardcoding:
> 
>   1. The notification (selector, old/new endpoint addresses, and the
>      km_address) is delivered to listeners on init_net's
>      XFRMNLGRP_MIGRATE / pfkey BROADCAST_ALL groups rather than on
>      the issuing netns. An IKE daemon running in init_net therefore
>      receives migration notifications originating from any other
>      netns on the host.
> 
>   2. An IKE daemon running inside a non-init netns and subscribed
>      to its own XFRMNLGRP_MIGRATE / pfkey groups never receives the
>      notification of its own migration. IKEv2 MOBIKE / address-update
>      handling inside a netns is silently broken.
> 
> Thread struct net through km_migrate() and the xfrm_mgr.migrate
> function pointer, drop the &init_net override in xfrm_send_migrate()
> and pfkey_send_migrate(), and pass the caller's net (already in
> scope in xfrm_migrate() via sock_net(skb->sk)) all the way down.
> struct xfrm_mgr is in-tree only and not exported as a stable API,
> so the function-pointer signature change is internal.
> 
> pfkey_broadcast() is already netns-aware via net_generic(net,
> pfkey_net_id) since the pernet conversion. The five other
> pfkey_broadcast() callers in af_key.c already pass xs_net(x),
> sock_net(sk) or a per-netns net, so this only removes the
> &init_net outlier.
> 
> Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>

Applied, thanks a lot!

^ permalink raw reply

* Re: [PATCH v2 17/22] dt-bindings: clock: Add StarFive JHB100 Peripheral-2 clock and reset generator
From: Rob Herring (Arm) @ 2026-05-08  6:46 UTC (permalink / raw)
  To: Changhuang Liang
  Cc: Krzysztof Kozlowski, linux-hardening, Conor Dooley, Hal Feng,
	Stephen Boyd, linux-riscv, Albert Ou, netdev, Palmer Dabbelt,
	linux-clk, Kees Cook, Alexandre Ghiti, Philipp Zabel,
	Michael Turquette, Emil Renner Berthing, devicetree, linux-kernel,
	Sia Jee Heng, Gustavo A . R . Silva, Richard Cochran,
	Paul Walmsley, Ley Foon Tan
In-Reply-To: <20260508053632.818548-18-changhuang.liang@starfivetech.com>


On Thu, 07 May 2026 22:36:27 -0700, Changhuang Liang wrote:
> Add bindings for the Peripheral-2 clock and reset generator (PER2CRG)
> on the JHB100 RISC-V SoC by StarFive Ltd.
> 
> Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
> ---
>  .../clock/starfive,jhb100-per2crg.yaml        | 76 +++++++++++++++++++
>  .../dt-bindings/clock/starfive,jhb100-crg.h   | 57 ++++++++++++++
>  .../dt-bindings/reset/starfive,jhb100-crg.h   | 17 +++++
>  3 files changed, 150 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
> 

My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/net/renesas,ether.example.dtb: ethernet-phy@1 (ethernet-phy-id0022.1537): compatible: ['ethernet-phy-id0022.1537', 'ethernet-phy-ieee802.3-c22'] is too long
	from schema $id: http://devicetree.org/schemas/net/micrel.yaml

doc reference errors (make refcheckdocs):

See https://patchwork.kernel.org/project/devicetree/patch/20260508053632.818548-18-changhuang.liang@starfivetech.com

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next] ice: prevent integer overflow
From: Rinitha, SX @ 2026-05-08  6:54 UTC (permalink / raw)
  To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
	Nguyen, Anthony L, Loktionov, Aleksandr
  Cc: netdev@vger.kernel.org, Czapnik, Lukasz
In-Reply-To: <20260320050544.422640-1-aleksandr.loktionov@intel.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Aleksandr Loktionov
> Sent: 20 March 2026 10:36
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Czapnik, Lukasz <lukasz.czapnik@intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-next] ice: prevent integer overflow
>
> From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>
> In ice_sched_bw_to_rl_profile(), the loop over 64 bits computes the scheduler timestamp rate as:
>
>  ts_rate = div64_long((s64)hw->psm_clk_freq,
>                       pow_result * ICE_RL_PROF_TS_MULTIPLIER);
>
> where pow_result = BIT_ULL(i). For large values of i, the product pow_result * ICE_RL_PROF_TS_MULTIPLIER overflows u64 before being used as the divisor, producing incorrect ts_rate values and potentially undefined behaviour.
>
> Fix this by pre-computing ts_freq = hw->psm_clk_freq / ICE_RL_PROF_TS_MULTIPLIER once before the loop and then dividing only by pow_result inside the loop. The division order avoids the overflow while preserving the same mathematical result. Declare ts_freq as s64 to match the type domain of the surrounding arithmetic and avoid a redundant cast at the use site.
>
> While at it, scope the loop variable i to the for statement itself.
>
> Fixes: 1ddef455f4a8 ("ice: Add NDO callback to set the maximum per-queue bitrate")
> Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_sched.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>

Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

* RE: [EXTERNAL] Re: [PATCH net-next 6/9] net: atlantic: implement AQC113 L2/L3/L4 RX filter management filter management management
From: Sukhdeep Soni [C] @ 2026-05-08  6:56 UTC (permalink / raw)
  To: Vadim Fedorenko, netdev@vger.kernel.org
  Cc: Igor Russkikh, Egor Pomozov, richardcochran@gmail.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, linux-kernel@vger.kernel.org
In-Reply-To: <2a15199c-d48f-4034-a907-6efa883d5d99@linux.dev>

On  06 May 2026, Vadim Fedorenko wrote:
> On 06/05/2026 14:57, sukhdeeps@marvell.com wrote:
> From: Sukhdeep Singh <sukhdeeps@marvell.com>
> 
> Implement complete RX filter management for AQC113 hardware:
> 
> - Add tag-based filter policy with reference-counted sharing, allowing
>    multiple filter rules to share the same L3 or L4 hardware filter
>    when their match criteria are identical.
> - Implement L3 (IPv4/IPv6 source/destination address and protocol)
>    filter find, get (program HW and increment refcount), and put
>    (decrement refcount and clear HW when last user releases).
> - Implement L4 (TCP/UDP/SCTP source/destination port) filter
>    management with the same find/get/put pattern.
> - Add combined L3L4 filter configuration that translates legacy
>    aq_rx_filter_l3l4 commands into AQC113 separate L3+L4 filter
>    programming with Action Resolver Table (ART) entries.
> - Add L2 ethertype filter set/clear with tag-based ART integration.
> - Add MAC address setup using firmware-provided L2 filter base index.
> 
> Update hardware initialization:
> - Use firmware-reported ART section base and count instead of
>    hardcoded 0xFFFF section enable.
> - Enable L3 v6/v4 select mode for simultaneous IPv4/IPv6 filtering.
> - Initialize L3L4 filter indices to -1 on reset.
> 
> Wire up hw_filter_l2_set, hw_filter_l2_clear, hw_filter_l3l4_set,
> hw_set_mac_address, hw_get_version, and hw_get_regs in hw_atl2_ops.
> 
> Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
> ---
>   .../net/ethernet/aquantia/atlantic/aq_hw.h    |   2 +
>   .../aquantia/atlantic/hw_atl2/hw_atl2.c       | 582 +++++++++++++++++-
>   2 files changed, 580 insertions(+), 4 deletions(-)

[...]

>   
> @@ -380,6 +422,9 @@ static void hw_atl2_hw_init_new_rx_filters(struct aq_hw_s *self)
>   {
>   	u8 *prio_tc_map = self->aq_nic_cfg->prio_tc_map;
>   	struct hw_atl2_priv *priv = self->priv;
> +	u32 art_first_sec, art_last_sec;
> +	u32 art_sections;
> +	u32 art_mask = 0;

> no need to init variable which is overwritten later ...

Agreed, will remove art_mask initialization in v2

>   	u16 action;
>   	u8 index;
>   	int i;
> @@ -394,9 +439,14 @@ static void hw_atl2_hw_init_new_rx_filters(struct aq_hw_s *self)
>   	 * REC entry is used for further processing. If multiple entries match,
>   	 * the lowest REC entry, Action field will be selected.
>   	 */
> -	hw_atl2_rpf_act_rslvr_section_en_set(self, 0xFFFF);
> +	art_last_sec = priv->art_base_index / 8 + priv->art_count / 8;
> +	art_first_sec = priv->art_base_index / 8;
> +	art_mask = (BIT(art_last_sec) - 1) - (BIT(art_first_sec) - 1);

> ... here

> +	art_sections = hw_atl2_rpf_act_rslvr_section_en_get(self) | art_mask;
> +	hw_atl2_rpf_act_rslvr_section_en_set(self, art_sections);
> +	hw_atl2_rpf_l3_v6_v4_select_set(self, 1);
>   	hw_atl2_rpfl2_uc_flr_tag_set(self, HW_ATL2_RPF_TAG_BASE_UC,
> -				     HW_ATL2_MAC_UC);
> +				     priv->l2_filters_base_index);
>   	hw_atl2_rpfl2_bc_flr_tag_set(self, HW_ATL2_RPF_TAG_BASE_UC);
>   
>   	/* FW reserves the beginning of ART, thus all driver entries must
> @@ -530,6 +580,35 @@ static int hw_atl2_hw_init_rx_path(struct aq_hw_s *self)
>   	return aq_hw_err_from_flags(self);
>   }
>   
> +static int hw_atl2_hw_mac_addr_set(struct aq_hw_s *self, const u8 *mac_addr)
> +{
> +	struct hw_atl2_priv *priv = self->priv;
> +	u32 location = priv->l2_filters_base_index;
> +	unsigned int h = 0U;
> +	unsigned int l = 0U;
> +	int err = 0;

> here again, h, l and err are not used with init values.

will remove initialization for h, l and err in v2. Thank you for the review.

> +
> +	if (!mac_addr) {
> +		err = -EINVAL;
> +		goto err_exit;
> +	}
> +	h = (mac_addr[0] << 8) | (mac_addr[1]);
> +	l = (mac_addr[2] << 24) | (mac_addr[3] << 16) |
> +		(mac_addr[4] << 8) | mac_addr[5];
> +
> +	hw_atl_rpfl2_uc_flr_en_set(self, 0U, location);
> +	hw_atl_rpfl2unicast_dest_addresslsw_set(self, l, location);
> +	hw_atl_rpfl2unicast_dest_addressmsw_set(self, h, location);
> +	hw_atl_rpfl2unicast_flr_act_set(self, 1U, location);
> +	hw_atl2_rpfl2_uc_flr_tag_set(self, HW_ATL2_RPF_TAG_BASE_UC, location);
> +	hw_atl_rpfl2_uc_flr_en_set(self, 1U, location);
> +
> +	err = aq_hw_err_from_flags(self);
> +
> +err_exit:
> +	return err;
> +}
> +
>   static int hw_atl2_hw_init(struct aq_hw_s *self, const u8 *mac_addr)
>   {
>   	static u32 aq_hw_atl2_igcr_table_[4][2] = {

[...]

^ permalink raw reply

* [PATCH net-next v3 0/2] net: dsa: yt921x: Add port TBF support
From: David Yang @ 2026-05-08  6:57 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
	Jiri Pirko, Simon Horman, linux-kernel

v2: https://lore.kernel.org/r/20260504101258.1608004-1-mmyangfl@gmail.com
  - drop changes on tc_tbf_qopt_offload_replace_params
  - drop excessive checks for tbf setup
  - react to TC_TBF_STATS correctly
v1: https://lore.kernel.org/r/20260502215314.917687-1-mmyangfl@gmail.com
  - remove queue related register definiations
  - add missing extack param during tbf setup
v0: https://lore.kernel.org/r/20260409171209.2575583-1-mmyangfl@gmail.com
  - picked from old series
  - add extack to the offload struct
  - add all params to the offload struct

David Yang (2):
  net/sched: tbf: add extack to offload params
  net: dsa: yt921x: Add port TBF support

 drivers/net/dsa/yt921x.c | 84 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/dsa/yt921x.h | 18 +++++++++
 include/net/pkt_cls.h    |  1 +
 net/sched/sch_tbf.c      |  9 ++++-
 4 files changed, 110 insertions(+), 2 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH net-next v3 1/2] net/sched: tbf: add extack to offload params
From: David Yang @ 2026-05-08  6:57 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
	Jiri Pirko, Simon Horman, linux-kernel
In-Reply-To: <20260508065757.2566258-1-mmyangfl@gmail.com>

Drivers might have error messages to propagate to user space. Propagate
the netlink extack so that they can inform user space in a verbal way of
their limitations.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 include/net/pkt_cls.h | 1 +
 net/sched/sch_tbf.c   | 9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 99ac747b7906..3bd08d7f39c1 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -1046,6 +1046,7 @@ struct tc_tbf_qopt_offload_replace_params {
 };
 
 struct tc_tbf_qopt_offload {
+	struct netlink_ext_ack *extack;
 	enum tc_tbf_command command;
 	u32 handle;
 	u32 parent;
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index f2340164f579..4576111fe075 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -139,7 +139,8 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r,
 	return len;
 }
 
-static void tbf_offload_change(struct Qdisc *sch)
+static void tbf_offload_change(struct Qdisc *sch,
+			       struct netlink_ext_ack *extack)
 {
 	struct tbf_sched_data *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
@@ -148,6 +149,7 @@ static void tbf_offload_change(struct Qdisc *sch)
 	if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
 		return;
 
+	qopt.extack = extack;
 	qopt.command = TC_TBF_REPLACE;
 	qopt.handle = sch->handle;
 	qopt.parent = sch->parent;
@@ -166,6 +168,7 @@ static void tbf_offload_destroy(struct Qdisc *sch)
 	if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
 		return;
 
+	qopt.extack = NULL;
 	qopt.command = TC_TBF_DESTROY;
 	qopt.handle = sch->handle;
 	qopt.parent = sch->parent;
@@ -176,6 +179,7 @@ static int tbf_offload_dump(struct Qdisc *sch)
 {
 	struct tc_tbf_qopt_offload qopt;
 
+	qopt.extack = NULL;
 	qopt.command = TC_TBF_STATS;
 	qopt.handle = sch->handle;
 	qopt.parent = sch->parent;
@@ -193,6 +197,7 @@ static void tbf_offload_graft(struct Qdisc *sch, struct Qdisc *new,
 		.parent		= sch->parent,
 		.child_handle	= new->handle,
 		.command	= TC_TBF_GRAFT,
+		.extack		= extack,
 	};
 
 	qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, old,
@@ -477,7 +482,7 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt,
 	qdisc_put(old);
 	err = 0;
 
-	tbf_offload_change(sch);
+	tbf_offload_change(sch, extack);
 done:
 	return err;
 }
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next v3 2/2] net: dsa: yt921x: Add port TBF support
From: David Yang @ 2026-05-08  6:57 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
	Jiri Pirko, Simon Horman, linux-kernel
In-Reply-To: <20260508065757.2566258-1-mmyangfl@gmail.com>

React to TC_SETUP_QDISC_TBF and configure the egress shaper as
appropriate with the maximum rate and burst size requested by the user.
Per queue shaper is possible, though not touched in this commit.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 drivers/net/dsa/yt921x.c | 84 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/dsa/yt921x.h | 18 +++++++++
 2 files changed, 102 insertions(+)

diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/yt921x.c
index fd1fdcd5f9a3..e5f547629bfd 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/yt921x.c
@@ -24,6 +24,7 @@
 #include <net/dsa.h>
 #include <net/dscp.h>
 #include <net/ieee8021q.h>
+#include <net/pkt_cls.h>
 
 #include "yt921x.h"
 
@@ -1272,6 +1273,17 @@ yt921x_marker_tfm_police(struct yt921x_marker *marker,
 				 priv, port, extack);
 }
 
+static int
+yt921x_marker_tfm_shape(struct yt921x_marker *marker, u64 rate, u64 burst,
+			unsigned int flags, struct yt921x_priv *priv, int port,
+			struct netlink_ext_ack *extack)
+{
+	return yt921x_marker_tfm(marker, rate, burst, flags,
+				 priv->port_shape_slot_ns, YT921X_SHAPE_CIR_MAX,
+				 YT921X_SHAPE_CBS_MAX, YT921X_SHAPE_UNIT_MAX,
+				 priv, port, extack);
+}
+
 static int
 yt921x_police_validate(const struct flow_action_police *police,
 		       const struct flow_action *action,
@@ -1378,6 +1390,70 @@ yt921x_dsa_port_policer_add(struct dsa_switch *ds, int port,
 	return res;
 }
 
+static int
+yt921x_dsa_port_setup_tc_tbf_port(struct dsa_switch *ds, int port,
+				  const struct tc_tbf_qopt_offload *qopt)
+{
+	struct yt921x_priv *priv = to_yt921x_priv(ds);
+	struct netlink_ext_ack *extack = qopt->extack;
+	u32 ctrls[2];
+	int res;
+
+	if (qopt->parent != TC_H_ROOT)
+		return -EOPNOTSUPP;
+
+	switch (qopt->command) {
+	case TC_TBF_STATS:
+		return 0;
+	case TC_TBF_DESTROY:
+		ctrls[0] = 0;
+		ctrls[1] = 0;
+		break;
+	case TC_TBF_REPLACE: {
+		const struct tc_tbf_qopt_offload_replace_params *p;
+		struct yt921x_marker marker;
+
+		p = &qopt->replace_params;
+
+		res = yt921x_marker_tfm_shape(&marker, p->rate.rate_bytes_ps,
+					      p->max_size,
+					      YT921X_MARKER_SINGLE_BUCKET,
+					      priv, port, extack);
+		if (res)
+			return res;
+
+		ctrls[0] = YT921X_PORT_SHAPE_CTRLa_CIR(marker.cir) |
+			   YT921X_PORT_SHAPE_CTRLa_CBS(marker.cbs);
+		ctrls[1] = YT921X_PORT_SHAPE_CTRLb_UNIT(marker.unit) |
+			   YT921X_PORT_SHAPE_CTRLb_EN;
+		break;
+	}
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	mutex_lock(&priv->reg_lock);
+	res = yt921x_reg64_write(priv, YT921X_PORTn_SHAPE_CTRL(port), ctrls);
+	mutex_unlock(&priv->reg_lock);
+
+	return res;
+}
+
+static int
+yt921x_dsa_port_setup_tc(struct dsa_switch *ds, int port,
+			 enum tc_setup_type type, void *type_data)
+{
+	switch (type) {
+	case TC_SETUP_QDISC_TBF: {
+		const struct tc_tbf_qopt_offload *qopt = type_data;
+
+		return yt921x_dsa_port_setup_tc_tbf_port(ds, port, qopt);
+	}
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static int
 yt921x_mirror_del(struct yt921x_priv *priv, int port, bool ingress)
 {
@@ -3524,6 +3600,13 @@ static int yt921x_chip_setup_tc(struct yt921x_priv *priv)
 		return res;
 	priv->meter_slot_ns = ctrl * op_ns;
 
+	ctrl = max(priv->port_shape_slot_ns / op_ns,
+		   YT921X_PORT_SHAPE_SLOT_MIN);
+	res = yt921x_reg_write(priv, YT921X_PORT_SHAPE_SLOT, ctrl);
+	if (res)
+		return res;
+	priv->port_shape_slot_ns = ctrl * op_ns;
+
 	return 0;
 }
 
@@ -3680,6 +3763,7 @@ static const struct dsa_switch_ops yt921x_dsa_switch_ops = {
 	/* rate */
 	.port_policer_del	= yt921x_dsa_port_policer_del,
 	.port_policer_add	= yt921x_dsa_port_policer_add,
+	.port_setup_tc		= yt921x_dsa_port_setup_tc,
 	/* hsr */
 	.port_hsr_leave		= dsa_port_simple_hsr_leave,
 	.port_hsr_join		= dsa_port_simple_hsr_join,
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/yt921x.h
index 546b12a8994a..70fa780c337f 100644
--- a/drivers/net/dsa/yt921x.h
+++ b/drivers/net/dsa/yt921x.h
@@ -531,6 +531,19 @@ enum yt921x_app_selector {
 #define  YT921X_MIRROR_PORT_M			GENMASK(3, 0)
 #define   YT921X_MIRROR_PORT(x)				FIELD_PREP(YT921X_MIRROR_PORT_M, (x))
 
+#define YT921X_PORT_SHAPE_SLOT		0x34000c
+#define  YT921X_PORT_SHAPE_SLOT_SLOT_M		GENMASK(11, 0)
+#define YT921X_PORTn_SHAPE_CTRL(port)	(0x354000 + 8 * (port))
+#define  YT921X_PORT_SHAPE_CTRLb_EN		BIT(4)
+#define  YT921X_PORT_SHAPE_CTRLb_PKT_MODE	BIT(3)	/* 0: byte rate mode */
+#define  YT921X_PORT_SHAPE_CTRLb_UNIT_M		GENMASK(2, 0)
+#define   YT921X_PORT_SHAPE_CTRLb_UNIT(x)		FIELD_PREP(YT921X_PORT_SHAPE_CTRLb_UNIT_M, (x))
+#define  YT921X_PORT_SHAPE_CTRLa_CBS_M		GENMASK(31, 18)
+#define   YT921X_PORT_SHAPE_CTRLa_CBS(x)		FIELD_PREP(YT921X_PORT_SHAPE_CTRLa_CBS_M, (x))
+#define  YT921X_PORT_SHAPE_CTRLa_CIR_M		GENMASK(17, 0)
+#define   YT921X_PORT_SHAPE_CTRLa_CIR(x)		FIELD_PREP(YT921X_PORT_SHAPE_CTRLa_CIR_M, (x))
+#define YT921X_PORTn_SHAPE_STAT(port)	(0x356000 + 4 * (port))
+
 #define YT921X_EDATA_EXTMODE	0xfb
 #define YT921X_EDATA_LEN	0x100
 
@@ -556,6 +569,10 @@ enum yt921x_fdb_entry_status {
 #define YT921X_METER_UNIT_MAX	((1 << 3) - 1)
 #define YT921X_METER_CIR_MAX	((1 << 18) - 1)
 #define YT921X_METER_CBS_MAX	((1 << 16) - 1)
+#define YT921X_PORT_SHAPE_SLOT_MIN	80
+#define YT921X_SHAPE_UNIT_MAX	((1 << 3) - 1)
+#define YT921X_SHAPE_CIR_MAX	((1 << 18) - 1)
+#define YT921X_SHAPE_CBS_MAX	((1 << 14) - 1)
 
 #define YT921X_LAG_NUM		2
 #define YT921X_LAG_PORT_NUM	4
@@ -652,6 +669,7 @@ struct yt921x_priv {
 
 	const struct yt921x_info *info;
 	unsigned int meter_slot_ns;
+	unsigned int port_shape_slot_ns;
 	/* cache of dsa_cpu_ports(ds) */
 	u16 cpu_ports_mask;
 	unsigned char cycle_ns;
-- 
2.53.0


^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH net] ice: fix locking around wait_event_interruptible_locked_irq
From: Rinitha, SX @ 2026-05-08  7:01 UTC (permalink / raw)
  To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
	Nguyen, Anthony L, Loktionov, Aleksandr
  Cc: netdev@vger.kernel.org, Keller, Jacob E, Jakub Kicinski
In-Reply-To: <20260327072332.130320-2-aleksandr.loktionov@intel.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Aleksandr Loktionov
> Sent: 27 March 2026 12:53
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Keller, Jacob E <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>
> Subject: [Intel-wired-lan] [PATCH net] ice: fix locking around wait_event_interruptible_locked_irq
>
> From: Jacob Keller <jacob.e.keller@intel.com>
>
> Commit 50327223a8bb ("ice: add lock to protect low latency interface") introduced a wait queue used to protect the low latency timer interface.
> The queue is used with the wait_event_interruptible_locked_irq macro, which unlocks the wait queue lock while sleeping. The irq variant uses spin_lock_irq and spin_unlock_irq to manage this. The wait queue lock was previously locked using spin_lock_irqsave. This difference in lock variants could lead to issues, since wait_event would unlock the wait queue and restore interrupts while sleeping.
>
> The ice_read_phy_tstamp_ll_e810() function is ultimately called through ice_read_phy_tstamp, which is called from ice_ptp_process_tx_tstamp or ice_ptp_clear_unexpected_tx_ready. The former is called through the miscellaneous IRQ thread function, while the latter is called from the service task work queue thread. Neither of these functions has interrupts disabled, so use spin_lock_irq instead of spin_lock_irqsave.
>
> Fixes: 50327223a8bb ("ice: add lock to protect low latency interface")
> Cc: stable@vger.kernel.org
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/netdev/20250109181823.77f44c69@kernel.org/
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
>
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>

Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox