* [PATCH v2 17/22] dt-bindings: clock: Add StarFive JHB100 Peripheral-2 clock and reset generator
From: Changhuang Liang @ 2026-05-08 5:36 UTC (permalink / raw)
To: Michael Turquette, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Philipp Zabel, Emil Renner Berthing, Kees Cook,
Gustavo A . R . Silva, Richard Cochran
Cc: linux-clk, linux-kernel, devicetree, linux-riscv, linux-hardening,
netdev, Sia Jee Heng, Hal Feng, Ley Foon Tan, Changhuang Liang
In-Reply-To: <20260508053632.818548-1-changhuang.liang@starfivetech.com>
Add bindings for the Peripheral-2 clock and reset generator (PER2CRG)
on the JHB100 RISC-V SoC by StarFive Ltd.
Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
---
.../clock/starfive,jhb100-per2crg.yaml | 76 +++++++++++++++++++
.../dt-bindings/clock/starfive,jhb100-crg.h | 57 ++++++++++++++
.../dt-bindings/reset/starfive,jhb100-crg.h | 17 +++++
3 files changed, 150 insertions(+)
create mode 100644 Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
diff --git a/Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml b/Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
new file mode 100644
index 000000000000..3c266bc2eac2
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
@@ -0,0 +1,76 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/starfive,jhb100-per2crg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: StarFive JHB100 Peripheral-2 Clock and Reset Generator
+
+maintainers:
+ - Changhuang Liang <changhuang.liang@starfivetech.com>
+
+properties:
+ compatible:
+ const: starfive,jhb100-per2crg
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: Non Coherent NOC Initiator
+ - description: Configure 400MHz
+ - description: Configure 125MHz
+ - description: GMAC2 RGMII RX
+ - description: GMAC2 RMII Reference
+ - description: GMAC3 SGMII TX
+ - description: GMAC3 SGMII RX
+ - description: Main Oscillator (25 MHz)
+
+ clock-names:
+ items:
+ - const: ncnoc_init
+ - const: cfg_400
+ - const: cfg_125
+ - const: gmac2_rgmii_rx
+ - const: gmac2_rmii_ref
+ - const: gmac3_sgmii_tx
+ - const: gmac3_sgmii_rx
+ - const: osc
+
+ '#clock-cells':
+ const: 1
+ description:
+ See <dt-bindings/clock/starfive,jhb100-crg.h> for valid indices.
+
+ '#reset-cells':
+ const: 1
+ description:
+ See <dt-bindings/reset/starfive-jhb100-crg.h> for valid indices.
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - clock-names
+ - '#clock-cells'
+ - '#reset-cells'
+
+additionalProperties: false
+
+examples:
+ - |
+ clock-controller@11bc0000 {
+ compatible = "starfive,jhb100-per2crg";
+ reg = <0x11bc0000 0x1000>;
+ clocks = <&sys0crg 52>, <&sys0crg 54>, <&sys0crg 55>,
+ <&per2_gmac2_rgmii_rx>, <&per2_gmac2_rmii_ref>,
+ <&per2_gmac3_sgmii_tx>, <&per2_gmac3_sgmii_rx>,
+ <&osc>;
+ clock-names = "ncnoc_init", "cfg_400", "cfg_125",
+ "gmac2_rgmii_rx", "gmac2_rmii_ref",
+ "gmac3_sgmii_tx", "gmac3_sgmii_rx",
+ "osc";
+ #clock-cells = <1>;
+ #reset-cells = <1>;
+ };
diff --git a/include/dt-bindings/clock/starfive,jhb100-crg.h b/include/dt-bindings/clock/starfive,jhb100-crg.h
index 7f508574177c..2b2e148ce5ce 100644
--- a/include/dt-bindings/clock/starfive,jhb100-crg.h
+++ b/include/dt-bindings/clock/starfive,jhb100-crg.h
@@ -447,4 +447,61 @@
#define JHB100_PER1CLK_MAIN_ICG_EN_RAS 75
#define JHB100_PER1CLK_MAIN_ICG_EN_UFS 76
+/* PER2CRG clocks */
+#define JHB100_PER2CLK_300 0
+#define JHB100_PER2CLK_100 1
+#define JHB100_PER2CLK_50 2
+#define JHB100_PER2CLK_GMAC2_RMII_50 3
+#define JHB100_PER2CLK_CAN0_CORE_DIV 4
+#define JHB100_PER2CLK_CAN1_CORE_DIV 5
+#define JHB100_PER2CLK_CAN0_TIMER 6
+#define JHB100_PER2CLK_CAN1_TIMER 7
+
+#define JHB100_PER2CLK_RTC_CORE_DIV 11
+#define JHB100_PER2CLK_GMAC2_RMII_MUX_DLY 12
+#define JHB100_PER2CLK_GMAC2_RMII_DIV 13
+
+#define JHB100_PER2CLK_GMAC2_RGMII_125_MUX 15
+#define JHB100_PER2CLK_GMAC2_RGMII_DIV 16
+#define JHB100_PER2CLK_GMAC2_TX_MUX 17
+#define JHB100_PER2CLK_GMAC2_TX_180_BUF 18
+#define JHB100_PER2CLK_GMAC2_RX_MUX_DLY 19
+#define JHB100_PER2CLK_GMAC2_RX_180_BUF 20
+#define JHB100_PER2CLK_GMAC2_TXCK_MUX_DLY 21
+#define JHB100_PER2CLK_GMAC3_TX_125_MUX 22
+#define JHB100_PER2CLK_GMAC3_RX_125_MUX 23
+#define JHB100_PER2CLK_GMAC3_TX_DIV 24
+#define JHB100_PER2CLK_GMAC3_RX_DIV 25
+#define JHB100_PER2CLK_SENSORS_PERIPH2 26
+
+#define JHB100_PER2CLK_FAN_TACH_PCLK 33
+
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_TX_I 44
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_RX_I 45
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_TX_180_I 46
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_RX_180_I 47
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_PTP_REF_I 48
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_RMII_I 49
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_CSR_I 50
+#define JHB100_PER2CLK_ETHER0_RMIIANDRGMII_ACLK_I 51
+#define JHB100_PER2CLK_RMIIANDRGMII_IOMUX_GMAC2_TXCK 52
+#define JHB100_PER2CLK_ETHER1_SGMII_TX_I 53
+#define JHB100_PER2CLK_ETHER1_SGMII_RX_I 54
+#define JHB100_PER2CLK_ETHER1_SGMII_TX_125_I 55
+#define JHB100_PER2CLK_ETHER1_SGMII_RX_125_I 56
+#define JHB100_PER2CLK_ETHER1_SGMII_PTP_REF_I 57
+#define JHB100_PER2CLK_ETHER1_SGMII_CSR_I 58
+#define JHB100_PER2CLK_ETHER1_SGMII_ACLK_I 59
+#define JHB100_PER2CLK_ETHER1_SGMII_PHY_PCLK_I 60
+#define JHB100_PER2CLK_ETHER1_SGMII_REF_25_I 61
+#define JHB100_PER2CLK_MAIN_ICG_EN_CAN0 62
+#define JHB100_PER2CLK_MAIN_ICG_EN_CAN1 63
+
+#define JHB100_PER2CLK_MAIN_ICG_EN_DMAC_8CH 65
+#define JHB100_PER2CLK_MAIN_ICG_EN_RTC_SCAN 66
+#define JHB100_PER2CLK_MAIN_ICG_EN_ADC0 67
+#define JHB100_PER2CLK_MAIN_ICG_EN_ADC1 68
+#define JHB100_PER2CLK_MAIN_ICG_EN_GMAC2 69
+#define JHB100_PER2CLK_MAIN_ICG_EN_GMAC3 70
+
#endif /* __DT_BINDINGS_CLOCK_STARFIVE_JHB100_H__ */
diff --git a/include/dt-bindings/reset/starfive,jhb100-crg.h b/include/dt-bindings/reset/starfive,jhb100-crg.h
index cf933a1befbb..0965f3798397 100644
--- a/include/dt-bindings/reset/starfive,jhb100-crg.h
+++ b/include/dt-bindings/reset/starfive,jhb100-crg.h
@@ -157,4 +157,21 @@
#define JHB100_PER1RST_MAIN_RSTN_DMAC_SPI0 15
#define JHB100_PER1RST_MAIN_RSTN_PERIPH1_RAS 16
+/* PER2CRG resets */
+#define JHB100_PER2RST_IOMUX_PRESETN 0
+#define JHB100_PER2RST_POK_IOMUX_PRESETN 1
+#define JHB100_PER2RST_SYSREG_RSTN 2
+#define JHB100_PER2RST_MAIN_RSTN_CAN0 3
+#define JHB100_PER2RST_MAIN_RSTN_CAN1 4
+#define JHB100_PER2RST_FAN_TACH_PRESETN 5
+#define JHB100_PER2RST_MAIN_RSTN_GMAC2 6
+#define JHB100_PER2RST_MAIN_RSTN_GMAC3 7
+#define JHB100_PER2RST_MAIN_RSTN_DMAC_8CH 8
+#define JHB100_PER2RST_MAIN_RSTN_RTC 9
+#define JHB100_PER2RST_ADC0_PRESETN 10
+#define JHB100_PER2RST_ADC0_IOMUX_PRESETN 11
+#define JHB100_PER2RST_ADC1_PRESETN 12
+#define JHB100_PER2RST_ADC1_IOMUX_PRESETN 13
+#define JHB100_PER2RST_MAIN_RSTN_PERIPH2_SENSORS 14
+
#endif /* __DT_BINDINGS_RESET_STARFIVE_JHB100_CRG_H__ */
--
2.25.1
^ permalink raw reply related
* Re: [v3 PATCH] xfrm: ipcomp: Free destination pages on acomp errors
From: Yilin Zhu @ 2026-05-08 5:43 UTC (permalink / raw)
To: Herbert Xu
Cc: n05ec, netdev, steffen.klassert, davem, edumazet, kuba, pabeni,
horms, yuantan098, yifanwucs, tomapufckgml, bird, ronbogo
In-Reply-To: <aftA0Iwi2aX_4ORo@gondor.apana.org.au>
On Wed, 6 May 2026 at 06:24, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Orion Zhu <zylzyl2333@gmail.com> wrote:
> >
> > Thanks, but I think moving the label is not safe for all paths reaching
> > ipcomp_post_acomp().
>
> Thanks for checking!
>
> We could fix this by checking whether req is NULL and whether
> sg_page(dsg) is NULL.
>
> ---8<---
> Move the out_free_req label up by a couple of lines so that the
> allocated dst SG list gets freed on error as well as success.
>
> Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
> Cc: stable@kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> index 5f38dff16177..671d48f8c937 100644
> --- a/net/xfrm/xfrm_ipcomp.c
> +++ b/net/xfrm/xfrm_ipcomp.c
> @@ -51,11 +51,15 @@ static int ipcomp_post_acomp(struct sk_buff *skb, int err, int hlen)
> struct scatterlist *dsg;
> int len, dlen;
>
> - if (unlikely(err))
> - goto out_free_req;
> + if (unlikely(!req))
> + return err;
>
> extra = acomp_request_extra(req);
> dsg = extra->sg;
> +
> + if (unlikely(err))
> + goto out_free_req;
> +
> dlen = req->dlen;
>
> pskb_trim_unique(skb, 0);
> @@ -84,10 +88,10 @@ static int ipcomp_post_acomp(struct sk_buff *skb, int err, int hlen)
> skb_shinfo(skb)->nr_frags++;
> } while ((dlen -= len));
>
> - for (; dsg; dsg = sg_next(dsg))
> +out_free_req:
> + for (; dsg && sg_page(dsg); dsg = sg_next(dsg))
> __free_page(sg_page(dsg));
>
> -out_free_req:
> acomp_request_free(req);
> return err;
> }
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Hi Herbert,
Thanks for the update.
I think this version looks reasonable to me. Checking req before
dereferencing it and stopping the SG walk at a NULL sg_page() should
cover the setup failure paths I was concerned about, while still freeing
the allocated destination pages on the acomp error path.
Could you please also add my Reported-by tag?
Reported-by: Yilin Zhu <zylzyl2333@gmail.com>
Thanks,
Yilin
^ permalink raw reply
* Re: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
From: 李志 @ 2026-05-08 5:43 UTC (permalink / raw)
To: Conor Dooley
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
alexandre.torgue, rmk+kernel, maxime.chevallier, linux-stm32,
linux-arm-kernel, ningyu, linmin, pinkesh.vaghela, pritesh.patel,
weishangjuan
In-Reply-To: <20260507-mural-moocher-ad6e07ef8ae0@spud>
> -----原始邮件-----
> 发件人: "Conor Dooley" <conor@kernel.org>
> 发送时间:2026-05-08 01:24:02 (星期五)
> 收件人: lizhi2@eswincomputing.com
> 抄送: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com, rmk+kernel@armlinux.org.uk, maxime.chevallier@bootlin.com, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, ningyu@eswincomputing.com, linmin@eswincomputing.com, pinkesh.vaghela@einfochips.com, pritesh.patel@einfochips.com, weishangjuan@eswincomputing.com
> 主题: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
>
> On Thu, May 07, 2026 at 04:31:36PM +0800, lizhi2@eswincomputing.com wrote:
> > From: Zhi Li <lizhi2@eswincomputing.com>
> >
> > Refine the EIC7700 Ethernet dt-binding based on observed hardware behavior
> > and clarify the original delay model for eth0.
> >
> > The previous binding used an enum-based definition for
> > rx-internal-delay-ps and tx-internal-delay-ps. Replace it with a
> > range-based model using:
> >
> > - minimum: 0
> > - maximum: 2540
> > - multipleOf: 20
> >
> > This better reflects the actual hardware implementation, which
> > supports 20ps granularity delay steps in the MAC RGMII interface.
> >
> > The tx/rx internal delay values are clarified as MAC-side programmable
> > delay components applied on the RGMII clock/data path, representing
> > the effective delay seen at the MAC interface.
> >
> > This does not change the intended hardware semantics, but aligns the
> > binding with the actual hardware implementation.
> >
> > These properties are optional and only required when MAC-side fine
> > tuning is needed; otherwise delay alignment is provided by PHY or
> > board design.
> >
> > Depending on the selected RGMII timing mode, delay alignment may be
> > provided by the PHY (e.g. rgmii-id) or by board/MAC-side configuration.
> > When PHY or board design already provides the required delay, these
> > MAC-side properties may be omitted. When MAC-side fine tuning is
> > required, they should be provided to describe the internal RGMII
> > timing adjustment.
> >
> > Additionally, extend the description of the HSP subsystem register
> > layout used by the MAC glue logic. This includes explicit TXD and RXD
> > delay control registers to ensure deterministic initialization and
> > to override any residual configuration potentially left by bootloaders.
> >
> > Add reference to the EIC7700X SoC Technical Reference Manual,
> > Chapter 10 ("High-Speed Interface"), Part 4 for background of the
> > HSP CSR block:
> > https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual/releases
> >
> > There are no in-tree users of this binding, so no ABI impact is
> > expected.
> >
> > Fixes: 888bd0eca93c ("dt-bindings: ethernet: eswin: Document for EIC7700 SoC")
> > Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
> > ---
>
> While this is v1, it's really v8 and there should therefore be a
> changelog that explains where my ack and the new compatible went.
>
Thanks for the review.
Based on Jakub's feedback on the previous v7 series, I plan to split the
changes into two separate series:
- a smaller fix series intended for net,
- and a separate eth1 feature series intended for net-next.
After the split, the scope and target trees of the two series will differ
from the original combined series, so I plan to restart the revision
numbering from v1 for both series.
The additional compatible string and the eth1-specific DT binding
extensions will be moved into the separate feature series, and I will
reflect this in the v2 cover letter.
The DT binding changes in this fix series v1 are simply extracted from the
previous v7 series as part of the split.
Since the series has been restructured, I will drop the previous
Acked-by tags.
I will also document the reason for doing so and the impact of the split
in the v2 cover letter.
If you think the binding changes are still effectively unchanged and the
previous Acked-by can still apply, I am happy to retain them or re-apply
them as appropriate. Otherwise I will assume a fresh review is preferred.
Please let me know your preference.
Thanks,
Zhi
^ permalink raw reply
* Re: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
From: 李志 @ 2026-05-08 5:47 UTC (permalink / raw)
To: Andrew Lunn
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
alexandre.torgue, rmk+kernel, maxime.chevallier, linux-stm32,
linux-arm-kernel, ningyu, linmin, pinkesh.vaghela, pritesh.patel,
weishangjuan
In-Reply-To: <2436c6e9-4aad-4ffd-9fef-0cbbe38dc66d@lunn.ch>
> -----原始邮件-----
> 发件人: "Andrew Lunn" <andrew@lunn.ch>
> 发送时间:2026-05-07 20:29:10 (星期四)
> 收件人: lizhi2@eswincomputing.com
> 抄送: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com, rmk+kernel@armlinux.org.uk, maxime.chevallier@bootlin.com, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, ningyu@eswincomputing.com, linmin@eswincomputing.com, pinkesh.vaghela@einfochips.com, pritesh.patel@einfochips.com, weishangjuan@eswincomputing.com
> 主题: Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
>
> > ethernet@50400000 {
> > compatible = "eswin,eic7700-qos-eth", "snps,dwmac-5.20";
> > reg = <0x50400000 0x10000>;
> > - clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
> > - <&d0_clock 193>;
> > - clock-names = "axi", "cfg", "stmmaceth", "tx";
> > interrupt-parent = <&plic>;
> > interrupts = <61>;
> > interrupt-names = "macirq";
> > - phy-mode = "rgmii-id";
> > - phy-handle = <&phy0>;
> > + clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
> > + <&d0_clock 193>;
> > + clock-names = "axi", "cfg", "stmmaceth", "tx";
>
> Please don't move the clocks around, since they have nothing to do
> with RGMII delays.
>
>
> > resets = <&reset 95>;
> > reset-names = "stmmaceth";
> > - rx-internal-delay-ps = <200>;
> > - tx-internal-delay-ps = <200>;
> > - eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118>;
> > - snps,axi-config = <&stmmac_axi_setup>;
> > + eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118 0x114 0x11c>;
> > + phy-handle = <&phy0>;
> > + phy-mode = "rgmii-id";
> > snps,aal;
> > snps,fixed-burst;
> > snps,tso;
> > - stmmac_axi_setup: stmmac-axi-config {
> > + snps,axi-config = <&stmmac_axi_setup_gmac0>;
> > +
> > + stmmac_axi_setup_gmac0: stmmac-axi-config {
>
> And what do these changes have to do with RGMII delays?
>
You're right, those unrelated example changes should not be mixed into the
fix-related binding update.
I will limit the binding changes to only what is required for the fixes,
such as the additional HSP CSR offsets needed for explicit TXD/RXD delay
register initialization, and drop the unrelated DTS example reordering or
cleanup changes from this series.
^ permalink raw reply
* Re: [PATCH net v2] net: mana: Optimize irq affinity for low vcpu configs
From: Shradha Gupta @ 2026-05-08 5:51 UTC (permalink / raw)
To: Yury Norov
Cc: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Konstantin Taranov, Simon Horman, Erni Sri Satya Vennela,
Dipayaan Roy, Shiraz Saleem, Michael Kelley, Long Li, Yury Norov,
linux-hyperv, linux-kernel, netdev, Paul Rosswurm, Shradha Gupta,
Saurabh Singh Sengar, stable
In-Reply-To: <afoQHm28qj8JnKww@yury>
On Tue, May 05, 2026 at 11:43:26AM -0400, Yury Norov wrote:
> On Mon, May 04, 2026 at 11:15:03PM -0700, Shradha Gupta wrote:
> > On Sat, May 02, 2026 at 01:15:36PM -0400, Yury Norov wrote:
> > > On Sat, May 02, 2026 at 07:37:43AM -0700, Shradha Gupta wrote:
> > > > On Fri, May 01, 2026 at 12:22:20PM -0400, Yury Norov wrote:
> > > > > On Wed, Apr 29, 2026 at 02:06:37AM -0700, Shradha Gupta wrote:
> > > > > > In mana driver, the number of IRQs allocated is capped by the
> > > > > > min(num_cpu + 1, queue count). In cases, where the IRQ count is greater
> > > > > > than the vcpu count, we want to utilize all the vCPUs, irrespective of
> > > > > > their NUMA/core bindings.
> > > > > >
> > > > > > This is important, especially in the envs where number of vCPUs are so
> > > > > > few that the softIRQ handling overhead on two IRQs on the same vCPU is
> > > > > > much more than their overheads if they were spread across sibling vCPUs.
> > > > > >
> > > > > > This behaviour is more evident with dynamic IRQ allocation. Since MANA
> > > > > > IRQs are assigned at a later stage compared to static allocation, other
> > > > > > device IRQs may already be affinitized to the vCPUs. As a result, IRQ
> > > > > > weights become imbalanced, causing multiple MANA IRQs to land on the
> > > > > > same vCPU, while some vCPUs have none.
> > > > > >
> > > > > > In such cases when many parallel TCP connections are tested, the
> > > > > > throughput drops significantly.
> > > > > >
> > > > > > Test envs:
> > > > > > =======================================================
> > > > > > Case 1: without this patch
> > > > > > =======================================================
> > > > > > 4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
> > > > > >
> > > > > > TYPE effective vCPU aff
> > > > > > =======================================================
> > > > > > IRQ0: HWC 0
> > > > > > IRQ1: mana_q1 0
> > > > > > IRQ2: mana_q2 2
> > > > > > IRQ3: mana_q3 0
> > > > > > IRQ4: mana_q4 3
> > > > > >
> > > > > > %soft on each vCPU(mpstat -P ALL 1) on receiver
> > > > > > vCPU 0 1 2 3
> > > > > > =======================================================
> > > > > > pass 1: 38.85 0.03 24.89 24.65
> > > > > > pass 2: 39.15 0.03 24.57 25.28
> > > > > > pass 3: 40.36 0.03 23.20 23.17
> > > > > >
> > > > > > =======================================================
> > > > > > Case 2: with this patch
> > > > > > =======================================================
> > > > > > 4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
> > > > > >
> > > > > > TYPE effective vCPU aff
> > > > > > =======================================================
> > > > > > IRQ0: HWC 0
> > > > > > IRQ1: mana_q1 0
> > > > > > IRQ2: mana_q2 1
> > > > > > IRQ3: mana_q3 2
> > > > > > IRQ4: mana_q4 3
> > > > > >
> > > > > > %soft on each vCPU(mpstat -P ALL 1) on receiver
> > > > > > vCPU 0 1 2 3
> > > > > > =======================================================
> > > > > > pass 1: 15.42 15.85 14.99 14.51
> > > > > > pass 2: 15.53 15.94 15.81 15.93
> > > > > > pass 3: 16.41 16.35 16.40 16.36
> > > > > >
> > > > > > =======================================================
> > > > > > Throughput Impact(in Gbps, same env)
> > > > > > =======================================================
> > > > > > TCP conn with patch w/o patch
> > > > > > 20480 15.65 7.73
> > > > > > 10240 15.63 8.93
> > > > > > 8192 15.64 9.69
> > > > > > 6144 15.64 13.16
> > > > > > 4096 15.69 15.75
> > > > > > 2048 15.69 15.83
> > > > > > 1024 15.71 15.28
> > > > > >
> > > > > > Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Co-developed-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> > > > > > Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> > > > > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> > > > > > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > > > > > ---
> > > > > > Changes in v2
> > > > > > * Removed the unused skip_first_cpu variable
> > > > > > * fixed exit condition in irq_setup_linear() with len == 0
> > > > > > * changed return type of irq_setup_linear() as it will always be 0
> > > > > > * removed the unnecessary rcu_read_lock() in irq_setup_linear()
> > > > > > * added appropriate comments to indicate expected behaviour when
> > > > > > IRQs are more than or equal to num_online_cpus()
> > > > > > ---
> > > > > > .../net/ethernet/microsoft/mana/gdma_main.c | 47 ++++++++++++++++---
> > > > > > 1 file changed, 40 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > > index 098fbda0d128..d740d1dc43da 100644
> > > > > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > > @@ -167,6 +167,8 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> > > > > > } else {
> > > > > > /* If dynamic allocation is enabled we have already allocated
> > > > > > * hwc msi
> > > > > > + * Also, we make sure in this case the following is always true
> > > > > > + * (num_msix_usable - 1 HWC) <= num_online_cpus()
> > > > > > */
> > > > > > gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
> > > > > > }
> > > > > > @@ -1672,11 +1674,24 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node,
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > +/* should be called with cpus_read_lock() held */
> > > > > > +static void irq_setup_linear(unsigned int *irqs, unsigned int len)
> > > > > > +{
> > > > > > + int cpu;
> > > > > > +
> > > > > > + for_each_online_cpu(cpu) {
> > > > > > + if (len == 0)
> > > > > > + break;
> > > > > > +
> > > > > > + irq_set_affinity_and_hint(*irqs++, cpumask_of(cpu));
> > > > > > + len--;
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> > > > > > {
> > > > > > struct gdma_context *gc = pci_get_drvdata(pdev);
> > > > > > struct gdma_irq_context *gic;
> > > > > > - bool skip_first_cpu = false;
> > > > > > int *irqs, irq, err, i;
> > > > > >
> > > > > > irqs = kmalloc_objs(int, nvec);
> > > > >
> > > > > So what about WARN_ON() and nvec adjustment before kmalloc?
> > > > Hey Yury,
> > > >
> > > > I am still a bit unsure about the WARN_ON() before kmalloc, as after
> > > > that also, in the same function till we take the cpus_read_lock() the
> > > > num_online_cpus() can change(or reduce). That's why I introduced the
> > > > dev_dbg() to capture hot-remove edge case.
> > >
> > > OK.
> > >
> > > > Do you still think it adds more value?
> > >
> > > It's your driver, so you know better. I just wonder because you said
> > > it's good to add WARN_ON(), and then didn't do that.
> > >
> > > > >
> > > > > > @@ -1722,13 +1737,31 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> > > > > > * first CPU sibling group since they are already affinitized to HWC IRQ
> > > > > > */
> > > > > > cpus_read_lock();
> > > > > > - if (gc->num_msix_usable <= num_online_cpus())
> > > > > > - skip_first_cpu = true;
> > > > > > + if (gc->num_msix_usable <= num_online_cpus()) {
> > > > > > + err = irq_setup(irqs, nvec, gc->numa_node, true);
> > > > > > + if (err) {
> > > > > > + cpus_read_unlock();
> > > > > > + goto free_irq;
> > > > >
> > > > > One thing puzzles me: if you skip first CPU with this 'true', and the
> > > > > gc->num_msix_usable == num_online_cpus(), it's one more than you can
> > > > > distribute. What do I miss?
> > > > >
> > > >
> > > > Let me explain this case a bit better then,
> > > >
> > > > - num_msix_usable = HWC IRQ + Queue IRQ
> > > > - nvec in this functions is only Queue IRQ (HWC already setup)
> > > >
> > > > When num_online_cpus == num_msix_usable:
> > > > - nvec = num_online_cpus - 1
> > > > - first CPU is already assigned to HWC IRQ, so skip it
> > > > - Queue IRQs fit in the remaining CPUs
> > > >
> > > > please let me know if I did not get your question right
> > >
> > > Can you put that in a comment?
> >
> > Sure I will. thanks
> >
> > >
> > > > > > + }
> > > > > > + } else {
> > > > > > + /*
> > > > > > + * When num_msix_usable are more than num_online_cpus, we try to
> > > > > > + * make sure we are using all vcpus. In such a case NUMA or
> > > > > > + * CPU core affinity does not matter.
> > > > >
> > > > > If it doesn't matter, why don't you assign each IRQ to all CPUs then?
> > > > > In theory, the system would have most of flexibility to balance them.
> > > > >
> > > >
> > > > Okay, let me fix the comment and elaborate on this. It doesn't matter
> > > > because in such a case we want to anyway exhaust and distribute the
> > > > Queue IRQs to all vCPUs.
> > > > We don't want to rely on the system's balancer in this case as it could
> > > > be skewed by other devices' IRQ weights
> > >
> > > I don't understand this. If I want to reserve some CPUs to solely
> > > handle IRQs from my high-priority hardware, then I configure my system
> > > accordingly. For example, assign all non-networking IRQs on CPU0, and
> > > all networking IRQs to all CPUs.
> > >
> > > In your case, you distribute IRQs evenly, which means you've no
> > > preferred CPUs. So, assuming the system is only running your IRQ
> > > driver, it's at max is as good as all-CPU distribution. In case of
> > > heavy loading some particular CPU, your scheme could cause
> > > corresponding IRQs to starve.
> > >
> > > I recall, when we was working on irq_setup(), the original idea was to
> > > distribute IRQs one-to-one, but than I suggested the
> > >
> > > irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> > >
> > > and after experiments, you agreed on that.
> > >
> > > Can you please run your throughput test for my suggested distribution
> > > too? Would be also nice to see how each distribution works when some
> > > CPUs are under stress.
> > >
> > > Thanks,
> > > Yury
> >
> > The design of irq_setup() works exactly how we want it for our IRQs for
> > almost all of our usecases, so we want to keep that as is. The only
> > scenarios where this is an issue in terms of significant throughput drop
> > is when we are working with low vCPU VMs (vCPU <= 4 with high TCP
> > connection counts) and where there are additional NVMe devices attached
> > to the VM.
> >
> > The current patch about utilizing all the vCPUs helps in that case and
> > doesn't cause any regression for other cases.
> >
> > This linear path is only taken when num_msix_usable > num_online_cpus(),
> > which is limited to low-vCPU VMs. Larger VMs continue using irq_setup()
> > as before.
> >
> > We can definately get our throughput run results on other suggestions
> > you have. And about that, I just needed a bit more clarity on what to
> > test against. Are you suggesting, with irq_setup() intact and in use, we
> > configure the non-mana IRQs to say CPU0 and capture the numbers?
>
> Can you try this:
>
> while(len--)
> // Or cpu_online_mask or cpu_all_mask?
> irq_set_affinity_and_hint(*irqs++, NULL);
>
> And compare it to the linear version under your vCPU scenario?
>
> Can you run your throughput test alone and on parallel with some
> IRQ torture test?
>
> stress-ng --timer 4 --timeout 60s
>
> And maybe pin the stress test to the default CPU. Assuming it's 0:
>
> taskset -c 0 stress-ng --timer 4 --timeout 60s
>
> Unless the 'linear' version is significantly faster, I'd stick to the
> above.
>
> Thanks,
> Yury
Hey Yury,
We tried a few tests with your suggestion, and throughput seems to be
the same compared to the linear distribution approach. We stressed out
CPU0 in both the cases and the results were similar. No IRQ migration
was observed in either case and no throughput drop.
But one observation I had was that " irq_set_affinity_and_hint(*irqs++,
NULL);" is essentially a no-op and we end up relying on the initial
placement from pci_alloc_irq_vectors(). Even though in these tests we
were not able to reproduce it, but with this distribution there is a
chance we end up clustering the mana queue IRQs, while other vCPUs are
not running any network load. It's because the placement depends on
system-wide IRQ state at allocation time.
The linear approach however gaurantees each queue IRQ lands on a
distinct vCPU regardless of system state. Even after stressing the cpus
using stress-ng, we did not observe any significant throughput drop.
regards,
Shradha.
^ permalink raw reply
* [PATCH v2 11/22] dt-bindings: clock: Add StarFive JHB100 Peripheral-0 clock and reset generator
From: Changhuang Liang @ 2026-05-08 5:36 UTC (permalink / raw)
To: Michael Turquette, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Philipp Zabel, Emil Renner Berthing, Kees Cook,
Gustavo A . R . Silva, Richard Cochran
Cc: linux-clk, linux-kernel, devicetree, linux-riscv, linux-hardening,
netdev, Sia Jee Heng, Hal Feng, Ley Foon Tan, Changhuang Liang
In-Reply-To: <20260508053632.818548-1-changhuang.liang@starfivetech.com>
Add bindings for the Peripheral-0 clock and reset generator (PER0CRG)
on the JHB100 RISC-V SoC by StarFive Ltd.
Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
---
.../clock/starfive,jhb100-per0crg.yaml | 70 +++++
.../dt-bindings/clock/starfive,jhb100-crg.h | 281 ++++++++++++++++++
.../dt-bindings/reset/starfive,jhb100-crg.h | 77 +++++
3 files changed, 428 insertions(+)
create mode 100644 Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml
diff --git a/Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml b/Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml
new file mode 100644
index 000000000000..d3d426a741cc
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/starfive,jhb100-per0crg.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/starfive,jhb100-per0crg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: StarFive JHB100 Peripheral-0 Clock and Reset Generator
+
+maintainers:
+ - Changhuang Liang <changhuang.liang@starfivetech.com>
+
+properties:
+ compatible:
+ const: starfive,jhb100-per0crg
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: Main Oscillator (25 MHz)
+ - description: PLL6
+ - description: Configure 400MHz
+ - description: Configure 800MHZ
+ - description: Non Coherent NOC Initiator
+ - description: Non Coherent NOC Target
+
+ clock-names:
+ items:
+ - const: osc
+ - const: pll6
+ - const: cfg_400
+ - const: cfg_800
+ - const: ncnoc_init
+ - const: ncnoc_targ
+
+ '#clock-cells':
+ const: 1
+ description:
+ See <dt-bindings/clock/starfive,jhb100-crg.h> for valid indices.
+
+ '#reset-cells':
+ const: 1
+ description:
+ See <dt-bindings/reset/starfive-jhb100-crg.h> for valid indices.
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - clock-names
+ - '#clock-cells'
+ - '#reset-cells'
+
+additionalProperties: false
+
+examples:
+ - |
+ clock-controller@11a08000 {
+ compatible = "starfive,jhb100-per0crg";
+ reg = <0x11a08000 0x1000>;
+ clocks = <&osc>, <&pll6>, <&sys0crg 71>,
+ <&sys0crg 72>, <&sys0crg 70>,
+ <&sys2crg 23>;
+ clock-names = "osc", "pll6", "cfg_400",
+ "cfg_800", "ncnoc_init",
+ "ncnoc_targ";
+ #clock-cells = <1>;
+ #reset-cells = <1>;
+ };
diff --git a/include/dt-bindings/clock/starfive,jhb100-crg.h b/include/dt-bindings/clock/starfive,jhb100-crg.h
index d19618e2a846..add2cd093dbd 100644
--- a/include/dt-bindings/clock/starfive,jhb100-crg.h
+++ b/include/dt-bindings/clock/starfive,jhb100-crg.h
@@ -106,4 +106,285 @@
#define JHB100_SYS2CLK_MAIN_ICG_EN_JTAG0 32
#define JHB100_SYS2CLK_MAIN_ICG_EN_JTAG1 33
+/* PER0CRG clocks */
+#define JHB100_PER0CLK_CDR_I3C0 0
+#define JHB100_PER0CLK_CDR_I3C1 1
+#define JHB100_PER0CLK_CDR_I3C2 2
+#define JHB100_PER0CLK_CDR_I3C3 3
+#define JHB100_PER0CLK_CDR_I3C4 4
+#define JHB100_PER0CLK_CDR_I3C5 5
+#define JHB100_PER0CLK_CDR_I3C6 6
+#define JHB100_PER0CLK_CDR_I3C7 7
+#define JHB100_PER0CLK_CDR_I3C8 8
+#define JHB100_PER0CLK_CDR_I3C9 9
+#define JHB100_PER0CLK_CDR_I3C10 10
+#define JHB100_PER0CLK_CDR_I3C11 11
+#define JHB100_PER0CLK_CDR_I3C12 12
+#define JHB100_PER0CLK_CDR_I3C13 13
+#define JHB100_PER0CLK_CDR_I3C14 14
+#define JHB100_PER0CLK_CDR_I3C15 15
+#define JHB100_PER0CLK_200 16
+#define JHB100_PER0CLK_600_DIV6 17
+#define JHB100_PER0CLK_600_DIV6_DIV5 18
+#define JHB100_PER0CLK_TIMER0_DUALTIMER0 19
+#define JHB100_PER0CLK_TIMER1_DUALTIMER0 20
+#define JHB100_PER0CLK_TIMER0_DUALTIMER1 21
+#define JHB100_PER0CLK_TIMER1_DUALTIMER1 22
+#define JHB100_PER0CLK_TIMER0_DUALTIMER2 23
+#define JHB100_PER0CLK_TIMER1_DUALTIMER2 24
+#define JHB100_PER0CLK_1200_PH0_LVDS0 25
+#define JHB100_PER0CLK_1200_PH0_LVDS1 26
+#define JHB100_PER0CLK_1200_CORE0 27
+#define JHB100_PER0CLK_1200_CORE1 28
+#define JHB100_PER0CLK_1200_SHIFT90_LVDS0 29
+#define JHB100_PER0CLK_1200_SHIFT90_LVDS1 30
+#define JHB100_PER0CLK_1200_DIV5_CORE0 31
+#define JHB100_PER0CLK_1200_DIV5_CORE1 32
+#define JHB100_PER0CLK_PH0_LTPI0 33
+
+#define JHB100_PER0CLK_PH0_LTPI1 35
+
+#define JHB100_PER0CLK_PH90_LTPI0 37
+
+#define JHB100_PER0CLK_PH90_LTPI1 39
+
+#define JHB100_PER0CLK_240_CORE_LTPI0 41
+
+#define JHB100_PER0CLK_240_CORE_LTPI1 43
+
+#define JHB100_PER0CLK_AXI_DMA_I2C_INIT 45
+#define JHB100_PER0CLK_AXI_DMA_I3C_INIT 46
+#define JHB100_PER0CLK_AXI_DMA_UART_INIT 47
+#define JHB100_PER0CLK_CORE_DMAC0 48
+#define JHB100_PER0CLK_CORE_DMAC1 49
+#define JHB100_PER0CLK_CORE_DMAC2 50
+
+#define JHB100_PER0CLK_HDR_TX_I3C0 78
+#define JHB100_PER0CLK_HDR_TX_I3C1 79
+#define JHB100_PER0CLK_HDR_TX_I3C2 80
+#define JHB100_PER0CLK_HDR_TX_I3C3 81
+#define JHB100_PER0CLK_HDR_TX_I3C4 82
+#define JHB100_PER0CLK_HDR_TX_I3C5 83
+#define JHB100_PER0CLK_HDR_TX_I3C6 84
+#define JHB100_PER0CLK_HDR_TX_I3C7 85
+#define JHB100_PER0CLK_HDR_TX_I3C8 86
+#define JHB100_PER0CLK_HDR_TX_I3C9 87
+#define JHB100_PER0CLK_HDR_TX_I3C10 88
+#define JHB100_PER0CLK_HDR_TX_I3C11 89
+#define JHB100_PER0CLK_HDR_TX_I3C12 90
+#define JHB100_PER0CLK_HDR_TX_I3C13 91
+#define JHB100_PER0CLK_HDR_TX_I3C14 92
+#define JHB100_PER0CLK_HDR_TX_I3C15 93
+#define JHB100_PER0CLK_CORE_I2C0 94
+#define JHB100_PER0CLK_CORE_I2C1 95
+#define JHB100_PER0CLK_CORE_I2C2 96
+#define JHB100_PER0CLK_CORE_I2C3 97
+#define JHB100_PER0CLK_CORE_I2C4 98
+#define JHB100_PER0CLK_CORE_I2C5 99
+#define JHB100_PER0CLK_CORE_I2C6 100
+#define JHB100_PER0CLK_CORE_I2C7 101
+#define JHB100_PER0CLK_CORE_I2C8 102
+#define JHB100_PER0CLK_CORE_I2C9 103
+#define JHB100_PER0CLK_CORE_I2C10 104
+#define JHB100_PER0CLK_CORE_I2C11 105
+#define JHB100_PER0CLK_CORE_I2C12 106
+#define JHB100_PER0CLK_CORE_I2C13 107
+#define JHB100_PER0CLK_CORE_I2C14 108
+#define JHB100_PER0CLK_CORE_I2C15 109
+
+#define JHB100_PER0CLK_WDOGCLK_WDT0 126
+#define JHB100_PER0CLK_WDOGCLK_WDT1 127
+#define JHB100_PER0CLK_WDOGCLK_WDT2 128
+#define JHB100_PER0CLK_WDOGCLK_WDT3 129
+#define JHB100_PER0CLK_WDOGCLK_WDT_EXTERNAL 130
+#define JHB100_PER0CLK_SCLK_UART4 131
+#define JHB100_PER0CLK_SCLK_UART5 132
+#define JHB100_PER0CLK_SCLK_UART6 133
+#define JHB100_PER0CLK_SCLK_UART7 134
+#define JHB100_PER0CLK_SCLK_UART8 135
+#define JHB100_PER0CLK_SCLK_UART9 136
+#define JHB100_PER0CLK_SCLK_UART10 137
+#define JHB100_PER0CLK_SCLK_UART11 138
+#define JHB100_PER0CLK_SCLK_UART12 139
+#define JHB100_PER0CLK_SCLK_UART13 140
+#define JHB100_PER0CLK_SCLK_UART14 141
+
+#define JHB100_PER0CLK_PCLK_DMA_UART_CFG 148
+#define JHB100_PER0CLK_PCLK_DMA_I2C_CFG 149
+#define JHB100_PER0CLK_PCLK_DMA_I3C_CFG 150
+#define JHB100_PER0CLK_PCLK_DUALTIMER0 151
+#define JHB100_PER0CLK_PCLK_DUALTIMER1 152
+#define JHB100_PER0CLK_PCLK_DUALTIMER2 153
+
+#define JHB100_PER0CLK_HCLK_TRNG 156
+#define JHB100_PER0CLK_APB_I2C0 157
+#define JHB100_PER0CLK_APB_I2C1 158
+#define JHB100_PER0CLK_APB_I2C2 159
+#define JHB100_PER0CLK_APB_I2C3 160
+#define JHB100_PER0CLK_APB_I2C4 161
+#define JHB100_PER0CLK_APB_I2C5 162
+#define JHB100_PER0CLK_APB_I2C6 163
+#define JHB100_PER0CLK_APB_I2C7 164
+#define JHB100_PER0CLK_APB_I2C8 165
+#define JHB100_PER0CLK_APB_I2C9 166
+#define JHB100_PER0CLK_APB_I2C10 167
+#define JHB100_PER0CLK_APB_I2C11 168
+#define JHB100_PER0CLK_APB_I2C12 169
+#define JHB100_PER0CLK_APB_I2C13 170
+#define JHB100_PER0CLK_APB_I2C14 171
+#define JHB100_PER0CLK_APB_I2C15 172
+#define JHB100_PER0CLK_APB_I2CF0 173
+#define JHB100_PER0CLK_APB_I2CF1 174
+#define JHB100_PER0CLK_APB_I2CF2 175
+#define JHB100_PER0CLK_APB_I2CF3 176
+#define JHB100_PER0CLK_APB_I2CF4 177
+#define JHB100_PER0CLK_APB_I2CF5 178
+#define JHB100_PER0CLK_APB_I2CF6 179
+#define JHB100_PER0CLK_APB_I2CF7 180
+#define JHB100_PER0CLK_APB_I2CF8 181
+#define JHB100_PER0CLK_APB_I2CF9 182
+#define JHB100_PER0CLK_APB_I2CF10 183
+#define JHB100_PER0CLK_APB_I2CF11 184
+#define JHB100_PER0CLK_APB_I2CF12 185
+#define JHB100_PER0CLK_APB_I2CF13 186
+#define JHB100_PER0CLK_APB_I2CF14 187
+#define JHB100_PER0CLK_APB_I2CF15 188
+#define JHB100_PER0CLK_APB_I3C0 189
+#define JHB100_PER0CLK_APB_I3C1 190
+#define JHB100_PER0CLK_APB_I3C2 191
+#define JHB100_PER0CLK_APB_I3C3 192
+#define JHB100_PER0CLK_APB_I3C4 193
+#define JHB100_PER0CLK_APB_I3C5 194
+#define JHB100_PER0CLK_APB_I3C6 195
+#define JHB100_PER0CLK_APB_I3C7 196
+#define JHB100_PER0CLK_APB_I3C8 197
+#define JHB100_PER0CLK_APB_I3C9 198
+#define JHB100_PER0CLK_APB_I3C10 199
+#define JHB100_PER0CLK_APB_I3C11 200
+#define JHB100_PER0CLK_APB_I3C12 201
+#define JHB100_PER0CLK_APB_I3C13 202
+#define JHB100_PER0CLK_APB_I3C14 203
+#define JHB100_PER0CLK_APB_I3C15 204
+#define JHB100_PER0CLK_APB_UART0 205
+#define JHB100_PER0CLK_APB_UART1 206
+#define JHB100_PER0CLK_APB_UART2 207
+#define JHB100_PER0CLK_APB_UART3 208
+#define JHB100_PER0CLK_APB_UART4 209
+#define JHB100_PER0CLK_APB_UART5 210
+#define JHB100_PER0CLK_APB_UART6 211
+#define JHB100_PER0CLK_APB_UART7 212
+#define JHB100_PER0CLK_APB_UART8 213
+#define JHB100_PER0CLK_APB_UART9 214
+#define JHB100_PER0CLK_APB_UART10 215
+#define JHB100_PER0CLK_APB_UART11 216
+#define JHB100_PER0CLK_APB_UART12 217
+#define JHB100_PER0CLK_APB_UART13 218
+#define JHB100_PER0CLK_APB_UART14 219
+#define JHB100_PER0CLK_DMA_I3C0 220
+#define JHB100_PER0CLK_DMA_I3C1 221
+#define JHB100_PER0CLK_DMA_I3C2 222
+#define JHB100_PER0CLK_DMA_I3C3 223
+#define JHB100_PER0CLK_DMA_I3C4 224
+#define JHB100_PER0CLK_DMA_I3C5 225
+#define JHB100_PER0CLK_DMA_I3C6 226
+#define JHB100_PER0CLK_DMA_I3C7 227
+#define JHB100_PER0CLK_DMA_I3C8 228
+#define JHB100_PER0CLK_DMA_I3C9 229
+#define JHB100_PER0CLK_DMA_I3C10 230
+#define JHB100_PER0CLK_DMA_I3C11 231
+#define JHB100_PER0CLK_DMA_I3C12 232
+#define JHB100_PER0CLK_DMA_I3C13 233
+#define JHB100_PER0CLK_DMA_I3C14 234
+#define JHB100_PER0CLK_DMA_I3C15 235
+#define JHB100_PER0CLK_CORE_I3C0 236
+#define JHB100_PER0CLK_CORE_I3C1 237
+#define JHB100_PER0CLK_CORE_I3C2 238
+#define JHB100_PER0CLK_CORE_I3C3 239
+#define JHB100_PER0CLK_CORE_I3C4 240
+#define JHB100_PER0CLK_CORE_I3C5 241
+#define JHB100_PER0CLK_CORE_I3C6 242
+#define JHB100_PER0CLK_CORE_I3C7 243
+#define JHB100_PER0CLK_CORE_I3C8 244
+#define JHB100_PER0CLK_CORE_I3C9 245
+#define JHB100_PER0CLK_CORE_I3C10 246
+#define JHB100_PER0CLK_CORE_I3C11 247
+#define JHB100_PER0CLK_CORE_I3C12 248
+#define JHB100_PER0CLK_CORE_I3C13 249
+#define JHB100_PER0CLK_CORE_I3C14 250
+#define JHB100_PER0CLK_CORE_I3C15 251
+#define JHB100_PER0CLK_DMAC_AXI_PERIPH0_HS_CLK_I2C 252
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C0 253
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C1 254
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C2 255
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C3 256
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C4 257
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C5 258
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C6 259
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C7 260
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C8 261
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C9 262
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C10 263
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C11 264
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C12 265
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C13 266
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C14 267
+#define JHB100_PER0CLK_MAIN_ICG_EN_I3C15 268
+#define JHB100_PER0CLK_MAIN_ICG_EN_DUALTIMER0 269
+#define JHB100_PER0CLK_MAIN_ICG_EN_DUALTIMER1 270
+#define JHB100_PER0CLK_MAIN_ICG_EN_DUALTIMER2 271
+#define JHB100_PER0CLK_MAIN_ICG_EN_LTPI0 272
+#define JHB100_PER0CLK_MAIN_ICG_EN_LTPI1 273
+#define JHB100_PER0CLK_MAIN_ICG_EN_DMAC_I2C 274
+#define JHB100_PER0CLK_MAIN_ICG_EN_DMAC_I3C 275
+#define JHB100_PER0CLK_MAIN_ICG_EN_DMAC_UART 276
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL4 277
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL5 278
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL6 279
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL7 280
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL8 281
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL9 282
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL10 283
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL11 284
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL12 285
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL13 286
+#define JHB100_PER0CLK_MAIN_ICG_EN_SOL14 287
+
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C0 304
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C1 305
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C2 306
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C3 307
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C4 308
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C5 309
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C6 310
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C7 311
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C8 312
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C9 313
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C10 314
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C11 315
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C12 316
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C13 317
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C14 318
+#define JHB100_PER0CLK_MAIN_ICG_EN_I2C15 319
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT0 320
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT1 321
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT2 322
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT3 323
+#define JHB100_PER0CLK_MAIN_ICG_EN_WDT_EXTERNAL 324
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART4 325
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART5 326
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART6 327
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART7 328
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART8 329
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART9 330
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART10 331
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART11 332
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART12 333
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART13 334
+#define JHB100_PER0CLK_MAIN_ICG_EN_UART14 335
+#define JHB100_PER0CLK_MAIN_ICG_EN_LDO0 336
+#define JHB100_PER0CLK_MAIN_ICG_EN_LDO1 337
+#define JHB100_PER0CLK_MAIN_ICG_EN_SENSORS_PERIPH0 338
+#define JHB100_PER0CLK_MAIN_ICG_EN_SENSORS_DMAC 339
+#define JHB100_PER0CLK_MAIN_ICG_EN_TRNG 340
+
#endif /* __DT_BINDINGS_CLOCK_STARFIVE_JHB100_H__ */
diff --git a/include/dt-bindings/reset/starfive,jhb100-crg.h b/include/dt-bindings/reset/starfive,jhb100-crg.h
index fbc55f95e76c..ccfb7616e1a7 100644
--- a/include/dt-bindings/reset/starfive,jhb100-crg.h
+++ b/include/dt-bindings/reset/starfive,jhb100-crg.h
@@ -61,4 +61,81 @@
#define JHB100_SYS2RST_GPU1_RSTN_BUS 21
#define JHB100_SYS2RST_GPU1_HOST_PCIE_RST_N 22
+/* PER0CRG resets */
+#define JHB100_PER0RST_MAIN_RSTN_UART4 0
+#define JHB100_PER0RST_MAIN_RSTN_UART5 1
+#define JHB100_PER0RST_MAIN_RSTN_UART6 2
+#define JHB100_PER0RST_MAIN_RSTN_UART7 3
+#define JHB100_PER0RST_MAIN_RSTN_UART8 4
+#define JHB100_PER0RST_MAIN_RSTN_UART9 5
+#define JHB100_PER0RST_MAIN_RSTN_UART10 6
+#define JHB100_PER0RST_MAIN_RSTN_UART11 7
+#define JHB100_PER0RST_MAIN_RSTN_UART12 8
+#define JHB100_PER0RST_MAIN_RSTN_UART13 9
+#define JHB100_PER0RST_MAIN_RSTN_UART14 10
+#define JHB100_PER0RST_MAIN_RSTN_I2C0 11
+#define JHB100_PER0RST_MAIN_RSTN_I2C1 12
+#define JHB100_PER0RST_MAIN_RSTN_I2C2 13
+#define JHB100_PER0RST_MAIN_RSTN_I2C3 14
+#define JHB100_PER0RST_MAIN_RSTN_I2C4 15
+#define JHB100_PER0RST_MAIN_RSTN_I2C5 16
+#define JHB100_PER0RST_MAIN_RSTN_I2C6 17
+#define JHB100_PER0RST_MAIN_RSTN_I2C7 18
+#define JHB100_PER0RST_MAIN_RSTN_I2C8 19
+#define JHB100_PER0RST_MAIN_RSTN_I2C9 20
+#define JHB100_PER0RST_MAIN_RSTN_I2C10 21
+#define JHB100_PER0RST_MAIN_RSTN_I2C11 22
+#define JHB100_PER0RST_MAIN_RSTN_I2C12 23
+#define JHB100_PER0RST_MAIN_RSTN_I2C13 24
+#define JHB100_PER0RST_MAIN_RSTN_I2C14 25
+#define JHB100_PER0RST_MAIN_RSTN_I2C15 26
+#define JHB100_PER0RST_MAIN_RSTN_I3C0 27
+#define JHB100_PER0RST_MAIN_RSTN_I3C1 28
+#define JHB100_PER0RST_MAIN_RSTN_I3C2 29
+#define JHB100_PER0RST_MAIN_RSTN_I3C3 30
+#define JHB100_PER0RST_MAIN_RSTN_I3C4 31
+#define JHB100_PER0RST_MAIN_RSTN_I3C5 32
+#define JHB100_PER0RST_MAIN_RSTN_I3C6 33
+#define JHB100_PER0RST_MAIN_RSTN_I3C7 34
+#define JHB100_PER0RST_MAIN_RSTN_I3C8 35
+#define JHB100_PER0RST_MAIN_RSTN_I3C9 36
+#define JHB100_PER0RST_MAIN_RSTN_I3C10 37
+#define JHB100_PER0RST_MAIN_RSTN_I3C11 38
+#define JHB100_PER0RST_MAIN_RSTN_I3C12 39
+#define JHB100_PER0RST_MAIN_RSTN_I3C13 40
+#define JHB100_PER0RST_MAIN_RSTN_I3C14 41
+#define JHB100_PER0RST_MAIN_RSTN_I3C15 42
+#define JHB100_PER0RST_MAIN_RSTN_WDT0 43
+#define JHB100_PER0RST_MAIN_RSTN_WDT1 44
+#define JHB100_PER0RST_MAIN_RSTN_WDT2 45
+#define JHB100_PER0RST_MAIN_RSTN_WDT3 46
+#define JHB100_PER0RST_MAIN_RSTN_WDT4 47
+#define JHB100_PER0RST_MAIN_RSTN_DUALTIMER0 48
+#define JHB100_PER0RST_MAIN_RSTN_DUALTIMER1 49
+#define JHB100_PER0RST_MAIN_RSTN_DUALTIMER2 50
+#define JHB100_PER0RST_MAIN_RSTN_TRNG 51
+#define JHB100_PER0RST_MAIN_RSTN_DMAC0 52
+#define JHB100_PER0RST_MAIN_RSTN_DMAC1 53
+#define JHB100_PER0RST_MAIN_RSTN_DMAC2 54
+#define JHB100_PER0RST_MAIN_RSTN_LTPI0 55
+#define JHB100_PER0RST_MAIN_RSTN_LTPI1 56
+#define JHB100_PER0RST_MAIN_RSTN_SOL4 57
+#define JHB100_PER0RST_MAIN_RSTN_SOL5 58
+#define JHB100_PER0RST_MAIN_RSTN_SOL6 59
+#define JHB100_PER0RST_MAIN_RSTN_SOL7 60
+#define JHB100_PER0RST_MAIN_RSTN_SOL8 61
+#define JHB100_PER0RST_MAIN_RSTN_SOL9 62
+#define JHB100_PER0RST_MAIN_RSTN_SOL10 63
+#define JHB100_PER0RST_MAIN_RSTN_SOL11 64
+#define JHB100_PER0RST_MAIN_RSTN_SOL12 65
+#define JHB100_PER0RST_MAIN_RSTN_SOL13 66
+#define JHB100_PER0RST_MAIN_RSTN_SOL14 67
+#define JHB100_PER0RST_MAIN_RSTN_LDO0 68
+#define JHB100_PER0RST_MAIN_RSTN_LDO1 69
+#define JHB100_PER0RST_MAIN_RSTN_PERIPH0_SENSORS 70
+#define JHB100_PER0RST_MAIN_RSTN_DMAC0_SENSORS 71
+#define JHB100_PER0RST_SYSCON_PRESETN 72
+#define JHB100_PER0RST_GPIO_IOMUX_PRESETN 73
+#define JHB100_PER0RST_UART_MUX_REG_WRAP 74
+
#endif /* __DT_BINDINGS_RESET_STARFIVE_JHB100_CRG_H__ */
--
2.25.1
^ permalink raw reply related
* [PATCH v2 10/22] clk: starfive: Add JHB100 System-2 clock generator driver
From: Changhuang Liang @ 2026-05-08 5:36 UTC (permalink / raw)
To: Michael Turquette, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Philipp Zabel, Emil Renner Berthing, Kees Cook,
Gustavo A . R . Silva, Richard Cochran
Cc: linux-clk, linux-kernel, devicetree, linux-riscv, linux-hardening,
netdev, Sia Jee Heng, Hal Feng, Ley Foon Tan, Changhuang Liang
In-Reply-To: <20260508053632.818548-1-changhuang.liang@starfivetech.com>
Add support for JHB100 System-2 clock generator (SYS2CRG).
Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
---
drivers/clk/starfive/Kconfig | 8 ++
drivers/clk/starfive/Makefile | 1 +
.../clk/starfive/clk-starfive-jhb100-sys2.c | 128 ++++++++++++++++++
3 files changed, 137 insertions(+)
create mode 100644 drivers/clk/starfive/clk-starfive-jhb100-sys2.c
diff --git a/drivers/clk/starfive/Kconfig b/drivers/clk/starfive/Kconfig
index b6042bcb5992..729bdfce7b8a 100644
--- a/drivers/clk/starfive/Kconfig
+++ b/drivers/clk/starfive/Kconfig
@@ -91,3 +91,11 @@ config CLK_STARFIVE_JHB100_SYS1
help
Say yes here to support the system-1 clock controller on the
StarFive JHB100 SoC.
+
+config CLK_STARFIVE_JHB100_SYS2
+ bool "StarFive JHB100 system-2 clock support"
+ depends on CLK_STARFIVE_JHB100_SYS0
+ default ARCH_STARFIVE
+ help
+ Say yes here to support the system-2 clock controller on the
+ StarFive JHB100 SoC.
diff --git a/drivers/clk/starfive/Makefile b/drivers/clk/starfive/Makefile
index b3571e2f0555..90b6390296bd 100644
--- a/drivers/clk/starfive/Makefile
+++ b/drivers/clk/starfive/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_CLK_STARFIVE_JH7110_VOUT) += clk-starfive-jh7110-vout.o
obj-$(CONFIG_CLK_STARFIVE_JHB100_SYS0) += clk-starfive-jhb100-sys0.o
obj-$(CONFIG_CLK_STARFIVE_JHB100_SYS1) += clk-starfive-jhb100-sys1.o
+obj-$(CONFIG_CLK_STARFIVE_JHB100_SYS2) += clk-starfive-jhb100-sys2.o
diff --git a/drivers/clk/starfive/clk-starfive-jhb100-sys2.c b/drivers/clk/starfive/clk-starfive-jhb100-sys2.c
new file mode 100644
index 000000000000..20ea5acf31ca
--- /dev/null
+++ b/drivers/clk/starfive/clk-starfive-jhb100-sys2.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * StarFive JHB100 System-2 Clock Driver
+ *
+ * Copyright (C) 2024 StarFive Technology Co., Ltd.
+ *
+ * Author: Changhuang Liang <changhuang.liang@starfivetech.com>
+ *
+ */
+
+#include <dt-bindings/clock/starfive,jhb100-crg.h>
+#include <linux/init.h>
+#include <linux/platform_device.h>
+
+#include "clk-starfive-common.h"
+
+#define JHB100_SYS2CLK_NUM_CLKS (JHB100_SYS2CLK_MAIN_ICG_EN_JTAG1 + 1)
+
+/* external clocks */
+#define JHB100_SYS2CLK_OSC (JHB100_SYS2CLK_NUM_CLKS + 0)
+#define JHB100_SYS2CLK_PLL1 (JHB100_SYS2CLK_NUM_CLKS + 1)
+#define JHB100_SYS2CLK_GPU0_NCNOC_INIT (JHB100_SYS2CLK_NUM_CLKS + 2)
+#define JHB100_SYS2CLK_GPU1_NCNOC_INIT (JHB100_SYS2CLK_NUM_CLKS + 3)
+
+char *jhb100_sys2_ext_clk[] = {
+ "osc",
+ "pll1",
+ "gpu0_ncnoc_init",
+ "gpu1_ncnoc_init",
+};
+
+static const struct starfive_clk_data jhb100_sys2crg_clk_data[] __initconst = {
+ /* jtag mst*/
+ STARFIVE__DIV(JHB100_SYS2CLK_JTAGM0_HCLK, "jtagm0_hclk", 6,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_JTAGM1_HCLK, "jtagm1_hclk", 6,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_JTAGM0_ATPG, "jtagm0_ATPG", 12,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_JTAGM1_ATPG, "jtagm1_ATPG", 12,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_JTAGM0_ATPG_TCLOCK, "jtagm0_atpg_tclock", 2,
+ JHB100_SYS2CLK_JTAGM0_ATPG),
+ STARFIVE__DIV(JHB100_SYS2CLK_JTAGM1_ATPG_TCLOCK, "jtagm1_atpg_tclock", 2,
+ JHB100_SYS2CLK_JTAGM1_ATPG),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_HCLK, "jtag0_mst_wrap_hclk",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_HCLK),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_CLK_JTAG, "jtag0_mst_wrap_clk_jtag",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_HCLK),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_APB_PCLK, "jtag0_mst_wrap_apb_pclk",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_ATPG),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG0_MST_WRAP_ATPG_TCLOCK, "jtag0_mst_wrap_atpg_tclock",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM0_ATPG),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_HCLK, "jtag1_mst_wrap_hclk",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_HCLK),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_CLK_JTAG, "jtag1_mst_wrap_clk_jtag",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_HCLK),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_APB_PCLK, "jtag1_mst_wrap_apb_pclk",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_ATPG),
+ STARFIVE_GATE(JHB100_SYS2CLK_JTAG1_MST_WRAP_ATPG_TCLOCK, "jtag1_mst_wrap_atpg_tclock",
+ CLK_IGNORE_UNUSED, JHB100_SYS2CLK_JTAGM1_ATPG),
+ /* hostusbcmn */
+ STARFIVE__DIV(JHB100_SYS2CLK_HOSTUSB_NCNOC_TARG, "hostusb_ncnoc_targ", 12,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_HOSTUSBCMN_CFG_500, "hostusbcmn_cfg_500", 4,
+ JHB100_SYS2CLK_PLL1),
+ /* bmcperiph1 */
+ STARFIVE__DIV(JHB100_SYS2CLK_BMCPER1_NCNOC_TARG, "bmcper1_ncnoc_targ", 6,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_BMCPER1_CFG_250, "bmcper1_cfg_250", 5,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE__DIV(JHB100_SYS2CLK_BMCPER1_CFG_143_DFT, "bmcper1_cfg_143_dft", 8,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE_GATE(JHB100_SYS2CLK_BMCPER1_CFG_143, "bmcper1_cfg_143", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_BMCPER1_CFG_143_DFT),
+ /* bmcperiph0 */
+ STARFIVE__DIV(JHB100_SYS2CLK_BMCPER0_NCNOC_TARG, "bmcper0_ncnoc_targ", 6,
+ JHB100_SYS2CLK_PLL1),
+ /* gpu0 */
+ STARFIVE__DIV(JHB100_SYS2CLK_GPU0_NCNOC_TARG, "gpu0_ncnoc_targ", 12,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE_GATE(JHB100_SYS2CLK_GPU0_BUS_CLK, "gpu0_bus_clk", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_GPU0_NCNOC_INIT),
+ STARFIVE_GATE(JHB100_SYS2CLK_GPU0_APB_CLK, "gpu0_apb_clk", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_GPU0_NCNOC_TARG),
+ STARFIVE_GATE(JHB100_SYS2CLK_GPU0_OSC_CLK, "gpu0_osc_clk", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_OSC),
+ /* gpu1 */
+ STARFIVE__DIV(JHB100_SYS2CLK_GPU1_NCNOC_TARG, "gpu1_ncnoc_targ", 12,
+ JHB100_SYS2CLK_PLL1),
+ STARFIVE_GATE(JHB100_SYS2CLK_GPU1_BUS_CLK, "gpu1_bus_clk", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_GPU1_NCNOC_INIT),
+ STARFIVE_GATE(JHB100_SYS2CLK_GPU1_APB_CLK, "gpu1_apb_clk", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_GPU1_NCNOC_TARG),
+ STARFIVE_GATE(JHB100_SYS2CLK_GPU1_OSC_CLK, "gpu1_osc_clk", CLK_IS_CRITICAL,
+ JHB100_SYS2CLK_OSC),
+ /* main icg */
+ STARFIVE_GATE(JHB100_SYS2CLK_MAIN_ICG_EN_JTAG0, "main_icg_en_jtag0", 0,
+ JHB100_SYS2CLK_JTAGM0_HCLK),
+ STARFIVE_GATE(JHB100_SYS2CLK_MAIN_ICG_EN_JTAG1, "main_icg_en_jtag1", 0,
+ JHB100_SYS2CLK_JTAGM1_HCLK),
+};
+
+const struct jhb100_crg_domain_info jhb100_sys2crg_info = {
+ .clk_data = jhb100_sys2crg_clk_data,
+ .num_clk = ARRAY_SIZE(jhb100_sys2crg_clk_data),
+ .ext_clk = jhb100_sys2_ext_clk,
+ .num_ext_clk = ARRAY_SIZE(jhb100_sys2_ext_clk),
+ .rst_name = "jhb100-r-sys2",
+ .power_domain = false,
+};
+
+static const struct of_device_id jhb100_sys2crg_match[] = {
+ {
+ .compatible = "starfive,jhb100-sys2crg",
+ .data = &jhb100_sys2crg_info,
+ },
+ { /* sentinel */ }
+};
+
+static struct platform_driver jhb100_sys2crg_driver = {
+ .driver = {
+ .name = "clk-starfive-jhb100-sys2",
+ .of_match_table = jhb100_sys2crg_match,
+ .suppress_bind_attrs = true,
+ },
+};
+builtin_platform_driver_probe(jhb100_sys2crg_driver, starfive_crg_probe);
--
2.25.1
^ permalink raw reply related
* Re: [PATCH net] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
From: Qingfang Deng @ 2026-05-08 5:57 UTC (permalink / raw)
To: Hyunwoo Kim
Cc: dhowells, marc.dionne, davem, edumazet, kuba, pabeni, horms,
linux-afs, netdev
In-Reply-To: <afKV2zGR6rrelPC7@v4bel>
On Thu, 30 Apr 2026 08:35:55 +0900, Hyunwoo Kim wrote:
>
> The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
> handler in rxrpc_verify_response() copy the skb to a linear one before
> calling into the security ops only when skb_cloned() is true. An skb
> that is not cloned but still carries paged fragments (skb->data_len != 0)
> falls through to the in-place decryption path, which binds the frag
> pages directly into the AEAD/skcipher SGL via skb_to_sgvec().
>
> Extend the gate so that any skb with non-linear data is also copied,
> ensuring the security handler always operates on a fully linear skb.
> The OOM/trace handling already in place is reused.
>
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
> net/rxrpc/call_event.c | 2 +-
> net/rxrpc/conn_event.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index fdd683261226..6c924ef55208 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -334,7 +334,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
>
> if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> sp->hdr.securityIndex != 0 &&
> - skb_cloned(skb)) {
> + (skb_cloned(skb) || skb->data_len)) {
It's recommended to use skb_is_nonlinear() instead of open-coding
skb->data_len.
> /* Unshare the packet so that it can be
> * modified by in-place decryption.
> */
> diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
> index a2130d25aaa9..eab7c5f2517a 100644
> --- a/net/rxrpc/conn_event.c
> +++ b/net/rxrpc/conn_event.c
> @@ -245,7 +245,7 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
> {
> int ret;
>
> - if (skb_cloned(skb)) {
> + if (skb_cloned(skb) || skb->data_len) {
Ditto.
> /* Copy the packet if shared so that we can do in-place
> * decryption.
> */
Regards,
Qingfang
^ permalink raw reply
* Re: [PATCH net] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
From: Hyunwoo Kim @ 2026-05-08 6:07 UTC (permalink / raw)
To: Qingfang Deng
Cc: dhowells, marc.dionne, davem, edumazet, kuba, pabeni, horms,
linux-afs, netdev, imv4bel
In-Reply-To: <20260508055716.89380-1-qingfang.deng@linux.dev>
On Fri, May 08, 2026 at 01:57:15PM +0800, Qingfang Deng wrote:
> On Thu, 30 Apr 2026 08:35:55 +0900, Hyunwoo Kim wrote:
> >
> > The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
> > handler in rxrpc_verify_response() copy the skb to a linear one before
> > calling into the security ops only when skb_cloned() is true. An skb
> > that is not cloned but still carries paged fragments (skb->data_len != 0)
> > falls through to the in-place decryption path, which binds the frag
> > pages directly into the AEAD/skcipher SGL via skb_to_sgvec().
> >
> > Extend the gate so that any skb with non-linear data is also copied,
> > ensuring the security handler always operates on a fully linear skb.
> > The OOM/trace handling already in place is reused.
> >
> > Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > ---
> > net/rxrpc/call_event.c | 2 +-
> > net/rxrpc/conn_event.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> > index fdd683261226..6c924ef55208 100644
> > --- a/net/rxrpc/call_event.c
> > +++ b/net/rxrpc/call_event.c
> > @@ -334,7 +334,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
> >
> > if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> > sp->hdr.securityIndex != 0 &&
> > - skb_cloned(skb)) {
> > + (skb_cloned(skb) || skb->data_len)) {
>
> It's recommended to use skb_is_nonlinear() instead of open-coding
> skb->data_len.
>
> > /* Unshare the packet so that it can be
> > * modified by in-place decryption.
> > */
> > diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
> > index a2130d25aaa9..eab7c5f2517a 100644
> > --- a/net/rxrpc/conn_event.c
> > +++ b/net/rxrpc/conn_event.c
> > @@ -245,7 +245,7 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
> > {
> > int ret;
> >
> > - if (skb_cloned(skb)) {
> > + if (skb_cloned(skb) || skb->data_len) {
>
> Ditto.
Thank you for the review.
I will submit a v2 patch.
Best regards,
Hyunwoo Kim
>
> > /* Copy the packet if shared so that we can do in-place
> > * decryption.
> > */
>
> Regards,
> Qingfang
^ permalink raw reply
* Re: [mst-vhost:balloon 6/30] Warning: mm/mempolicy.c:2444 function parameter 'user_addr' not described in '__alloc_pages_mpol'
From: Michael S. Tsirkin @ 2026-05-08 6:09 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-kbuild-all, kvm, virtualization, netdev
In-Reply-To: <202605080515.6jRN5wN7-lkp@intel.com>
On Fri, May 08, 2026 at 05:26:30AM +0800, kernel test robot wrote:
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
> head: 9f56ee36fbf6a6d336dc6a9eaeb4f8a67cb42a31
> commit: c4289f5a4e563611a468b4b5379025a4aa4a7c12 [6/30] mm: thread user_addr through page allocator for cache-friendly zeroing
> config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20260508/202605080515.6jRN5wN7-lkp@intel.com/config)
> compiler: powerpc64-linux-gcc (GCC) 15.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260508/202605080515.6jRN5wN7-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202605080515.6jRN5wN7-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> >> Warning: mm/mempolicy.c:2444 function parameter 'user_addr' not described in '__alloc_pages_mpol'
> >> Warning: mm/mempolicy.c:2444 expecting prototype for alloc_pages_mpol(). Prototype was for __alloc_pages_mpol() instead
> Warning: mm/mempolicy.c:2547 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
kerneldoc messed up with a rebase. will fix in v6.
^ permalink raw reply
* Re: [mst-vhost:balloon 4/30] Warning: mm/mempolicy.c:2527 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead
From: Michael S. Tsirkin @ 2026-05-08 6:16 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-kbuild-all, kvm, virtualization, netdev
In-Reply-To: <202605080331.y1eIdVUC-lkp@intel.com>
On Fri, May 08, 2026 at 03:56:56AM +0800, kernel test robot wrote:
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
> head: 9f56ee36fbf6a6d336dc6a9eaeb4f8a67cb42a31
> commit: 95744e0e9c4df79c6bc8ec96306b29c7a8e8984e [4/30] mm: move vma_alloc_folio to page_alloc.c
> config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20260508/202605080331.y1eIdVUC-lkp@intel.com/config)
> compiler: powerpc64-linux-gcc (GCC) 15.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260508/202605080331.y1eIdVUC-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202605080331.y1eIdVUC-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> >> Warning: mm/mempolicy.c:2527 expecting prototype for vma_alloc_folio(). Prototype was for alloc_frozen_pages() instead
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
kerneldoc messed up by a rebase. will fix up.
^ permalink raw reply
* [PATCH net] net: ena: PHC: Fix potential use-after-free in get_timestamp
From: Arthur Kiyanovski @ 2026-05-08 6:21 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, netdev
Cc: Arthur Kiyanovski, Richard Cochran, Eric Dumazet, Paolo Abeni,
David Woodhouse, Thomas Gleixner, Miroslav Lichvar, Andrew Lunn,
Wen Gu, Xuan Zhuo, David Woodhouse, Yonatan Sarna,
Zorik Machulsky, Alexander Matushevsky, Saeed Bshara, Matt Wilson,
Anthony Liguori, Nafea Bshara, Evgeny Schmeilin, Netanel Belgazal,
Ali Saidi, Benjamin Herrenschmidt, Noam Dagan, David Arinzon,
Evgeny Ostrovsky, Ofir Tabachnik, Amit Bernstein, stable
Move the phc->active check and resp pointer assignment to after
acquiring the spinlock. Previously, phc->active was checked without
holding the lock, and resp was cached from ena_dev->phc.virt_addr
before the lock was acquired.
If ena_com_phc_destroy() runs between the lockless active check and
the lock acquisition, it sets active=false, releases the lock, frees
the DMA memory, and sets virt_addr=NULL. The get_timestamp path would
then read a NULL virt_addr and dereference it.
With both the active check and the pointer read under the lock,
destroy cannot free the memory while get_timestamp is using it.
Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index e67b592..8c86789 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -1782,20 +1782,23 @@ void ena_com_phc_destroy(struct ena_com_dev *ena_dev)
int ena_com_phc_get_timestamp(struct ena_com_dev *ena_dev, u64 *timestamp)
{
- volatile struct ena_admin_phc_resp *resp = ena_dev->phc.virt_addr;
const ktime_t zero_system_time = ktime_set(0, 0);
struct ena_com_phc_info *phc = &ena_dev->phc;
+ volatile struct ena_admin_phc_resp *resp;
ktime_t expire_time;
ktime_t block_time;
unsigned long flags = 0;
int ret = 0;
+ spin_lock_irqsave(&phc->lock, flags);
+
if (!phc->active) {
+ spin_unlock_irqrestore(&phc->lock, flags);
netdev_err(ena_dev->net_device, "PHC feature is not active in the device\n");
return -EOPNOTSUPP;
}
- spin_lock_irqsave(&phc->lock, flags);
+ resp = ena_dev->phc.virt_addr;
/* Check if PHC is in blocked state */
if (unlikely(ktime_compare(phc->system_time, zero_system_time))) {
--
2.47.3
^ permalink raw reply related
* Re: Re: [PATCH net v1 2/2] net: stmmac: eic7700: fix delay step calculation and ensure safe register initialization
From: 李志 @ 2026-05-08 6:25 UTC (permalink / raw)
To: Maxime Chevallier
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
alexandre.torgue, rmk+kernel, linux-stm32, linux-arm-kernel,
ningyu, linmin, pinkesh.vaghela, pritesh.patel, weishangjuan
In-Reply-To: <92e8a3dd-a46a-499f-b5f6-99f7b99f45f5@bootlin.com>
> -----Original Messages-----
> From: "Maxime Chevallier" <maxime.chevallier@bootlin.com>
> Send time:Thursday, 07/05/2026 19:21:41
> To: lizhi2@eswincomputing.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com, rmk+kernel@armlinux.org.uk, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org
> Cc: ningyu@eswincomputing.com, linmin@eswincomputing.com, pinkesh.vaghela@einfochips.com, pritesh.patel@einfochips.com, weishangjuan@eswincomputing.com
> Subject: Re: [PATCH net v1 2/2] net: stmmac: eic7700: fix delay step calculation and ensure safe register initialization
>
> Hi,
>
> On 07/05/2026 10:32, lizhi2@eswincomputing.com wrote:
> > From: Zhi Li <lizhi2@eswincomputing.com>
> >
> > Fix several issues in the EIC7700 DWMAC glue driver related to delay
> > configuration and register initialization.
> >
> > The hardware implements TX/RX delay with a granularity of 20 ps per
> > step, but the driver previously assumed a 100 ps step. Update the
> > definitions to match the actual hardware behaviour and align with
> > the binding constraints.
> >
> > Introduce explicit definitions for the maximum programmable delay
> > range based on the hardware limits.
> >
> > Move HSP CSR configuration into the initialization path after clocks
> > are enabled. This ensures that all register accesses occur with the
> > required clocks active, avoiding undefined behaviour.
> >
> > Clear the TXD and RXD delay control registers during initialization
> > to override any residual configuration left by the bootloader. This
> > ensures deterministic RGMII timing and prevents unintended delay
> > being applied.
> >
> > The MAC RGMII delay programming is only required for 100Mbps and
> > 1000Mbps modes, where precise clock-to-data alignment is necessary for
> > reliable sampling.
> >
> > For 10Mbps operation, timing margins are sufficiently relaxed and no
> > additional delay compensation is required. In this case, the driver
> > falls back to a safe default configuration with delay disabled.
> >
> > For unsupported or unexpected link speeds, the driver avoids
> > programming invalid delay values and falls back to a safe default
> > state by explicitly clearing the delay configuration.
> >
> > Explicitly programming zero ensures that no residual delay settings
> > from previous configurations or bootloader state remain active.
> >
> > These changes fix incorrect delay programming and initialization
> > ordering for existing users.
> >
> > This also aligns the driver implementation with the updated device
> > tree binding.
>
> There's a lot going on in this patch, can you split this into patches
> that solves each of these individual issues ?
>
> It's a mix of fixes (the reg access moved after clk config for example)
> and non-fixes (the RGMII timings, you're improving the granularity of
> the delays, is this required to fix existing setups, or is it a generic
> improvement ?), splitting this would make it both easier to review, and
> easier to bisect should problems arise in the future.
>
Thanks for the detailed review and suggestions.
You're right that the current patch mixes several logically independent
changes, and splitting them will make the series easier to review and
bisect. I will follow your suggestion and split the current patch into
multiple smaller patches within the same series.
All five changes below are correctness fixes addressing hardware or driver
issues, not improvements or new features.
Based on the current change set, the individual fixes are:
1. TX/RX delay granularity correction (100 ps -> 20 ps step)
This corrects an incorrect hardware capability modeling in the driver.
The driver previously assumed a 100 ps step, while the hardware actually
implements 20 ps granularity.
This fixes incorrect delay programming that could occur when fine-grained
delay values are used, ensuring correct representation of the hardware
capability.
2. Introduce explicit maximum delay range definitions
This fixes missing enforcement of hardware constraints, preventing invalid
delay values from being accepted or programmed.
3. Move HSP CSR configuration after clock enable
This fixes a register access ordering issue where accessing HSP CSR before
clocks are enabled may result in undefined behavior during initialization.
4. Clear TXD/RXD delay control registers during initialization
This fixes residual configuration left by bootloader state, ensuring
deterministic behavior across reboot and driver reload.
5. Delay handling for 10Mbps and invalid link speeds
This fixes incorrect application of RGMII delay programming outside valid
operating modes, preventing invalid configuration from being applied.
I will split these into separate patches in the next revision, while keeping
them within the same series.
For the DT binding side, would you also recommend splitting the binding
changes to match the driver-level granularity, or would it be better to keep
them consolidated in a single binding patch?
If you have any further suggestions on the split or classification, please
let me know.
Thanks,
Zhi
^ permalink raw reply
* [PATCH net-next 0/2] net/smc: transition to RDMA core CQ pooling
From: D. Wythe @ 2026-05-08 6:37 UTC (permalink / raw)
To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
linux-rdma, linux-s390, netdev, oliver.yang, pasic
This series transitions SMC-R completion handling to RDMA core CQ pooling
via the ib_cqe API. The new completion model improves scalability by
allowing per-link completion processing across multiple cores and enables
DIM-based interrupt moderation.
As a side effect, the increased concurrency can amplify contention for TX
slots on the shared wait queue. Patch 2 addresses this by switching TX slot
allocation from non-exclusive wait_event() to prepare_to_wait_exclusive(),
which avoids thundering-herd wakeups under contention.
Patch 1 replaces the global per-device CQ and manual tasklet polling model
with RDMA core CQ pooling.
Patch 2 reduces TX slot contention by using exclusive wait queue entries
during allocation.
Link: https://lore.kernel.org/netdev/20260305022323.96125-1-alibuda@linux.alibaba.com/
D. Wythe (2):
net/smc: transition to RDMA core CQ pooling
net/smc: reduce TX slot contention with exclusive wait
net/smc/smc_core.c | 9 +-
net/smc/smc_core.h | 28 ++--
net/smc/smc_ib.c | 113 +++++----------
net/smc/smc_ib.h | 7 -
net/smc/smc_tx.c | 1 -
net/smc/smc_wr.c | 344 ++++++++++++++++++++-------------------------
net/smc/smc_wr.h | 40 ++----
7 files changed, 215 insertions(+), 327 deletions(-)
--
2.45.0
^ permalink raw reply
* [PATCH net-next 1/2] net/smc: transition to RDMA core CQ pooling
From: D. Wythe @ 2026-05-08 6:37 UTC (permalink / raw)
To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
linux-rdma, linux-s390, netdev, oliver.yang, pasic,
Leon Romanovsky
In-Reply-To: <20260508063718.101622-1-alibuda@linux.alibaba.com>
The current SMC-R implementation relies on global per-device CQs
and manual polling within tasklets, which introduces severe
scalability bottlenecks due to global lock contention and tasklet
scheduling overhead, resulting in poor performance as concurrency
increases.
Refactor the completion handling to utilize the ib_cqe API and
standard RDMA core CQ pooling. This transition provides several key
advantages:
1. Multi-CQ: Shift from a single shared per-device CQ to multiple
link-specific CQs via the CQ pool. This allows completion processing
to be parallelized across multiple CPU cores, effectively eliminating
the global CQ bottleneck.
2. Leverage DIM: Utilizing the standard CQ pool with IB_POLL_SOFTIRQ
enables Dynamic Interrupt Moderation from the RDMA core, optimizing
interrupt frequency and reducing CPU load under high pressure.
3. O(1) Context Retrieval: Replaces the expensive wr_id based lookup
logic (e.g., smc_wr_tx_find_pending_index) with direct context retrieval
using container_of() on the embedded ib_cqe.
4. Code Simplification: This refactoring results in a reduction of
~150 lines of code. It removes redundant sequence tracking, complex lookup
helpers, and manual CQ management, significantly improving maintainability.
Performance Test: redis-benchmark with max 32 connections per QP
Data format: Requests Per Second (RPS), Percentage in brackets
represents the gain/loss compared to TCP.
| Clients | TCP | SMC (original) | SMC (cq_pool) |
|---------|----------|---------------------|---------------------|
| c = 1 | 24449 | 31172 (+27%) | 34039 (+39%) |
| c = 2 | 46420 | 53216 (+14%) | 64391 (+38%) |
| c = 16 | 159673 | 83668 (-48%) <-- | 216947 (+36%) |
| c = 32 | 164956 | 97631 (-41%) <-- | 249376 (+51%) |
| c = 64 | 166322 | 118192 (-29%) <-- | 249488 (+50%) |
| c = 128 | 167700 | 121497 (-27%) <-- | 249480 (+48%) |
| c = 256 | 175021 | 146109 (-16%) <-- | 240384 (+37%) |
| c = 512 | 168987 | 101479 (-40%) <-- | 226634 (+34%) |
The results demonstrate that this optimization effectively resolves the
scalability bottleneck, with RPS increasing by over 110% at c=64
compared to the original implementation.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
net/smc/smc_core.c | 9 +-
net/smc/smc_core.h | 28 ++--
net/smc/smc_ib.c | 113 +++++-----------
net/smc/smc_ib.h | 7 -
net/smc/smc_tx.c | 1 -
net/smc/smc_wr.c | 312 +++++++++++++++++++--------------------------
net/smc/smc_wr.h | 40 ++----
7 files changed, 193 insertions(+), 317 deletions(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index cf6b620fef05..218a10e85361 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -815,17 +815,11 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
lnk->lgr = lgr;
smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
lnk->link_idx = link_idx;
- lnk->wr_rx_id_compl = 0;
smc_ibdev_cnt_inc(lnk);
smcr_copy_dev_info_to_link(lnk);
atomic_set(&lnk->conn_cnt, 0);
smc_llc_link_set_uid(lnk);
INIT_WORK(&lnk->link_down_wrk, smc_link_down_work);
- if (!lnk->smcibdev->initialized) {
- rc = (int)smc_ib_setup_per_ibdev(lnk->smcibdev);
- if (rc)
- goto out;
- }
get_random_bytes(rndvec, sizeof(rndvec));
lnk->psn_initial = rndvec[0] + (rndvec[1] << 8) +
(rndvec[2] << 16);
@@ -863,6 +857,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
if (rc)
goto free_link_mem;
lnk->state = SMC_LNK_ACTIVATING;
+ smc_wr_init_cqes(lnk);
return 0;
free_link_mem:
@@ -1373,7 +1368,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
smc_llc_link_clear(lnk, log);
smcr_buf_unmap_lgr(lnk);
smcr_rtoken_clear_link(lnk);
- smc_ib_modify_qp_error(lnk);
+ smc_wr_drain_rq(lnk);
smc_wr_free_link(lnk);
smc_ib_destroy_queue_pair(lnk);
smc_ib_dealloc_protection_domain(lnk);
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 5c18f08a4c8a..f98c0f0cb14b 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -89,8 +89,21 @@ struct smc_rdma_sges { /* sges per message send */
struct smc_rdma_wr { /* work requests per message
* send
*/
+ struct ib_cqe cqe;
struct ib_rdma_wr wr_tx_rdma[SMC_MAX_RDMA_WRITES];
-};
+} ____cacheline_aligned_in_smp;
+
+struct smc_ib_recv_wr {
+ struct ib_cqe cqe;
+ struct ib_recv_wr wr;
+ int idx;
+} ____cacheline_aligned_in_smp;
+
+struct smc_ib_send_wr {
+ struct ib_cqe cqe;
+ struct ib_send_wr wr;
+ int idx;
+} ____cacheline_aligned_in_smp;
#define SMC_LGR_ID_SIZE 4
@@ -100,23 +113,24 @@ struct smc_link {
struct ib_pd *roce_pd; /* IB protection domain,
* unique for every RoCE QP
*/
+ unsigned int nr_cqe; /* number of CQ entries */
+ struct ib_cq *ib_cq; /* IB completion queue */
struct ib_qp *roce_qp; /* IB queue pair */
struct ib_qp_attr qp_attr; /* IB queue pair attributes */
struct smc_wr_buf *wr_tx_bufs; /* WR send payload buffers */
- struct ib_send_wr *wr_tx_ibs; /* WR send meta data */
+ struct smc_ib_send_wr *wr_tx_ibs; /* WR send meta data */
struct ib_sge *wr_tx_sges; /* WR send gather meta data */
struct smc_rdma_sges *wr_tx_rdma_sges;/*RDMA WRITE gather meta data*/
struct smc_rdma_wr *wr_tx_rdmas; /* WR RDMA WRITE */
struct smc_wr_tx_pend *wr_tx_pends; /* WR send waiting for CQE */
struct completion *wr_tx_compl; /* WR send CQE completion */
/* above four vectors have wr_tx_cnt elements and use the same index */
- struct ib_send_wr *wr_tx_v2_ib; /* WR send v2 meta data */
+ struct smc_ib_send_wr *wr_tx_v2_ib; /* WR send v2 meta data */
struct ib_sge *wr_tx_v2_sge; /* WR send v2 gather meta data*/
struct smc_wr_tx_pend *wr_tx_v2_pend; /* WR send v2 waiting for CQE */
dma_addr_t wr_tx_dma_addr; /* DMA address of wr_tx_bufs */
dma_addr_t wr_tx_v2_dma_addr; /* DMA address of v2 tx buf*/
- atomic_long_t wr_tx_id; /* seq # of last sent WR */
unsigned long *wr_tx_mask; /* bit mask of used indexes */
u32 wr_tx_cnt; /* number of WR send buffers */
wait_queue_head_t wr_tx_wait; /* wait for free WR send buf */
@@ -126,7 +140,7 @@ struct smc_link {
struct completion tx_ref_comp;
u8 *wr_rx_bufs; /* WR recv payload buffers */
- struct ib_recv_wr *wr_rx_ibs; /* WR recv meta data */
+ struct smc_ib_recv_wr *wr_rx_ibs; /* WR recv meta data */
struct ib_sge *wr_rx_sges; /* WR recv scatter meta data */
/* above three vectors have wr_rx_cnt elements and use the same index */
int wr_rx_sge_cnt; /* rx sge, V1 is 1, V2 is either 2 or 1 */
@@ -135,13 +149,11 @@ struct smc_link {
*/
dma_addr_t wr_rx_dma_addr; /* DMA address of wr_rx_bufs */
dma_addr_t wr_rx_v2_dma_addr; /* DMA address of v2 rx buf*/
- u64 wr_rx_id; /* seq # of last recv WR */
- u64 wr_rx_id_compl; /* seq # of last completed WR */
u32 wr_rx_cnt; /* number of WR recv buffers */
unsigned long wr_rx_tstamp; /* jiffies when last buf rx */
- wait_queue_head_t wr_rx_empty_wait; /* wait for RQ empty */
struct ib_reg_wr wr_reg; /* WR register memory region */
+ struct ib_cqe wr_reg_cqe; /* ib_cqe for wr_reg */
wait_queue_head_t wr_reg_wait; /* wait for wr_reg result */
struct {
struct percpu_ref wr_reg_refs;
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 9bb495707445..eaeb3bacc613 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -111,15 +111,6 @@ int smc_ib_modify_qp_rts(struct smc_link *lnk)
IB_QP_MAX_QP_RD_ATOMIC);
}
-int smc_ib_modify_qp_error(struct smc_link *lnk)
-{
- struct ib_qp_attr qp_attr;
-
- memset(&qp_attr, 0, sizeof(qp_attr));
- qp_attr.qp_state = IB_QPS_ERR;
- return ib_modify_qp(lnk->roce_qp, &qp_attr, IB_QP_STATE);
-}
-
int smc_ib_ready_link(struct smc_link *lnk)
{
struct smc_link_group *lgr = smc_get_lgr(lnk);
@@ -133,10 +124,7 @@ int smc_ib_ready_link(struct smc_link *lnk)
if (rc)
goto out;
smc_wr_remember_qp_attr(lnk);
- rc = ib_req_notify_cq(lnk->smcibdev->roce_cq_recv,
- IB_CQ_SOLICITED_MASK);
- if (rc)
- goto out;
+
rc = smc_wr_rx_post_init(lnk);
if (rc)
goto out;
@@ -657,38 +645,59 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk)
if (lnk->roce_qp)
ib_destroy_qp(lnk->roce_qp);
lnk->roce_qp = NULL;
+ if (lnk->ib_cq) {
+ ib_cq_pool_put(lnk->ib_cq, lnk->nr_cqe);
+ lnk->ib_cq = NULL;
+ }
}
/* create a queue pair within the protection domain for a link */
int smc_ib_create_queue_pair(struct smc_link *lnk)
{
+ int max_send_wr, max_recv_wr, rc;
+ struct ib_cq *cq;
+
+ /* include unsolicited rdma_writes as well,
+ * there are max. 2 RDMA_WRITE per 1 WR_SEND.
+ */
+ max_send_wr = 3 * lnk->lgr->max_send_wr;
+ max_recv_wr = lnk->lgr->max_recv_wr + 1; /* +1 for ib_drain_rq() */
+
+ cq = ib_cq_pool_get(lnk->smcibdev->ibdev, max_send_wr + max_recv_wr, -1,
+ IB_POLL_SOFTIRQ);
+
+ if (IS_ERR(cq)) {
+ rc = PTR_ERR(cq);
+ return rc;
+ }
+
struct ib_qp_init_attr qp_attr = {
.event_handler = smc_ib_qp_event_handler,
.qp_context = lnk,
- .send_cq = lnk->smcibdev->roce_cq_send,
- .recv_cq = lnk->smcibdev->roce_cq_recv,
+ .send_cq = cq,
+ .recv_cq = cq,
.srq = NULL,
.cap = {
.max_send_sge = SMC_IB_MAX_SEND_SGE,
.max_recv_sge = lnk->wr_rx_sge_cnt,
+ .max_send_wr = max_send_wr,
+ .max_recv_wr = max_recv_wr,
.max_inline_data = 0,
},
.sq_sig_type = IB_SIGNAL_REQ_WR,
.qp_type = IB_QPT_RC,
};
- int rc;
- /* include unsolicited rdma_writes as well,
- * there are max. 2 RDMA_WRITE per 1 WR_SEND
- */
- qp_attr.cap.max_send_wr = 3 * lnk->lgr->max_send_wr;
- qp_attr.cap.max_recv_wr = lnk->lgr->max_recv_wr;
lnk->roce_qp = ib_create_qp(lnk->roce_pd, &qp_attr);
rc = PTR_ERR_OR_ZERO(lnk->roce_qp);
- if (IS_ERR(lnk->roce_qp))
+ if (IS_ERR(lnk->roce_qp)) {
lnk->roce_qp = NULL;
- else
+ ib_cq_pool_put(cq, max_send_wr + max_recv_wr);
+ } else {
smc_wr_remember_qp_attr(lnk);
+ lnk->nr_cqe = max_send_wr + max_recv_wr;
+ lnk->ib_cq = cq;
+ }
return rc;
}
@@ -838,62 +847,6 @@ void smc_ib_buf_unmap_sg(struct smc_link *lnk,
buf_slot->sgt[lnk->link_idx].sgl->dma_address = 0;
}
-long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev)
-{
- struct ib_cq_init_attr cqattr = {
- .cqe = SMC_MAX_CQE, .comp_vector = 0 };
- int cqe_size_order, smc_order;
- long rc;
-
- mutex_lock(&smcibdev->mutex);
- rc = 0;
- if (smcibdev->initialized)
- goto out;
- /* the calculated number of cq entries fits to mlx5 cq allocation */
- cqe_size_order = cache_line_size() == 128 ? 7 : 6;
- smc_order = MAX_PAGE_ORDER - cqe_size_order;
- if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE)
- cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2;
- smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev,
- smc_wr_tx_cq_handler, NULL,
- smcibdev, &cqattr);
- rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_send);
- if (IS_ERR(smcibdev->roce_cq_send)) {
- smcibdev->roce_cq_send = NULL;
- goto out;
- }
- smcibdev->roce_cq_recv = ib_create_cq(smcibdev->ibdev,
- smc_wr_rx_cq_handler, NULL,
- smcibdev, &cqattr);
- rc = PTR_ERR_OR_ZERO(smcibdev->roce_cq_recv);
- if (IS_ERR(smcibdev->roce_cq_recv)) {
- smcibdev->roce_cq_recv = NULL;
- goto err;
- }
- smc_wr_add_dev(smcibdev);
- smcibdev->initialized = 1;
- goto out;
-
-err:
- ib_destroy_cq(smcibdev->roce_cq_send);
-out:
- mutex_unlock(&smcibdev->mutex);
- return rc;
-}
-
-static void smc_ib_cleanup_per_ibdev(struct smc_ib_device *smcibdev)
-{
- mutex_lock(&smcibdev->mutex);
- if (!smcibdev->initialized)
- goto out;
- smcibdev->initialized = 0;
- ib_destroy_cq(smcibdev->roce_cq_recv);
- ib_destroy_cq(smcibdev->roce_cq_send);
- smc_wr_remove_dev(smcibdev);
-out:
- mutex_unlock(&smcibdev->mutex);
-}
-
static struct ib_client smc_ib_client;
static void smc_copy_netdev_ifindex(struct smc_ib_device *smcibdev, int port)
@@ -952,7 +905,6 @@ static int smc_ib_add_dev(struct ib_device *ibdev)
INIT_WORK(&smcibdev->port_event_work, smc_ib_port_event_work);
atomic_set(&smcibdev->lnk_cnt, 0);
init_waitqueue_head(&smcibdev->lnks_deleted);
- mutex_init(&smcibdev->mutex);
mutex_lock(&smc_ib_devices.mutex);
list_add_tail(&smcibdev->list, &smc_ib_devices.list);
mutex_unlock(&smc_ib_devices.mutex);
@@ -1001,7 +953,6 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data)
pr_warn_ratelimited("smc: removing ib device %s\n",
smcibdev->ibdev->name);
smc_smcr_terminate_all(smcibdev);
- smc_ib_cleanup_per_ibdev(smcibdev);
ib_unregister_event_handler(&smcibdev->event_handler);
cancel_work_sync(&smcibdev->port_event_work);
kfree(smcibdev);
diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h
index ef8ac2b7546d..a75fe8bcef3a 100644
--- a/net/smc/smc_ib.h
+++ b/net/smc/smc_ib.h
@@ -37,17 +37,12 @@ struct smc_ib_device { /* ib-device infos for smc */
struct ib_device *ibdev;
struct ib_port_attr pattr[SMC_MAX_PORTS]; /* ib dev. port attrs */
struct ib_event_handler event_handler; /* global ib_event handler */
- struct ib_cq *roce_cq_send; /* send completion queue */
- struct ib_cq *roce_cq_recv; /* recv completion queue */
- struct tasklet_struct send_tasklet; /* called by send cq handler */
- struct tasklet_struct recv_tasklet; /* called by recv cq handler */
char mac[SMC_MAX_PORTS][ETH_ALEN];
/* mac address per port*/
u8 pnetid[SMC_MAX_PORTS][SMC_MAX_PNETID_LEN];
/* pnetid per port */
bool pnetid_by_user[SMC_MAX_PORTS];
/* pnetid defined by user? */
- u8 initialized : 1; /* ib dev CQ, evthdl done */
struct work_struct port_event_work;
unsigned long port_event_mask;
DECLARE_BITMAP(ports_going_away, SMC_MAX_PORTS);
@@ -96,8 +91,6 @@ void smc_ib_destroy_queue_pair(struct smc_link *lnk);
int smc_ib_create_queue_pair(struct smc_link *lnk);
int smc_ib_ready_link(struct smc_link *lnk);
int smc_ib_modify_qp_rts(struct smc_link *lnk);
-int smc_ib_modify_qp_error(struct smc_link *lnk);
-long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev);
int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
struct smc_buf_desc *buf_slot, u8 link_idx);
void smc_ib_put_memory_region(struct ib_mr *mr);
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 3144b4b1fe29..d301df9ed58b 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -321,7 +321,6 @@ static int smc_tx_rdma_write(struct smc_connection *conn, int peer_rmbe_offset,
struct smc_link *link = conn->lnk;
int rc;
- rdma_wr->wr.wr_id = smc_wr_tx_get_next_wr_id(link);
rdma_wr->wr.num_sge = num_sges;
rdma_wr->remote_addr =
lgr->rtokens[conn->rtoken_idx][link->link_idx].dma_addr +
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 59c92b46945c..48037a3d97a3 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -31,14 +31,11 @@
#include "smc.h"
#include "smc_wr.h"
-#define SMC_WR_MAX_POLL_CQE 10 /* max. # of compl. queue elements in 1 poll */
-
#define SMC_WR_RX_HASH_BITS 4
static DEFINE_HASHTABLE(smc_wr_rx_hash, SMC_WR_RX_HASH_BITS);
static DEFINE_SPINLOCK(smc_wr_rx_hash_lock);
struct smc_wr_tx_pend { /* control data for a pending send request */
- u64 wr_id; /* work request id sent */
smc_wr_tx_handler handler;
enum ib_wc_status wc_status; /* CQE status */
struct smc_link *link;
@@ -63,55 +60,52 @@ void smc_wr_tx_wait_no_pending_sends(struct smc_link *link)
wait_event(link->wr_tx_wait, !smc_wr_is_tx_pend(link));
}
-static inline int smc_wr_tx_find_pending_index(struct smc_link *link, u64 wr_id)
+static void smc_wr_tx_rdma_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
{
- u32 i;
+ struct smc_link *link = wc->qp->qp_context;
- for (i = 0; i < link->wr_tx_cnt; i++) {
- if (link->wr_tx_pends[i].wr_id == wr_id)
- return i;
- }
- return link->wr_tx_cnt;
+ /* terminate link */
+ if (wc->status)
+ smcr_link_down_cond_sched(link);
+}
+
+static void smc_wr_reg_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
+{
+ struct smc_link *link = wc->qp->qp_context;
+
+ if (wc->status)
+ link->wr_reg_state = FAILED;
+ else
+ link->wr_reg_state = CONFIRMED;
+ smc_wr_wakeup_reg_wait(link);
}
-static inline void smc_wr_tx_process_cqe(struct ib_wc *wc)
+static void smc_wr_tx_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
{
- struct smc_wr_tx_pend pnd_snd;
+ struct smc_wr_tx_pend *tx_pend, pnd_snd;
+ struct smc_ib_send_wr *send_wr;
struct smc_link *link;
u32 pnd_snd_idx;
link = wc->qp->qp_context;
- if (wc->opcode == IB_WC_REG_MR) {
- if (wc->status)
- link->wr_reg_state = FAILED;
- else
- link->wr_reg_state = CONFIRMED;
- smc_wr_wakeup_reg_wait(link);
- return;
- }
+ send_wr = container_of(wc->wr_cqe, struct smc_ib_send_wr, cqe);
+ pnd_snd_idx = send_wr->idx;
+
+ tx_pend = (pnd_snd_idx == link->wr_tx_cnt) ? link->wr_tx_v2_pend :
+ &link->wr_tx_pends[pnd_snd_idx];
+
+ tx_pend->wc_status = wc->status;
+ memcpy(&pnd_snd, tx_pend, sizeof(pnd_snd));
+ /* clear the full struct smc_wr_tx_pend including .priv */
+ memset(tx_pend, 0, sizeof(*tx_pend));
- pnd_snd_idx = smc_wr_tx_find_pending_index(link, wc->wr_id);
if (pnd_snd_idx == link->wr_tx_cnt) {
- if (link->lgr->smc_version != SMC_V2 ||
- link->wr_tx_v2_pend->wr_id != wc->wr_id)
- return;
- link->wr_tx_v2_pend->wc_status = wc->status;
- memcpy(&pnd_snd, link->wr_tx_v2_pend, sizeof(pnd_snd));
- /* clear the full struct smc_wr_tx_pend including .priv */
- memset(link->wr_tx_v2_pend, 0,
- sizeof(*link->wr_tx_v2_pend));
memset(link->lgr->wr_tx_buf_v2, 0,
sizeof(*link->lgr->wr_tx_buf_v2));
} else {
- link->wr_tx_pends[pnd_snd_idx].wc_status = wc->status;
- if (link->wr_tx_pends[pnd_snd_idx].compl_requested)
+ if (pnd_snd.compl_requested)
complete(&link->wr_tx_compl[pnd_snd_idx]);
- memcpy(&pnd_snd, &link->wr_tx_pends[pnd_snd_idx],
- sizeof(pnd_snd));
- /* clear the full struct smc_wr_tx_pend including .priv */
- memset(&link->wr_tx_pends[pnd_snd_idx], 0,
- sizeof(link->wr_tx_pends[pnd_snd_idx]));
memset(&link->wr_tx_bufs[pnd_snd_idx], 0,
sizeof(link->wr_tx_bufs[pnd_snd_idx]));
if (!test_and_clear_bit(pnd_snd_idx, link->wr_tx_mask))
@@ -133,39 +127,6 @@ static inline void smc_wr_tx_process_cqe(struct ib_wc *wc)
wake_up(&link->wr_tx_wait);
}
-static void smc_wr_tx_tasklet_fn(struct tasklet_struct *t)
-{
- struct smc_ib_device *dev = from_tasklet(dev, t, send_tasklet);
- struct ib_wc wc[SMC_WR_MAX_POLL_CQE];
- int i = 0, rc;
- int polled = 0;
-
-again:
- polled++;
- do {
- memset(&wc, 0, sizeof(wc));
- rc = ib_poll_cq(dev->roce_cq_send, SMC_WR_MAX_POLL_CQE, wc);
- if (polled == 1) {
- ib_req_notify_cq(dev->roce_cq_send,
- IB_CQ_NEXT_COMP |
- IB_CQ_REPORT_MISSED_EVENTS);
- }
- if (!rc)
- break;
- for (i = 0; i < rc; i++)
- smc_wr_tx_process_cqe(&wc[i]);
- } while (rc > 0);
- if (polled == 1)
- goto again;
-}
-
-void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
-{
- struct smc_ib_device *dev = (struct smc_ib_device *)cq_context;
-
- tasklet_schedule(&dev->send_tasklet);
-}
-
/*---------------------------- request submission ---------------------------*/
static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
@@ -201,8 +162,6 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
struct smc_link_group *lgr = smc_get_lgr(link);
struct smc_wr_tx_pend *wr_pend;
u32 idx = link->wr_tx_cnt;
- struct ib_send_wr *wr_ib;
- u64 wr_id;
int rc;
*wr_buf = NULL;
@@ -226,14 +185,10 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
if (idx == link->wr_tx_cnt)
return -EPIPE;
}
- wr_id = smc_wr_tx_get_next_wr_id(link);
wr_pend = &link->wr_tx_pends[idx];
- wr_pend->wr_id = wr_id;
wr_pend->handler = handler;
wr_pend->link = link;
wr_pend->idx = idx;
- wr_ib = &link->wr_tx_ibs[idx];
- wr_ib->wr_id = wr_id;
*wr_buf = &link->wr_tx_bufs[idx];
if (wr_rdma_buf)
*wr_rdma_buf = &link->wr_tx_rdmas[idx];
@@ -247,22 +202,16 @@ int smc_wr_tx_get_v2_slot(struct smc_link *link,
struct smc_wr_tx_pend_priv **wr_pend_priv)
{
struct smc_wr_tx_pend *wr_pend;
- struct ib_send_wr *wr_ib;
- u64 wr_id;
if (link->wr_tx_v2_pend->idx == link->wr_tx_cnt)
return -EBUSY;
*wr_buf = NULL;
*wr_pend_priv = NULL;
- wr_id = smc_wr_tx_get_next_wr_id(link);
wr_pend = link->wr_tx_v2_pend;
- wr_pend->wr_id = wr_id;
wr_pend->handler = handler;
wr_pend->link = link;
wr_pend->idx = link->wr_tx_cnt;
- wr_ib = link->wr_tx_v2_ib;
- wr_ib->wr_id = wr_id;
*wr_buf = link->lgr->wr_tx_buf_v2;
*wr_pend_priv = &wr_pend->priv;
return 0;
@@ -306,10 +255,8 @@ int smc_wr_tx_send(struct smc_link *link, struct smc_wr_tx_pend_priv *priv)
struct smc_wr_tx_pend *pend;
int rc;
- ib_req_notify_cq(link->smcibdev->roce_cq_send,
- IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
pend = container_of(priv, struct smc_wr_tx_pend, priv);
- rc = ib_post_send(link->roce_qp, &link->wr_tx_ibs[pend->idx], NULL);
+ rc = ib_post_send(link->roce_qp, &link->wr_tx_ibs[pend->idx].wr, NULL);
if (rc) {
smc_wr_tx_put_slot(link, priv);
smcr_link_down_cond_sched(link);
@@ -322,10 +269,8 @@ int smc_wr_tx_v2_send(struct smc_link *link, struct smc_wr_tx_pend_priv *priv,
{
int rc;
- link->wr_tx_v2_ib->sg_list[0].length = len;
- ib_req_notify_cq(link->smcibdev->roce_cq_send,
- IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
- rc = ib_post_send(link->roce_qp, link->wr_tx_v2_ib, NULL);
+ link->wr_tx_v2_ib->wr.sg_list[0].length = len;
+ rc = ib_post_send(link->roce_qp, &link->wr_tx_v2_ib->wr, NULL);
if (rc) {
smc_wr_tx_put_slot(link, priv);
smcr_link_down_cond_sched(link);
@@ -367,10 +312,7 @@ int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr)
{
int rc;
- ib_req_notify_cq(link->smcibdev->roce_cq_send,
- IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
link->wr_reg_state = POSTED;
- link->wr_reg.wr.wr_id = (u64)(uintptr_t)mr;
link->wr_reg.mr = mr;
link->wr_reg.key = mr->rkey;
rc = ib_post_send(link->roce_qp, &link->wr_reg.wr, NULL);
@@ -431,94 +373,74 @@ static inline void smc_wr_rx_demultiplex(struct ib_wc *wc)
{
struct smc_link *link = (struct smc_link *)wc->qp->qp_context;
struct smc_wr_rx_handler *handler;
+ struct smc_ib_recv_wr *recv_wr;
struct smc_wr_rx_hdr *wr_rx;
- u64 temp_wr_id;
- u32 index;
if (wc->byte_len < sizeof(*wr_rx))
return; /* short message */
- temp_wr_id = wc->wr_id;
- index = do_div(temp_wr_id, link->wr_rx_cnt);
- wr_rx = (struct smc_wr_rx_hdr *)(link->wr_rx_bufs + index * link->wr_rx_buflen);
+
+ recv_wr = container_of(wc->wr_cqe, struct smc_ib_recv_wr, cqe);
+
+ wr_rx = (struct smc_wr_rx_hdr *)(link->wr_rx_bufs + recv_wr->idx * link->wr_rx_buflen);
hash_for_each_possible(smc_wr_rx_hash, handler, list, wr_rx->type) {
if (handler->type == wr_rx->type)
handler->handler(wc, wr_rx);
}
}
-static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num)
+static void smc_wr_rx_process_cqe(struct ib_cq *cq, struct ib_wc *wc)
{
- struct smc_link *link;
- int i;
+ struct smc_link *link = wc->qp->qp_context;
- for (i = 0; i < num; i++) {
- link = wc[i].qp->qp_context;
- link->wr_rx_id_compl = wc[i].wr_id;
- if (wc[i].status == IB_WC_SUCCESS) {
- link->wr_rx_tstamp = jiffies;
- smc_wr_rx_demultiplex(&wc[i]);
- smc_wr_rx_post(link); /* refill WR RX */
- } else {
- /* handle status errors */
- switch (wc[i].status) {
- case IB_WC_RETRY_EXC_ERR:
- case IB_WC_RNR_RETRY_EXC_ERR:
- case IB_WC_WR_FLUSH_ERR:
- smcr_link_down_cond_sched(link);
- if (link->wr_rx_id_compl == link->wr_rx_id)
- wake_up(&link->wr_rx_empty_wait);
- break;
- default:
- smc_wr_rx_post(link); /* refill WR RX */
- break;
- }
+ if (wc->status == IB_WC_SUCCESS) {
+ link->wr_rx_tstamp = jiffies;
+ smc_wr_rx_demultiplex(wc);
+ smc_wr_rx_post(link, wc->wr_cqe); /* refill WR RX */
+ } else {
+ /* handle status errors */
+ switch (wc->status) {
+ case IB_WC_RETRY_EXC_ERR:
+ case IB_WC_RNR_RETRY_EXC_ERR:
+ case IB_WC_WR_FLUSH_ERR:
+ smcr_link_down_cond_sched(link);
+ break;
+ default:
+ smc_wr_rx_post(link, wc->wr_cqe); /* refill WR RX */
+ break;
}
}
}
-static void smc_wr_rx_tasklet_fn(struct tasklet_struct *t)
+int smc_wr_rx_post_init(struct smc_link *link)
{
- struct smc_ib_device *dev = from_tasklet(dev, t, recv_tasklet);
- struct ib_wc wc[SMC_WR_MAX_POLL_CQE];
- int polled = 0;
- int rc;
+ int i, rc = 0;
-again:
- polled++;
- do {
- memset(&wc, 0, sizeof(wc));
- rc = ib_poll_cq(dev->roce_cq_recv, SMC_WR_MAX_POLL_CQE, wc);
- if (polled == 1) {
- ib_req_notify_cq(dev->roce_cq_recv,
- IB_CQ_SOLICITED_MASK
- | IB_CQ_REPORT_MISSED_EVENTS);
- }
- if (!rc)
- break;
- smc_wr_rx_process_cqes(&wc[0], rc);
- } while (rc > 0);
- if (polled == 1)
- goto again;
+ for (i = 0; i < link->wr_rx_cnt; i++)
+ rc = smc_wr_rx_post(link, &link->wr_rx_ibs[i].cqe);
+ return rc;
}
-void smc_wr_rx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
-{
- struct smc_ib_device *dev = (struct smc_ib_device *)cq_context;
+/***************************** init, exit, misc ******************************/
- tasklet_schedule(&dev->recv_tasklet);
+static inline void smc_wr_reg_init_cqe(struct ib_cqe *cqe)
+{
+ cqe->done = smc_wr_reg_process_cqe;
}
-int smc_wr_rx_post_init(struct smc_link *link)
+static inline void smc_wr_tx_init_cqe(struct ib_cqe *cqe)
{
- u32 i;
- int rc = 0;
+ cqe->done = smc_wr_tx_process_cqe;
+}
- for (i = 0; i < link->wr_rx_cnt; i++)
- rc = smc_wr_rx_post(link);
- return rc;
+static inline void smc_wr_rx_init_cqe(struct ib_cqe *cqe)
+{
+ cqe->done = smc_wr_rx_process_cqe;
}
-/***************************** init, exit, misc ******************************/
+static inline void smc_wr_tx_rdma_init_cqe(struct ib_cqe *cqe)
+{
+ cqe->done = smc_wr_tx_rdma_process_cqe;
+}
void smc_wr_remember_qp_attr(struct smc_link *lnk)
{
@@ -550,7 +472,7 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk)
lnk->wr_tx_cnt = min_t(size_t, lnk->max_send_wr,
lnk->qp_attr.cap.max_send_wr);
lnk->wr_rx_cnt = min_t(size_t, lnk->max_recv_wr,
- lnk->qp_attr.cap.max_recv_wr);
+ lnk->qp_attr.cap.max_recv_wr - 1); /* -1 for ib_drain_rq() */
}
static void smc_wr_init_sge(struct smc_link *lnk)
@@ -571,14 +493,14 @@ static void smc_wr_init_sge(struct smc_link *lnk)
lnk->roce_pd->local_dma_lkey;
lnk->wr_tx_rdma_sges[i].tx_rdma_sge[1].wr_tx_rdma_sge[1].lkey =
lnk->roce_pd->local_dma_lkey;
- lnk->wr_tx_ibs[i].next = NULL;
- lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
- lnk->wr_tx_ibs[i].num_sge = 1;
- lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
- lnk->wr_tx_ibs[i].send_flags =
+ lnk->wr_tx_ibs[i].wr.next = NULL;
+ lnk->wr_tx_ibs[i].wr.sg_list = &lnk->wr_tx_sges[i];
+ lnk->wr_tx_ibs[i].wr.num_sge = 1;
+ lnk->wr_tx_ibs[i].wr.opcode = IB_WR_SEND;
+ lnk->wr_tx_ibs[i].wr.send_flags =
IB_SEND_SIGNALED | IB_SEND_SOLICITED;
if (send_inline)
- lnk->wr_tx_ibs[i].send_flags |= IB_SEND_INLINE;
+ lnk->wr_tx_ibs[i].wr.send_flags |= IB_SEND_INLINE;
lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.opcode = IB_WR_RDMA_WRITE;
lnk->wr_tx_rdmas[i].wr_tx_rdma[1].wr.opcode = IB_WR_RDMA_WRITE;
lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.sg_list =
@@ -592,11 +514,11 @@ static void smc_wr_init_sge(struct smc_link *lnk)
lnk->wr_tx_v2_sge->length = SMC_WR_BUF_V2_SIZE;
lnk->wr_tx_v2_sge->lkey = lnk->roce_pd->local_dma_lkey;
- lnk->wr_tx_v2_ib->next = NULL;
- lnk->wr_tx_v2_ib->sg_list = lnk->wr_tx_v2_sge;
- lnk->wr_tx_v2_ib->num_sge = 1;
- lnk->wr_tx_v2_ib->opcode = IB_WR_SEND;
- lnk->wr_tx_v2_ib->send_flags =
+ lnk->wr_tx_v2_ib->wr.next = NULL;
+ lnk->wr_tx_v2_ib->wr.sg_list = lnk->wr_tx_v2_sge;
+ lnk->wr_tx_v2_ib->wr.num_sge = 1;
+ lnk->wr_tx_v2_ib->wr.opcode = IB_WR_SEND;
+ lnk->wr_tx_v2_ib->wr.send_flags =
IB_SEND_SIGNALED | IB_SEND_SOLICITED;
}
@@ -622,10 +544,11 @@ static void smc_wr_init_sge(struct smc_link *lnk)
lnk->wr_rx_sges[x + 1].lkey =
lnk->roce_pd->local_dma_lkey;
}
- lnk->wr_rx_ibs[i].next = NULL;
- lnk->wr_rx_ibs[i].sg_list = &lnk->wr_rx_sges[x];
- lnk->wr_rx_ibs[i].num_sge = lnk->wr_rx_sge_cnt;
+ lnk->wr_rx_ibs[i].wr.next = NULL;
+ lnk->wr_rx_ibs[i].wr.sg_list = &lnk->wr_rx_sges[x];
+ lnk->wr_rx_ibs[i].wr.num_sge = lnk->wr_rx_sge_cnt;
}
+
lnk->wr_reg.wr.next = NULL;
lnk->wr_reg.wr.num_sge = 0;
lnk->wr_reg.wr.send_flags = IB_SEND_SIGNALED;
@@ -641,7 +564,6 @@ void smc_wr_free_link(struct smc_link *lnk)
return;
ibdev = lnk->smcibdev->ibdev;
- smc_wr_drain_cq(lnk);
smc_wr_wakeup_reg_wait(lnk);
smc_wr_wakeup_tx_wait(lnk);
@@ -826,18 +748,6 @@ int smc_wr_alloc_link_mem(struct smc_link *link)
return -ENOMEM;
}
-void smc_wr_remove_dev(struct smc_ib_device *smcibdev)
-{
- tasklet_kill(&smcibdev->recv_tasklet);
- tasklet_kill(&smcibdev->send_tasklet);
-}
-
-void smc_wr_add_dev(struct smc_ib_device *smcibdev)
-{
- tasklet_setup(&smcibdev->recv_tasklet, smc_wr_rx_tasklet_fn);
- tasklet_setup(&smcibdev->send_tasklet, smc_wr_tx_tasklet_fn);
-}
-
static void smcr_wr_tx_refs_free(struct percpu_ref *ref)
{
struct smc_link *lnk = container_of(ref, struct smc_link, wr_tx_refs);
@@ -857,8 +767,6 @@ int smc_wr_create_link(struct smc_link *lnk)
struct ib_device *ibdev = lnk->smcibdev->ibdev;
int rc = 0;
- smc_wr_tx_set_wr_id(&lnk->wr_tx_id, 0);
- lnk->wr_rx_id = 0;
lnk->wr_rx_dma_addr = ib_dma_map_single(
ibdev, lnk->wr_rx_bufs, lnk->wr_rx_buflen * lnk->wr_rx_cnt,
DMA_FROM_DEVICE);
@@ -906,7 +814,6 @@ int smc_wr_create_link(struct smc_link *lnk)
if (rc)
goto cancel_ref;
init_completion(&lnk->reg_ref_comp);
- init_waitqueue_head(&lnk->wr_rx_empty_wait);
return rc;
cancel_ref:
@@ -931,3 +838,42 @@ int smc_wr_create_link(struct smc_link *lnk)
out:
return rc;
}
+
+void smc_wr_init_cqes(struct smc_link *lnk)
+{
+ int i;
+
+ /* init CQE for WR fast reg */
+ smc_wr_reg_init_cqe(&lnk->wr_reg_cqe);
+ lnk->wr_reg.wr.wr_cqe = &lnk->wr_reg_cqe;
+
+ /* init CQE for WR WRITE */
+ for (i = 0; i < lnk->wr_tx_cnt; i++) {
+ int n;
+
+ smc_wr_tx_rdma_init_cqe(&lnk->wr_tx_rdmas[i].cqe);
+ for (n = 0; n < SMC_MAX_RDMA_WRITES; n++)
+ lnk->wr_tx_rdmas[i].wr_tx_rdma[n].wr.wr_cqe = &lnk->wr_tx_rdmas[i].cqe;
+ }
+
+ /* init CQEs for WR RECV */
+ for (i = 0; i < lnk->wr_rx_cnt; i++) {
+ smc_wr_rx_init_cqe(&lnk->wr_rx_ibs[i].cqe);
+ lnk->wr_rx_ibs[i].wr.wr_cqe = &lnk->wr_rx_ibs[i].cqe;
+ lnk->wr_rx_ibs[i].idx = i;
+ }
+
+ /* init CQEs for WR SEND */
+ for (i = 0; i < lnk->wr_tx_cnt; i++) {
+ smc_wr_tx_init_cqe(&lnk->wr_tx_ibs[i].cqe);
+ lnk->wr_tx_ibs[i].wr.wr_cqe = &lnk->wr_tx_ibs[i].cqe;
+ lnk->wr_tx_ibs[i].idx = i;
+ }
+
+ /* init CQE for SMC-Rv2 WR SEND */
+ if (lnk->lgr->smc_version == SMC_V2) {
+ smc_wr_tx_init_cqe(&lnk->wr_tx_v2_ib->cqe);
+ lnk->wr_tx_v2_ib->wr.wr_cqe = &lnk->wr_tx_v2_ib->cqe;
+ lnk->wr_tx_v2_ib->idx = lnk->wr_tx_cnt;
+ }
+}
diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h
index aa4533af9122..295575fb060a 100644
--- a/net/smc/smc_wr.h
+++ b/net/smc/smc_wr.h
@@ -44,19 +44,6 @@ struct smc_wr_rx_handler {
u8 type;
};
-/* Only used by RDMA write WRs.
- * All other WRs (CDC/LLC) use smc_wr_tx_send handling WR_ID implicitly
- */
-static inline long smc_wr_tx_get_next_wr_id(struct smc_link *link)
-{
- return atomic_long_inc_return(&link->wr_tx_id);
-}
-
-static inline void smc_wr_tx_set_wr_id(atomic_long_t *wr_tx_id, long val)
-{
- atomic_long_set(wr_tx_id, val);
-}
-
static inline bool smc_wr_tx_link_hold(struct smc_link *link)
{
if (!smc_link_sendable(link))
@@ -70,9 +57,10 @@ static inline void smc_wr_tx_link_put(struct smc_link *link)
percpu_ref_put(&link->wr_tx_refs);
}
-static inline void smc_wr_drain_cq(struct smc_link *lnk)
+static inline void smc_wr_drain_rq(struct smc_link *lnk)
{
- wait_event(lnk->wr_rx_empty_wait, lnk->wr_rx_id_compl == lnk->wr_rx_id);
+ if (lnk->qp_attr.cur_qp_state != IB_QPS_RESET)
+ ib_drain_rq(lnk->roce_qp);
}
static inline void smc_wr_wakeup_tx_wait(struct smc_link *lnk)
@@ -86,18 +74,12 @@ static inline void smc_wr_wakeup_reg_wait(struct smc_link *lnk)
}
/* post a new receive work request to fill a completed old work request entry */
-static inline int smc_wr_rx_post(struct smc_link *link)
+static inline int smc_wr_rx_post(struct smc_link *link, struct ib_cqe *cqe)
{
- int rc;
- u64 wr_id, temp_wr_id;
- u32 index;
-
- wr_id = ++link->wr_rx_id; /* tasklet context, thus not atomic */
- temp_wr_id = wr_id;
- index = do_div(temp_wr_id, link->wr_rx_cnt);
- link->wr_rx_ibs[index].wr_id = wr_id;
- rc = ib_post_recv(link->roce_qp, &link->wr_rx_ibs[index], NULL);
- return rc;
+ struct smc_ib_recv_wr *recv_wr;
+
+ recv_wr = container_of(cqe, struct smc_ib_recv_wr, cqe);
+ return ib_post_recv(link->roce_qp, &recv_wr->wr, NULL);
}
int smc_wr_create_link(struct smc_link *lnk);
@@ -107,8 +89,6 @@ void smc_wr_free_link(struct smc_link *lnk);
void smc_wr_free_link_mem(struct smc_link *lnk);
void smc_wr_free_lgr_mem(struct smc_link_group *lgr);
void smc_wr_remember_qp_attr(struct smc_link *lnk);
-void smc_wr_remove_dev(struct smc_ib_device *smcibdev);
-void smc_wr_add_dev(struct smc_ib_device *smcibdev);
int smc_wr_tx_get_free_slot(struct smc_link *link, smc_wr_tx_handler handler,
struct smc_wr_buf **wr_buf,
@@ -126,12 +106,12 @@ int smc_wr_tx_v2_send(struct smc_link *link,
struct smc_wr_tx_pend_priv *priv, int len);
int smc_wr_tx_send_wait(struct smc_link *link, struct smc_wr_tx_pend_priv *priv,
unsigned long timeout);
-void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context);
void smc_wr_tx_wait_no_pending_sends(struct smc_link *link);
int smc_wr_rx_register_handler(struct smc_wr_rx_handler *handler);
int smc_wr_rx_post_init(struct smc_link *link);
-void smc_wr_rx_cq_handler(struct ib_cq *ib_cq, void *cq_context);
int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr);
+void smc_wr_init_cqes(struct smc_link *lnk);
+
#endif /* SMC_WR_H */
--
2.45.0
^ permalink raw reply related
* [PATCH net-next 2/2] net/smc: reduce TX slot contention with exclusive wait
From: D. Wythe @ 2026-05-08 6:37 UTC (permalink / raw)
To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
linux-rdma, linux-s390, netdev, oliver.yang, pasic
In-Reply-To: <20260508063718.101622-1-alibuda@linux.alibaba.com>
smc_wr_tx_get_free_slot() waits for a free TX slot with
wait_event_interruptible_timeout(). Since the wait_event family
enqueues waiters as non-exclusive, wake_up() may wake multiple
waiters even though only one can use the slot, causing
thundering-herd contention when slots are scarce.
Use an exclusive wait loop with prepare_to_wait_exclusive() so
wake_up() wakes only one waiter per freed slot.
smc_wr_wakeup_tx_wait() still uses wake_up_all() during link
teardown, so teardown behavior is unchanged.
Performance measured with netperf TCP_RR (63 flows, 200B write /
1000B read, 60s duration):
+-------------------------------+---------------+---------------+
| smcr_max_conns_per_lgr | 32 | 255 |
|-------------------------------+---------------+---------------|
| before | 4.85 Gb/s | 657.95 Mb/s |
|-------------------------------+---------------+---------------|
| after | 5.01 Gb/s | 2.2 Gb/s |
+-------------------------------+---------------+---------------+
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/smc_wr.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 48037a3d97a3..0a6f2befb0e2 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -159,9 +159,11 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
struct smc_rdma_wr **wr_rdma_buf,
struct smc_wr_tx_pend_priv **wr_pend_priv)
{
+ unsigned long timeout = SMC_WR_TX_WAIT_FREE_SLOT_TIME;
struct smc_link_group *lgr = smc_get_lgr(link);
struct smc_wr_tx_pend *wr_pend;
u32 idx = link->wr_tx_cnt;
+ DEFINE_WAIT(wait);
int rc;
*wr_buf = NULL;
@@ -171,17 +173,27 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
if (rc)
return rc;
} else {
- rc = wait_event_interruptible_timeout(
- link->wr_tx_wait,
- !smc_link_sendable(link) ||
- lgr->terminating ||
- (smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY),
- SMC_WR_TX_WAIT_FREE_SLOT_TIME);
- if (!rc) {
- /* timeout - terminate link */
- smcr_link_down_cond_sched(link);
- return -EPIPE;
+ rc = 0;
+ for (;;) {
+ prepare_to_wait_exclusive(&link->wr_tx_wait, &wait,
+ TASK_INTERRUPTIBLE);
+ if (!smc_link_sendable(link) || lgr->terminating ||
+ smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY)
+ break;
+ timeout = schedule_timeout(timeout);
+ if (!timeout) {
+ /* timeout - terminate link */
+ smcr_link_down_cond_sched(link);
+ break;
+ }
+ if (signal_pending(current)) {
+ rc = -ERESTARTSYS;
+ break;
+ }
}
+ finish_wait(&link->wr_tx_wait, &wait);
+ if (rc)
+ return rc;
if (idx == link->wr_tx_cnt)
return -EPIPE;
}
--
2.45.0
^ permalink raw reply related
* [PATCH net v2] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
From: Hyunwoo Kim @ 2026-05-08 6:42 UTC (permalink / raw)
To: dhowells, marc.dionne, davem, edumazet, kuba, pabeni, horms,
qingfang.deng
Cc: linux-afs, netdev, stable, imv4bel
The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true. An skb
that is not cloned but still carries paged fragments (skb->data_len != 0)
falls through to the in-place decryption path, which binds the frag
pages directly into the AEAD/skcipher SGL via skb_to_sgvec().
Extend the gate so that any skb with non-linear data is also copied,
ensuring the security handler always operates on a fully linear skb.
The OOM/trace handling already in place is reused.
Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
Changes in v2:
- Use skb_is_nonlinear() instead of skb->data_len
- v1: https://lore.kernel.org/all/afKV2zGR6rrelPC7@v4bel/
---
net/rxrpc/call_event.c | 2 +-
net/rxrpc/conn_event.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fdd683261226..a6ad5ff6ec5f 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -334,7 +334,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
sp->hdr.securityIndex != 0 &&
- skb_cloned(skb)) {
+ (skb_cloned(skb) || skb_is_nonlinear(skb))) {
/* Unshare the packet so that it can be
* modified by in-place decryption.
*/
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index a2130d25aaa9..632cbeff1f5d 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -245,7 +245,7 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
{
int ret;
- if (skb_cloned(skb)) {
+ if (skb_cloned(skb) || skb_is_nonlinear(skb)) {
/* Copy the packet if shared so that we can do in-place
* decryption.
*/
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net] xfrm: route MIGRATE notifications to caller's netns
From: Steffen Klassert @ 2026-05-08 6:46 UTC (permalink / raw)
To: Maoyi Xie
Cc: herbert, davem, kuba, pabeni, edumazet, horms, antony.antony,
netdev, linux-kernel, stable
In-Reply-To: <20260504142736.1228425-1-maoyi.xie@ntu.edu.sg>
On Mon, May 04, 2026 at 10:27:36PM +0800, Maoyi Xie wrote:
> xfrm_send_migrate() in net/xfrm/xfrm_user.c and pfkey_send_migrate()
> in net/key/af_key.c both hardcode &init_net for the multicast that
> announces a successful XFRM_MSG_MIGRATE / SADB_X_MIGRATE.
>
> XFRM_MSG_MIGRATE arrives on a per-netns NETLINK_XFRM socket, and the
> rest of the xfrm/af_key netlink path was made netns-aware in 2008.
> The other 14 multicast paths in xfrm_user.c route their event using
> xs_net(x), xp_net(xp) or sock_net(skb->sk); only the migrate path
> was missed.
>
> Two consequences of the init_net hardcoding:
>
> 1. The notification (selector, old/new endpoint addresses, and the
> km_address) is delivered to listeners on init_net's
> XFRMNLGRP_MIGRATE / pfkey BROADCAST_ALL groups rather than on
> the issuing netns. An IKE daemon running in init_net therefore
> receives migration notifications originating from any other
> netns on the host.
>
> 2. An IKE daemon running inside a non-init netns and subscribed
> to its own XFRMNLGRP_MIGRATE / pfkey groups never receives the
> notification of its own migration. IKEv2 MOBIKE / address-update
> handling inside a netns is silently broken.
>
> Thread struct net through km_migrate() and the xfrm_mgr.migrate
> function pointer, drop the &init_net override in xfrm_send_migrate()
> and pfkey_send_migrate(), and pass the caller's net (already in
> scope in xfrm_migrate() via sock_net(skb->sk)) all the way down.
> struct xfrm_mgr is in-tree only and not exported as a stable API,
> so the function-pointer signature change is internal.
>
> pfkey_broadcast() is already netns-aware via net_generic(net,
> pfkey_net_id) since the pernet conversion. The five other
> pfkey_broadcast() callers in af_key.c already pass xs_net(x),
> sock_net(sk) or a per-netns net, so this only removes the
> &init_net outlier.
>
> Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Applied, thanks a lot!
^ permalink raw reply
* Re: [PATCH v2 17/22] dt-bindings: clock: Add StarFive JHB100 Peripheral-2 clock and reset generator
From: Rob Herring (Arm) @ 2026-05-08 6:46 UTC (permalink / raw)
To: Changhuang Liang
Cc: Krzysztof Kozlowski, linux-hardening, Conor Dooley, Hal Feng,
Stephen Boyd, linux-riscv, Albert Ou, netdev, Palmer Dabbelt,
linux-clk, Kees Cook, Alexandre Ghiti, Philipp Zabel,
Michael Turquette, Emil Renner Berthing, devicetree, linux-kernel,
Sia Jee Heng, Gustavo A . R . Silva, Richard Cochran,
Paul Walmsley, Ley Foon Tan
In-Reply-To: <20260508053632.818548-18-changhuang.liang@starfivetech.com>
On Thu, 07 May 2026 22:36:27 -0700, Changhuang Liang wrote:
> Add bindings for the Peripheral-2 clock and reset generator (PER2CRG)
> on the JHB100 RISC-V SoC by StarFive Ltd.
>
> Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
> ---
> .../clock/starfive,jhb100-per2crg.yaml | 76 +++++++++++++++++++
> .../dt-bindings/clock/starfive,jhb100-crg.h | 57 ++++++++++++++
> .../dt-bindings/reset/starfive,jhb100-crg.h | 17 +++++
> 3 files changed, 150 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/clock/starfive,jhb100-per2crg.yaml
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/net/renesas,ether.example.dtb: ethernet-phy@1 (ethernet-phy-id0022.1537): compatible: ['ethernet-phy-id0022.1537', 'ethernet-phy-ieee802.3-c22'] is too long
from schema $id: http://devicetree.org/schemas/net/micrel.yaml
doc reference errors (make refcheckdocs):
See https://patchwork.kernel.org/project/devicetree/patch/20260508053632.818548-18-changhuang.liang@starfivetech.com
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH iwl-next] ice: prevent integer overflow
From: Rinitha, SX @ 2026-05-08 6:54 UTC (permalink / raw)
To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
Nguyen, Anthony L, Loktionov, Aleksandr
Cc: netdev@vger.kernel.org, Czapnik, Lukasz
In-Reply-To: <20260320050544.422640-1-aleksandr.loktionov@intel.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Aleksandr Loktionov
> Sent: 20 March 2026 10:36
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Czapnik, Lukasz <lukasz.czapnik@intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-next] ice: prevent integer overflow
>
> From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>
> In ice_sched_bw_to_rl_profile(), the loop over 64 bits computes the scheduler timestamp rate as:
>
> ts_rate = div64_long((s64)hw->psm_clk_freq,
> pow_result * ICE_RL_PROF_TS_MULTIPLIER);
>
> where pow_result = BIT_ULL(i). For large values of i, the product pow_result * ICE_RL_PROF_TS_MULTIPLIER overflows u64 before being used as the divisor, producing incorrect ts_rate values and potentially undefined behaviour.
>
> Fix this by pre-computing ts_freq = hw->psm_clk_freq / ICE_RL_PROF_TS_MULTIPLIER once before the loop and then dividing only by pow_result inside the loop. The division order avoids the overflow while preserving the same mathematical result. Declare ts_freq as s64 to match the type domain of the surrounding arithmetic and avoid a redundant cast at the use site.
>
> While at it, scope the loop variable i to the for statement itself.
>
> Fixes: 1ddef455f4a8 ("ice: Add NDO callback to set the maximum per-queue bitrate")
> Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_sched.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
^ permalink raw reply
* RE: [EXTERNAL] Re: [PATCH net-next 6/9] net: atlantic: implement AQC113 L2/L3/L4 RX filter management filter management management
From: Sukhdeep Soni [C] @ 2026-05-08 6:56 UTC (permalink / raw)
To: Vadim Fedorenko, netdev@vger.kernel.org
Cc: Igor Russkikh, Egor Pomozov, richardcochran@gmail.com,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, linux-kernel@vger.kernel.org
In-Reply-To: <2a15199c-d48f-4034-a907-6efa883d5d99@linux.dev>
On 06 May 2026, Vadim Fedorenko wrote:
> On 06/05/2026 14:57, sukhdeeps@marvell.com wrote:
> From: Sukhdeep Singh <sukhdeeps@marvell.com>
>
> Implement complete RX filter management for AQC113 hardware:
>
> - Add tag-based filter policy with reference-counted sharing, allowing
> multiple filter rules to share the same L3 or L4 hardware filter
> when their match criteria are identical.
> - Implement L3 (IPv4/IPv6 source/destination address and protocol)
> filter find, get (program HW and increment refcount), and put
> (decrement refcount and clear HW when last user releases).
> - Implement L4 (TCP/UDP/SCTP source/destination port) filter
> management with the same find/get/put pattern.
> - Add combined L3L4 filter configuration that translates legacy
> aq_rx_filter_l3l4 commands into AQC113 separate L3+L4 filter
> programming with Action Resolver Table (ART) entries.
> - Add L2 ethertype filter set/clear with tag-based ART integration.
> - Add MAC address setup using firmware-provided L2 filter base index.
>
> Update hardware initialization:
> - Use firmware-reported ART section base and count instead of
> hardcoded 0xFFFF section enable.
> - Enable L3 v6/v4 select mode for simultaneous IPv4/IPv6 filtering.
> - Initialize L3L4 filter indices to -1 on reset.
>
> Wire up hw_filter_l2_set, hw_filter_l2_clear, hw_filter_l3l4_set,
> hw_set_mac_address, hw_get_version, and hw_get_regs in hw_atl2_ops.
>
> Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
> ---
> .../net/ethernet/aquantia/atlantic/aq_hw.h | 2 +
> .../aquantia/atlantic/hw_atl2/hw_atl2.c | 582 +++++++++++++++++-
> 2 files changed, 580 insertions(+), 4 deletions(-)
[...]
>
> @@ -380,6 +422,9 @@ static void hw_atl2_hw_init_new_rx_filters(struct aq_hw_s *self)
> {
> u8 *prio_tc_map = self->aq_nic_cfg->prio_tc_map;
> struct hw_atl2_priv *priv = self->priv;
> + u32 art_first_sec, art_last_sec;
> + u32 art_sections;
> + u32 art_mask = 0;
> no need to init variable which is overwritten later ...
Agreed, will remove art_mask initialization in v2
> u16 action;
> u8 index;
> int i;
> @@ -394,9 +439,14 @@ static void hw_atl2_hw_init_new_rx_filters(struct aq_hw_s *self)
> * REC entry is used for further processing. If multiple entries match,
> * the lowest REC entry, Action field will be selected.
> */
> - hw_atl2_rpf_act_rslvr_section_en_set(self, 0xFFFF);
> + art_last_sec = priv->art_base_index / 8 + priv->art_count / 8;
> + art_first_sec = priv->art_base_index / 8;
> + art_mask = (BIT(art_last_sec) - 1) - (BIT(art_first_sec) - 1);
> ... here
> + art_sections = hw_atl2_rpf_act_rslvr_section_en_get(self) | art_mask;
> + hw_atl2_rpf_act_rslvr_section_en_set(self, art_sections);
> + hw_atl2_rpf_l3_v6_v4_select_set(self, 1);
> hw_atl2_rpfl2_uc_flr_tag_set(self, HW_ATL2_RPF_TAG_BASE_UC,
> - HW_ATL2_MAC_UC);
> + priv->l2_filters_base_index);
> hw_atl2_rpfl2_bc_flr_tag_set(self, HW_ATL2_RPF_TAG_BASE_UC);
>
> /* FW reserves the beginning of ART, thus all driver entries must
> @@ -530,6 +580,35 @@ static int hw_atl2_hw_init_rx_path(struct aq_hw_s *self)
> return aq_hw_err_from_flags(self);
> }
>
> +static int hw_atl2_hw_mac_addr_set(struct aq_hw_s *self, const u8 *mac_addr)
> +{
> + struct hw_atl2_priv *priv = self->priv;
> + u32 location = priv->l2_filters_base_index;
> + unsigned int h = 0U;
> + unsigned int l = 0U;
> + int err = 0;
> here again, h, l and err are not used with init values.
will remove initialization for h, l and err in v2. Thank you for the review.
> +
> + if (!mac_addr) {
> + err = -EINVAL;
> + goto err_exit;
> + }
> + h = (mac_addr[0] << 8) | (mac_addr[1]);
> + l = (mac_addr[2] << 24) | (mac_addr[3] << 16) |
> + (mac_addr[4] << 8) | mac_addr[5];
> +
> + hw_atl_rpfl2_uc_flr_en_set(self, 0U, location);
> + hw_atl_rpfl2unicast_dest_addresslsw_set(self, l, location);
> + hw_atl_rpfl2unicast_dest_addressmsw_set(self, h, location);
> + hw_atl_rpfl2unicast_flr_act_set(self, 1U, location);
> + hw_atl2_rpfl2_uc_flr_tag_set(self, HW_ATL2_RPF_TAG_BASE_UC, location);
> + hw_atl_rpfl2_uc_flr_en_set(self, 1U, location);
> +
> + err = aq_hw_err_from_flags(self);
> +
> +err_exit:
> + return err;
> +}
> +
> static int hw_atl2_hw_init(struct aq_hw_s *self, const u8 *mac_addr)
> {
> static u32 aq_hw_atl2_igcr_table_[4][2] = {
[...]
^ permalink raw reply
* [PATCH net-next v3 0/2] net: dsa: yt921x: Add port TBF support
From: David Yang @ 2026-05-08 6:57 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Jiri Pirko, Simon Horman, linux-kernel
v2: https://lore.kernel.org/r/20260504101258.1608004-1-mmyangfl@gmail.com
- drop changes on tc_tbf_qopt_offload_replace_params
- drop excessive checks for tbf setup
- react to TC_TBF_STATS correctly
v1: https://lore.kernel.org/r/20260502215314.917687-1-mmyangfl@gmail.com
- remove queue related register definiations
- add missing extack param during tbf setup
v0: https://lore.kernel.org/r/20260409171209.2575583-1-mmyangfl@gmail.com
- picked from old series
- add extack to the offload struct
- add all params to the offload struct
David Yang (2):
net/sched: tbf: add extack to offload params
net: dsa: yt921x: Add port TBF support
drivers/net/dsa/yt921x.c | 84 ++++++++++++++++++++++++++++++++++++++++
drivers/net/dsa/yt921x.h | 18 +++++++++
include/net/pkt_cls.h | 1 +
net/sched/sch_tbf.c | 9 ++++-
4 files changed, 110 insertions(+), 2 deletions(-)
--
2.53.0
^ permalink raw reply
* [PATCH net-next v3 1/2] net/sched: tbf: add extack to offload params
From: David Yang @ 2026-05-08 6:57 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Jiri Pirko, Simon Horman, linux-kernel
In-Reply-To: <20260508065757.2566258-1-mmyangfl@gmail.com>
Drivers might have error messages to propagate to user space. Propagate
the netlink extack so that they can inform user space in a verbal way of
their limitations.
Signed-off-by: David Yang <mmyangfl@gmail.com>
---
include/net/pkt_cls.h | 1 +
net/sched/sch_tbf.c | 9 +++++++--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 99ac747b7906..3bd08d7f39c1 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -1046,6 +1046,7 @@ struct tc_tbf_qopt_offload_replace_params {
};
struct tc_tbf_qopt_offload {
+ struct netlink_ext_ack *extack;
enum tc_tbf_command command;
u32 handle;
u32 parent;
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index f2340164f579..4576111fe075 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -139,7 +139,8 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r,
return len;
}
-static void tbf_offload_change(struct Qdisc *sch)
+static void tbf_offload_change(struct Qdisc *sch,
+ struct netlink_ext_ack *extack)
{
struct tbf_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
@@ -148,6 +149,7 @@ static void tbf_offload_change(struct Qdisc *sch)
if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
return;
+ qopt.extack = extack;
qopt.command = TC_TBF_REPLACE;
qopt.handle = sch->handle;
qopt.parent = sch->parent;
@@ -166,6 +168,7 @@ static void tbf_offload_destroy(struct Qdisc *sch)
if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
return;
+ qopt.extack = NULL;
qopt.command = TC_TBF_DESTROY;
qopt.handle = sch->handle;
qopt.parent = sch->parent;
@@ -176,6 +179,7 @@ static int tbf_offload_dump(struct Qdisc *sch)
{
struct tc_tbf_qopt_offload qopt;
+ qopt.extack = NULL;
qopt.command = TC_TBF_STATS;
qopt.handle = sch->handle;
qopt.parent = sch->parent;
@@ -193,6 +197,7 @@ static void tbf_offload_graft(struct Qdisc *sch, struct Qdisc *new,
.parent = sch->parent,
.child_handle = new->handle,
.command = TC_TBF_GRAFT,
+ .extack = extack,
};
qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, old,
@@ -477,7 +482,7 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt,
qdisc_put(old);
err = 0;
- tbf_offload_change(sch);
+ tbf_offload_change(sch, extack);
done:
return err;
}
--
2.53.0
^ permalink raw reply related
* [PATCH net-next v3 2/2] net: dsa: yt921x: Add port TBF support
From: David Yang @ 2026-05-08 6:57 UTC (permalink / raw)
To: netdev
Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Jiri Pirko, Simon Horman, linux-kernel
In-Reply-To: <20260508065757.2566258-1-mmyangfl@gmail.com>
React to TC_SETUP_QDISC_TBF and configure the egress shaper as
appropriate with the maximum rate and burst size requested by the user.
Per queue shaper is possible, though not touched in this commit.
Signed-off-by: David Yang <mmyangfl@gmail.com>
---
drivers/net/dsa/yt921x.c | 84 ++++++++++++++++++++++++++++++++++++++++
drivers/net/dsa/yt921x.h | 18 +++++++++
2 files changed, 102 insertions(+)
diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/yt921x.c
index fd1fdcd5f9a3..e5f547629bfd 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/yt921x.c
@@ -24,6 +24,7 @@
#include <net/dsa.h>
#include <net/dscp.h>
#include <net/ieee8021q.h>
+#include <net/pkt_cls.h>
#include "yt921x.h"
@@ -1272,6 +1273,17 @@ yt921x_marker_tfm_police(struct yt921x_marker *marker,
priv, port, extack);
}
+static int
+yt921x_marker_tfm_shape(struct yt921x_marker *marker, u64 rate, u64 burst,
+ unsigned int flags, struct yt921x_priv *priv, int port,
+ struct netlink_ext_ack *extack)
+{
+ return yt921x_marker_tfm(marker, rate, burst, flags,
+ priv->port_shape_slot_ns, YT921X_SHAPE_CIR_MAX,
+ YT921X_SHAPE_CBS_MAX, YT921X_SHAPE_UNIT_MAX,
+ priv, port, extack);
+}
+
static int
yt921x_police_validate(const struct flow_action_police *police,
const struct flow_action *action,
@@ -1378,6 +1390,70 @@ yt921x_dsa_port_policer_add(struct dsa_switch *ds, int port,
return res;
}
+static int
+yt921x_dsa_port_setup_tc_tbf_port(struct dsa_switch *ds, int port,
+ const struct tc_tbf_qopt_offload *qopt)
+{
+ struct yt921x_priv *priv = to_yt921x_priv(ds);
+ struct netlink_ext_ack *extack = qopt->extack;
+ u32 ctrls[2];
+ int res;
+
+ if (qopt->parent != TC_H_ROOT)
+ return -EOPNOTSUPP;
+
+ switch (qopt->command) {
+ case TC_TBF_STATS:
+ return 0;
+ case TC_TBF_DESTROY:
+ ctrls[0] = 0;
+ ctrls[1] = 0;
+ break;
+ case TC_TBF_REPLACE: {
+ const struct tc_tbf_qopt_offload_replace_params *p;
+ struct yt921x_marker marker;
+
+ p = &qopt->replace_params;
+
+ res = yt921x_marker_tfm_shape(&marker, p->rate.rate_bytes_ps,
+ p->max_size,
+ YT921X_MARKER_SINGLE_BUCKET,
+ priv, port, extack);
+ if (res)
+ return res;
+
+ ctrls[0] = YT921X_PORT_SHAPE_CTRLa_CIR(marker.cir) |
+ YT921X_PORT_SHAPE_CTRLa_CBS(marker.cbs);
+ ctrls[1] = YT921X_PORT_SHAPE_CTRLb_UNIT(marker.unit) |
+ YT921X_PORT_SHAPE_CTRLb_EN;
+ break;
+ }
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ mutex_lock(&priv->reg_lock);
+ res = yt921x_reg64_write(priv, YT921X_PORTn_SHAPE_CTRL(port), ctrls);
+ mutex_unlock(&priv->reg_lock);
+
+ return res;
+}
+
+static int
+yt921x_dsa_port_setup_tc(struct dsa_switch *ds, int port,
+ enum tc_setup_type type, void *type_data)
+{
+ switch (type) {
+ case TC_SETUP_QDISC_TBF: {
+ const struct tc_tbf_qopt_offload *qopt = type_data;
+
+ return yt921x_dsa_port_setup_tc_tbf_port(ds, port, qopt);
+ }
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static int
yt921x_mirror_del(struct yt921x_priv *priv, int port, bool ingress)
{
@@ -3524,6 +3600,13 @@ static int yt921x_chip_setup_tc(struct yt921x_priv *priv)
return res;
priv->meter_slot_ns = ctrl * op_ns;
+ ctrl = max(priv->port_shape_slot_ns / op_ns,
+ YT921X_PORT_SHAPE_SLOT_MIN);
+ res = yt921x_reg_write(priv, YT921X_PORT_SHAPE_SLOT, ctrl);
+ if (res)
+ return res;
+ priv->port_shape_slot_ns = ctrl * op_ns;
+
return 0;
}
@@ -3680,6 +3763,7 @@ static const struct dsa_switch_ops yt921x_dsa_switch_ops = {
/* rate */
.port_policer_del = yt921x_dsa_port_policer_del,
.port_policer_add = yt921x_dsa_port_policer_add,
+ .port_setup_tc = yt921x_dsa_port_setup_tc,
/* hsr */
.port_hsr_leave = dsa_port_simple_hsr_leave,
.port_hsr_join = dsa_port_simple_hsr_join,
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/yt921x.h
index 546b12a8994a..70fa780c337f 100644
--- a/drivers/net/dsa/yt921x.h
+++ b/drivers/net/dsa/yt921x.h
@@ -531,6 +531,19 @@ enum yt921x_app_selector {
#define YT921X_MIRROR_PORT_M GENMASK(3, 0)
#define YT921X_MIRROR_PORT(x) FIELD_PREP(YT921X_MIRROR_PORT_M, (x))
+#define YT921X_PORT_SHAPE_SLOT 0x34000c
+#define YT921X_PORT_SHAPE_SLOT_SLOT_M GENMASK(11, 0)
+#define YT921X_PORTn_SHAPE_CTRL(port) (0x354000 + 8 * (port))
+#define YT921X_PORT_SHAPE_CTRLb_EN BIT(4)
+#define YT921X_PORT_SHAPE_CTRLb_PKT_MODE BIT(3) /* 0: byte rate mode */
+#define YT921X_PORT_SHAPE_CTRLb_UNIT_M GENMASK(2, 0)
+#define YT921X_PORT_SHAPE_CTRLb_UNIT(x) FIELD_PREP(YT921X_PORT_SHAPE_CTRLb_UNIT_M, (x))
+#define YT921X_PORT_SHAPE_CTRLa_CBS_M GENMASK(31, 18)
+#define YT921X_PORT_SHAPE_CTRLa_CBS(x) FIELD_PREP(YT921X_PORT_SHAPE_CTRLa_CBS_M, (x))
+#define YT921X_PORT_SHAPE_CTRLa_CIR_M GENMASK(17, 0)
+#define YT921X_PORT_SHAPE_CTRLa_CIR(x) FIELD_PREP(YT921X_PORT_SHAPE_CTRLa_CIR_M, (x))
+#define YT921X_PORTn_SHAPE_STAT(port) (0x356000 + 4 * (port))
+
#define YT921X_EDATA_EXTMODE 0xfb
#define YT921X_EDATA_LEN 0x100
@@ -556,6 +569,10 @@ enum yt921x_fdb_entry_status {
#define YT921X_METER_UNIT_MAX ((1 << 3) - 1)
#define YT921X_METER_CIR_MAX ((1 << 18) - 1)
#define YT921X_METER_CBS_MAX ((1 << 16) - 1)
+#define YT921X_PORT_SHAPE_SLOT_MIN 80
+#define YT921X_SHAPE_UNIT_MAX ((1 << 3) - 1)
+#define YT921X_SHAPE_CIR_MAX ((1 << 18) - 1)
+#define YT921X_SHAPE_CBS_MAX ((1 << 14) - 1)
#define YT921X_LAG_NUM 2
#define YT921X_LAG_PORT_NUM 4
@@ -652,6 +669,7 @@ struct yt921x_priv {
const struct yt921x_info *info;
unsigned int meter_slot_ns;
+ unsigned int port_shape_slot_ns;
/* cache of dsa_cpu_ports(ds) */
u16 cpu_ports_mask;
unsigned char cycle_ns;
--
2.53.0
^ permalink raw reply related
* RE: [Intel-wired-lan] [PATCH net] ice: fix locking around wait_event_interruptible_locked_irq
From: Rinitha, SX @ 2026-05-08 7:01 UTC (permalink / raw)
To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
Nguyen, Anthony L, Loktionov, Aleksandr
Cc: netdev@vger.kernel.org, Keller, Jacob E, Jakub Kicinski
In-Reply-To: <20260327072332.130320-2-aleksandr.loktionov@intel.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Aleksandr Loktionov
> Sent: 27 March 2026 12:53
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Keller, Jacob E <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>
> Subject: [Intel-wired-lan] [PATCH net] ice: fix locking around wait_event_interruptible_locked_irq
>
> From: Jacob Keller <jacob.e.keller@intel.com>
>
> Commit 50327223a8bb ("ice: add lock to protect low latency interface") introduced a wait queue used to protect the low latency timer interface.
> The queue is used with the wait_event_interruptible_locked_irq macro, which unlocks the wait queue lock while sleeping. The irq variant uses spin_lock_irq and spin_unlock_irq to manage this. The wait queue lock was previously locked using spin_lock_irqsave. This difference in lock variants could lead to issues, since wait_event would unlock the wait queue and restore interrupts while sleeping.
>
> The ice_read_phy_tstamp_ll_e810() function is ultimately called through ice_read_phy_tstamp, which is called from ice_ptp_process_tx_tstamp or ice_ptp_clear_unexpected_tx_ready. The former is called through the miscellaneous IRQ thread function, while the latter is called from the service task work queue thread. Neither of these functions has interrupts disabled, so use spin_lock_irq instead of spin_lock_irqsave.
>
> Fixes: 50327223a8bb ("ice: add lock to protect low latency interface")
> Cc: stable@vger.kernel.org
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/netdev/20250109181823.77f44c69@kernel.org/
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
>
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox