Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v5 4/4] arm64: dts: cix: sky1: add audss cru
From: Krzysztof Kozlowski @ 2026-06-22  9:02 UTC (permalink / raw)
  To: joakim.zhang
  Cc: mturquette, sboyd, bmasney, robh, krzk+dt, conor+dt, p.zabel,
	gary.yang, cix-kernel-upstream, linux-clk, devicetree,
	linux-kernel, linux-arm-kernel
In-Reply-To: <20260622022520.3127103-5-joakim.zhang@cixtech.com>

On Mon, Jun 22, 2026 at 10:25:20AM +0800, joakim.zhang@cixtech.com wrote:
>  
> +		audss_cru: clock-controller@7110000 {
> +			compatible = "cix,sky1-audss-cru";
> +			reg = <0x0 0x07110000 0x0 0x10000>;
> +			#clock-cells = <1>;
> +			#reset-cells = <1>;
> +			clocks = <&scmi_clk CLK_TREE_AUDIO_CLK0>,
> +				 <&scmi_clk CLK_TREE_AUDIO_CLK2>,
> +				 <&scmi_clk CLK_TREE_AUDIO_CLK4>,
> +				 <&scmi_clk CLK_TREE_AUDIO_CLK5>;
> +			clock-names = "x8k", "x11k", "sys", "48m";
> +			power-domains = <&smc_devpd SKY1_PD_AUDIO>;
> +			resets = <&s5_syscon SKY1_AUDIO_HIFI5_NOC_RESET_N>;

> +			status = "okay";

Drop.

> +		};
> +

Best regards,
Krzysztof



^ permalink raw reply

* Re: [PATCH v5 1/4] dt-bindings: soc: cix: add sky1 audss cru controller
From: Krzysztof Kozlowski @ 2026-06-22  9:01 UTC (permalink / raw)
  To: joakim.zhang
  Cc: mturquette, sboyd, bmasney, robh, krzk+dt, conor+dt, p.zabel,
	gary.yang, cix-kernel-upstream, linux-clk, devicetree,
	linux-kernel, linux-arm-kernel
In-Reply-To: <20260622022520.3127103-2-joakim.zhang@cixtech.com>

On Mon, Jun 22, 2026 at 10:25:17AM +0800, joakim.zhang@cixtech.com wrote:
> From: Joakim Zhang <joakim.zhang@cixtech.com>
> 
> The Cix Sky1 Audio Subsystem (AUDSS) Clock and Reset Unit (CRU)
> groups clock muxing, gating and block-level software reset control
> in a single register block.
> 
> Signed-off-by: Joakim Zhang <joakim.zhang@cixtech.com>
> ---
>  .../bindings/soc/cix/cix,sky1-audss-cru.yaml  | 92 +++++++++++++++++++
>  .../dt-bindings/clock/cix,sky1-audss-clock.h  | 60 ++++++++++++
>  .../dt-bindings/reset/cix,sky1-audss-reset.h  | 25 +++++
>  3 files changed, 177 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/soc/cix/cix,sky1-audss-cru.yaml
>  create mode 100644 include/dt-bindings/clock/cix,sky1-audss-clock.h
>  create mode 100644 include/dt-bindings/reset/cix,sky1-audss-reset.h

Both headers should have the same name as the compatible. I already
requested this some time ago, I think.

Best regards,
Krzysztof



^ permalink raw reply

* Re: [RFC] arm64: early_ioremap fails to map ACPI MADT on 64K pages
From: Hanjun Guo @ 2026-06-22  8:55 UTC (permalink / raw)
  To: Will Deacon, Yu Peng
  Cc: Catalin Marinas, linux-arm-kernel, Rafael J. Wysocki, Len Brown,
	linux-acpi, Andrew Morton, linux-mm, linux-kernel, lpieralisi,
	sudeep.holla
In-Reply-To: <ajVVfnGIq_6zt2jC@willie-the-truck>

On 2026/6/19 22:43, Will Deacon wrote:
> +arm64 ACPI maintainers
> 
> On Wed, Jun 17, 2026 at 02:01:10PM +0800, Yu Peng wrote:
>> I hit an early boot failure on an arm64 system built with 64K pages while
>> parsing the ACPI MADT.
>>
>> The failing system reports:
>>
>>    PAGE_SIZE: 64K
>>    MADT physical address: 0x5a7ae018
>>    MADT length: 0x32094
> 
> The MADT isn't even 4k aligned, so why does the page size matter in this
> case?
> 
>> The failure happens when acpi_table_parse_madt() calls into early_memremap()
>> via __acpi_map_table(). The MADT itself is smaller than 256K, but its
>> placement causes the early mapping to require 5 64K pages:
>>
>>    offset within 64K page = 0x5a7ae018 & 0xffff = 0xe018
>>    mapped range           = PAGE_ALIGN(0xe018 + 0x32094)
>>                           = PAGE_ALIGN(0x400ac)
>>                           = 0x50000
>>    nrpages                = 0x50000 / 0x10000 = 5
>>
>> On arm64, NR_FIX_BTMAPS is currently derived from a 256K per-slot budget:
>>
>>    #define NR_FIX_BTMAPS        (SZ_256K / PAGE_SIZE)
>>
>> So for 64K pages, NR_FIX_BTMAPS is 4. The mapping therefore fails the
>> early_ioremap() check:
>>
>>    if (WARN_ON(nrpages > NR_FIX_BTMAPS))
>>            return NULL;
>>
>> After that, MADT parsing fails and the boot continues with symptoms such as:
>>
>>    ACPI: APIC not present
>>    missing boot CPU MPIDR, not enabling secondaries
>>    Kernel panic - not syncing: No interrupt controller found.
>>
>> A firmware change can avoid this by placing MADT so that:
>>
>>    (madt_phys & 0xffff) + madt_length <= SZ_256K
>>
>> However, I do not think ACPI requires such placement, so this looks like a
>> kernel-side robustness issue as well, especially on large arm64 systems where
>> MADT can grow with CPU topology.
>>
>> One possible kernel-side change is to increase the boot-time mapping budget for
>> CONFIG_ARM64_64K_PAGES, for example using a 512K per-slot budget only in that
>> configuration. I do not think this should be applied unconditionally to all
>> page sizes, since the arm64 early fixmap code expects the boot-ioremap range
>> to stay within one PMD.
>>
>> Has anyone seen similar failures on arm64 64K systems?
>>
>> Would maintainers prefer treating this as a firmware layout issue, or would
>> increasing the early_ioremap budget for 64K pages be acceptable?
> 
> It think it boils down to what ACPI says about the alignment of the MADT.

I checked the ACPI spec and it didn't require the alignment for ACPI
tables, but in UEFI spec, it says (for aarch64):

ACPI Tables loaded at boot time can be contained in memory of type
EfiACPIReclaimMemory (recommended) or EfiACPIMemoryNVS.

EFI memory descriptors of type EfiACPIReclaimMemory and EfiACPIMemoryNVS
must be aligned on a 4 KiB boundary and must be a multiple of 4 KiB in
size.

It only requires EfiACPIReclaimMemory type to be 4K aligned, not
for each ACPI table, because ACPI tables can be packed into the
allocated EfiACPIReclaimMemory type, correct me if I'm wrong!

Thanks
Hanjun


^ permalink raw reply

* Re: [PATCH v5 8/8] arm64: dts: imx8qxp-mek: add parallel ov5640 camera support
From: guoniu.zhou @ 2026-06-22  9:01 UTC (permalink / raw)
  To: Frank.Li
  Cc: Sakari Ailus, Mauro Carvalho Chehab, Michael Riesch,
	Laurent Pinchart, Frank Li, Martin Kepplinger-Novakovic,
	Rui Miguel Silva, Purism Kernel Team, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, linux-media, linux-kernel,
	imx, devicetree, linux-arm-kernel
In-Reply-To: <20260617-imx8qxp_pcam-v5-8-7fa6c8e7fba7@nxp.com>

> Add parallel ov5640 nodes in imx8qxp-mek and create overlay file to enable
> it because it can work at two mode: MIPI CSI and parallel mode.
> 
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>
> diff --git a/arch/arm64/boot/dts/freescale/Makefile b/arch/arm64/boot/dts/freescale/Makefile
> index 711e36cc2c99..f54fd4cdd926 100644
> --- a/arch/arm64/boot/dts/freescale/Makefile
> +++ b/arch/arm64/boot/dts/freescale/Makefile
> @@ -434,6 +434,9 @@ dtb-$(CONFIG_ARCH_MXC) += imx8qxp-mek-pcie-ep.dtb
>  imx8qxp-mek-ov5640-csi-dtbs := imx8qxp-mek.dtb imx8qxp-mek-ov5640-csi.dtbo
>  dtb-${CONFIG_ARCH_MXC} += imx8qxp-mek-ov5640-csi.dtb
>  
> +imx8qxp-mek-ov5640-cpi-dtbs := imx8qxp-mek.dtb imx8qxp-mek-ov5640-cpi.dtbo
> +dtb-${CONFIG_ARCH_MXC} += imx8qxp-mek-ov5640-cpi.dtb
> +
>  dtb-$(CONFIG_ARCH_MXC) += imx8qxp-tqma8xqp-mba8xx.dtb
>  dtb-$(CONFIG_ARCH_MXC) += imx8qxp-tqma8xqps-mb-smarc-2.dtb
>  dtb-$(CONFIG_ARCH_MXC) += imx8ulp-9x9-evk.dtb
> diff --git a/arch/arm64/boot/dts/freescale/imx8qxp-mek-ov5640-cpi.dtso b/arch/arm64/boot/dts/freescale/imx8qxp-mek-ov5640-cpi.dtso
> new file mode 100644
> index 000000000000..9fbdd798f17d
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/imx8qxp-mek-ov5640-cpi.dtso
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +/*
> + * Copyright 2025 NXP
> + */
> +
> +/dts-v1/;
> +/plugin/;
> +
> +#include <dt-bindings/clock/imx8-lpcg.h>
> +#include <dt-bindings/gpio/gpio.h>
> +#include <dt-bindings/media/video-interfaces.h>
> +#include <dt-bindings/pinctrl/pads-imx8qxp.h>
> +
> +&cm40_i2c {
> +	#address-cells = <1>;
> +	#size-cells = <0>;
> +
> +	ov5640_pi: camera@3c {
> +		compatible = "ovti,ov5640";
> +		reg = <0x3c>;
> +		clocks = <&pi0_misc_lpcg IMX_LPCG_CLK_0>;
> +		clock-names = "xclk";
> +		assigned-clocks = <&pi0_misc_lpcg IMX_LPCG_CLK_0>;
> +		assigned-clock-rates = <24000000>;
> +		AVDD-supply = <&reg_2v8>;
> +		DOVDD-supply = <&reg_1v8>;
> +		DVDD-supply = <&reg_1v5>;
> +		pinctrl-0 = <&pinctrl_parallel_cpi>;
> +		pinctrl-names = "default";
> +		powerdown-gpios = <&lsio_gpio3 2 GPIO_ACTIVE_HIGH>;
> +		reset-gpios = <&lsio_gpio3 3 GPIO_ACTIVE_LOW>;
> +
> +		port {
> +			ov5640_pi_ep: endpoint {
> +				bus-type = <MEDIA_BUS_TYPE_PARALLEL>;
> +				bus-width = <8>;
> +				hsync-active = <1>;
> +				pclk-sample = <1>;
> +				remote-endpoint = <&parallel_cpi_in>;
> +				vsync-active = <0>;
> +			};
> +		};
> +	};
> +};
> +
> +&iomuxc {
> +	pinctrl_parallel_cpi: parallelcpigrp {
> +		fsl,pins = <
> +			IMX8QXP_CSI_D00_CI_PI_D02		0xc0000041
> +			IMX8QXP_CSI_D01_CI_PI_D03		0xc0000041
> +			IMX8QXP_CSI_D02_CI_PI_D04		0xc0000041
> +			IMX8QXP_CSI_D03_CI_PI_D05		0xc0000041
> +			IMX8QXP_CSI_D04_CI_PI_D06		0xc0000041
> +			IMX8QXP_CSI_D05_CI_PI_D07		0xc0000041
> +			IMX8QXP_CSI_D06_CI_PI_D08		0xc0000041
> +			IMX8QXP_CSI_D07_CI_PI_D09		0xc0000041
> +
> +			IMX8QXP_CSI_MCLK_CI_PI_MCLK		0xc0000041
> +			IMX8QXP_CSI_PCLK_CI_PI_PCLK		0xc0000041
> +			IMX8QXP_CSI_HSYNC_CI_PI_HSYNC		0xc0000041
> +			IMX8QXP_CSI_VSYNC_CI_PI_VSYNC		0xc0000041
> +			IMX8QXP_CSI_EN_LSIO_GPIO3_IO02		0xc0000041
> +			IMX8QXP_CSI_RESET_LSIO_GPIO3_IO03	0xc0000041
> +		>;
> +	};
> +};
> +
> +&isi {
> +	status = "okay";
> +};
> +
> +&parallel_cpi {
> +	status = "okay";
> +
> +	ports {
> +		port@0 {
> +			parallel_cpi_in: endpoint {
> +				hsync-active = <1>;
> +				remote-endpoint = <&ov5640_pi_ep>;
> +			};
> +		};
> +	};
> +};

Reviewed-by: Guoniu Zhou <guoniu.zhou@nxp.com>

-- 
Guoniu Zhou <guoniu.zhou@oss.nxp.com>


^ permalink raw reply

* Re: [PATCH v5 7/8] arm64: dts: imx8: add camera parallel interface (CPI) node
From: guoniu.zhou @ 2026-06-22  9:01 UTC (permalink / raw)
  To: Frank.Li
  Cc: Sakari Ailus, Mauro Carvalho Chehab, Michael Riesch,
	Laurent Pinchart, Frank Li, Martin Kepplinger-Novakovic,
	Rui Miguel Silva, Purism Kernel Team, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, linux-media, linux-kernel,
	imx, devicetree, linux-arm-kernel
In-Reply-To: <20260617-imx8qxp_pcam-v5-7-7fa6c8e7fba7@nxp.com>

> Add camera parallel interface (CPI) node.
> 
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>
> diff --git a/arch/arm64/boot/dts/freescale/imx8-ss-img.dtsi b/arch/arm64/boot/dts/freescale/imx8-ss-img.dtsi
> index a72b2f1c4a1b..b504f99f6acd 100644
> --- a/arch/arm64/boot/dts/freescale/imx8-ss-img.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8-ss-img.dtsi
> @@ -222,6 +222,19 @@ irqsteer_parallel: irqsteer@58260000 {
>  		status = "disabled";
>  	};
>  
> +	parallel_cpi: cpi@58261000 {
> +		compatible = "fsl,imx8qxp-pcif";
> +		reg = <0x58261000 0x1000>;
> +		clocks = <&pi0_pxl_lpcg IMX_LPCG_CLK_0>,
> +			 <&pi0_ipg_lpcg IMX_LPCG_CLK_4>;
> +		clock-names = "pixel", "ipg";
> +		assigned-clocks = <&clk IMX_SC_R_PI_0 IMX_SC_PM_CLK_PER>;
> +		assigned-clock-parents = <&clk IMX_SC_R_PI_0_PLL IMX_SC_PM_CLK_PLL>;
> +		assigned-clock-rates = <160000000>;
> +		power-domains = <&pd IMX_SC_R_PI_0>;
> +		status = "disabled";
> +	};
> +
>  	pi0_ipg_lpcg: clock-controller@58263004 {
>  		compatible = "fsl,imx8qxp-lpcg";
>  		reg = <0x58263004 0x4>;
> diff --git a/arch/arm64/boot/dts/freescale/imx8qxp-ss-img.dtsi b/arch/arm64/boot/dts/freescale/imx8qxp-ss-img.dtsi
> index 232cf25dadfc..5aae15540d6c 100644
> --- a/arch/arm64/boot/dts/freescale/imx8qxp-ss-img.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8qxp-ss-img.dtsi
> @@ -62,6 +62,14 @@ isi_in_2: endpoint {
>  				remote-endpoint = <&mipi_csi0_out>;
>  			};
>  		};
> +
> +		port@4 {
> +			reg = <4>;
> +
> +			isi_in_4: endpoint {
> +				remote-endpoint = <&parallel_cpi_out>;
> +			};
> +		};
>  	};
>  };
>  
> @@ -95,3 +103,22 @@ &jpegenc {
>  &mipi_csi_1 {
>  	status = "disabled";
>  };
> +
> +&parallel_cpi {
> +	ports {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		port@0 {
> +			reg = <0>;
> +		};
> +
> +		port@1 {
> +			reg = <1>;
> +
> +			parallel_cpi_out: endpoint {
> +				remote-endpoint = <&isi_in_4>;
> +			};
> +		};
> +	};
> +};

Reviewed-by: Guoniu Zhou <guoniu.zhou@nxp.com>

-- 
Guoniu Zhou <guoniu.zhou@oss.nxp.com>


^ permalink raw reply

* Re: [PATCH v5 3/8] media: synopsys: Use v4l2_subdev_get_frame_desc_passthrough()
From: guoniu.zhou @ 2026-06-22  9:01 UTC (permalink / raw)
  To: Frank.Li
  Cc: Sakari Ailus, Mauro Carvalho Chehab, Michael Riesch,
	Laurent Pinchart, Frank Li, Martin Kepplinger-Novakovic,
	Rui Miguel Silva, Purism Kernel Team, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, linux-media, linux-kernel,
	imx, devicetree, linux-arm-kernel
In-Reply-To: <20260617-imx8qxp_pcam-v5-3-7fa6c8e7fba7@nxp.com>

> Replace the local frame descriptor callback implementation with
> v4l2_subdev_get_frame_desc_passthrough().
> 
> This helper provides the same functionality while avoiding duplicate
> code and simplifying the driver implementation.
> 
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>
> diff --git a/drivers/media/platform/synopsys/dw-mipi-csi2rx.c b/drivers/media/platform/synopsys/dw-mipi-csi2rx.c
> index 41e48365167e..f51367409ff4 100644
> --- a/drivers/media/platform/synopsys/dw-mipi-csi2rx.c
> +++ b/drivers/media/platform/synopsys/dw-mipi-csi2rx.c
> @@ -630,31 +630,11 @@ static int dw_mipi_csi2rx_disable_streams(struct v4l2_subdev *sd,
>  	return ret;
>  }
>  
> -static int
> -dw_mipi_csi2rx_get_frame_desc(struct v4l2_subdev *sd, unsigned int pad,
> -			      struct v4l2_mbus_frame_desc *fd)
> -{
> -	struct dw_mipi_csi2rx_device *csi2 = to_csi2(sd);
> -	struct v4l2_subdev *remote_sd;
> -	struct media_pad *remote_pad;
> -
> -	remote_pad = media_pad_remote_pad_unique(&csi2->pads[DW_MIPI_CSI2RX_PAD_SINK]);
> -	if (IS_ERR(remote_pad)) {
> -		dev_err(csi2->dev, "can't get remote source pad\n");
> -		return PTR_ERR(remote_pad);
> -	}
> -
> -	remote_sd = media_entity_to_v4l2_subdev(remote_pad->entity);
> -
> -	return v4l2_subdev_call(remote_sd, pad, get_frame_desc,
> -				remote_pad->index, fd);
> -}
> -
>  static const struct v4l2_subdev_pad_ops dw_mipi_csi2rx_pad_ops = {
>  	.enum_mbus_code = dw_mipi_csi2rx_enum_mbus_code,
>  	.get_fmt = v4l2_subdev_get_fmt,
>  	.set_fmt = dw_mipi_csi2rx_set_fmt,
> -	.get_frame_desc = dw_mipi_csi2rx_get_frame_desc,
> +	.get_frame_desc = v4l2_subdev_get_frame_desc_passthrough,
>  	.set_routing = dw_mipi_csi2rx_set_routing,
>  	.enable_streams = dw_mipi_csi2rx_enable_streams,
>  	.disable_streams = dw_mipi_csi2rx_disable_streams,

Reviewed-by: Guoniu Zhou <guoniu.zhou@nxp.com>

-- 
Guoniu Zhou <guoniu.zhou@oss.nxp.com>


^ permalink raw reply

* Re: [PATCH v5 6/8] media: nxp: add V4L2 subdev driver for camera parallel interface (CPI)
From: guoniu.zhou @ 2026-06-22  9:01 UTC (permalink / raw)
  To: Frank.Li
  Cc: Sakari Ailus, Mauro Carvalho Chehab, Michael Riesch,
	Laurent Pinchart, Frank Li, Martin Kepplinger-Novakovic,
	Rui Miguel Silva, Purism Kernel Team, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, linux-media, linux-kernel,
	imx, devicetree, linux-arm-kernel, Alice Yuan, Robert Chiras,
	Zhipeng Wang
In-Reply-To: <20260617-imx8qxp_pcam-v5-6-7fa6c8e7fba7@nxp.com>

On Wed, 17 Jun 2026 15:50:16 -0400, Frank.Li@oss.nxp.com <Frank.Li@oss.nxp.com> wrote:
> diff --git a/drivers/media/platform/nxp/imx-parallel-cpi.c b/drivers/media/platform/nxp/imx-parallel-cpi.c
> new file mode 100644
> index 000000000000..00f5d5f47644
> --- /dev/null
> +++ b/drivers/media/platform/nxp/imx-parallel-cpi.c
> @@ -0,0 +1,614 @@
> [ ... skip 245 lines ... ]
> +	}
> +
> +	val = CPI_CTRL_REG1_PIXEL_WIDTH(pixel_width) |
> +	      CPI_CTRL_REG1_VSYNC_PULSE(vsync_pulse);
> +	writel(val, pcpidev->regs + pdata->interface_ctrl_reg1);
> +}

The switch statement result is overwritten.

-- 
Guoniu Zhou <guoniu.zhou@oss.nxp.com>


^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22  8:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
	dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
	linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260622083440.GX49951@noisy.programming.kicks-ass.net>

On Mon, 22 Jun 2026 10:34:40 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Did you forget your C 101 class? If you use a function, you gotta
> include the relevant header.

If this was the way it was back in 2009, yeah sure. But the header
wasn't need for 17 years. Now it suddenly will be.

-- Steve


^ permalink raw reply

* Re: [PATCH v2 8/8] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
From: Vincent Donnefort @ 2026-06-22  8:49 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Oliver Upton, kvmarm, linux-arm-kernel,
	linux-kernel, Catalin Marinas, Will Deacon, Joey Gouly,
	Steffen Eiden, Suzuki K Poulose, Zenghui Yu, Quentin Perret,
	Sebastian Ene, Hyunwoo Kim
In-Reply-To: <CA+EHjTz13obYHAZYCW+zpH1RB953FseP9koXydeoLqmn6UONHQ@mail.gmail.com>

[...]

> > > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > > index 54aedf93c78b..8963621bcdd1 100644
> > > --- a/arch/arm64/kvm/handle_exit.c
> > > +++ b/arch/arm64/kvm/handle_exit.c
> > > @@ -422,6 +422,20 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
> > >  {
> > >       int handled;
> > >
> > > +     /*
> > > +      * If we run a non-protected VM when protection is enabled
> > > +      * system-wide, resync the state from the hypervisor and mark
> > > +      * it as dirty on the host side if it wasn't dirty already
> > > +      * (which could happen if preemption has taken place).
> > > +      */
> > > +     if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
> > > +             guard(preempt)();
> > > +             if (!(vcpu_get_flag(vcpu, PKVM_HOST_STATE_DIRTY))) {
> > > +                     kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> > > +                     vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > > +             }
> > > +     }
> > > +
> >
> > Could we remove this update here and let handle_exit_early() do the sync
> > regardless of the SError injection? One of the main point of handle_exit_early()
> > is to do things under !prempt().
> 
> Agreed on the move: handle_exit_early() is already preempt-off, so the
> guard() goes away. Not on every exit though. handle_exit_early() runs
> on every exit, and sync_hyp_vcpu() only copies PC/PSTATE/fault back
> for a non-protected guest; the GPRs and sysregs cross solely via
> __pkvm_vcpu_sync_state. Syncing unconditionally would pull the full
> context back on plain IRQ exits, which is the copy this patch avoids.
> So I will gate it on trap-or-SError and drop the
> handle_trap_exceptions() block.
> 
> >
> >
> > >       /*
> > >        * See ARM ARM B1.14.1: "Hyp traps on instructions
> > >        * that fail their condition code check"
> > > @@ -489,6 +503,22 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
> > >  /* For exit types that need handling before we can be preempted */
> > >  void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
> > >  {
> > > +     bool inject_serror = ARM_SERROR_PENDING(exception_index) ||
> > > +             ARM_EXCEPTION_CODE(exception_index) == ARM_EXCEPTION_EL1_SERROR;
> > > +
> > > +     /*
> > > +      * An SError injected below writes the host ctxt; for a non-protected
> > > +      * guest, sync from the hyp vCPU and keep it dirty so it isn't dropped.
> > > +      */
> > > +     if (is_protected_kvm_enabled()) {
> >
> > Should we test !kvm_vm_is_protected(vcpu->kvm) here, as the
> > PKVM_HOST_STATE_DIRTY is only updated for p-guests everywhere else?
> 
> Yes. The flag is only ever set for non-protected guests, so clearing it
> for a protected one is a no-op, but gating it matches the invariant.
> 
> Both fold into one block in handle_exit_early():
> 
>       if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
>               if (inject_serror ||
>                   ARM_EXCEPTION_CODE(exception_index) == ARM_EXCEPTION_TRAP) {
>                       kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
>                       vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>               } else {
>                       vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>               }
>       }
> 
> I will fold this into the next respin.

Ah yes of course, I was hoping we could just have a switch here, just like
handle_exit() does, but that's not possible because of ARM_SERROR_PENDING().

Perhaps it would look cleaner if done in a separate function
handle_exit_pkvm_state()?


> 
> Thanks for the reviews!
> /fuad
> 
> >
> > > +             vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > > +
> > > +             if (inject_serror && !kvm_vm_is_protected(vcpu->kvm)) {
> > > +                     kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> > > +                     vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > > +             }
> > > +     }
> > > +
> > >       if (ARM_SERROR_PENDING(exception_index)) {
> > >               if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
> > >                       u64 disr = kvm_vcpu_get_disr(vcpu);
> >
> > [...]


^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Peter Zijlstra @ 2026-06-22  8:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
	dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
	linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>

On Sun, Jun 21, 2026 at 05:34:30AM -0400, Steven Rostedt wrote:
> There's been complaints about trace_printk() being defined in kernel.h as it
> can increase the compilation time. As it is only used by some developers for
> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> cycles for those that do not ever care about it.
> 
> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
> use it can set and not have to always remember to add #include <linux/trace_printk.h>
> to the files they add trace_printk() while debugging. It also means that
> those that do not have that config set will not have to worry about wasted
> CPU cycles as it is only include in the CFLAGS when the option is set, and
> its completely ignored otherwise.

Did you forget your C 101 class? If you use a function, you gotta
include the relevant header.

You don't see userspace saying: 'Hey, you know what, perhaps we should
add stdio.h to every other header, just in case someone wants to
printf()' either.

I really don't understand your argument. Yes, maybe someone will forget
and then either their editor (if they have a halfway modern setup with
LSP enabled) or their build will complain, but so what? This is all
trivial stuff, surely we have more pressing matters to concern outselves
with?


^ permalink raw reply

* Re: [PATCH] KVM: arm64: account pKVM reclaim against the VM mm
From: Fuad Tabba @ 2026-06-22  8:32 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Steffen Eiden,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	linux-arm-kernel, kvmarm, linux-kernel
In-Reply-To: <20260621213155.6019-1-include@grrlz.net>

On Sun, 21 Jun 2026 at 22:32, Bradley Morgan <include@grrlz.net> wrote:
>
> Protected guest faults charge long term pins to the VM's mm. Teardown
> can run later from file release, where current->mm may be unrelated.
>
> Drop the charge from kvm->mm instead.
>
> Fixes: 4e6e03f9eadd ("KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()")
> Signed-off-by: Bradley Morgan <include@grrlz.net>

Reproduced by creating a protected VM, running the vCPU to fault in a
page, then forking and having the child close the last fd reference.
Without the fix, the parent's VmLck leaks (the reclaim decrements the
child's mm, which is freed on exit). With the fix the parent's VmLck
returns to zero.

One minor observation: account_locked_vm() also passes `current` as
the task pointer to __account_locked_vm(), but on the decrement path
that is only used in the pr_debug log line, so it is technically wrong
but functionally harmless.

Reviewed-by: Fuad Tabba <fuad.tabba@linux.dev>
Tested-by: Fuad Tabba < fuad.tabba@linux.dev>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/pkvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index 053e4f733e4b..428723b1b0f5 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -352,7 +352,7 @@ static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64
>                 page = pfn_to_page(mapping->pfn);
>                 WARN_ON_ONCE(mapping->nr_pages != 1);
>                 unpin_user_pages_dirty_lock(&page, 1, true);
> -               account_locked_vm(current->mm, 1, false);
> +               account_locked_vm(kvm->mm, 1, false);
>                 pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
>                 kfree(mapping);
>         }
> --
> 2.53.0
>


^ permalink raw reply

* Re: [PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs
From: Jinjie Ruan @ 2026-06-22  8:16 UTC (permalink / raw)
  To: Thomas Gleixner, catalin.marinas, will, tsbogend, pjw, palmer,
	aou, alex, mingo, bp, dave.hansen, hpa, peterz, kees, nathan,
	linusw, ojeda, david.kaplan, lukas.bulwahn, ryan.roberts, maz,
	timothy.hayes, lpieralisi, thuth, oupton, yeoreum.yun,
	miko.lenczewski, broonie, kevin.brodsky, james.clark, tabba,
	mrigendra.chaubey, arnd, anshuman.khandual, x86, linux-kernel,
	linux-arm-kernel, linux-mips, linux-riscv
In-Reply-To: <877bnvdf1a.ffs@fw13>



On 6/18/2026 11:49 PM, Thomas Gleixner wrote:
> On Thu, Jun 11 2026 at 21:38, Jinjie Ruan wrote:
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -113,6 +113,7 @@ config ARM64
>>  	select CPUMASK_OFFSTACK if NR_CPUS > 256
>>  	select DCACHE_WORD_ACCESS
>>  	select HAVE_EXTRA_IPI_TRACEPOINTS
>> +	select HOTPLUG_PARALLEL if SMP && HOTPLUG_CPU
> 
> Why do you tie that to HOTPLUG_CPU? HOTPLUG_CPU lets you unplug/plug
> CPUs at runtime, but if its disabled then a SMP system still has to
> bring up the APs. So why should that fall back to the existing variant?

That's a very good point. Parallel bringup of APs during early boot
should indeed benefit SMP systems even if runtime CPU hotplug
(HOTPLUG_CPU) is disabled. I will decouple this optimization from
HOTPLUG_CPU and tie it strictly to SMP. Thanks for catching this!

> 
>> +#ifdef CONFIG_HOTPLUG_PARALLEL
>> +extern struct secondary_data cpu_boot_data[NR_CPUS];
>> +#endif
>> +
>>  extern struct secondary_data secondary_data;
>>  extern long __early_cpu_boot_status;
>>  extern void secondary_entry(void);
>> @@ -124,7 +128,11 @@ static inline void __noreturn cpu_park_loop(void)
>>  
>>  static inline void update_cpu_boot_status(unsigned int cpu, int val)
>>  {
>> +#ifdef CONFIG_HOTPLUG_PARALLEL
>> +	WRITE_ONCE(cpu_boot_data[cpu].status, val);
>> +#else
>>  	WRITE_ONCE(secondary_data.status, val);
>> +#endif
> 
> You're really a great fan of #ifdefs, right?
> 
> Just convert it over to the parallel mode unconditionally and get rid of
> the existing cruft.

Converting this unconditionally to use cpu_boot_data makes the code so
much cleaner. Thanks for the guidance!

> 
>>  	/*
>>  	 * TTBR0 is only used for the identity mapping at this stage. Make it
>>  	 * point to zero page to avoid speculatively fetching new entries.
>> @@ -254,7 +276,9 @@ asmlinkage notrace void secondary_start_kernel(void)
>>  					 read_cpuid_id());
>>  	update_cpu_boot_status(cpu, CPU_BOOT_SUCCESS);
>>  	set_cpu_online(cpu, true);
>> +#ifndef CONFIG_HOTPLUG_PARALLEL
>>  	complete(&cpu_running);
>> +#endif
> 
> Just for the record. You can get rid of this completion w/o PARALLEL
> hotplug by selecting HOTPLUG_SPLIT_STARTUP and implementing the
> kick/sync parts.

I will look into selecting HOTPLUG_SPLIT_STARTUP and cleaning up this
completion mechanism either as a prerequisite cleanup patch. For now, I
will make sure to eliminate the ugly #ifndef as suggested earlier.

>   
> Thanks,
> 
>         tglx
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



^ permalink raw reply

* Re: [PATCH] drm/mxsfb/lcdif: don't hide lcdif_attach_bridge() deferral messages
From: Liu Ying @ 2026-06-22  8:13 UTC (permalink / raw)
  To: Luca Ceresoli
  Cc: Marek Vasut, Stefan Agner, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Frank Li,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam, Hui Pu,
	Ian Ray, Thomas Petazzoni, dri-devel, imx, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20260619-drm-lcdif-deferral-msg-v1-1-ce2392dca985@bootlin.com>

On Fri, Jun 19, 2026 at 09:02:13AM +0200, Luca Ceresoli wrote:
> lcdif_attach_bridge() uses dev_err_probe() on all its error returns to
> store a specific deferral message.
> 
> However its caller lcdif_load() calls dev_err_probe() again on error,
> overwriting the specific deferral messages with a unique, unavoidably
> generic, message.
> 
> Make the specific deferral message visible by using a plain 'return ret' on
> the caller.
> 
> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> ---
>  drivers/gpu/drm/mxsfb/lcdif_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Liu Ying <victor.liu@nxp.com>


^ permalink raw reply

* Re: [PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs
From: Jinjie Ruan @ 2026-06-22  8:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Michael Kelley, catalin.marinas@arm.com,
	tsbogend@alpha.franken.de, pjw@kernel.org, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, alex@ghiti.fr, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, peterz@infradead.org, kees@kernel.org,
	nathan@kernel.org, linusw@kernel.org, ojeda@kernel.org,
	david.kaplan@amd.com, lukas.bulwahn@redhat.com,
	ryan.roberts@arm.com, maz@kernel.org, timothy.hayes@arm.com,
	lpieralisi@kernel.org, thuth@redhat.com, oupton@kernel.org,
	yeoreum.yun@arm.com, miko.lenczewski@arm.com, broonie@kernel.org,
	kevin.brodsky@arm.com, james.clark@linaro.org, tabba@google.com,
	mrigendra.chaubey@gmail.com, arnd@arndb.de,
	anshuman.khandual@arm.com, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <ajPitENEHWa8lDfC@willie-the-truck>



On 6/18/2026 8:21 PM, Will Deacon wrote:
> Hi Jinjie,
> 
> On Mon, Jun 15, 2026 at 04:51:48PM +0800, Jinjie Ruan wrote:
>> On 6/12/2026 11:45 PM, Michael Kelley wrote:
>>> From: Jinjie Ruan <ruanjinjie@huawei.com> Sent: Thursday, June 11, 2026 6:38 AM
>>>>
>>>> Support for parallel secondary CPU bringup is already utilized by x86,
>>>> MIPS, and RISC-V. This patch brings this capability to the arm64
>>>> architecture.
>>>>
>>>> Rework the global `secondary_data` accessed during early boot into
>>>> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
>>>> enabling the early boot code in head.S to resolve each secondary CPU's
>>>> logical ID concurrently.
>>>>
>>>> To fully enable HOTPLUG_PARALLEL, this patch implements:
>>>> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
>>>> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
>>>>
>>>> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
>>>>
>>>> |     test kernel	   | secondary CPUs boot time |
>>>> |  ---------------------   |	--------------------  |
>>>> |   Without this patch     |		155.672	      |
>>>> |   cpuhp.parallel=0	   |		62.897	      |
>>>> |   cpuhp.parallel=1	   |		166.703	      |
>>>
>>> The last two rows seem mixed up. I would expect parallel=0 to
>>> result in a longer boot time.
>>
>> Hi, Michael,
>>
>> The results are correct and not mixed up.
>>
>> Compared to the original non‑HOTPLUG_PARALLEL approach, the advantage of
>> cpuhp.parallel=0 lies in its use of cpu_relax(`yield` on arm64) instead
>> of the wait_for_completion_timeout() mechanism (which may cause sleep
>> and context switching). This significantly reduces the overhead of VM
>> exits and context switches in a KVM guest, thereby cutting the secondary
>> CPU boot time by more than half.
> 
> I don't think that's a particularly compelling reason to enable this for
> arm64, in all honesty. The yield instruction typically doesn't do
> anything on actual arm64 silicon, so this probably means that you're
> introducing busy-loops which tend to be bad for power and scalability.

After updating the implementation in v2, the performance gains are
primarily observed on actual hardware.

> 
> I implemented this a while ago [1] but didn't manage to see much in terms
> of performance improvement and so I didn't bother to send the patches out

As shown in v2 below, on actual hardware, this results in a 40%–60%
reduction in boot time.

Bringup Time Comparison (ms, lower is better):

|     Platform		| Baseline|   P=0   |   P=1  | Delta(%)|
| --------------------- | ------- | ------- | ------ | ------- |
| 64-core ATF QEMU	| 2075.8  | 2080.7  | 1653.4 | 20.34%  |
| 192-core server(HIP12)| 14619.2 | 14619.1 | 8589.4 | 41.21%  |
| 32-core board	        | 2776.5  | 2881.0  | 1045.0 | 62.36%  |

Link:
https://lore.kernel.org/all/20260618092444.1316336-5-ruanjinjie@huawei.com/

> after talking about it at KVM forum [2]. However, as mentioned at the end
> of that talk, it _is_ still useful for confidential VMs using PSCI so
> let me dust off my old series and send it out to see what you think.
> 
> It relies on PSCI v0.2, which means we don't need the NR_CPUS size array
> for secondary_data and I also have some support for error handling (it
> doesn't look like you handle __early_cpu_boot_status properly).

I need some time to look closely at your patch. Alternatively, I will
integrate your changes, re-test everything on actual hardware, and then
send out a revised version.

> 
> It looks like I could include your first patch, though!

Thank you very much.

> 
> Will
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=cpu-hotplug

It seems that the following patch removing
`rcutree_report_cpu_starting()` will reintroduce the original issue as
commit ce3d31ad3cac ("arm64/smp: Move
rcu_cpu_starting() earlier") soloved.

Link:
https://web.git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=cpu-hotplug&id=bba4b62f45f2614bf6085e6cd3f233528f85bf26

Indeed, I also noticed that the invocation order of
rcutree_report_cpu_starting() on arm64 is somewhat suboptimal. It
hinders the implementation of parallel bringup on arm64 and could
potentially lead to RCU stalls.

Link:
https://lore.kernel.org/all/20260618092444.1316336-4-ruanjinjie@huawei.com/

[    0.329017] smp: Bringing up secondary CPUs ...
[    0.343628] Detected VIPT I-cache on CPU1
[    0.343788]
[    0.343806] =============================
[    0.343816] WARNING: suspicious RCU usage
[    0.343966] 7.1.0-rc1-g27c1871848a2 #109 Not tainted
[    0.344087] -----------------------------
[    0.344098] kernel/locking/lockdep.c:3801 RCU-list traversed in
non-reader section!!
[    0.344112]
[    0.344112] other info that might help us debug this:
[    0.344112]
[    0.344135]
[    0.344135] RCU used illegally from offline CPU!
[    0.344135] rcu_scheduler_active = 1, debug_locks = 1
[    0.344174] no locks held by swapper/1/0.
[    0.344204]
[    0.344204] stack backtrace:
[    0.344611] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted
7.1.0-rc1-g27c1871848a2 #109 PREEMPT
[    0.344707] Hardware name: linux,dummy-virt (DT)
[    0.345267] Call trace:
[    0.345436]  show_stack+0x18/0x24 (C)
[    0.345593]  dump_stack_lvl+0x90/0xd0
[    0.345620]  dump_stack+0x18/0x24
[    0.345639]  lockdep_rcu_suspicious+0x170/0x234
[    0.345665]  __lock_acquire+0xdd4/0x2078
[    0.345688]  lock_acquire+0x1c4/0x3f0
[    0.345711]  _raw_spin_lock_irqsave+0x60/0x88
[    0.345736]  down_trylock+0x18/0x48
[    0.345758]  __down_trylock_console_sem+0x38/0xc4
[    0.345782]  vprintk_emit+0x23c/0x3d0
[    0.345802]  vprintk_default+0x38/0x44
[    0.345822]  vprintk+0x28/0x34
[    0.345841]  _printk+0x5c/0x84
[    0.345864]  cpuinfo_store_cpu+0x174/0x298
[    0.345884]  secondary_start_kernel+0xbc/0x150
[    0.345905]  __secondary_switched+0xc0/0xc4
[    0.350307] GICv3: CPU1: found redistributor 1 region
0:0x00000000080c0000
[    0.350523] GICv3: CPU1: using allocated LPI pending table
@0x00000001042f0000
[    0.351303] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[    0.387425] Detected VIPT I-cache on CPU2


> [2] https://www.youtube.com/watch?v=Q6kOshnnQuE
> 



^ permalink raw reply

* [PATCH v7 22/22] TEST(do-not-upstream): fake qemu vendor JSON + mapfile entry for CounterIDMask path
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@meta.com>

arch/riscv/qemu/virt/events.json: fake-json-{any,ctr3,ctr34,ctr6} with EventCode
+ CounterIDMask; mapfile.csv: 0x0-0x0-0x0 -> qemu/virt. Exercises jevents
CounterIDMask -> counterid_mask= -> config2 -> cdeleg counter allocation.

Signed-off-by: Atish Patra <atishp@meta.com>
---
 tools/perf/pmu-events/arch/riscv/mapfile.csv       |  1 +
 .../pmu-events/arch/riscv/qemu/virt/events.json    | 26 ++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/tools/perf/pmu-events/arch/riscv/mapfile.csv b/tools/perf/pmu-events/arch/riscv/mapfile.csv
index 87cfb0e0849f..3533a8c0253f 100644
--- a/tools/perf/pmu-events/arch/riscv/mapfile.csv
+++ b/tools/perf/pmu-events/arch/riscv/mapfile.csv
@@ -24,3 +24,4 @@
 0x602-0x3-0x0,v1,openhwgroup/cva6,core
 0x67e-0x80000000db0000[89]0-0x[[:xdigit:]]+,v1,starfive/dubhe-80,core
 0x31e-0x8000000000008a45-0x[[:xdigit:]]+,v1,andes/ax45,core
+0x0-0x0-0x0,v1,qemu/virt,core
diff --git a/tools/perf/pmu-events/arch/riscv/qemu/virt/events.json b/tools/perf/pmu-events/arch/riscv/qemu/virt/events.json
new file mode 100644
index 000000000000..294c4ed645f6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/riscv/qemu/virt/events.json
@@ -0,0 +1,26 @@
+[
+  {
+    "EventName": "fake-json-any",
+    "EventCode": "0xF10",
+    "CounterIDMask": "0xFFFFFFF8",
+    "BriefDescription": "FAKE json event (any hpmcounter 3-31) - QEMU does not model 0xF10"
+  },
+  {
+    "EventName": "fake-json-ctr3",
+    "EventCode": "0xF11",
+    "CounterIDMask": "0x8",
+    "BriefDescription": "FAKE json event constrained to hpmcounter3"
+  },
+  {
+    "EventName": "fake-json-ctr34",
+    "EventCode": "0xF12",
+    "CounterIDMask": "0x18",
+    "BriefDescription": "FAKE json event constrained to hpmcounter3,4"
+  },
+  {
+    "EventName": "fake-json-ctr6",
+    "EventCode": "0xF13",
+    "CounterIDMask": "0x40",
+    "BriefDescription": "FAKE json event constrained to hpmcounter6 (out of a small pmu-mask)"
+  }
+]

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 21/22] TEST(do-not-upstream): fake qemu-virt PMU events for cdeleg counter-mask testing
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@meta.com>

Adds fake-any/fake-ctr3/fake-ctr34 (event codes 0xF0x QEMU doesn't model) with
counterid_masks, to exercise the counter-delegation allocation + counter-mask
constraint in QEMU (events read 0 = allocated/programmed, vs 'not supported').

Signed-off-by: Atish Patra <atishp@meta.com>
---
 drivers/perf/riscv_pmu_sbi.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 3cb7a1f4035e..13a9f1fe4293 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -492,6 +492,12 @@ RVPMU_EVENT_CMASK_ATTR(instructions, instructions, 0x02, 0xFFFFFFF8);
 RVPMU_EVENT_CMASK_ATTR(dTLB-load-misses, dTLB_load_miss, 0x10019, 0xFFFFFFF8);
 RVPMU_EVENT_CMASK_ATTR(dTLB-store-misses, dTLB_store_miss, 0x1001B, 0xFFFFFFF8);
 RVPMU_EVENT_CMASK_ATTR(iTLB-load-misses, iTLB_load_miss, 0x10021, 0xFFFFFFF8);
+/*
+ * FAKE events for cdeleg mechanism testing: event codes QEMU does NOT model.
+ */
+RVPMU_EVENT_CMASK_ATTR(fake-any, fake_any, 0xF00, 0xFFFFFFF8);
+RVPMU_EVENT_CMASK_ATTR(fake-ctr3, fake_ctr3, 0xF01, 0x8);
+RVPMU_EVENT_CMASK_ATTR(fake-ctr34, fake_ctr34, 0xF02, 0x18);
 
 static struct attribute *qemu_virt_event_group[] = {
 	RVPMU_EVENT_ATTR_PTR(cycles),
@@ -499,6 +505,9 @@ static struct attribute *qemu_virt_event_group[] = {
 	RVPMU_EVENT_ATTR_PTR(dTLB_load_miss),
 	RVPMU_EVENT_ATTR_PTR(dTLB_store_miss),
 	RVPMU_EVENT_ATTR_PTR(iTLB_load_miss),
+	RVPMU_EVENT_ATTR_PTR(fake_any),
+	RVPMU_EVENT_ATTR_PTR(fake_ctr3),
+	RVPMU_EVENT_ATTR_PTR(fake_ctr34),
 	NULL,
 };
 

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 19/22] tools/perf: Support event code for arch standard events
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@rivosinc.com>

RISC-V relies on the event encoding from the json file. That includes
arch standard events. If event code is present, event is already updated
with correct encoding. No need to update it again which results in losing
the event encoding.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 tools/perf/pmu-events/arch/riscv/arch-standard.json | 10 ++++++++++
 tools/perf/pmu-events/jevents.py                    |  6 +++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/arch/riscv/arch-standard.json b/tools/perf/pmu-events/arch/riscv/arch-standard.json
new file mode 100644
index 000000000000..96e21f088558
--- /dev/null
+++ b/tools/perf/pmu-events/arch/riscv/arch-standard.json
@@ -0,0 +1,10 @@
+[
+  {
+    "EventName": "cycles",
+    "BriefDescription": "cycle executed"
+  },
+  {
+    "EventName": "instructions",
+    "BriefDescription": "instruction retired"
+  }
+]
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 3a1bcdcdc685..457fce7a5982 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -413,7 +413,11 @@ class JsonEvent:
         self.long_desc = None
     if arch_std:
       if arch_std.lower() in _arch_std_events:
-        event = _arch_std_events[arch_std.lower()].event
+        # If the JSON event already specified an event code, the encoding has
+        # been set above; don't overwrite it with the arch standard event or
+        # the event encoding would be lost.
+        if not eventcode:
+          event = _arch_std_events[arch_std.lower()].event
         # Copy from the architecture standard event to self for undefined fields.
         for attr, value in _arch_std_events[arch_std.lower()].__dict__.items():
           if hasattr(self, attr) and not getattr(self, attr):

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 16/22] RISC-V: perf: Use config2/vendor table for event to counter mapping
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@rivosinc.com>

The counter restriction specified in the json file is passed to
the drivers via config2 paarameter in perf attributes. This allows
any platform vendor to define their custom mapping between event and
hpmcounters without any rules defined in the ISA.

For legacy events, the platform vendor may define the mapping in
the driver in the vendor event table.
The fixed cycle and instruction counters are fixed (0 and 2
respectively) by the ISA and maps to the legacy events. The platform
vendor must specify this in the driver if intended to be used while
profiling. Otherwise, they can just specify the alternate hpmcounters
that may monitor and/or sample the cycle/instruction counts.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 drivers/perf/riscv_pmu_sbi.c   | 95 +++++++++++++++++++++++++++++++++++-------
 include/linux/perf/riscv_pmu.h |  2 +
 2 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 4f3a30143db1..1c846cdc96cf 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -77,6 +77,7 @@ static ssize_t __maybe_unused rvpmu_format_show(struct device *dev, struct devic
 	RVPMU_ATTR_ENTRY(_name, rvpmu_format_show, (char *)_config)
 
 PMU_FORMAT_ATTR(firmware, "config:62-63");
+PMU_FORMAT_ATTR(counterid_mask, "config2:0-31");
 
 static bool sbi_v2_available;
 static bool sbi_v3_available;
@@ -121,6 +122,7 @@ static const struct attribute_group *riscv_sbi_pmu_attr_groups[] = {
 static struct attribute *riscv_cdeleg_pmu_formats_attr[] = {
 	RVPMU_FORMAT_ATTR_ENTRY(event, RVPMU_CDELEG_PMU_FORMAT_ATTR),
 	&format_attr_firmware.attr,
+	&format_attr_counterid_mask.attr,
 	NULL,
 };
 
@@ -1501,24 +1503,85 @@ static int rvpmu_deleg_find_ctrs(void)
 	return num_hw_ctr;
 }
 
+/*
+ * The json file must correctly specify counter 0 or counter 2 is available
+ * in the counter lists for cycle/instret events. Otherwise, the drivers have
+ * no way to figure out if a fixed counter must be used and pick a programmable
+ * counter if available.
+ */
 static int get_deleg_fixed_hw_idx(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
-	return -EINVAL;
+	bool guest_events = event->attr.config1 & RISCV_PMU_CONFIG1_GUEST_EVENTS;
+	int idx;
+
+	/* event_base is 0 on the delegation path; match via the original perf attrs. */
+	if (guest_events) {
+		if (event->attr.type != PERF_TYPE_HARDWARE)
+			return -EINVAL;
+		if (event->attr.config == PERF_COUNT_HW_CPU_CYCLES)
+			idx = 0; /* CY counter */
+		else if (event->attr.config == PERF_COUNT_HW_INSTRUCTIONS)
+			idx = 2; /* IR counter */
+		else
+			return -EINVAL;
+	} else if (event->attr.config2 & RISCV_PMU_CYCLE_FIXED_CTR_MASK) {
+		idx = 0; /* CY counter */
+	} else if (event->attr.config2 & RISCV_PMU_INSTRUCTION_FIXED_CTR_MASK) {
+		idx = 2; /* IR counter */
+	} else {
+		return -EINVAL;
+	}
+
+	/* Take the fixed counter only if delegated and free, else fall back. */
+	if (!(cmask & BIT(idx)) || test_bit(idx, cpuc->used_hw_ctrs))
+		return -EINVAL;
+
+	return idx;
 }
 
 static int get_deleg_next_hpm_hw_idx(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
-	unsigned long hw_ctr_mask = 0;
+	u32 hw_ctr_mask = 0, temp_mask = 0;
+	u32 type = event->attr.type;
+	u64 config = event->attr.config;
+	int ret;
 
-	/*
-	 * TODO: Treat every hpmcounter can monitor every event for now.
-	 * The event to counter mapping should come from the json file.
-	 * The mapping should also tell if sampling is supported or not.
-	 */
+	/* Select only available hpmcounters */
+	hw_ctr_mask = cmask & (~0x7) & ~(cpuc->used_hw_ctrs[0]);
+
+	switch (type) {
+	case PERF_TYPE_HARDWARE:
+		temp_mask = current_pmu_hw_event_map[config].counter_mask;
+		break;
+	case PERF_TYPE_HW_CACHE:
+		ret = cdeleg_pmu_event_find_cache(config, NULL, &temp_mask);
+		if (ret)
+			return ret;
+		break;
+	case PERF_TYPE_RAW:
+		/*
+		 * Mask off the counters that can't monitor this event (specified via json)
+		 * The counter mask for this event is set in config2 via the property 'Counter'
+		 * in the json file or manual configuration of config2. If the config2 is not set,
+		 * it is assumed all the available hpmcounters can monitor this event.
+		 * Note: This assumption may fail for virtualization use case where they hypervisor
+		 * (e.g. KVM) virtualizes the counter. Any event to counter mapping provided by the
+		 * guest is meaningless from a hypervisor perspective. Thus, the hypervisor doesn't
+		 * set config2 when creating kernel counter and relies default host mapping.
+		 */
+		if (event->attr.config2)
+			temp_mask = event->attr.config2;
+		break;
+	default:
+		break;
+	}
+
+	if (temp_mask)
+		hw_ctr_mask &= temp_mask;
+
+	if (!hw_ctr_mask)
+		return -EINVAL;
 
-	/* Select only hpmcounters */
-	hw_ctr_mask = cmask & (~0x7);
-	hw_ctr_mask &= ~(cpuc->used_hw_ctrs[0]);
 	return __ffs(hw_ctr_mask);
 }
 
@@ -1547,10 +1610,6 @@ static int rvpmu_deleg_ctr_get_idx(struct perf_event *event)
 	u64 priv_filter;
 	int idx;
 
-	/*
-	 * TODO: We should not rely on SBI Perf encoding to check if the event
-	 * is a fixed one or not.
-	 */
 	if (!is_sampling_event(event)) {
 		idx = get_deleg_fixed_hw_idx(cpuc, event);
 		if (idx == 0 || idx == 2) {
@@ -1570,10 +1629,14 @@ static int rvpmu_deleg_ctr_get_idx(struct perf_event *event)
 		goto out_err;
 found_idx:
 	priv_filter = get_deleg_priv_filter_bits(event);
+	if (test_and_set_bit(idx, cpuc->used_hw_ctrs))
+		goto out_err;
 	update_deleg_hpmevent(idx, hwc->config, priv_filter);
+	return idx;
 skip_update:
-	if (!test_and_set_bit(idx, cpuc->used_hw_ctrs))
-		return idx;
+	if (test_and_set_bit(idx, cpuc->used_hw_ctrs))
+		goto out_err;
+	return idx;
 out_err:
 	return -ENOENT;
 }
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 3c64151cb038..b23b71cb4e66 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -30,6 +30,8 @@
 #define RISCV_PMU_CONFIG1_GUEST_EVENTS 0x1
 
 #define RISCV_PMU_DELEG_RAW_EVENT_MASK GENMASK_ULL(55, 0)
+#define RISCV_PMU_CYCLE_FIXED_CTR_MASK 0x01
+#define RISCV_PMU_INSTRUCTION_FIXED_CTR_MASK 0x04
 
 struct cpu_hw_events {
 	/* currently enabled events */

-- 
2.53.0-Meta



^ permalink raw reply related

* Re: [PATCH net] net: ti: icssg-prueth: fix XDP_TX from the AF_XDP zero-copy RX path
From: Meghana Malladi @ 2026-06-22  8:05 UTC (permalink / raw)
  To: David Carlier, danishanwar, rogerq, andrew+netdev, netdev
  Cc: davem, edumazet, kuba, pabeni, horms, hawk, john.fastabend, sdf,
	ast, daniel, bpf, linux-arm-kernel, linux-kernel, stable
In-Reply-To: <20260620213756.87499-1-devnexen@gmail.com>

Hi David,

Thanks for the fix.

On 6/21/26 03:07, David Carlier wrote:
> On XDP_TX from the zero-copy RX path, emac_run_xdp() converts the xsk
> buffer via xdp_convert_zc_to_xdp_frame(), which clones the data into a
> fresh MEM_TYPE_PAGE_ORDER0 page that is not DMA mapped. Transmitting it
> as PRUETH_TX_BUFF_TYPE_XDP_TX derives the DMA address with
> page_pool_get_dma_addr(), reading an uninitialized page->dma_addr, so
> the device DMAs from a bogus address (corrupt TX, or an IOMMU fault).
> 
> Pick the TX buffer type from the frame's memory type: keep
> PRUETH_TX_BUFF_TYPE_XDP_TX for page_pool frames and use
> PRUETH_TX_BUFF_TYPE_XDP_NDO for the cloned zero-copy frame. The
> completion path already unmaps PRUETH_SWDATA_XDPF buffers.
> 

Is it safe to unconditionally unmap the buffer for the case where 
frame's memory type is PRUETH_TX_BUFF_TYPE_XDP_TX? In this case the DMA 
mapping is done with rx_chn->dma_dev, where as in completion path we are 
unmapping with tx_chn->dma_dev unconditionally.

> Fixes: 7a64bb388df3 ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
>   drivers/net/ethernet/ti/icssg/icssg_common.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
> index 82ddef9c17d5..302e700ea17d 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
> @@ -804,6 +804,7 @@ EXPORT_SYMBOL_GPL(emac_xmit_xdp_frame);
>    */
>   static u32 emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp, u32 *len)
>   {
> +	enum prueth_tx_buff_type tx_buff_type;
>   	struct net_device *ndev = emac->ndev;
>   	struct netdev_queue *netif_txq;
>   	int cpu = smp_processor_id();
> @@ -826,11 +827,21 @@ static u32 emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp, u32 *len
>   			goto drop;
>   		}
>   
> +		/* In AF_XDP zero-copy mode xdp_convert_buff_to_frame()
> +		 * clones the xsk buffer into a fresh MEM_TYPE_PAGE_ORDER0
> +		 * page that is not DMA mapped. Such a frame must be mapped
> +		 * via the NDO path; only a page pool-backed frame already
> +		 * carries a usable page_pool DMA address.
> +		 */
> +		tx_buff_type = xdpf->mem_type == MEM_TYPE_PAGE_POOL ?
> +				PRUETH_TX_BUFF_TYPE_XDP_TX :
> +				PRUETH_TX_BUFF_TYPE_XDP_NDO;
> +
>   		q_idx = cpu % emac->tx_ch_num;
>   		netif_txq = netdev_get_tx_queue(ndev, q_idx);
>   		__netif_tx_lock(netif_txq, cpu);
>   		result = emac_xmit_xdp_frame(emac, xdpf, q_idx,
> -					     PRUETH_TX_BUFF_TYPE_XDP_TX);
> +					     tx_buff_type);
>   		__netif_tx_unlock(netif_txq);
>   		if (result == ICSSG_XDP_CONSUMED) {
>   			ndev->stats.tx_dropped++;


^ permalink raw reply

* [PATCH v7 20/22] tools/perf: Add RISC-V CounterIDMask event field
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@rivosinc.com>

Counter delegation lets supervisor mode choose the hpmcounter for an event,
but the hardware may only allow a given event on a subset of counters. Add
a RISC-V specific "CounterIDMask" json event field, handled like the other
arch-specific entries in event_fields[], that carries the allowed-counter
bitmask through to the driver's existing counterid_mask (config2:0-31)
format.

The value is the bitmask directly so no counter-list to bitmask
conversion is needed, and because the field is RISC-V specific it is a
no-op for every other architecture's events (unlike the shared "Counter"
field).

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 tools/perf/pmu-events/jevents.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 457fce7a5982..c1ed8a05c9a4 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -396,6 +396,7 @@ class JsonEvent:
         ('EnAllSlices', 'enallslices='),
         ('SliceId', 'sliceid='),
         ('ThreadMask', 'threadmask='),
+        ('CounterIDMask', 'counterid_mask='),
     ]
     for key, value in event_fields:
       if key in jd and not is_zero(jd[key]):

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 14/22] RISC-V: perf: Implement supervisor counter delegation support
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@rivosinc.com>

There are few new RISC-V ISA exensions (ssccfg, sscsrind, smcntrpmf) which
allows the hpmcounter/hpmevents to be programmed directly from S-mode. The
implementation detects the ISA extension at runtime and uses them if
available instead of SBI PMU extension. SBI PMU extension will still be
used for firmware counters if the user requests it.

The current linux driver relies on event encoding defined by SBI PMU
specification for standard perf events. However, there are no standard
event encoding available in the ISA. In the future, we may want to
decouple the counter delegation and SBI PMU completely. In that case,
counter delegation supported platforms must rely on the event encoding
defined in the perf json file or in the pmu driver.

For firmware events, it will continue to use the SBI PMU encoding as
one can not support firmware event without SBI PMU.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/csr.h   |   1 +
 drivers/perf/riscv_pmu_sbi.c   | 578 +++++++++++++++++++++++++++++++++--------
 include/linux/perf/riscv_pmu.h |   3 +
 3 files changed, 478 insertions(+), 104 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 26cb78dee2fd..25ebf853bfef 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -266,6 +266,7 @@
 #endif
 
 #define SISELECT_SSCCFG_BASE		0x40
+#define HPMEVENT_MASK			GENMASK_ULL(63, 56)
 
 /* mseccfg bits */
 #define MSECCFG_PMM			ENVCFG_PMM
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 1c0961e09b15..6407e229c2c3 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -28,6 +28,8 @@
 #include <asm/cpufeature.h>
 #include <asm/vendor_extensions.h>
 #include <asm/vendor_extensions/andes.h>
+#include <asm/hwcap.h>
+#include <asm/csr_ind.h>
 
 #define ALT_SBI_PMU_OVERFLOW(__ovl)					\
 asm volatile(ALTERNATIVE_2(						\
@@ -60,7 +62,20 @@ asm volatile(ALTERNATIVE(						\
 #define PERF_EVENT_FLAG_USER_ACCESS	BIT(SYSCTL_USER_ACCESS)
 #define PERF_EVENT_FLAG_LEGACY		BIT(SYSCTL_LEGACY)
 
-PMU_FORMAT_ATTR(event, "config:0-55");
+#define RVPMU_SBI_PMU_FORMAT_ATTR	"config:0-47"
+#define RVPMU_CDELEG_PMU_FORMAT_ATTR	"config:0-55"
+
+static ssize_t __maybe_unused rvpmu_format_show(struct device *dev, struct device_attribute *attr,
+						char *buf);
+
+#define RVPMU_ATTR_ENTRY(_name, _func, _config)	(			\
+	&((struct dev_ext_attribute[]) {				\
+		{ __ATTR(_name, 0444, _func, NULL), (void *)_config }	\
+	})[0].attr.attr)
+
+#define RVPMU_FORMAT_ATTR_ENTRY(_name, _config) \
+	RVPMU_ATTR_ENTRY(_name, rvpmu_format_show, (char *)_config)
+
 PMU_FORMAT_ATTR(firmware, "config:62-63");
 
 static bool sbi_v2_available;
@@ -68,7 +83,11 @@ static bool sbi_v3_available;
 static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
 #define sbi_pmu_snapshot_available() \
 	static_branch_unlikely(&sbi_pmu_snapshot_available)
+
 static DEFINE_STATIC_KEY_FALSE(riscv_pmu_sbi_available);
+#define riscv_pmu_sbi_available() \
+		static_branch_likely(&riscv_pmu_sbi_available)
+
 static DEFINE_STATIC_KEY_FALSE(riscv_pmu_cdeleg_available);
 
 /* Avoid unnecessary code patching in the one time booting path*/
@@ -83,19 +102,35 @@ static DEFINE_STATIC_KEY_FALSE(riscv_pmu_cdeleg_available);
 #define riscv_pmu_sbi_available() \
 		static_branch_likely(&riscv_pmu_sbi_available)
 
-static struct attribute *riscv_arch_formats_attr[] = {
-	&format_attr_event.attr,
+static struct attribute *riscv_sbi_pmu_formats_attr[] = {
+	RVPMU_FORMAT_ATTR_ENTRY(event, RVPMU_SBI_PMU_FORMAT_ATTR),
 	&format_attr_firmware.attr,
 	NULL,
 };
 
-static struct attribute_group riscv_pmu_format_group = {
+static struct attribute_group riscv_sbi_pmu_format_group = {
 	.name = "format",
-	.attrs = riscv_arch_formats_attr,
+	.attrs = riscv_sbi_pmu_formats_attr,
 };
 
-static const struct attribute_group *riscv_pmu_attr_groups[] = {
-	&riscv_pmu_format_group,
+static const struct attribute_group *riscv_sbi_pmu_attr_groups[] = {
+	&riscv_sbi_pmu_format_group,
+	NULL,
+};
+
+static struct attribute *riscv_cdeleg_pmu_formats_attr[] = {
+	RVPMU_FORMAT_ATTR_ENTRY(event, RVPMU_CDELEG_PMU_FORMAT_ATTR),
+	&format_attr_firmware.attr,
+	NULL,
+};
+
+static struct attribute_group riscv_cdeleg_pmu_format_group = {
+	.name = "format",
+	.attrs = riscv_cdeleg_pmu_formats_attr,
+};
+
+static const struct attribute_group *riscv_cdeleg_pmu_attr_groups[] = {
+	&riscv_cdeleg_pmu_format_group,
 	NULL,
 };
 
@@ -482,6 +517,14 @@ static void rvpmu_sbi_check_std_events(struct work_struct *work)
 
 static DECLARE_WORK(check_std_events_work, rvpmu_sbi_check_std_events);
 
+static ssize_t rvpmu_format_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+				struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
+}
+
 static int rvpmu_ctr_get_width(int idx)
 {
 	return pmu_ctr_list[idx].width;
@@ -599,6 +642,38 @@ static uint8_t rvpmu_csr_index(struct perf_event *event)
 	return pmu_ctr_list[event->hw.idx].csr - CSR_CYCLE;
 }
 
+static uint64_t get_deleg_priv_filter_bits(struct perf_event *event)
+{
+	u64 priv_filter_bits = 0;
+	bool guest_events = false;
+
+	if (event->attr.config1 & RISCV_PMU_CONFIG1_GUEST_EVENTS)
+		guest_events = true;
+	if (event->attr.exclude_kernel)
+		priv_filter_bits |= guest_events ? HPMEVENT_VSINH : HPMEVENT_SINH;
+	if (event->attr.exclude_user)
+		priv_filter_bits |= guest_events ? HPMEVENT_VUINH : HPMEVENT_UINH;
+	if (guest_events && event->attr.exclude_hv)
+		priv_filter_bits |= HPMEVENT_SINH;
+	if (event->attr.exclude_host)
+		priv_filter_bits |= HPMEVENT_UINH | HPMEVENT_SINH;
+	if (event->attr.exclude_guest)
+		priv_filter_bits |= HPMEVENT_VSINH | HPMEVENT_VUINH;
+
+	return priv_filter_bits;
+}
+
+static bool pmu_sbi_is_fw_event(struct perf_event *event)
+{
+	u32 type = event->attr.type;
+	u64 config = event->attr.config;
+
+	if (type == PERF_TYPE_RAW && ((config >> 63) == 1))
+		return true;
+	else
+		return false;
+}
+
 static unsigned long rvpmu_sbi_get_filter_flags(struct perf_event *event)
 {
 	unsigned long cflags = 0;
@@ -627,7 +702,8 @@ static int rvpmu_sbi_ctr_get_idx(struct perf_event *event)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(rvpmu->hw_events);
 	struct sbiret ret;
 	int idx;
-	uint64_t cbase = 0, cmask = rvpmu->cmask;
+	u64 cbase = 0;
+	unsigned long ctr_mask = rvpmu->cmask;
 	unsigned long cflags = 0;
 
 	cflags = rvpmu_sbi_get_filter_flags(event);
@@ -640,21 +716,23 @@ static int rvpmu_sbi_ctr_get_idx(struct perf_event *event)
 	if ((hwc->flags & PERF_EVENT_FLAG_LEGACY) && (event->attr.type == PERF_TYPE_HARDWARE)) {
 		if (event->attr.config == PERF_COUNT_HW_CPU_CYCLES) {
 			cflags |= SBI_PMU_CFG_FLAG_SKIP_MATCH;
-			cmask = 1;
+			ctr_mask = 1;
 		} else if (event->attr.config == PERF_COUNT_HW_INSTRUCTIONS) {
 			cflags |= SBI_PMU_CFG_FLAG_SKIP_MATCH;
-			cmask = BIT(CSR_INSTRET - CSR_CYCLE);
+			ctr_mask = BIT(CSR_INSTRET - CSR_CYCLE);
 		}
+	} else if (pmu_sbi_is_fw_event(event)) {
+		ctr_mask = firmware_cmask;
 	}
 
 	/* retrieve the available counter index */
 #if defined(CONFIG_32BIT)
 	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
-			cmask, cflags, hwc->event_base, hwc->config,
+			ctr_mask, cflags, hwc->event_base, hwc->config,
 			hwc->config >> 32);
 #else
 	ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
-			cmask, cflags, hwc->event_base, hwc->config, 0);
+			ctr_mask, cflags, hwc->event_base, hwc->config, 0);
 #endif
 	if (ret.error) {
 		pr_debug("Not able to find a counter for event %lx config %llx\n",
@@ -663,7 +741,7 @@ static int rvpmu_sbi_ctr_get_idx(struct perf_event *event)
 	}
 
 	idx = ret.value;
-	if (!test_bit(idx, &rvpmu->cmask) || !pmu_ctr_list[idx].value)
+	if (!test_bit(idx, &ctr_mask) || !pmu_ctr_list[idx].value)
 		return -ENOENT;
 
 	/* Additional sanity check for the counter id */
@@ -713,29 +791,98 @@ static int sbi_pmu_event_find_cache(u64 config)
 	return ret;
 }
 
-static bool pmu_sbi_is_fw_event(struct perf_event *event)
+static int rvpmu_sbi_event_map(struct perf_event *event, u64 *econfig)
 {
 	u32 type = event->attr.type;
 	u64 config = event->attr.config;
 
-	if ((type == PERF_TYPE_RAW) && ((config >> 63) == 1))
-		return true;
-	else
-		return false;
+	/*
+	 * Ensure we are finished checking standard hardware events for
+	 * validity before allowing userspace to configure any events.
+	 */
+	flush_work(&check_std_events_work);
+
+	return riscv_pmu_get_event_info(type, config, econfig);
 }
 
-static int rvpmu_sbi_event_map(struct perf_event *event, u64 *econfig)
+static int cdeleg_pmu_event_find_cache(u64 config, u64 *eventid, uint32_t *counter_mask)
+{
+	unsigned int cache_type, cache_op, cache_result;
+
+	if (!current_pmu_cache_event_map)
+		return -ENOENT;
+
+	cache_type = (config >>  0) & 0xff;
+	if (cache_type >= PERF_COUNT_HW_CACHE_MAX)
+		return -EINVAL;
+
+	cache_op = (config >>  8) & 0xff;
+	if (cache_op >= PERF_COUNT_HW_CACHE_OP_MAX)
+		return -EINVAL;
+
+	cache_result = (config >> 16) & 0xff;
+	if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+		return -EINVAL;
+
+	if (eventid)
+		*eventid = current_pmu_cache_event_map[cache_type][cache_op]
+						      [cache_result].event_id;
+	if (counter_mask)
+		*counter_mask = current_pmu_cache_event_map[cache_type][cache_op]
+							   [cache_result].counter_mask;
+
+	return 0;
+}
+
+static int rvpmu_cdeleg_event_map(struct perf_event *event, u64 *econfig)
 {
 	u32 type = event->attr.type;
 	u64 config = event->attr.config;
+	int ret = 0;
 
 	/*
-	 * Ensure we are finished checking standard hardware events for
-	 * validity before allowing userspace to configure any events.
+	 * There are two ways standard perf events can be mapped to platform specific
+	 * encoding.
+	 * 1. The vendor may specify the encodings in the driver.
+	 * 2. The Perf tool for RISC-V may remap the standard perf event to platform
+	 * specific encoding.
+	 *
+	 * As RISC-V ISA doesn't define any standard event encoding. Thus, perf tool allows
+	 * vendor to define it via json file. The encoding defined in the json will override
+	 * the perf legacy encoding. However, some user may want to run performance
+	 * monitoring without perf tool as well. That's why, vendors may specify the event
+	 * encoding in the driver as well if they want to support that use case too.
+	 * If an encoding is defined in the json, it will be encoded as a raw event.
 	 */
-	flush_work(&check_std_events_work);
 
-	return riscv_pmu_get_event_info(type, config, econfig);
+	switch (type) {
+	case PERF_TYPE_HARDWARE:
+		if (config >= PERF_COUNT_HW_MAX)
+			return -EINVAL;
+		if (!current_pmu_hw_event_map)
+			return -ENOENT;
+
+		*econfig = current_pmu_hw_event_map[config].event_id;
+		if (*econfig == HW_OP_UNSUPPORTED)
+			ret = -ENOENT;
+		break;
+	case PERF_TYPE_HW_CACHE:
+		ret = cdeleg_pmu_event_find_cache(config, econfig, NULL);
+		if (ret)
+			break;
+		if (*econfig == CACHE_OP_UNSUPPORTED)
+			ret = -ENOENT;
+		break;
+	case PERF_TYPE_RAW:
+		*econfig = config & RISCV_PMU_DELEG_RAW_EVENT_MASK;
+		break;
+	default:
+		ret = -ENOENT;
+		break;
+	}
+
+	/* event_base is not used for counter delegation */
+	return ret;
 }
 
 static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
@@ -821,7 +968,7 @@ static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
 	return 0;
 }
 
-static u64 rvpmu_sbi_ctr_read(struct perf_event *event)
+static u64 rvpmu_ctr_read(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
@@ -898,10 +1045,6 @@ static void rvpmu_sbi_ctr_start(struct perf_event *event, u64 ival)
 	if (ret.error && (ret.error != SBI_ERR_ALREADY_STARTED))
 		pr_err("Starting counter idx %d failed with error %d\n",
 			hwc->idx, sbi_err_map_linux_errno(ret.error));
-
-	if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
-	    (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
-		rvpmu_set_scounteren((void *)event);
 }
 
 static void rvpmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
@@ -912,10 +1055,6 @@ static void rvpmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
 	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
 	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
 
-	if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
-	    (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
-		rvpmu_reset_scounteren((void *)event);
-
 	if (sbi_pmu_snapshot_available())
 		flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
 
@@ -951,12 +1090,6 @@ static int rvpmu_sbi_find_num_ctrs(void)
 		return sbi_err_map_linux_errno(ret.error);
 }
 
-static u32 rvpmu_deleg_find_ctrs(void)
-{
-	/* TODO */
-	return 0;
-}
-
 static int rvpmu_sbi_get_ctrinfo(u32 nsbi_ctr, u32 *num_fw_ctr, u32 *num_hw_ctr)
 {
 	struct sbiret ret;
@@ -1034,55 +1167,75 @@ static inline void rvpmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
 	}
 }
 
-/*
- * This function starts all the used counters in two step approach.
- * Any counter that did not overflow can be start in a single step
- * while the overflowed counters need to be started with updated initialization
- * value.
- */
-static inline void rvpmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
-						u64 ctr_ovf_mask)
+static void rvpmu_deleg_ctr_start_mask(unsigned long mask)
 {
-	int idx = 0, i;
-	struct perf_event *event;
-	unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
-	unsigned long ctr_start_mask = 0;
-	uint64_t max_period;
-	struct hw_perf_event *hwc;
-	u64 init_val = 0;
+	unsigned long scountinhibit_val = 0;
 
-	for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
-		ctr_start_mask = cpu_hw_evt->used_hw_ctrs[i] & ~ctr_ovf_mask;
-		/* Start all the counters that did not overflow in a single shot */
-		if (ctr_start_mask) {
-			sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG,
-				  ctr_start_mask, 0, 0, 0, 0);
-		}
-	}
+	scountinhibit_val = csr_read(CSR_SCOUNTINHIBIT);
+	scountinhibit_val &= ~mask;
+
+	csr_write(CSR_SCOUNTINHIBIT, scountinhibit_val);
+}
+
+static void rvpmu_deleg_ctr_enable_irq(struct perf_event *event)
+{
+	unsigned long hpmevent_curr;
+	unsigned long of_mask;
+	struct hw_perf_event *hwc = &event->hw;
+	int counter_idx = hwc->idx;
+	unsigned long sip_val = csr_read(CSR_SIP);
+
+	if (!is_sampling_event(event) || (sip_val & SIP_LCOFIP))
+		return;
 
-	/* Reinitialize and start all the counter that overflowed */
-	while (ctr_ovf_mask) {
-		if (ctr_ovf_mask & 0x01) {
-			event = cpu_hw_evt->events[idx];
-			hwc = &event->hw;
-			max_period = riscv_pmu_ctr_get_width_mask(event);
-			init_val = local64_read(&hwc->prev_count) & max_period;
 #if defined(CONFIG_32BIT)
-			sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx, 1,
-				  flag, init_val, init_val >> 32, 0);
+	hpmevent_curr = csr_ind_read(CSR_SIREG5, SISELECT_SSCCFG_BASE, counter_idx);
+	of_mask = (u32)~HPMEVENTH_OF;
 #else
-			sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx, 1,
-				  flag, init_val, 0, 0);
+	hpmevent_curr = csr_ind_read(CSR_SIREG2, SISELECT_SSCCFG_BASE, counter_idx);
+	of_mask = ~HPMEVENT_OF;
 #endif
-			perf_event_update_userpage(event);
-		}
-		ctr_ovf_mask = ctr_ovf_mask >> 1;
-		idx++;
-	}
+
+	hpmevent_curr &= of_mask;
+#if defined(CONFIG_32BIT)
+	csr_ind_write(CSR_SIREG5, SISELECT_SSCCFG_BASE, counter_idx, hpmevent_curr);
+#else
+	csr_ind_write(CSR_SIREG2, SISELECT_SSCCFG_BASE, counter_idx, hpmevent_curr);
+#endif
+}
+
+static void rvpmu_deleg_ctr_start(struct perf_event *event, u64 ival)
+{
+	unsigned long scountinhibit_val = 0;
+	struct hw_perf_event *hwc = &event->hw;
+
+#if defined(CONFIG_32BIT)
+	csr_ind_write(CSR_SIREG, SISELECT_SSCCFG_BASE, hwc->idx, ival & 0xFFFFFFFF);
+	csr_ind_write(CSR_SIREG4, SISELECT_SSCCFG_BASE, hwc->idx, ival >> BITS_PER_LONG);
+#else
+	csr_ind_write(CSR_SIREG, SISELECT_SSCCFG_BASE, hwc->idx, ival);
+#endif
+
+	rvpmu_deleg_ctr_enable_irq(event);
+
+	scountinhibit_val = csr_read(CSR_SCOUNTINHIBIT);
+	scountinhibit_val &= ~BIT(hwc->idx);
+
+	csr_write(CSR_SCOUNTINHIBIT, scountinhibit_val);
+}
+
+static void rvpmu_deleg_ctr_stop_mask(unsigned long mask)
+{
+	unsigned long scountinhibit_val = 0;
+
+	scountinhibit_val = csr_read(CSR_SCOUNTINHIBIT);
+	scountinhibit_val |= mask;
+
+	csr_write(CSR_SCOUNTINHIBIT, scountinhibit_val);
 }
 
-static inline void rvpmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
-						     u64 ctr_ovf_mask)
+static void rvpmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
+					      u64 ctr_ovf_mask)
 {
 	int i, idx = 0;
 	struct perf_event *event;
@@ -1116,15 +1269,53 @@ static inline void rvpmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_h
 	}
 }
 
-static void rvpmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
-					  u64 ctr_ovf_mask)
+/*
+ * This function starts all the used counters in two step approach.
+ * Any counter that did not overflow can be start in a single step
+ * while the overflowed counters need to be started with updated initialization
+ * value.
+ */
+static void rvpmu_start_overflow_mask(struct riscv_pmu *pmu, u64 ctr_ovf_mask)
 {
+	int idx = 0, i;
+	struct perf_event *event;
+	unsigned long ctr_start_mask = 0;
+	u64 max_period, init_val = 0;
+	struct hw_perf_event *hwc;
 	struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
 
 	if (sbi_pmu_snapshot_available())
-		rvpmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
-	else
-		rvpmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
+		return rvpmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
+
+	/* Start all the counters that did not overflow */
+	if (riscv_pmu_cdeleg_available()) {
+		ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0] & ~ctr_ovf_mask;
+		rvpmu_deleg_ctr_start_mask(ctr_start_mask);
+	} else {
+		for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
+			ctr_start_mask = cpu_hw_evt->used_hw_ctrs[i] & ~ctr_ovf_mask;
+			/* Start all the counters that did not overflow in a single shot */
+			sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG,
+				  ctr_start_mask, 0, 0, 0, 0);
+		}
+	}
+
+	/* Reinitialize and start all the counter that overflowed */
+	while (ctr_ovf_mask) {
+		if (ctr_ovf_mask & 0x01) {
+			event = cpu_hw_evt->events[idx];
+			hwc = &event->hw;
+			max_period = riscv_pmu_ctr_get_width_mask(event);
+			init_val = local64_read(&hwc->prev_count) & max_period;
+			if (riscv_pmu_cdeleg_available())
+				rvpmu_deleg_ctr_start(event, init_val);
+			else
+				rvpmu_sbi_ctr_start(event, init_val);
+			perf_event_update_userpage(event);
+		}
+		ctr_ovf_mask = ctr_ovf_mask >> 1;
+		idx++;
+	}
 }
 
 static irqreturn_t rvpmu_ovf_handler(int irq, void *dev)
@@ -1159,10 +1350,18 @@ static irqreturn_t rvpmu_ovf_handler(int irq, void *dev)
 	}
 
 	pmu = to_riscv_pmu(event->pmu);
-	rvpmu_sbi_stop_hw_ctrs(pmu);
+	if (riscv_pmu_cdeleg_available())
+		rvpmu_deleg_ctr_stop_mask(cpu_hw_evt->used_hw_ctrs[0]);
+	else
+		rvpmu_sbi_stop_hw_ctrs(pmu);
 
-	/* Overflow status register should only be read after counter are stopped */
-	if (sbi_pmu_snapshot_available())
+	/*
+	 * Overflow status register should only be read after counter are stopped.
+	 * In counter delegation mode the overflows are reported in scountovf, not
+	 * in the SBI snapshot area, so read the CSR directly even when an SBI PMU
+	 * snapshot is also available.
+	 */
+	if (sbi_pmu_snapshot_available() && !riscv_pmu_cdeleg_available())
 		overflow = sdata->ctr_overflow_mask;
 	else
 		ALT_SBI_PMU_OVERFLOW(overflow);
@@ -1228,22 +1427,183 @@ static irqreturn_t rvpmu_ovf_handler(int irq, void *dev)
 		hw_evt->state = 0;
 	}
 
-	rvpmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
+	rvpmu_start_overflow_mask(pmu, overflowed_ctrs);
 	perf_sample_event_took(sched_clock() - start_clock);
 
 	return IRQ_HANDLED;
 }
 
+static int get_deleg_hw_ctr_width(int counter_offset)
+{
+	unsigned long hpm_warl;
+	int num_bits;
+
+	if (counter_offset < 3 || counter_offset > 31)
+		return 0;
+
+	hpm_warl = csr_ind_warl(CSR_SIREG, SISELECT_SSCCFG_BASE, counter_offset, -1);
+	if (!hpm_warl)
+		return 0;
+	num_bits = __fls(hpm_warl);
+
+#if defined(CONFIG_32BIT)
+	/*
+	 * The low half contributes a full BITS_PER_LONG bits when the counter is
+	 * wider than 32 bits; the high half's __fls() gives the remaining width.
+	 */
+	hpm_warl = csr_ind_warl(CSR_SIREG4, SISELECT_SSCCFG_BASE, counter_offset, -1);
+	if (hpm_warl)
+		num_bits = BITS_PER_LONG + __fls(hpm_warl);
+#endif
+	return num_bits;
+}
+
+static int rvpmu_deleg_find_ctrs(void)
+{
+	int i, num_hw_ctr = 0;
+	union sbi_pmu_ctr_info cinfo;
+	unsigned long scountinhibit_old = 0;
+
+	/* Do a WARL write/read to detect which hpmcounters have been delegated */
+	scountinhibit_old = csr_read(CSR_SCOUNTINHIBIT);
+	csr_write(CSR_SCOUNTINHIBIT, -1);
+	cmask = csr_read(CSR_SCOUNTINHIBIT);
+
+	csr_write(CSR_SCOUNTINHIBIT, scountinhibit_old);
+
+	for_each_set_bit(i, &cmask, RISCV_MAX_HW_COUNTERS) {
+		if (unlikely(i == 1))
+			continue; /* This should never happen as TM is read only */
+		cinfo.value = 0;
+		cinfo.type = SBI_PMU_CTR_TYPE_HW;
+		/*
+		 * If counter delegation is enabled, the csr stored to the cinfo will
+		 * be a virtual counter that the delegation attempts to read.
+		 */
+		cinfo.csr = CSR_CYCLE + i;
+		if (i == 0 || i == 2)
+			cinfo.width = 63;
+		else
+			cinfo.width = get_deleg_hw_ctr_width(i);
+
+		num_hw_ctr++;
+		pmu_ctr_list[i].value = cinfo.value;
+	}
+
+	return num_hw_ctr;
+}
+
+static int get_deleg_fixed_hw_idx(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	return -EINVAL;
+}
+
+static int get_deleg_next_hpm_hw_idx(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	unsigned long hw_ctr_mask = 0;
+
+	/*
+	 * TODO: Treat every hpmcounter can monitor every event for now.
+	 * The event to counter mapping should come from the json file.
+	 * The mapping should also tell if sampling is supported or not.
+	 */
+
+	/* Select only hpmcounters */
+	hw_ctr_mask = cmask & (~0x7);
+	hw_ctr_mask &= ~(cpuc->used_hw_ctrs[0]);
+	return __ffs(hw_ctr_mask);
+}
+
+static void update_deleg_hpmevent(int counter_idx, uint64_t event_value, uint64_t filter_bits)
+{
+	u64 hpmevent_value = 0;
+
+	/* OF bit should be enable during the start if sampling is requested */
+	hpmevent_value = (event_value & ~HPMEVENT_MASK) | filter_bits | HPMEVENT_OF;
+#if defined(CONFIG_32BIT)
+	csr_ind_write(CSR_SIREG2, SISELECT_SSCCFG_BASE, counter_idx, hpmevent_value & 0xFFFFFFFF);
+	if (riscv_isa_extension_available(NULL, SSCOFPMF))
+		csr_ind_write(CSR_SIREG5, SISELECT_SSCCFG_BASE, counter_idx,
+			      hpmevent_value >> BITS_PER_LONG);
+#else
+	csr_ind_write(CSR_SIREG2, SISELECT_SSCCFG_BASE, counter_idx, hpmevent_value);
+#endif
+}
+
+static int rvpmu_deleg_ctr_get_idx(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+	struct cpu_hw_events *cpuc = this_cpu_ptr(rvpmu->hw_events);
+	unsigned long hw_ctr_max_id;
+	u64 priv_filter;
+	int idx;
+
+	/*
+	 * TODO: We should not rely on SBI Perf encoding to check if the event
+	 * is a fixed one or not.
+	 */
+	if (!is_sampling_event(event)) {
+		idx = get_deleg_fixed_hw_idx(cpuc, event);
+		if (idx == 0 || idx == 2) {
+			/* Priv mode filter bits are only available if smcntrpmf is present */
+			if (riscv_isa_extension_available(NULL, SMCNTRPMF))
+				goto found_idx;
+			else
+				goto skip_update;
+		}
+	}
+
+	if (!cmask)
+		goto out_err;
+	hw_ctr_max_id = __fls(cmask);
+	idx = get_deleg_next_hpm_hw_idx(cpuc, event);
+	if (idx < 3 || idx > hw_ctr_max_id)
+		goto out_err;
+found_idx:
+	priv_filter = get_deleg_priv_filter_bits(event);
+	update_deleg_hpmevent(idx, hwc->config, priv_filter);
+skip_update:
+	if (!test_and_set_bit(idx, cpuc->used_hw_ctrs))
+		return idx;
+out_err:
+	return -ENOENT;
+}
+
 static void rvpmu_ctr_start(struct perf_event *event, u64 ival)
 {
-	rvpmu_sbi_ctr_start(event, ival);
-	/* TODO: Counter delegation implementation */
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (riscv_pmu_cdeleg_available() && !pmu_sbi_is_fw_event(event))
+		rvpmu_deleg_ctr_start(event, ival);
+	else
+		rvpmu_sbi_ctr_start(event, ival);
+
+	if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
+	    (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
+		rvpmu_set_scounteren((void *)event);
 }
 
 static void rvpmu_ctr_stop(struct perf_event *event, unsigned long flag)
 {
-	rvpmu_sbi_ctr_stop(event, flag);
-	/* TODO: Counter delegation implementation */
+	struct hw_perf_event *hwc = &event->hw;
+
+	if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
+	    (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
+		rvpmu_reset_scounteren((void *)event);
+
+	if (riscv_pmu_cdeleg_available() && !pmu_sbi_is_fw_event(event)) {
+		/*
+		 * The counter is already stopped. No need to stop again. Counter
+		 * mapping will be reset in clear_idx function.
+		 */
+		if (flag != RISCV_PMU_STOP_FLAG_RESET)
+			rvpmu_deleg_ctr_stop_mask(BIT(hwc->idx));
+		else
+			update_deleg_hpmevent(hwc->idx, 0, 0);
+	} else {
+		rvpmu_sbi_ctr_stop(event, flag);
+	}
 }
 
 static int rvpmu_find_ctrs(void)
@@ -1292,20 +1652,18 @@ static int rvpmu_find_ctrs(void)
 
 static int rvpmu_event_map(struct perf_event *event, u64 *econfig)
 {
-	return rvpmu_sbi_event_map(event, econfig);
-	/* TODO: Counter delegation implementation */
+	if (riscv_pmu_cdeleg_available() && !pmu_sbi_is_fw_event(event))
+		return rvpmu_cdeleg_event_map(event, econfig);
+	else
+		return rvpmu_sbi_event_map(event, econfig);
 }
 
 static int rvpmu_ctr_get_idx(struct perf_event *event)
 {
-	return rvpmu_sbi_ctr_get_idx(event);
-	/* TODO: Counter delegation implementation */
-}
-
-static u64 rvpmu_ctr_read(struct perf_event *event)
-{
-	return rvpmu_sbi_ctr_read(event);
-	/* TODO: Counter delegation implementation */
+	if (riscv_pmu_cdeleg_available() && !pmu_sbi_is_fw_event(event))
+		return rvpmu_deleg_ctr_get_idx(event);
+	else
+		return rvpmu_sbi_ctr_get_idx(event);
 }
 
 static int rvpmu_starting_cpu(unsigned int cpu, struct hlist_node *node)
@@ -1323,7 +1681,16 @@ static int rvpmu_starting_cpu(unsigned int cpu, struct hlist_node *node)
 		csr_write(CSR_SCOUNTEREN, 0x2);
 
 	/* Stop all the counters so that they can be enabled from perf */
-	rvpmu_sbi_stop_all(pmu);
+	if (riscv_pmu_cdeleg_available()) {
+		rvpmu_deleg_ctr_stop_mask(cmask);
+		if (riscv_pmu_sbi_available()) {
+			/* Stop the firmware counters as well */
+			sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0, firmware_cmask,
+				  0, 0, 0, 0);
+		}
+	} else {
+		rvpmu_sbi_stop_all(pmu);
+	}
 
 	if (riscv_pmu_use_irq) {
 		cpu_hw_evt->irq = riscv_pmu_irq;
@@ -1625,8 +1992,11 @@ static int rvpmu_device_probe(struct platform_device *pdev)
 	}
 	irq_requested = (ret == 0);
 
-	pmu->pmu.attr_groups = riscv_pmu_attr_groups;
 	pmu->pmu.parent = &pdev->dev;
+	if (riscv_pmu_cdeleg_available_boot())
+		pmu->pmu.attr_groups = riscv_cdeleg_pmu_attr_groups;
+	else
+		pmu->pmu.attr_groups = riscv_sbi_pmu_attr_groups;
 	pmu->cmask = cmask;
 	pmu->ctr_start = rvpmu_ctr_start;
 	pmu->ctr_stop = rvpmu_ctr_stop;
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index f82a28040594..3c64151cb038 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -20,6 +20,7 @@
  */
 
 #define RISCV_MAX_COUNTERS	64
+#define RISCV_MAX_HW_COUNTERS	32
 #define RISCV_OP_UNSUPP		(-EOPNOTSUPP)
 #define RISCV_PMU_SBI_PDEV_NAME	"riscv-pmu-sbi"
 #define RISCV_PMU_LEGACY_PDEV_NAME	"riscv-pmu-legacy"
@@ -28,6 +29,8 @@
 
 #define RISCV_PMU_CONFIG1_GUEST_EVENTS 0x1
 
+#define RISCV_PMU_DELEG_RAW_EVENT_MASK GENMASK_ULL(55, 0)
+
 struct cpu_hw_events {
 	/* currently enabled events */
 	int			n_events;

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 18/22] RISC-V: perf: Add Qemu virt machine events
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@rivosinc.com>

Qemu virt machine supports a very minimal set of legacy perf events.
Add them to the vendor table so that users can use them when
counter delegation is enabled.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/vendorid_list.h |  4 ++++
 drivers/perf/riscv_pmu_sbi.c           | 36 ++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/riscv/include/asm/vendorid_list.h b/arch/riscv/include/asm/vendorid_list.h
index 7f5030ee1fcf..603aa2b21c0b 100644
--- a/arch/riscv/include/asm/vendorid_list.h
+++ b/arch/riscv/include/asm/vendorid_list.h
@@ -11,4 +11,8 @@
 #define SIFIVE_VENDOR_ID	0x489
 #define THEAD_VENDOR_ID		0x5b7
 
+#define QEMU_VIRT_VENDOR_ID		0x000
+#define QEMU_VIRT_IMPL_ID		0x000
+#define QEMU_VIRT_ARCH_ID		0x000
+
 #endif
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8aaf16e31fdf..3cb7a1f4035e 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -27,6 +27,7 @@
 #include <asm/sbi.h>
 #include <asm/cpufeature.h>
 #include <asm/vendor_extensions.h>
+#include <asm/vendorid_list.h>
 #include <asm/vendor_extensions/andes.h>
 #include <asm/hwcap.h>
 #include <asm/csr_ind.h>
@@ -469,7 +470,42 @@ struct riscv_vendor_pmu_events {
 	  .hw_event_map = _hw_event_map, .cache_event_map = _cache_event_map, \
 	  .attrs_events = _attrs },
 
+/* QEMU virt PMU events */
+static const struct riscv_pmu_event qemu_virt_hw_event_map[PERF_COUNT_HW_MAX] = {
+	PERF_MAP_ALL_UNSUPPORTED,
+	[PERF_COUNT_HW_CPU_CYCLES]		= {0x01, 0xFFFFFFF8},
+	[PERF_COUNT_HW_INSTRUCTIONS]		= {0x02, 0xFFFFFFF8}
+};
+
+static const struct riscv_pmu_event qemu_virt_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
+						[PERF_COUNT_HW_CACHE_OP_MAX]
+						[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+	PERF_CACHE_MAP_ALL_UNSUPPORTED,
+	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= {0x10019, 0xFFFFFFF8},
+	[C(DTLB)][C(OP_WRITE)][C(RESULT_MISS)]	= {0x1001B, 0xFFFFFFF8},
+
+	[C(ITLB)][C(OP_READ)][C(RESULT_MISS)]	= {0x10021, 0xFFFFFFF8},
+};
+
+RVPMU_EVENT_CMASK_ATTR(cycles, cycles, 0x01, 0xFFFFFFF8);
+RVPMU_EVENT_CMASK_ATTR(instructions, instructions, 0x02, 0xFFFFFFF8);
+RVPMU_EVENT_CMASK_ATTR(dTLB-load-misses, dTLB_load_miss, 0x10019, 0xFFFFFFF8);
+RVPMU_EVENT_CMASK_ATTR(dTLB-store-misses, dTLB_store_miss, 0x1001B, 0xFFFFFFF8);
+RVPMU_EVENT_CMASK_ATTR(iTLB-load-misses, iTLB_load_miss, 0x10021, 0xFFFFFFF8);
+
+static struct attribute *qemu_virt_event_group[] = {
+	RVPMU_EVENT_ATTR_PTR(cycles),
+	RVPMU_EVENT_ATTR_PTR(instructions),
+	RVPMU_EVENT_ATTR_PTR(dTLB_load_miss),
+	RVPMU_EVENT_ATTR_PTR(dTLB_store_miss),
+	RVPMU_EVENT_ATTR_PTR(iTLB_load_miss),
+	NULL,
+};
+
 static struct riscv_vendor_pmu_events pmu_vendor_events_table[] = {
+	RISCV_VENDOR_PMU_EVENTS(QEMU_VIRT_VENDOR_ID, QEMU_VIRT_ARCH_ID, QEMU_VIRT_IMPL_ID,
+				qemu_virt_hw_event_map, qemu_virt_cache_event_map,
+				qemu_virt_event_group)
 };
 
 static const struct riscv_pmu_event *current_pmu_hw_event_map;

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 17/22] RISC-V: perf: Add legacy event encodings via sysfs
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Atish Patra <atishp@rivosinc.com>

Define sysfs details for the legacy events so that any tool can
parse these to understand the minimum set of legacy events
supported by the platform. The sysfs entry will describe both event
encoding and corresponding counter map so that an perf event can be
programmed accordingly.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 drivers/perf/riscv_pmu_sbi.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 1c846cdc96cf..8aaf16e31fdf 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -131,7 +131,20 @@ static struct attribute_group riscv_cdeleg_pmu_format_group = {
 	.attrs = riscv_cdeleg_pmu_formats_attr,
 };
 
+#define RVPMU_EVENT_ATTR_RESOLVE(m) #m
+#define RVPMU_EVENT_CMASK_ATTR(_name, _var, config, mask) \
+	PMU_EVENT_ATTR_STRING(_name, rvpmu_event_attr_##_var, \
+			      "event=" RVPMU_EVENT_ATTR_RESOLVE(config) \
+			      ",counterid_mask=" RVPMU_EVENT_ATTR_RESOLVE(mask))
+
+#define RVPMU_EVENT_ATTR_PTR(name) (&rvpmu_event_attr_##name.attr.attr)
+
+static struct attribute_group riscv_cdeleg_pmu_event_group __ro_after_init = {
+	.name = "events",
+};
+
 static const struct attribute_group *riscv_cdeleg_pmu_attr_groups[] = {
+	&riscv_cdeleg_pmu_event_group,
 	&riscv_cdeleg_pmu_format_group,
 	NULL,
 };
@@ -447,11 +460,14 @@ struct riscv_vendor_pmu_events {
 	const struct riscv_pmu_event *hw_event_map;
 	const struct riscv_pmu_event (*cache_event_map)[PERF_COUNT_HW_CACHE_OP_MAX]
 						       [PERF_COUNT_HW_CACHE_RESULT_MAX];
+	struct attribute **attrs_events;
 };
 
-#define RISCV_VENDOR_PMU_EVENTS(_vendorid, _archid, _implid, _hw_event_map, _cache_event_map) \
+#define RISCV_VENDOR_PMU_EVENTS(_vendorid, _archid, _implid, _hw_event_map, \
+				_cache_event_map, _attrs) \
 	{ .vendorid = _vendorid, .archid = _archid, .implid = _implid, \
-	  .hw_event_map = _hw_event_map, .cache_event_map = _cache_event_map },
+	  .hw_event_map = _hw_event_map, .cache_event_map = _cache_event_map, \
+	  .attrs_events = _attrs },
 
 static struct riscv_vendor_pmu_events pmu_vendor_events_table[] = {
 };
@@ -473,6 +489,8 @@ static void __init rvpmu_vendor_register_events(void)
 		    pmu_vendor_events_table[i].archid == arch_id) {
 			current_pmu_hw_event_map = pmu_vendor_events_table[i].hw_event_map;
 			current_pmu_cache_event_map = pmu_vendor_events_table[i].cache_event_map;
+			riscv_cdeleg_pmu_event_group.attrs =
+							pmu_vendor_events_table[i].attrs_events;
 			break;
 		}
 	}

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 15/22] RISC-V: perf: Skip PMU SBI extension when not implemented
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Charlie Jenkins <charlie@rivosinc.com>

When the PMU SBI extension is not implemented, sbi_v2_available should
not be set to true. The SBI implementation for counter config matching
and firmware counter read  should also be skipped when the SBI extension
is not implemented.

Signed-off-by: Atish Patra <atishp@meta.com>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 drivers/perf/riscv_pmu_sbi.c | 49 ++++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 6407e229c2c3..4f3a30143db1 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -495,27 +495,32 @@ static void rvpmu_sbi_check_event(struct sbi_pmu_event_data *edata)
 	}
 }
 
-static void rvpmu_sbi_check_std_events(struct work_struct *work)
+static void rvpmu_check_std_events(struct work_struct *work)
 {
 	int ret;
 
-	if (sbi_v3_available) {
-		ret = pmu_sbi_check_event_info();
-		if (ret)
-			pr_err("pmu_sbi_check_event_info failed with error %d\n", ret);
-		return;
-	}
+	if (riscv_pmu_sbi_available()) {
+		if (sbi_v3_available) {
+			ret = pmu_sbi_check_event_info();
+			if (ret)
+				pr_err("pmu_sbi_check_event_info failed with error %d\n", ret);
+			return;
+		}
 
-	for (int i = 0; i < ARRAY_SIZE(pmu_hw_event_sbi_map); i++)
-		rvpmu_sbi_check_event(&pmu_hw_event_sbi_map[i]);
+		for (int i = 0; i < ARRAY_SIZE(pmu_hw_event_sbi_map); i++)
+			rvpmu_sbi_check_event(&pmu_hw_event_sbi_map[i]);
 
-	for (int i = 0; i < ARRAY_SIZE(pmu_cache_event_sbi_map); i++)
-		for (int j = 0; j < ARRAY_SIZE(pmu_cache_event_sbi_map[i]); j++)
-			for (int k = 0; k < ARRAY_SIZE(pmu_cache_event_sbi_map[i][j]); k++)
-				rvpmu_sbi_check_event(&pmu_cache_event_sbi_map[i][j][k]);
+		for (int i = 0; i < ARRAY_SIZE(pmu_cache_event_sbi_map); i++)
+			for (int j = 0; j < ARRAY_SIZE(pmu_cache_event_sbi_map[i]); j++)
+				for (int k = 0; k < ARRAY_SIZE(pmu_cache_event_sbi_map[i][j]); k++)
+					rvpmu_sbi_check_event(&pmu_cache_event_sbi_map[i][j][k]);
+	} else {
+		DO_ONCE_LITE_IF(1, pr_info,
+				"Boot time config matching not required for smcdeleg\n");
+	}
 }
 
-static DECLARE_WORK(check_std_events_work, rvpmu_sbi_check_std_events);
+static DECLARE_WORK(check_std_events_work, rvpmu_check_std_events);
 
 static ssize_t rvpmu_format_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
@@ -708,6 +713,9 @@ static int rvpmu_sbi_ctr_get_idx(struct perf_event *event)
 
 	cflags = rvpmu_sbi_get_filter_flags(event);
 
+	if (!riscv_pmu_sbi_available())
+		return -ENOENT;
+
 	/*
 	 * In legacy mode, we have to force the fixed counters for those events
 	 * but not in the user access mode as we want to use the other counters
@@ -985,7 +993,7 @@ static u64 rvpmu_ctr_read(struct perf_event *event)
 		return val;
 	}
 
-	if (pmu_sbi_is_fw_event(event)) {
+	if (pmu_sbi_is_fw_event(event) && riscv_pmu_sbi_available()) {
 		ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
 				hwc->idx, 0, 0, 0, 0, 0);
 		if (ret.error)
@@ -2084,12 +2092,13 @@ static int __init rvpmu_devinit(void)
 	int ret;
 	struct platform_device *pdev;
 
-	if (sbi_spec_version >= sbi_mk_version(0, 3) &&
-	    sbi_probe_extension(SBI_EXT_PMU))
-		static_branch_enable(&riscv_pmu_sbi_available);
+	if (sbi_probe_extension(SBI_EXT_PMU)) {
+		if (sbi_spec_version >= sbi_mk_version(0, 3))
+			static_branch_enable(&riscv_pmu_sbi_available);
+		if (sbi_spec_version >= sbi_mk_version(2, 0))
+			sbi_v2_available = true;
+	}
 
-	if (sbi_spec_version >= sbi_mk_version(2, 0))
-		sbi_v2_available = true;
 	/*
 	 * We need all three extensions to be present to access the counters
 	 * in S-mode via Supervisor Counter delegation.

-- 
2.53.0-Meta



^ permalink raw reply related

* [PATCH v7 08/22] RISC-V: Add Sscfg extension CSR definition
From: Atish Patra @ 2026-06-22  8:04 UTC (permalink / raw)
  To: Jiri Olsa, James Clark, Mark Rutland, Will Deacon,
	Arnaldo Carvalho de Melo, Rob Herring, Ian Rogers,
	Krzysztof Kozlowski, Anup Patel, Paul Walmsley, Atish Patra,
	Namhyung Kim
  Cc: devicetree, linux-perf-users, Conor Dooley, linux-arm-kernel,
	linux-kernel, linux-riscv
In-Reply-To: <20260622-counter_delegation-v7-0-0ba2fd34614e@meta.com>

From: Kaiwen Xue <kaiwenx@rivosinc.com>

This adds the scountinhibit CSR definition and S-mode accessible hpmevent
bits defined by smcdeleg/ssccfg. scountinhibit allows S-mode to start/stop
counters directly from S-mode without invoking SBI calls to M-mode. It is
also used to figure out the counters delegated to S-mode by the M-mode as
well.

Signed-off-by: Kaiwen Xue <kaiwenx@rivosinc.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
---
 arch/riscv/include/asm/csr.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index b4551a6cf7cb..26cb78dee2fd 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -241,6 +241,31 @@
 #define SMSTATEEN0_HSENVCFG		(_ULL(1) << SMSTATEEN0_HSENVCFG_SHIFT)
 #define SMSTATEEN0_SSTATEEN0_SHIFT	63
 #define SMSTATEEN0_SSTATEEN0		(_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT)
+/* HPMEVENT bits. These are accessible in S-mode via Smcdeleg/Ssccfg */
+#ifdef CONFIG_64BIT
+#define HPMEVENT_OF			(BIT_ULL(63))
+#define HPMEVENT_MINH			(BIT_ULL(62))
+#define HPMEVENT_SINH			(BIT_ULL(61))
+#define HPMEVENT_UINH			(BIT_ULL(60))
+#define HPMEVENT_VSINH			(BIT_ULL(59))
+#define HPMEVENT_VUINH			(BIT_ULL(58))
+#else
+#define HPMEVENTH_OF			(BIT_ULL(31))
+#define HPMEVENTH_MINH			(BIT_ULL(30))
+#define HPMEVENTH_SINH			(BIT_ULL(29))
+#define HPMEVENTH_UINH			(BIT_ULL(28))
+#define HPMEVENTH_VSINH			(BIT_ULL(27))
+#define HPMEVENTH_VUINH			(BIT_ULL(26))
+
+#define HPMEVENT_OF			(HPMEVENTH_OF << 32)
+#define HPMEVENT_MINH			(HPMEVENTH_MINH << 32)
+#define HPMEVENT_SINH			(HPMEVENTH_SINH << 32)
+#define HPMEVENT_UINH			(HPMEVENTH_UINH << 32)
+#define HPMEVENT_VSINH			(HPMEVENTH_VSINH << 32)
+#define HPMEVENT_VUINH			(HPMEVENTH_VUINH << 32)
+#endif
+
+#define SISELECT_SSCCFG_BASE		0x40
 
 /* mseccfg bits */
 #define MSECCFG_PMM			ENVCFG_PMM
@@ -322,6 +347,7 @@
 #define CSR_SCOUNTEREN		0x106
 #define CSR_SENVCFG		0x10a
 #define CSR_SSTATEEN0		0x10c
+#define CSR_SCOUNTINHIBIT	0x120
 #define CSR_SSCRATCH		0x140
 #define CSR_SEPC		0x141
 #define CSR_SCAUSE		0x142

-- 
2.53.0-Meta



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox