Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2 1/3] dpll: add frequency monitoring to netlink spec
From: Ivan Vecera @ 2026-04-01  6:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Vadim Fedorenko, Arkadiusz Kubalewski, Jiri Pirko,
	Jonathan Corbet, Shuah Khan, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Prathosh Satish,
	Petr Oros, linux-doc, linux-kernel
In-Reply-To: <20260331200535.6a73e940@kernel.org>

Hi Kuba,

On 4/1/26 5:05 AM, Jakub Kicinski wrote:
> On Mon, 30 Mar 2026 12:55:03 +0200 Ivan Vecera wrote:
>> Add DPLL_A_FREQUENCY_MONITOR device attribute to allow control over
>> the frequency monitor feature. The attribute uses the existing
>> dpll_feature_state enum (enable/disable) and is present in both
>> device-get reply and device-set request.
>>
>> Add DPLL_A_PIN_MEASURED_FREQUENCY pin attribute to expose the measured
>> input frequency in Hz. The attribute is present in the pin-get reply.
> 
> 
>> +      -
>> +        name: frequency-monitor
>> +        type: u32
>> +        enum: feature-state
>> +        doc: Receive or request state of frequency monitor feature.
> 
> reads a bit clunkily - how about:
> 
> 	Current or desired state of the frequency monitor feature.
> 
> ?

Agreed, will update.

> 
>> +          If enabled, dpll device shall measure all currently available
>> +          inputs for their actual input frequency.
>>     -
>>       name: pin
>>       enum-name: dpll_a_pin
>> @@ -456,6 +463,13 @@ attribute-sets:
>>             Value is in PPT (parts per trillion, 10^-12).
>>             Note: This attribute provides higher resolution than the standard
>>             fractional-frequency-offset (which is in PPM).
>> +      -
>> +        name: measured-frequency
>> +        type: u64
>> +        doc: |
>> +          The measured frequency of the input pin in Hz.
>> +          This is the actual frequency being received on the pin,
>> +          as measured by the dpll device hardware.
> 
> If we make this a u64 should it be fixed point? Seems dubious that we'd
> ever be able to measure >4Ghz frequencies, much more likely that we'd
> want sub-1 precision ? So let's say this is fixed point 34.30 ?
> 
Good point. I'm going with a decimal divider (1000) instead of 34.30, 
following the same approach as DPLL_PHASE_OFFSET_DIVIDER. The value
is now in millihertz (mHz) with a DPLL_PIN_MEASURED_FREQUENCY_DIVIDER
constant for userspace to extract integer and fractional parts.

Thanks,
Ivan


^ permalink raw reply

* Re: [PATCH net-next v2 2/3] dpll: add frequency monitoring callback ops
From: Ivan Vecera @ 2026-04-01  6:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Vadim Fedorenko, Arkadiusz Kubalewski, Jiri Pirko,
	Jonathan Corbet, Shuah Khan, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Prathosh Satish,
	Petr Oros, linux-doc, linux-kernel
In-Reply-To: <20260331201001.03339bab@kernel.org>


On 4/1/26 5:10 AM, Jakub Kicinski wrote:
>> +static int dpll_msg_add_measured_freq(struct sk_buff *msg, struct dpll_pin *pin,
>> +				      struct dpll_pin_ref *ref,
>> +				    struct netlink_ext_ack *extack)
>> +{
>> +	const struct dpll_device_ops *dev_ops = dpll_device_ops(ref->dpll);
>> +	const struct dpll_pin_ops *ops = dpll_pin_ops(ref);
>> +	struct dpll_device *dpll = ref->dpll;
>> +	enum dpll_feature_state state;
>> +	u64 measured_freq;
>> +	int ret;
>> +
>> +	if (!ops->measured_freq_get)
>> +		return 0;
>> +	if (dev_ops->freq_monitor_get) {
> what are you trying to cater to by making freq_monitor_get optional
> here? I thought maybe some devices would have it always enabled, but
> then dpll_msg_add_freq_monitor() should presumably report enabled
> if !freq_monitor_get && measured_freq_get ?
> 
> Maybe there's some precedent in surrounding code outside of the context
> but the intention of the patch reads a bit off.

You're right, making it optional was not well thought out. I'm going to 
change it so that .freq_monitor_get() will be required when
.measured_freq_get() is provided.

Thanks,
Ivan


^ permalink raw reply

* Re: [PATCH 3/5] compiler_attributes: Add overflow_behavior macros __ob_trap and __ob_wrap
From: Vincent Mailhol @ 2026-04-01  7:19 UTC (permalink / raw)
  To: Kees Cook, Peter Zijlstra
  Cc: Justin Stitt, Marco Elver, Andrey Konovalov, Andrey Ryabinin,
	Jonathan Corbet, Shuah Khan, Miguel Ojeda, Nathan Chancellor,
	kasan-dev, linux-doc, llvm, Linus Torvalds, Nicolas Schier,
	Arnd Bergmann, Greg Kroah-Hartman, Andrew Morton, linux-kernel,
	linux-hardening, linux-kbuild
In-Reply-To: <20260331163725.2765789-3-kees@kernel.org>

Hi Kees,

Many thanks for this series. Great work and I am ready it with a lot of
interest!

Le 31/03/2026 à 18:37, Kees Cook a écrit :
> From: Justin Stitt <justinstitt@google.com>
> 
> When CONFIG_OVERFLOW_BEHAVIOR_TYPES=y, Clang 23+'s Overflow Behavior
> Type[1] annotations are available (i.e. __ob_trap, __ob_wrap). When not
> enabled, these need to be empty macros. Document the new annotation and
> add links from sanitizer docs pointing to the arithmetic-overflow docs.
> 
> Link: https://clang.llvm.org/docs/OverflowBehaviorTypes.html [1]
> Signed-off-by: Justin Stitt <justinstitt@google.com>
> Co-developed-by: Kees Cook <kees@kernel.org>
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Marco Elver <elver@google.com>
> Cc: Andrey Konovalov <andreyknvl@gmail.com>
> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <skhan@linuxfoundation.org>
> Cc: Miguel Ojeda <ojeda@kernel.org>
> Cc: Nathan Chancellor <nathan@kernel.org>
> Cc: <kasan-dev@googlegroups.com>
> Cc: <linux-doc@vger.kernel.org>
> Cc: <llvm@lists.linux.dev>
> ---
>  Documentation/dev-tools/ubsan.rst             |  13 +
>  Documentation/process/arithmetic-overflow.rst | 323 ++++++++++++++++++
>  Documentation/process/deprecated.rst          |  39 +++
>  Documentation/process/index.rst               |   1 +
>  include/linux/compiler_attributes.h           |  12 +
>  MAINTAINERS                                   |   1 +
>  6 files changed, 389 insertions(+)
>  create mode 100644 Documentation/process/arithmetic-overflow.rst
> 
> diff --git a/Documentation/dev-tools/ubsan.rst b/Documentation/dev-tools/ubsan.rst
> index e3591f8e9d5b..9e0c0f048eef 100644
> --- a/Documentation/dev-tools/ubsan.rst
> +++ b/Documentation/dev-tools/ubsan.rst
> @@ -71,6 +71,19 @@ unaligned accesses (CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y). One could
>  still enable it in config, just note that it will produce a lot of UBSAN
>  reports.
>  
> +Additional sanitizer options include::
> +
> +  CONFIG_OVERFLOW_BEHAVIOR_TYPES=y
> +
> +This enables checking for integer arithmetic wrap-around (overflow/underflow).
> +It instruments signed and unsigned integer overflow, as well as implicit
> +truncation operations. This option is currently limited to specific types
> +via the ``__ob_trap`` and ``__ob_wrap`` annotations.
> +
> +For detailed information about arithmetic overflow handling, overflow behavior
> +annotations, and best practices, see:
> +Documentation/process/arithmetic-overflow.rst
> +
>  References
>  ----------
>  
> diff --git a/Documentation/process/arithmetic-overflow.rst b/Documentation/process/arithmetic-overflow.rst
> new file mode 100644
> index 000000000000..2f19990f189b
> --- /dev/null
> +++ b/Documentation/process/arithmetic-overflow.rst
> @@ -0,0 +1,323 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _arithmetic_overflow:
> +
> +Arithmetic Overflow Resolutions for Linux
> +=========================================
> +
> +Background
> +----------
> +
> +When a calculation's result exceeds the involved storage ranges, several
> +strategies can be followed to handle such an overflow (or underflow),
> +including:
> +
> +  - Undefined (i.e. pretend it's impossible and the result depends on hardware)
> +  - Wrap around (this is what 2s-complement representation does by default)
> +  - Trap (create an exception so the problem can be handled in another way)
> +  - Saturate (explicitly hold the maximum or minimum representable value)

I just wanted to ask how much consideration was put into this last
"saturate" option.

When speaking of "safe" as in "functional safety" this seems a good
option to me. The best option is of course proper handling, but as
discussed, we are speaking of the scenario in which the code is already
buggy and which is the fallout option doing the least damage.

What I have in mind is a new __ob_saturate type qualifier. Something like:

	void foo(int num)
	{
		int __ob_saturate saturate_var = num;
	
		saturate_var += 42;
	}

would just print a warning and continue execution, thus solving the
trapping issue. The above code would generate something equivalent to that:

	void foo(int num)
	{
		int __ob_saturate saturate_var = num;
	
		if (check_add_overflow(saturate_var, increment,
				       &saturate_var) {
			WARN(true, "saturation occurred");
			saturate_var = type_max(saturate_var);
	}

People using those saturating integers could then later check that the
value is still in bound.

This is basically what your size_add() from overflow.h is already doing.
If an overflow occurred, the allocation the addition does not trap, it
just saturates and let the allocation functions properly handle the issue.

The saturation can neutralize many security attacks and can mitigate
some safety issues. Think of the Ariane 5 rocket launch: a saturation
could have prevented the unintended fireworks.

The caveat I can think of is that the old overflow check pattern becomes
invalid. Doing:

	if (saturate_var + increment < increment)

is now bogus and would need to be caught if possible by static analysis.
So those saturating integers will only be usable in newly written code
and could not be easily retrofitted.

> +In the C standard, three basic types can be involved in arithmetic, and each
> +has a default strategy for solving the overflow problem:
> +
> +  - Signed overflow is undefined
> +  - Unsigned overflow explicitly wraps around
> +  - Pointer overflow is undefined

Nitpick: the C standard uses different definitions than yours. In the
standard:

  - overflow is *always* undefined
  - unsigned integer wraparound
  - signed integer overflow

The nuance is that in the standard unsigned integers do not overflow,
they just wraparound.

I am not asking you to change your terminology, but it could be good to
state in your document that your definition of overflow differs from the
standard's definition. Maybe a terminology section could help.


Yours sincerely,
Vincent Mailhol


^ permalink raw reply

* [PATCH v2 1/4] PCI/DOE: Move common definitions to the header file
From: Aksh Garg @ 2026-04-01  7:30 UTC (permalink / raw)
  To: linux-pci, linux-doc, mani, kwilczynski, bhelgaas, corbet, kishon,
	skhan, lukas, cassel, alistair
  Cc: linux-arm-kernel, linux-kernel, s-vadapalli, danishanwar, srk,
	a-garg7
In-Reply-To: <20260401073022.215805-1-a-garg7@ti.com>

Move common macros and structures from drivers/pci/doe.c to
drivers/pci/pci.h to allow reuse across root complex and
endpoint DOE implementations.

PCI_DOE_MAX_LENGTH macro can be used outside the PCI core as well,
hence move the macro to include/linux/pci-doe.h.

These changes prepare the groundwork for the DOE endpoint implementation
that will reuse these common definitions.

Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Aksh Garg <a-garg7@ti.com>
---

Changes since v1:
- Moved the common macros that need not be visible outside the PCI core
  to drivers/pci/pci.h instead to include/linux/pci-doe.h as suggested
  by Lukas Wunner
- Removed the redundant empty inlines guarded with CONFIG_PCI_DOE in
  include/linux/pci-doe.h.

v1: https://lore.kernel.org/all/20260213123603.420941-3-a-garg7@ti.com/

 drivers/pci/doe.c       | 11 -----------
 drivers/pci/pci.h       |  9 +++++++++
 include/linux/pci-doe.h |  3 +++
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 7b41da4ec11a..e8d9e95644b3 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -28,12 +28,6 @@
 #define PCI_DOE_TIMEOUT HZ
 #define PCI_DOE_POLL_INTERVAL	(PCI_DOE_TIMEOUT / 128)
 
-#define PCI_DOE_FLAG_CANCEL	0
-#define PCI_DOE_FLAG_DEAD	1
-
-/* Max data object length is 2^18 dwords */
-#define PCI_DOE_MAX_LENGTH	(1 << 18)
-
 /**
  * struct pci_doe_mb - State for a single DOE mailbox
  *
@@ -63,11 +57,6 @@ struct pci_doe_mb {
 #endif
 };
 
-struct pci_doe_feature {
-	u16 vid;
-	u8 type;
-};
-
 /**
  * struct pci_doe_task - represents a single query/response
  *
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 13d998fbacce..66b7ec80f46f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -681,6 +681,15 @@ struct pci_sriov {
 	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
 };
 
+/* DOE Mailbox state flags */
+#define PCI_DOE_FLAG_CANCEL	0
+#define PCI_DOE_FLAG_DEAD	1
+
+struct pci_doe_feature {
+	u16 vid;
+	u8 type;
+};
+
 #ifdef CONFIG_PCI_DOE
 void pci_doe_init(struct pci_dev *pdev);
 void pci_doe_destroy(struct pci_dev *pdev);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index bd4346a7c4e7..abb9b7ae8029 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -19,6 +19,9 @@ struct pci_doe_mb;
 #define PCI_DOE_FEATURE_CMA 1
 #define PCI_DOE_FEATURE_SSESSION 2
 
+/* Max data object length is 2^18 dwords */
+#define PCI_DOE_MAX_LENGTH		(1 << 18)
+
 struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
 					u8 type);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH v2 0/4] PCI: Add DOE support for endpoint
From: Aksh Garg @ 2026-04-01  7:30 UTC (permalink / raw)
  To: linux-pci, linux-doc, mani, kwilczynski, bhelgaas, corbet, kishon,
	skhan, lukas, cassel, alistair
  Cc: linux-arm-kernel, linux-kernel, s-vadapalli, danishanwar, srk,
	a-garg7

This patch series introduces the framework for supporting the Data
Object Exchange (DOE) feature for PCIe endpoint devices. Please refer
to the documentation added in patch 4 for details on the feature and
implementation architecture.

The implementation provides a common framework for all PCIe endpoint
controllers, not specific to any particular SoC vendor.

This patch series is the non-RFC version of the RFC series at 
https://lore.kernel.org/all/20260213123603.420941-1-a-garg7@ti.com/

The changes since v1 are documented in the respective patch description.

A new patch (patch 3) have been introduced in the series. This patch
moves the burden of initializing and setting-up the DOE mailbox from
controller driver to EPC core driver, avoiding the duplication of code
across the controller drivers. This patch adds APIs to the EPC core
driver, which simply needs to be called by the controller driver during
probe/cleanup if it supports DOE capability.

Aksh Garg (4):
  PCI/DOE: Move common definitions to the header file
  PCI: endpoint: Add DOE mailbox support for endpoint functions
  PCI: endpoint: Add API for DOE initialization and setup in EPC core
  Documentation: PCI: Add documentation for DOE endpoint support

 Documentation/PCI/endpoint/index.rst          |   1 +
 .../PCI/endpoint/pci-endpoint-doe.rst         | 318 ++++++++++
 drivers/pci/doe.c                             |  11 -
 drivers/pci/endpoint/Kconfig                  |  14 +
 drivers/pci/endpoint/Makefile                 |   1 +
 drivers/pci/endpoint/pci-ep-doe.c             | 552 ++++++++++++++++++
 drivers/pci/endpoint/pci-epc-core.c           |  71 +++
 drivers/pci/pci.h                             |  47 ++
 include/linux/pci-doe.h                       |   8 +
 include/linux/pci-epc.h                       |  24 +
 10 files changed, 1036 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/PCI/endpoint/pci-endpoint-doe.rst
 create mode 100644 drivers/pci/endpoint/pci-ep-doe.c

-- 
2.34.1

^ permalink raw reply

* [PATCH v2 2/4] PCI: endpoint: Add DOE mailbox support for endpoint functions
From: Aksh Garg @ 2026-04-01  7:30 UTC (permalink / raw)
  To: linux-pci, linux-doc, mani, kwilczynski, bhelgaas, corbet, kishon,
	skhan, lukas, cassel, alistair
  Cc: linux-arm-kernel, linux-kernel, s-vadapalli, danishanwar, srk,
	a-garg7
In-Reply-To: <20260401073022.215805-1-a-garg7@ti.com>

DOE (Data Object Exchange) is a standard PCIe extended capability
feature introduced in the Data Object Exchange (DOE) ECN for
PCIe r5.0. It provides a communication mechanism primarily used for
implementing PCIe security features such as device authentication, and
secure link establishment. Think of DOE as a sophisticated mailbox
system built into PCIe. The root complex can send structured requests
to the endpoint device through DOE mailboxes, and the endpoint device
responds with appropriate data.

Add the DOE support for PCIe endpoint devices, enabling endpoint
functions to process the DOE requests from the host. The implementation
provides framework APIs for EPC core driver and controller drivers to
register mailboxes, and request processing with workqueues ensuring
sequential handling per mailbox, and parallel handling across mailboxes.
The Discovery protocol is handled internally by the DOE core.

This implementation complements the existing DOE implementation for
root complex in drivers/pci/doe.c.

Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Aksh Garg <a-garg7@ti.com>
---

Changes since v1:
- Moved the DOE-EP core file to drivers/pci/endpoint/pci-ep-doe.c, and
  corresponding Kconfig and Makefile to match the existing naming scheme,
  as suggested by Niklas Cassel.
- Renamed the config from PCI_DOE_EP to PCI_ENDPOINT_DOE
- Moved the function declarations that need not be visible outside the
  PCI core to drivers/pci/pci.h instead to include/linux/pci-doe.h as
  suggested by Lukas Wunner
- Converted from synchronous to asynchronous request processing:
  * Removed wait_for_completion() from pci_ep_doe_process_request()
  * Function returns immediately after queuing to workqueue, hence
    removed private data for completion in the task structure
  * Added completion callback as an additional argument to
    pci_ep_doe_process_request(), which takes the response and status
    parameters as arguments (along with other required arguments), hence
    removed task_status in the task structure
  * Created a typedef pci_ep_doe_complete_t for completion callback
  * Removed the pci_ep_doe_task_complete() function, as it would not be
    required anymore with these changes
  * Moved from INIT_WORK_ONSTACK() to INIT_WORK(), to initialize the work
    on heap instead of stack
  * signal_task_complete() now invokes the completion callback, once the
    protocol handler completes its task
- Changed from dynamic xarray-based protocol registration to static array:
  * Removed the register/unregister protocol APIs
  * Replaced the dynamic xarray with static array of struct pci_doe_protocol
  * Added discovery protocol to static array, instead of treating it specially,
    hence removed the special handling for Discovery protocol in
    doe_ep_task_work()
  * Updated pci_ep_doe_handle_discovery() and pci_ep_doe_find_protocol()
    accordingly.
- Memory Management:
  * DOE core frees request buffer in signal_task_complete()
    or during error handling
  * pci_ep_doe_process_request() defines response_pl and response_pl_sz
    as NULL and 0 respectively, whose pointer is passed to the protocol
    handler, hence removed the arguments void **response, size_t *response_sz
    to this function.
- Task structure refactoring:
  * Response buffer: void **response_pl to void *response_pl
  * Response size: size_t *response_pl_sz to size_t response_pl_sz
  * Changed the completion callback to type pci_ep_doe_complete_t
  * Removed void *private and int task_status
- Updated documentation comments of the functions according to the changes 

v1: https://lore.kernel.org/all/20260213123603.420941-4-a-garg7@ti.com/

 drivers/pci/endpoint/Kconfig      |  14 +
 drivers/pci/endpoint/Makefile     |   1 +
 drivers/pci/endpoint/pci-ep-doe.c | 552 ++++++++++++++++++++++++++++++
 drivers/pci/pci.h                 |  38 ++
 include/linux/pci-doe.h           |   5 +
 include/linux/pci-epc.h           |   3 +
 6 files changed, 613 insertions(+)
 create mode 100644 drivers/pci/endpoint/pci-ep-doe.c

diff --git a/drivers/pci/endpoint/Kconfig b/drivers/pci/endpoint/Kconfig
index 8dad291be8b8..15ae16aaa58f 100644
--- a/drivers/pci/endpoint/Kconfig
+++ b/drivers/pci/endpoint/Kconfig
@@ -36,6 +36,20 @@ config PCI_ENDPOINT_MSI_DOORBELL
 	  doorbell. The RC can trigger doorbell in EP by writing data to a
 	  dedicated BAR, which the EP maps to the controller's message address.
 
+config PCI_ENDPOINT_DOE
+	bool "PCI Endpoint Data Object Exchange (DOE) support"
+	depends on PCI_ENDPOINT
+	help
+	  This enables support for Data Object Exchange (DOE) protocol
+	  on PCI Endpoint controllers. It provides a communication
+	  mechanism through mailboxes, primarily used for PCIe security
+	  features.
+
+	  Say Y here if you want be able to communicate using PCIe DOE
+	  mailboxes.
+
+	  If unsure, say N.
+
 source "drivers/pci/endpoint/functions/Kconfig"
 
 endmenu
diff --git a/drivers/pci/endpoint/Makefile b/drivers/pci/endpoint/Makefile
index b4869d52053a..1fa176b6792b 100644
--- a/drivers/pci/endpoint/Makefile
+++ b/drivers/pci/endpoint/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_PCI_ENDPOINT_CONFIGFS)	+= pci-ep-cfs.o
 obj-$(CONFIG_PCI_ENDPOINT)		+= pci-epc-core.o pci-epf-core.o\
 					   pci-epc-mem.o functions/
 obj-$(CONFIG_PCI_ENDPOINT_MSI_DOORBELL)	+= pci-ep-msi.o
+obj-$(CONFIG_PCI_ENDPOINT_DOE)		+= pci-ep-doe.o
diff --git a/drivers/pci/endpoint/pci-ep-doe.c b/drivers/pci/endpoint/pci-ep-doe.c
new file mode 100644
index 000000000000..ded0290b15ed
--- /dev/null
+++ b/drivers/pci/endpoint/pci-ep-doe.c
@@ -0,0 +1,552 @@
+// SPDX-License-Identifier: GPL-2.0-only or MIT
+/*
+ * Data Object Exchange for PCIe Endpoint
+ *	PCIe r7.0, sec 6.30 DOE
+ *
+ * Copyright (C) 2026 Texas Instruments Incorporated - https://www.ti.com
+ *	Aksh Garg <a-garg7@ti.com>
+ *	Siddharth Vadapalli <s-vadapalli@ti.com>
+ */
+
+#define dev_fmt(fmt) "DOE EP: " fmt
+
+#include <linux/bitfield.h>
+#include <linux/device.h>
+#include <linux/pci.h>
+#include <linux/pci-epc.h>
+#include <linux/pci-doe.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/xarray.h>
+
+#include "../pci.h"
+
+/* Forward declaration of discovery protocol handler */
+static int pci_ep_doe_handle_discovery(const void *request, size_t request_sz,
+				       void **response, size_t *response_sz);
+
+/**
+ * struct pci_doe_protocol - DOE protocol handler entry
+ * @vid: Vendor ID
+ * @type: Protocol type
+ * @handler: Handler function pointer
+ */
+struct pci_doe_protocol {
+	u16 vid;
+	u8 type;
+	pci_doe_protocol_handler_t handler;
+};
+
+/**
+ * struct pci_ep_doe_mb - State for a single DOE mailbox on EP
+ *
+ * This state is used to manage a single DOE mailbox capability on the
+ * endpoint side.
+ *
+ * @epc: PCI endpoint controller this mailbox belongs to
+ * @func_no: Physical function number of the function this mailbox belongs to
+ * @cap_offset: Capability offset
+ * @work_queue: Queue of work items
+ * @flags: Bit array of PCI_DOE_FLAG_* flags
+ */
+struct pci_ep_doe_mb {
+	struct pci_epc *epc;
+	u8 func_no;
+	u16 cap_offset;
+	struct workqueue_struct *work_queue;
+	unsigned long flags;
+};
+
+/**
+ * struct pci_ep_doe_task - Represents a single DOE request/response task
+ *
+ * @feat: DOE feature (vendor ID and type)
+ * @request_pl: Request payload
+ * @request_pl_sz: Size of request payload in bytes
+ * @response_pl: Response buffer
+ * @response_pl_sz: Size of response buffer in bytes
+ * @complete: Completion callback
+ * @work: Work structure for workqueue
+ * @doe_mb: DOE mailbox handling this task
+ */
+struct pci_ep_doe_task {
+	struct pci_doe_feature feat;
+	const void *request_pl;
+	size_t request_pl_sz;
+	void *response_pl;
+	size_t response_pl_sz;
+	pci_ep_doe_complete_t complete;
+
+	/* Initialized by pci_ep_doe_submit_task() */
+	struct work_struct work;
+	struct pci_ep_doe_mb *doe_mb;
+};
+
+/*
+ * Global registry of protocol handlers.
+ * When a new DOE protocol, library is added, add an entry to this array.
+ */
+static const struct pci_doe_protocol pci_doe_protocols[] = {
+	{
+		.vid = PCI_VENDOR_ID_PCI_SIG,
+		.type = PCI_DOE_FEATURE_DISCOVERY,
+		.handler = pci_ep_doe_handle_discovery,
+	},
+};
+
+/*
+ * Combines function number and capability offset into a unique lookup key
+ * for storing/retrieving DOE mailboxes in an xarray.
+ */
+#define PCI_DOE_MB_KEY(func, offset) \
+	(((unsigned long)(func) << 16) | (offset))
+#define PCI_DOE_PROTOCOL_COUNT        ARRAY_SIZE(pci_doe_protocols)
+
+/**
+ * pci_ep_doe_init() - Initialize the DOE framework for a controller in EP mode
+ * @epc: PCI endpoint controller
+ *
+ * Initialize the DOE framework data structures. This only initializes
+ * the xarray that will hold the mailboxes.
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+int pci_ep_doe_init(struct pci_epc *epc)
+{
+	if (!epc)
+		return -EINVAL;
+
+	xa_init(&epc->doe_mbs);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_ep_doe_init);
+
+/**
+ * pci_ep_doe_add_mailbox() - Add a DOE mailbox for a physical function
+ * @epc: PCI endpoint controller
+ * @func_no: Physical function number
+ * @cap_offset: Offset of the DOE capability
+ *
+ * Create and register a DOE mailbox for the specified physical function
+ * and capability offset.
+ *
+ * EPC core driver calls this for each DOE capability discovered in the config
+ * space of each endpoint function through an API. The API is invoked by the
+ * controller driver during initialization if DOE support is available.
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+int pci_ep_doe_add_mailbox(struct pci_epc *epc, u8 func_no, u16 cap_offset)
+{
+	struct pci_ep_doe_mb *doe_mb;
+	unsigned long key;
+	int ret;
+
+	if (!epc)
+		return -EINVAL;
+
+	doe_mb = kzalloc_obj(*doe_mb, GFP_KERNEL);
+	if (!doe_mb)
+		return -ENOMEM;
+
+	doe_mb->epc = epc;
+	doe_mb->func_no = func_no;
+	doe_mb->cap_offset = cap_offset;
+
+	doe_mb->work_queue = alloc_ordered_workqueue("pci_ep_doe[%s:pf%d:offset%x]", 0,
+						     dev_name(&epc->dev),
+						     func_no, cap_offset);
+	if (!doe_mb->work_queue) {
+		dev_err(epc->dev.parent,
+			"[pf%d:offset%x] failed to allocate work queue\n",
+			func_no, cap_offset);
+		ret = -ENOMEM;
+		goto err_free;
+	}
+
+	/* Add to xarray with composite key */
+	key = PCI_DOE_MB_KEY(func_no, cap_offset);
+	ret = xa_insert(&epc->doe_mbs, key, doe_mb, GFP_KERNEL);
+	if (ret) {
+		dev_err(epc->dev.parent,
+			"[pf%d:offset%x] failed to insert mailbox: %d\n",
+			func_no, cap_offset, ret);
+		goto err_destroy;
+	}
+
+	dev_dbg(epc->dev.parent,
+		"DOE mailbox added: pf%d offset 0x%x\n",
+		func_no, cap_offset);
+
+	return 0;
+
+err_destroy:
+	destroy_workqueue(doe_mb->work_queue);
+err_free:
+	kfree(doe_mb);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pci_ep_doe_add_mailbox);
+
+/**
+ * pci_ep_doe_cancel_tasks() - Cancel all pending tasks
+ * @doe_mb: DOE mailbox
+ *
+ * Cancel all pending tasks in the mailbox. Mark the mailbox as dead
+ * so no new tasks can be submitted.
+ */
+static void pci_ep_doe_cancel_tasks(struct pci_ep_doe_mb *doe_mb)
+{
+	if (!doe_mb)
+		return;
+
+	/* Mark the mailbox as dead */
+	set_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags);
+
+	/* Stop all pending work items from starting */
+	set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags);
+}
+
+/**
+ * pci_ep_doe_get_mailbox() - Get DOE mailbox by function and offset
+ * @epc: PCI endpoint controller
+ * @func_no: Physical function number
+ * @cap_offset: Offset of the DOE capability
+ *
+ * Internal helper to look up a DOE mailbox by its function number and
+ * capability offset.
+ *
+ * RETURNS: Pointer to the mailbox or NULL if not found
+ */
+static struct pci_ep_doe_mb *pci_ep_doe_get_mailbox(struct pci_epc *epc,
+						    u8 func_no, u16 cap_offset)
+{
+	unsigned long key;
+
+	if (!epc)
+		return NULL;
+
+	key = PCI_DOE_MB_KEY(func_no, cap_offset);
+	return xa_load(&epc->doe_mbs, key);
+}
+
+/**
+ * pci_ep_doe_find_protocol() - Find protocol handler in static array
+ * @vendor: Vendor ID
+ * @type: Protocol type
+ *
+ * Look up a protocol handler in the static protocol array by matching vendor ID
+ * and protocol type.
+ *
+ * RETURNS: Handler function pointer or NULL if not found
+ */
+static pci_doe_protocol_handler_t pci_ep_doe_find_protocol(u16 vendor, u8 type)
+{
+	int i;
+
+	/* Search static protocol array */
+	for (i = 0; i < PCI_DOE_PROTOCOL_COUNT; i++) {
+		if (pci_doe_protocols[i].vid == vendor &&
+		    pci_doe_protocols[i].type == type)
+			return pci_doe_protocols[i].handler;
+	}
+
+	return NULL;
+}
+
+/**
+ * pci_ep_doe_handle_discovery() - Handle Discovery protocol request
+ * @request: Request payload
+ * @request_sz: Request size
+ * @response: Output pointer for response buffer
+ * @response_sz: Output pointer for response size
+ *
+ * Handle the DOE Discovery protocol. The request contains an index specifying
+ * which protocol to query. This function creates a response containing the
+ * vendor ID and protocol type for the requested index, along with the next
+ * index value for further discovery:
+ *
+ * - next_index = 0: Signals this is the last protocol supported
+ * - next_index = n (non-zero): Signals more protocols available,
+ *   query index n next
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+static int pci_ep_doe_handle_discovery(const void *request, size_t request_sz,
+				       void **response, size_t *response_sz)
+{
+	struct pci_doe_protocol protocol;
+	u8 requested_index, next_index;
+	u32 *response_pl;
+	u32 request_pl;
+	u16 vendor;
+	u8 type;
+
+	if (request_sz != sizeof(u32))
+		return -EINVAL;
+
+	request_pl = *(u32 *)request;
+	requested_index = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, request_pl);
+
+	if (requested_index >= PCI_DOE_PROTOCOL_COUNT)
+		return -EINVAL;
+
+	/* Get protocol from array at requested_index */
+	protocol = pci_doe_protocols[requested_index];
+	vendor = protocol.vid;
+	type = protocol.type;
+
+	/* Calculate next index */
+	next_index = (requested_index + 1 < PCI_DOE_PROTOCOL_COUNT) ? requested_index + 1 : 0;
+
+	response_pl = kzalloc_obj(*response_pl, GFP_KERNEL);
+	if (!response_pl)
+		return -ENOMEM;
+
+	/* Build response */
+	*response_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, vendor) |
+		       FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_RSP_3_TYPE, type) |
+		       FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX, next_index);
+
+	*response = response_pl;
+	*response_sz = sizeof(*response_pl);
+
+	return 0;
+}
+
+static void signal_task_complete(struct pci_ep_doe_task *task, int status)
+{
+	kfree(task->request_pl);
+	task->complete(task->doe_mb->func_no, task->doe_mb->cap_offset, status,
+		       task->feat.vid, task->feat.type,
+		       task->response_pl, task->response_pl_sz);
+	kfree(task);
+}
+
+/**
+ * doe_ep_task_work() - Work function for processing DOE EP tasks
+ * @work: Work structure
+ *
+ * Process a DOE request by calling the appropriate protocol handler.
+ */
+static void doe_ep_task_work(struct work_struct *work)
+{
+	struct pci_ep_doe_task *task = container_of(work, struct pci_ep_doe_task,
+						    work);
+	struct pci_ep_doe_mb *doe_mb = task->doe_mb;
+	pci_doe_protocol_handler_t handler;
+	int rc;
+
+	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
+		signal_task_complete(task, -EIO);
+		return;
+	}
+
+	/* Check if request was aborted */
+	if (test_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags)) {
+		signal_task_complete(task, -ECANCELED);
+		return;
+	}
+
+	/* Find protocol handler in the array */
+	handler = pci_ep_doe_find_protocol(task->feat.vid, task->feat.type);
+	if (!handler) {
+		dev_warn(doe_mb->epc->dev.parent,
+			 "[%d:%x] Unsupported protocol VID=%04x TYPE=%02x\n",
+			 doe_mb->func_no, doe_mb->cap_offset,
+			 task->feat.vid, task->feat.type);
+		signal_task_complete(task, -EOPNOTSUPP);
+		return;
+	}
+
+	/* Call protocol handler */
+	rc = handler(task->request_pl, task->request_pl_sz,
+		     &task->response_pl, &task->response_pl_sz);
+
+	signal_task_complete(task, rc);
+}
+
+/**
+ * pci_ep_doe_submit_task() - Submit a task to be processed
+ * @doe_mb: DOE mailbox
+ * @task: Task to submit
+ *
+ * Submit a DOE task to the workqueue for asynchronous processing.
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+static int pci_ep_doe_submit_task(struct pci_ep_doe_mb *doe_mb,
+				  struct pci_ep_doe_task *task)
+{
+	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
+		return -EIO;
+
+	task->doe_mb = doe_mb;
+	INIT_WORK(&task->work, doe_ep_task_work);
+	queue_work(doe_mb->work_queue, &task->work);
+	return 0;
+}
+
+/**
+ * pci_ep_doe_process_request() - Process DOE request on endpoint
+ * @epc: PCI endpoint controller
+ * @func_no: Physical function number
+ * @cap_offset: DOE capability offset
+ * @vendor: Vendor ID from request header
+ * @type: Protocol type from request header
+ * @request: Request payload in CPU-native format
+ * @request_sz: Size of request payload (bytes)
+ * @complete: Callback to invoke upon completion
+ *
+ * Asynchronously process a DOE request received on the endpoint. The request
+ * payload should not include the DOE header (vendor/type/length). The protocol
+ * handler will allocate the response buffer, which the caller (controller driver)
+ * must free after use.
+ *
+ * This function returns immediately after queuing the request. The completion
+ * callback will be invoked asynchronously from workqueue context once the
+ * request is processed. The callback receives the function number and capability
+ * offset to identify the mailbox, along with a status code (0 on success, -errno
+ * on failure), and other required arguments.
+ *
+ * As per DOE specification, a mailbox processes one request at a time.
+ * Therefore, this function will never be called concurrently for the same
+ * mailbox by different callers.
+ *
+ * The caller is responsible for the conversion of the received DOE request
+ * with le32_to_cpu() before calling this function.
+ * Similarly, it is responsible for converting the response payload with
+ * cpu_to_le32() before sending it back over the DOE mailbox.
+ *
+ * The caller is also responsible for ensuring that the request size
+ * is within the limits defined by PCI_DOE_MAX_LENGTH.
+ *
+ * RETURNS: 0 if the request was successfully queued, -errno on failure
+ */
+int pci_ep_doe_process_request(struct pci_epc *epc, u8 func_no, u16 cap_offset,
+			       u16 vendor, u8 type, const void *request, size_t request_sz,
+			       pci_ep_doe_complete_t complete)
+{
+	struct pci_ep_doe_mb *doe_mb;
+	struct pci_ep_doe_task *task;
+	int rc;
+
+	doe_mb = pci_ep_doe_get_mailbox(epc, func_no, cap_offset);
+	if (!doe_mb) {
+		kfree(request);
+		return -ENODEV;
+	}
+
+	task = kzalloc_obj(*task, GFP_KERNEL);
+	if (!task) {
+		kfree(request);
+		return -ENOMEM;
+	}
+
+	task->feat.vid = vendor;
+	task->feat.type = type;
+	task->request_pl = request;
+	task->request_pl_sz = request_sz;
+	task->response_pl = NULL;
+	task->response_pl_sz = 0;
+	task->complete = complete;
+
+	rc = pci_ep_doe_submit_task(doe_mb, task);
+	if (rc) {
+		kfree(request);
+		kfree(task);
+		return rc;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_ep_doe_process_request);
+
+/**
+ * pci_ep_doe_abort() - Abort DOE operations on a mailbox
+ * @epc: PCI endpoint controller
+ * @func_no: Physical function number
+ * @cap_offset: DOE capability offset
+ *
+ * Abort all queued and wait for in-flight DOE operations to complete for the
+ * specified mailbox. This function is called by the EP controller driver
+ * when the RC sets the ABORT bit in the DOE Control register.
+ *
+ * The function will:
+ *
+ * - Set CANCEL flag to prevent new requests in the queue from starting
+ * - Wait for the currently executing handler to complete (cannot interrupt)
+ * - Flush the workqueue to wait for all requests to be handled appropriately
+ * - Clear CANCEL flag to prepare for new requests
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+int pci_ep_doe_abort(struct pci_epc *epc, u8 func_no, u16 cap_offset)
+{
+	struct pci_ep_doe_mb *doe_mb;
+
+	if (!epc)
+		return -EINVAL;
+
+	doe_mb = pci_ep_doe_get_mailbox(epc, func_no, cap_offset);
+	if (!doe_mb)
+		return -ENODEV;
+
+	/* Set CANCEL flag - worker will abort queued requests */
+	set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags);
+	flush_workqueue(doe_mb->work_queue);
+
+	/* Clear CANCEL flag - mailbox ready for new requests */
+	clear_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags);
+
+	dev_dbg(epc->dev.parent,
+		"DOE mailbox aborted: PF%d offset 0x%x\n",
+		func_no, cap_offset);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_ep_doe_abort);
+
+/**
+ * pci_ep_doe_destroy_mb() - Destroy a single DOE mailbox
+ * @doe_mb: DOE mailbox to destroy
+ *
+ * Internal function to destroy a mailbox and free its resources.
+ */
+static void pci_ep_doe_destroy_mb(struct pci_ep_doe_mb *doe_mb)
+{
+	if (!doe_mb)
+		return;
+
+	pci_ep_doe_cancel_tasks(doe_mb);
+
+	if (doe_mb->work_queue)
+		destroy_workqueue(doe_mb->work_queue);
+
+	kfree(doe_mb);
+}
+
+/**
+ * pci_ep_doe_destroy() - Destroy all DOE mailboxes
+ * @epc: PCI endpoint controller
+ *
+ * Destroy all DOE mailboxes and free associated resources.
+ *
+ * The EPC core driver calls this through an API, invoked by the controller
+ * driver during controller cleanup to free all DOE resources,
+ * if DOE support is available.
+ */
+void pci_ep_doe_destroy(struct pci_epc *epc)
+{
+	struct pci_ep_doe_mb *doe_mb;
+	unsigned long index;
+
+	if (!epc)
+		return;
+
+	xa_for_each(&epc->doe_mbs, index, doe_mb)
+		pci_ep_doe_destroy_mb(doe_mb);
+
+	xa_destroy(&epc->doe_mbs);
+}
+EXPORT_SYMBOL_GPL(pci_ep_doe_destroy);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 66b7ec80f46f..456e0717dd54 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -690,6 +690,12 @@ struct pci_doe_feature {
 	u8 type;
 };
 
+struct pci_epc;
+
+typedef void (*pci_ep_doe_complete_t)(u8 func_no, u16 cap_offset, int status,
+				      u16 vendor, u8 type, void *response_pl,
+				      size_t response_pl_sz);
+
 #ifdef CONFIG_PCI_DOE
 void pci_doe_init(struct pci_dev *pdev);
 void pci_doe_destroy(struct pci_dev *pdev);
@@ -700,6 +706,38 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { }
 static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
 #endif
 
+#ifdef CONFIG_PCI_ENDPOINT_DOE
+int pci_ep_doe_init(struct pci_epc *epc);
+int pci_ep_doe_add_mailbox(struct pci_epc *epc, u8 func_no, u16 cap_offset);
+int pci_ep_doe_process_request(struct pci_epc *epc, u8 func_no, u16 cap_offset,
+			       u16 vendor, u8 type, const void *request,
+			       size_t request_sz, pci_ep_doe_complete_t complete);
+int pci_ep_doe_abort(struct pci_epc *epc, u8 func_no, u16 cap_offset);
+void pci_ep_doe_destroy(struct pci_epc *epc);
+#else
+static inline int pci_ep_doe_init(struct pci_epc *epc) { return -EOPNOTSUPP; }
+static inline int pci_ep_doe_add_mailbox(struct pci_epc *epc, u8 func_no,
+					 u16 cap_offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int pci_ep_doe_process_request(struct pci_epc *epc, u8 func_no,
+					     u16 cap_offset, u16 vendor, u8 type,
+					     const void *request, size_t request_sz,
+					     pci_ep_doe_complete_t complete)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int pci_ep_doe_abort(struct pci_epc *epc, u8 func_no, u16 cap_offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void pci_ep_doe_destroy(struct pci_epc *epc) { }
+#endif
+
 #ifdef CONFIG_PCI_NPEM
 void pci_npem_create(struct pci_dev *dev);
 void pci_npem_remove(struct pci_dev *dev);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index abb9b7ae8029..c46e42f3ce78 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -22,6 +22,11 @@ struct pci_doe_mb;
 /* Max data object length is 2^18 dwords */
 #define PCI_DOE_MAX_LENGTH		(1 << 18)
 
+typedef int (*pci_doe_protocol_handler_t)(const void *request,
+					  size_t request_sz,
+					  void **response,
+					  size_t *response_sz);
+
 struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
 					u8 type);
 
diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
index c021c7af175f..cfe74585be4c 100644
--- a/include/linux/pci-epc.h
+++ b/include/linux/pci-epc.h
@@ -182,6 +182,9 @@ struct pci_epc {
 	unsigned long			function_num_map;
 	int				domain_nr;
 	bool				init_complete;
+#ifdef CONFIG_PCI_ENDPOINT_DOE
+	struct xarray			doe_mbs;
+#endif
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related

* [PATCH v2 4/4] Documentation: PCI: Add documentation for DOE endpoint support
From: Aksh Garg @ 2026-04-01  7:30 UTC (permalink / raw)
  To: linux-pci, linux-doc, mani, kwilczynski, bhelgaas, corbet, kishon,
	skhan, lukas, cassel, alistair
  Cc: linux-arm-kernel, linux-kernel, s-vadapalli, danishanwar, srk,
	a-garg7
In-Reply-To: <20260401073022.215805-1-a-garg7@ti.com>

Document the architecture and implementation details for the Data Object
Exchange (DOE) framework for PCIe Endpoint devices.

Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Aksh Garg <a-garg7@ti.com>
---

Changes since v1:
- Squashed the patches [1] and [2], and moved the documentation file
  to Documentation/PCI/endpoint/pci-endpoint-doe.rst to match the existing
  naming scheme, as suggested by Niklas Cassel
- Updated the documentation as per the design and implementaion changes
  made to previous patches in this series:
  * Updated for static protocol array instead of dynamic registration
  * Documented asynchronous callback model
  * Updated request/response flow with new callback signature
  * Updated memory ownership: DOE core frees request, driver frees response
  * Updated initialization and cleanup sections for new APIs

v1: [1] https://lore.kernel.org/all/20260213123603.420941-2-a-garg7@ti.com/
    [2] https://lore.kernel.org/all/20260213123603.420941-5-a-garg7@ti.com/

 Documentation/PCI/endpoint/index.rst          |   1 +
 .../PCI/endpoint/pci-endpoint-doe.rst         | 318 ++++++++++++++++++
 2 files changed, 319 insertions(+)
 create mode 100644 Documentation/PCI/endpoint/pci-endpoint-doe.rst

diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index dd1f62e731c9..7c03d5abd2ef 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -9,6 +9,7 @@ PCI Endpoint Framework
 
    pci-endpoint
    pci-endpoint-cfs
+   pci-endpoint-doe
    pci-test-function
    pci-test-howto
    pci-ntb-function
diff --git a/Documentation/PCI/endpoint/pci-endpoint-doe.rst b/Documentation/PCI/endpoint/pci-endpoint-doe.rst
new file mode 100644
index 000000000000..03b7a69516f3
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-endpoint-doe.rst
@@ -0,0 +1,318 @@
+.. SPDX-License-Identifier: GPL-2.0-only or MIT
+
+.. include:: <isonum.txt>
+
+=============================================
+Data Object Exchange (DOE) for PCIe Endpoint
+=============================================
+
+:Copyright: |copy| 2026 Texas Instruments Incorporated
+:Author: Aksh Garg <a-garg7@ti.com>
+:Co-Author: Siddharth Vadapalli <s-vadapalli@ti.com>
+
+Overview
+========
+
+DOE (Data Object Exchange) is a standard PCIe extended capability feature
+introduced in the Data Object Exchange (DOE) ECN for PCIe r5.0. It is an optional
+mechanism for system firmware/software running on root complex (host) to perform
+:ref:`data object <data-object-term>` exchanges with an endpoint function. Each
+data object is uniquely identified by the Vendor ID of the vendor publishing the
+data object definition and a Data Object Type value assigned by that vendor.
+
+Think of DOE as a sophisticated mailbox system built into PCIe. The root complex
+can send structured requests to the endpoint device through DOE mailboxes, and
+the endpoint device responds with appropriate data. DOE mailboxes are implemented
+as PCIe Extended Capabilities in endpoint devices, allowing multiple mailboxes
+per function, each potentially supporting different data object protocols.
+
+The DOE support for root complex devices has already been implemented in
+``drivers/pci/doe.c``.
+
+How DOE Works
+=============
+
+The DOE mailbox operates through a simple request-response model:
+
+1. **Host sends request**: The root complex writes a data object (vendor ID, type,
+   and payload) to the DOE write mailbox register (one DWORD at a time) of the
+   endpoint function's config space and sets the GO bit in the DOE Status register
+   to indicate that a request is ready for processing.
+2. **Endpoint processes**: The endpoint function reads the request from DOE write
+   mailbox register, sets the BUSY bit in the DOE Status register, identifies the
+   protocol of the data object, and executes the appropriate handler.
+3. **Endpoint responds**: The endpoint function writes the response data object to the
+   DOE read mailbox register (one DWORD at a time), and sets the READY bit in the DOE
+   Status register to indicate that the response is ready. If an error occurs during
+   request processing (such as unsupported protocol or handler failure), the endpoint
+   sets the ERROR bit in the DOE Status register instead of the READY bit.
+4. **Host reads response**: The root complex retrieves the response data from the DOE read
+   mailbox register once the READY bit is set in the DOE Status register, and then writes
+   any value to this register to indicate a successful read. If the ERROR bit was set,
+   the root complex discards the response and performs error handling as needed.
+
+Each mailbox operates independently and can handle one transaction at a time. The
+DOE specification supports data objects of size up to 256KB (2\ :sup:`18` dwords).
+
+For complete DOE capability details, refer to `PCI Express Base Specification Revision 7.0,
+Section 6.30 - Data Object Exchange (DOE)`.
+
+Key Terminologies
+=================
+
+.. _data-object-term:
+
+**Data Object**
+  A structured, vendor-defined, or standard-defined message exchanged between
+  root complex and endpoint function via DOE capability registers in configuration
+  space of the function.
+
+**Mailbox**
+  A DOE capability on the endpoint device, where each physical function can have
+  multiple mailboxes.
+
+**Protocol**
+  A specific type of DOE communication data object identified by a Vendor ID and Type.
+
+**Handler**
+  A function that processes DOE requests of a specific protocol and generates responses.
+
+Architecture of DOE Implementation for Endpoint
+===============================================
+
+.. code-block:: text
+
+       +------------------+
+       |                  |
+       |   Root Complex   |
+       |                  |
+       +--------^---------+
+                |
+                | Config space access
+                |   over PCIe link
+                |
+     +----------v-----------+
+     |                      |
+     |    PCIe Controller   |
+     |      as Endpoint     |
+     |                      |
+     |  +-----------------+ |
+     |  |   DOE Mailbox   | |
+     |  +-------^---------+ |
+     +----------|-----------+
+    +-----------|---------------------------------------------------------------+
+    |           |                                       +--------------------+  |
+    | +---------v--------+           Allocate           |  +--------------+  |  |
+    | |                  |-------------------------------->|   Request    |  |  |
+    | |   EP Controller  |                            +--->|    Buffer    |  |  |
+    | |      Driver      |             Free           | |  +--------------+  |  |
+    | |                  |--------------------------+ | |                    |  |
+    | +--------^---------+                          | | |                    |  |
+    |          |                                    | | |                    |  |
+    |          |                                    | | |                    |  |
+    |          | pci_ep_doe_process_request()       | | |                    |  |
+    |          |                                    | | |                    |  |
+    | +--------v---------+             Free         | | |                    |  |
+    | |                  |----------------------------+ |         DDR        |  |
+    | |    DOE EP Core   |<----+                    |   |                    |  |
+    | |    (doe-ep.c)    |     |     Discovery      |   |                    |  |
+    | |                  |-----+  Protocol Handler  |   |                    |  |
+    | +--------^---------+                          |   |                    |  |
+    |          |                                    |   |                    |  |
+    |          | protocol_handler()                 |   |                    |  |
+    |          |                                    |   |                    |  |
+    | +--------v---------+                          |   |                    |  |
+    | |                  |                          |   |  +--------------+  |  |
+    | | Protocol Handler |                          +----->|   Response   |  |  |
+    | |      Module      |-------------------------------->|    Buffer    |  |  |
+    | | (CMA/SPDM/Other) |           Allocate           |  +--------------+  |  |
+    | |                  |                              |                    |  |
+    | +------------------+                              |                    |  |
+    |                                                   +--------------------+  |
+    +---------------------------------------------------------------------------+
+
+Initialization and Cleanup
+--------------------------
+
+**Framework Initialization and DOE Setup**
+
+The EPC core provides the ``pci_epc_doe_setup(epc)`` API for centralized DOE
+mailbox discovery and registration. The controller driver calls this API during
+its probe sequence if DOE is supported.
+
+This API performs the following steps:
+
+1. Calls ``pci_ep_doe_init(epc)``, which initializes the xarray data structure
+   (a resizable array data structure defined in linux) named ``doe_mbs`` that
+   stores metadata of DOE mailboxes for the controller in ``struct pci_epc``.
+2. Discovers all DOE capabilities in the endpoint function's configuration space
+   for each function. For each discovered DOE capability, calls
+   ``pci_ep_doe_add_mailbox(epc, func_no, cap_offset)`` to register the mailbox.
+
+Each DOE mailbox structure created by ``pci_ep_doe_add_mailbox()`` gets an
+ordered workqueue allocated for processing DOE requests sequentially for that
+mailbox, enabling concurrent request handling across different mailboxes. Each
+mailbox is uniquely identified by the combination of physical function number
+and capability offset for that controller.
+
+**Cleanup**
+
+The EPC core provides the ``pci_epc_doe_destroy(epc)`` API for centralized DOE
+cleanup. The controller driver calls this API during its remove sequence
+if DOE is supported.
+
+This API calls ``pci_ep_doe_destroy(epc)``, which destroys all registered
+mailboxes, cancels any pending tasks, flushes and destroys the workqueues,
+and frees all memory allocated to the mailboxes.
+
+Protocol Handler Support
+------------------------
+
+Protocol implementations (such as CMA, SPDM, or vendor-specific protocols) are
+supported through a static array of protocol handlers.
+
+When a new DOE protocol library is introduced, its handler function is added to
+the static ``pci_doe_protocols`` array in ``drivers/pci/endpoint/pci-ep-doe.c``.
+The discovery protocol (VID = 0x0001 (PCI-SIG vendor ID), Type = 0x00 (discovery
+protocol)) is included in this static array and handled internally by the
+DOE EP core.
+
+Request Handling
+----------------
+
+The complete flow of a DOE request from the root complex to the response:
+
+**Step 1: Root Complex → EP Controller Driver**
+
+The root complex writes a DOE request (Vendor ID, Type, and Payload) to the
+DOE write mailbox register in the endpoint function's configuration space and sets
+the GO bit in the DOE Control register, indicating that the request is ready for
+processing.
+
+**Step 2: EP Controller Driver → DOE EP Core**
+
+The controller driver reads the request header to determine the data object
+length. Based on this length field, it allocates a request buffer in memory
+(DDR) of the appropriate size. The driver then reads the complete request
+payload from the DOE write mailbox register and converts the data from
+little-endian format (the format followed in the PCIe transactions over the
+link) to CPU-native format using ``le32_to_cpu()``. The driver defines a
+completion callback function with signature ``void (*complete)(u8 func_no,
+u16 cap_offset, int status, u16 vendor, u8 type, void *response_pl,
+size_t response_pl_sz)`` to be invoked when the request processing completes.
+The driver then calls ``pci_ep_doe_process_request(epc, func_no, cap_offset,
+vendor, type, request, request_sz, complete)`` to hand off the request to the
+DOE EP core. This function returns immediately after queuing the work
+(without blocking), and the driver sets the BUSY bit in the DOE Status register.
+
+**Step 3: DOE EP Core Processing**
+
+The DOE EP core creates a task structure and submits it to the mailbox's ordered
+workqueue. This ensures that requests for each mailbox are processed
+sequentially, one at a time, as required by the DOE specification. It looks up
+the protocol handler based on the Vendor ID and Type from the request header,
+and executes the handler function.
+
+**Step 4: Protocol Handler Execution**
+
+The workqueue executes the task by calling the registered protocol handler:
+``handler(request, request_sz, &response, &response_sz)``. The handler processes
+the request, allocates a response buffer in memory (DDR), builds the response
+data, and returns the response pointer and size. For the discovery protocol,
+the DOE EP core handles this directly without invoking an external handler.
+
+**Step 5: DOE EP Core → EP Controller Driver**
+
+After the protocol handler completes, the DOE EP core frees the request buffer,
+and invokes the completion callback provided by the controller driver asynchronously.
+The callback receives the function number, capability offset (to identify the mailbox),
+status code indicating the result of request processing, vendor ID and type of the data
+object, the response buffer, and its size.
+
+**Step 6: EP Controller Driver → Root Complex**
+
+The controller driver converts the response from CPU-native format to
+little-endian format using ``cpu_to_le32()``, writes the response to DOE read
+mailbox register, and sets the READY bit in the DOE Status register. The root
+complex then reads the response from the read mailbox register. Finally, the controller
+driver frees the response buffer (which the handler allocated).
+
+Asynchronous Request Processing
+-------------------------------
+
+The DOE-EP framework implements asynchronous request processing because an
+endpoint function can have multiple instances of DOE mailboxes, and requests may
+be interleaved across these mailboxes. Request processing of one mailbox should
+not result in blocking request processing of other mailboxes. Hence, requests
+on each mailbox need to be handled in parallel for optimization.
+
+For the EP controller driver to handle requests on multiple mailboxes in
+parallel, ``pci_ep_doe_process_request()`` must be asynchronous. The function
+returns immediately after submitting the request to the mailbox's workqueue,
+without waiting for the request to complete. A completion callback provided by
+the controller driver is invoked asynchronously when request processing
+finishes. This asynchronous design enables concurrent processing of requests
+across different mailboxes.
+
+Abort Handling
+--------------
+
+The DOE specification allows the root complex to abort ongoing DOE operations
+by setting the ABORT bit in the DOE Control register.
+
+**Trigger**
+
+When the root complex sets the ABORT bit, the EP controller driver detects this
+condition (typically in an interrupt handler or register polling routine). The
+action taken depends on the timing of the abort:
+
+- **ABORT during request transfer**: If the ABORT bit is set while the root complex
+  is still transferring the request to the mailbox registers, the controller driver
+  discards the request and no call to ``pci_ep_doe_abort()`` is needed.
+
+- **ABORT after request submission**: If the ABORT bit is set after the request
+  has been fully received and submitted to the DOE EP core via
+  ``pci_ep_doe_process_request()``, the controller driver must call
+  ``pci_ep_doe_abort(epc, func_no, cap_offset)`` for the affected mailbox to
+  perform abort sequence in the DOE EP core.
+
+**Abort Sequence**
+
+The abort function performs the following actions:
+
+1. Sets the CANCEL flag on the mailbox to prevent queued requests from starting
+2. Flushes the workqueue to wait for any currently executing handler to complete
+   (handlers cannot be interrupted mid-execution)
+3. Clears the CANCEL flag to allow the mailbox to accept new requests
+
+Queued requests that have not started execution will be aborted with an error
+status. The currently executing request will complete normally, and the controller
+will reject the response if it arrives after the abort sequence has been triggered.
+
+.. note::
+   Independent of when the ABORT bit is triggered, the controller driver must
+   clear the ERROR, BUSY, and READY bits in the DOE Status register after
+   completing the abort operation to reset the mailbox to an idle state.
+
+Error Handling
+--------------
+
+Errors can occur during DOE request processing for various reasons, such as
+unsupported protocols, handler failures, or memory allocation failures.
+
+**Error Detection**
+
+When an error occurs during DOE request processing, the DOE EP core propagates this error
+back to the controller driver either through the ``pci_ep_doe_process_request()`` return value,
+or the status code passed to the completion callback.
+
+**Error Response**
+
+When the controller driver receives an error code, it sets the ERROR bit in the DOE Status
+register instead of writing a response to the read mailbox register, and frees the buffers.
+
+API Reference
+=============
+
+.. kernel-doc:: drivers/pci/endpoint/pci-ep-doe.c
+   :export:
-- 
2.34.1


^ permalink raw reply related

* [PATCH v2 3/4] PCI: endpoint: Add API for DOE initialization and setup in EPC core
From: Aksh Garg @ 2026-04-01  7:30 UTC (permalink / raw)
  To: linux-pci, linux-doc, mani, kwilczynski, bhelgaas, corbet, kishon,
	skhan, lukas, cassel, alistair
  Cc: linux-arm-kernel, linux-kernel, s-vadapalli, danishanwar, srk,
	a-garg7
In-Reply-To: <20260401073022.215805-1-a-garg7@ti.com>

Add pci_epc_setup_doe() API in EPC core driver to initialize and setup
the DOE framework for an endpoint controller. The API discovers the DOE
capabilities (extended capability ID 0x2E), and registers each discovered
DOE mailbox for all the functions in the endpoint controller. This API
should be invoked by the controller driver during probe based on the
doe_capable feature.

Add pci_epc_destroy_doe() API in EPC core driver for cleanup of DOE
resources, which should be invoked by the controller driver during
controller cleanup based on the doe_capable feature.

Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Aksh Garg <a-garg7@ti.com>
---

Changes since v1:
- New patch added to v2 (not in v1)

This patch is introduced based on the feedback provided by Manivannan
Sadhasivam at [1].

[1]: https://lore.kernel.org/all/p57x6jleaim5w7t2k3v7tioujnaxuovfpj5euop5ogefvw23se@y5fw3che5p5d/

 drivers/pci/endpoint/pci-epc-core.c | 71 +++++++++++++++++++++++++++++
 include/linux/pci-epc.h             | 21 +++++++++
 2 files changed, 92 insertions(+)

diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
index e546b3dbb240..14f77dd0877b 100644
--- a/drivers/pci/endpoint/pci-epc-core.c
+++ b/drivers/pci/endpoint/pci-epc-core.c
@@ -14,6 +14,8 @@
 #include <linux/pci-epf.h>
 #include <linux/pci-ep-cfs.h>
 
+#include "../pci.h"
+
 static const struct class pci_epc_class = {
 	.name = "pci_epc",
 };
@@ -547,6 +549,75 @@ void pci_epc_mem_unmap(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
 }
 EXPORT_SYMBOL_GPL(pci_epc_mem_unmap);
 
+/**
+ * pci_epc_doe_setup() - Setup and discover DOE mailboxes for all functions
+ * @epc: the EPC device on which DOE mailboxes has to be setup
+ *
+ * Discover DOE (Data Object Exchange) capabilities for all physical functions
+ * in the endpoint controller and register DOE mailboxes.
+ *
+ * This API should be called by the controller driver during initialization
+ * if DOE support is available (indicated by doe_capable in pci_epc_features).
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+int pci_epc_doe_setup(struct pci_epc *epc)
+{
+	u16 cap_offset = 0;
+	u8 func_no;
+	int ret;
+
+	if (!epc || !epc->ops || !epc->ops->find_ext_capability)
+		return -EINVAL;
+
+	/* Initialize DOE framework for this controller */
+	ret = pci_ep_doe_init(epc);
+	if (ret)
+		return ret;
+
+	/* Discover DOE capabilities for all functions */
+	for (func_no = 0; func_no < epc->max_functions; func_no++) {
+		while ((cap_offset = epc->ops->find_ext_capability(epc, func_no, 0,
+								   cap_offset,
+								   PCI_EXT_CAP_ID_DOE))) {
+			/* Register this DOE mailbox */
+			ret = pci_ep_doe_add_mailbox(epc, func_no, cap_offset);
+			if (ret) {
+				dev_err(&epc->dev,
+					"[pf%d:offset %x] failed to add DOE mailbox\n",
+					func_no, cap_offset);
+			}
+		}
+	}
+
+	dev_dbg(&epc->dev, "DOE mailboxes setup complete\n");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_epc_doe_setup);
+
+/**
+ * pci_epc_doe_destroy() - Destroy and cleanup DOE mailboxes
+ * @epc: the EPC device on which DOE mailboxes has to be destroyed
+ *
+ * Destroy all DOE mailboxes registered on this endpoint controller and
+ * free associated resources.
+ *
+ * This API should be called by the controller driver during controller cleanup
+ * if DOE support is available (indicated by doe_capable in pci_epc_features).
+ *
+ * RETURNS: 0 on success, -errno on failure
+ */
+int pci_epc_doe_destroy(struct pci_epc *epc)
+{
+	if (!epc)
+		return -EINVAL;
+
+	pci_ep_doe_destroy(epc);
+	dev_dbg(&epc->dev, "DOE mailboxes destroyed\n");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_epc_doe_destroy);
+
 /**
  * pci_epc_clear_bar() - reset the BAR
  * @epc: the EPC device for which the BAR has to be cleared
diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
index cfe74585be4c..07fa41a8e466 100644
--- a/include/linux/pci-epc.h
+++ b/include/linux/pci-epc.h
@@ -84,6 +84,8 @@ struct pci_epc_map {
  * @start: ops to start the PCI link
  * @stop: ops to stop the PCI link
  * @get_features: ops to get the features supported by the EPC
+ * @find_ext_capability: ops to find extended capability offset for a function
+ *			 in endpoint controller
  * @owner: the module owner containing the ops
  */
 struct pci_epc_ops {
@@ -115,6 +117,8 @@ struct pci_epc_ops {
 	void	(*stop)(struct pci_epc *epc);
 	const struct pci_epc_features* (*get_features)(struct pci_epc *epc,
 						       u8 func_no, u8 vfunc_no);
+	u16	(*find_ext_capability)(struct pci_epc *epc, u8 func_no,
+				       u8 vfunc_no, u16 start, u8 cap);
 	struct module *owner;
 };
 
@@ -236,6 +240,7 @@ struct pci_epc_bar_desc {
  * @msi_capable: indicate if the endpoint function has MSI capability
  * @msix_capable: indicate if the endpoint function has MSI-X capability
  * @intx_capable: indicate if the endpoint can raise INTx interrupts
+ * @doe_capable: indicate if the endpoint function has DOE capability
  * @bar: array specifying the hardware description for each BAR
  * @align: alignment size required for BAR buffer allocation
  */
@@ -246,6 +251,7 @@ struct pci_epc_features {
 	unsigned int	msi_capable : 1;
 	unsigned int	msix_capable : 1;
 	unsigned int	intx_capable : 1;
+	unsigned int	doe_capable : 1;
 	struct	pci_epc_bar_desc bar[PCI_STD_NUM_BARS];
 	size_t	align;
 };
@@ -334,6 +340,21 @@ int pci_epc_mem_map(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
 void pci_epc_mem_unmap(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
 		       struct pci_epc_map *map);
 
+#ifdef CONFIG_PCI_ENDPOINT_DOE
+int pci_epc_doe_setup(struct pci_epc *epc);
+int pci_epc_doe_destroy(struct pci_epc *epc);
+#else
+static inline int pci_epc_doe_setup(struct pci_epc *epc)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int pci_epc_doe_destroy(struct pci_epc *epc)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 #else
 static inline void pci_epc_init_notify(struct pci_epc *epc)
 {
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next v03 1/6] hinic3: Add ethtool queue ops
From: Mohsin Bashir @ 2026-04-01  7:52 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <17b3bbb0c917ee09ddd7a164cdefca6eb63793ec.1774940117.git.zhuyikai1@h-partners.com>


> +static int hinic3_set_ringparam(struct net_device *netdev,
> +				struct ethtool_ringparam *ring,
> +				struct kernel_ethtool_ringparam *kernel_ring,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_dyna_txrxq_params q_params = {};
> +	u32 new_sq_depth, new_rq_depth;
> +	int err;
> +
> +	err = hinic3_check_ringparam_valid(netdev, ring);
> +	if (err)
> +		return err;
> +
> +	new_sq_depth = 1U << ilog2(ring->tx_pending);
> +	new_rq_depth = 1U << ilog2(ring->rx_pending);
> +	if (new_sq_depth == nic_dev->q_params.sq_depth &&
> +	    new_rq_depth == nic_dev->q_params.rq_depth)
> +		return 0;
> +

So non power of 2 values would trimmed to the nearest power of 2 even 
when the driver support these values. If such values are not supported, 
should these be rejected here? or if we want to force the nearest power 
of 2, should we use NL_SET_ERR_MSG to inform the user about rounding?

> +int
> +hinic3_change_channel_settings(struct net_device *netdev,
> +			       struct hinic3_dyna_txrxq_params *trxq_params)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_dyna_qp_params new_qp_params = {};
> +	struct hinic3_dyna_qp_params cur_qp_params = {};
> +	bool need_teardown = false;
> +	unsigned long flags;
> +	int err;
> +
> +	mutex_lock(&nic_dev->channel_cfg_lock);
> +
> +	hinic3_config_num_qps(netdev, trxq_params);
> +
> +	err = hinic3_alloc_channel_resources(netdev, &new_qp_params,
> +					     trxq_params);
> +	if (err) {
> +		netdev_err(netdev, "Failed to alloc channel resources\n");
> +		mutex_unlock(&nic_dev->channel_cfg_lock);
> +		return err;
> +	}
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
> +		need_teardown = true;
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	if (need_teardown) {
> +		hinic3_vport_down(netdev);
> +		hinic3_close_channel(netdev);
> +		hinic3_uninit_qps(nic_dev, &cur_qp_params);
> +		hinic3_free_channel_resources(netdev, &cur_qp_params,
> +					      &nic_dev->q_params);
> +	}
> +
> +	if (nic_dev->num_qp_irq > trxq_params->num_qps)
> +		hinic3_qp_irq_change(netdev, trxq_params->num_qps);
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	nic_dev->q_params = *trxq_params;

[1] Here we are overwriting the old parameters
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	hinic3_init_qps(nic_dev, &new_qp_params);
> +
> +	err = hinic3_open_channel(netdev);
> +	if (err)
> +		goto err_uninit_qps;
> +
> +	err = hinic3_vport_up(netdev);
> +	if (err)
> +		goto err_close_channel;
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	clear_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags);
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	mutex_unlock(&nic_dev->channel_cfg_lock);
> +
> +	return 0;
> +
> +err_close_channel:
> +	hinic3_close_channel(netdev);
> +err_uninit_qps:
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	memset(&nic_dev->q_params, 0, sizeof(nic_dev->q_params));

Looks like we are asking for trouble here. The old parameters were 
already overwritten at [1] and now we are destroying the new ones?

> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	hinic3_uninit_qps(nic_dev, &new_qp_params);
> +	hinic3_free_channel_resources(netdev, &new_qp_params, trxq_params);
> +
> +	mutex_unlock(&nic_dev->channel_cfg_lock);
> +
> +	return err;
> +}
> +

^ permalink raw reply

* Re: [PATCH net-next v03 2/6] hinic3: Add ethtool statistic ops
From: Mohsin Bashir @ 2026-04-01  7:52 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <d3256c3049445bfa9c6b98832d25eaa712422ae6.1774940117.git.zhuyikai1@h-partners.com>


> +
> +static void hinic3_get_qp_stats_strings(const struct net_device *netdev,

Any strong reason to add const here? netdev_priv() would just strip it 
anyway. No?

> +					char *p)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	u8 *data = p;
> +	u16 i, j;
> +
> +	for (i = 0; i < nic_dev->q_params.num_qps; i++) {
> +		for (j = 0; j < ARRAY_SIZE(hinic3_tx_queue_stats); j++)
> +			ethtool_sprintf(&data,
> +					hinic3_tx_queue_stats[j].name, i);
> +	}
> +
> +	for (i = 0; i < nic_dev->q_params.num_qps; i++) {
> +		for (j = 0; j < ARRAY_SIZE(hinic3_rx_queue_stats); j++)
> +			ethtool_sprintf(&data,
> +					hinic3_rx_queue_stats[j].name, i);
> +	}
> +}
> +


^ permalink raw reply

* Re: [PATCH net-next v03 3/6] hinic3: Add ethtool coalesce ops
From: Mohsin Bashir @ 2026-04-01  7:53 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <ffc357656b2412abf6e8de4200d289e761f3e6ea.1774940117.git.zhuyikai1@h-partners.com>


> +static int is_coalesce_exceed_limit(struct net_device *netdev,
> +				    const struct ethtool_coalesce *coal)
> +{
> +	const struct {
> +		const char *name;
> +		u32 value;
> +		u32 limit;
> +	} coalesce_limits[] = {
> +		{"rx_coalesce_usecs",
> +		 coal->rx_coalesce_usecs,
> +		 COALESCE_MAX_TIMER_CFG},
> +		{"rx_max_coalesced_frames",
> +		 coal->rx_max_coalesced_frames,
> +		 COALESCE_MAX_PENDING_LIMIT},
> +		{"rx_max_coalesced_frames_low",
> +		 coal->rx_max_coalesced_frames_low,
> +		 COALESCE_MAX_PENDING_LIMIT},
> +		{"rx_max_coalesced_frames_high",
> +		 coal->rx_max_coalesced_frames_high,
> +		 COALESCE_MAX_PENDING_LIMIT},
> +	};
> +
> +	for (int i = 0; i < ARRAY_SIZE(coalesce_limits); i++) {
> +		if (coalesce_limits[i].value > coalesce_limits[i].limit) {
> +			netdev_err(netdev, "%s out of range %d-%d\n",
> +				   coalesce_limits[i].name, 0,
> +				   coalesce_limits[i].limit);
> +			return -EOPNOTSUPP;

Since we are failing a range check, maybe -ERANGE or -EINVAL would be 
more appropriate here.

> +		}
> +	}
> +	return 0;
> +}
> +
> +static int is_coalesce_legal(struct net_device *netdev,
> +			     const struct ethtool_coalesce *coal)
> +{
> +	int err;
> +
> +	err = is_coalesce_exceed_limit(netdev, coal);
> +	if (err)
> +		return err;
> +
> +	if (coal->rx_max_coalesced_frames_low >=
> +	    coal->rx_max_coalesced_frames_high &&
> +	    coal->rx_max_coalesced_frames_high > 0) {

So this would allow non-zero low and zero high. For example, low = 10, 
high = 0. Is this expected?

> +		netdev_err(netdev, "invalid coalesce frame high %u, low %u, unit %d\n",
> +			   coal->rx_max_coalesced_frames_high,
> +			   coal->rx_max_coalesced_frames_low,
> +			   COALESCE_PENDING_LIMIT_UNIT);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
> +static void check_coalesce_align(struct net_device *netdev,
> +				 u32 item, u32 unit, const char *str)
> +{
> +	if (item % unit)
> +		netdev_warn(netdev, "%s in %d units, change to %u\n",
> +			    str, unit, item - item % unit);
> +}
> +
> +#define CHECK_COALESCE_ALIGN(member, unit) \
> +	check_coalesce_align(netdev, member, unit, #member)
> +
> +static void check_coalesce_changed(struct net_device *netdev,
> +				   u32 item, u32 unit, u32 ori_val,
> +				   const char *obj_str, const char *str)
> +{
> +	if ((item / unit) != ori_val)
> +		netdev_dbg(netdev, "Change %s from %d to %u %s\n",
> +			   str, ori_val * unit, item - item % unit, obj_str);
> +}
> +
> +#define CHECK_COALESCE_CHANGED(member, unit, ori_val, obj_str) \
> +	check_coalesce_changed(netdev, member, unit, ori_val, obj_str, #member)
> +
> +static int hinic3_set_hw_coal_param(struct net_device *netdev,
> +				    struct hinic3_intr_coal_info *intr_coal)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	int err;
> +	u16 i;
> +
> +	for (i = 0; i < nic_dev->max_qps; i++) {
> +		err = hinic3_set_queue_coalesce(netdev, i, intr_coal);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int hinic3_get_coalesce(struct net_device *netdev,
> +			       struct ethtool_coalesce *coal,
> +			       struct kernel_ethtool_coalesce *kernel_coal,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_intr_coal_info *interrupt_info;
> +
> +	interrupt_info = &nic_dev->intr_coalesce[0];
> +
> +	/* TX/RX uses the same interrupt.
> +	 * So we only declare RX ethtool_coalesce parameters.
> +	 */
> +	coal->rx_coalesce_usecs = interrupt_info->coalesce_timer_cfg *
> +				  COALESCE_TIMER_CFG_UNIT;
> +	coal->rx_max_coalesced_frames = interrupt_info->pending_limit *
> +					COALESCE_PENDING_LIMIT_UNIT;
> +
> +	coal->use_adaptive_rx_coalesce = nic_dev->adaptive_rx_coal;
> +
> +	coal->rx_max_coalesced_frames_high =
> +		interrupt_info->rx_pending_limit_high *
> +		COALESCE_PENDING_LIMIT_UNIT;
> +
> +	coal->rx_max_coalesced_frames_low =
> +		interrupt_info->rx_pending_limit_low *
> +		COALESCE_PENDING_LIMIT_UNIT;
> +
> +	return 0;
> +}
> +
> +static int hinic3_set_coalesce(struct net_device *netdev,
> +			       struct ethtool_coalesce *coal,
> +			       struct kernel_ethtool_coalesce *kernel_coal,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_intr_coal_info *ori_intr_coal;
> +	struct hinic3_intr_coal_info intr_coal = {};
> +	char obj_str[32];
> +	int err;
> +
> +	err = is_coalesce_legal(netdev, coal);
> +	if (err)
> +		return err;
> +
> +	CHECK_COALESCE_ALIGN(coal->rx_coalesce_usecs, COALESCE_TIMER_CFG_UNIT);
> +	CHECK_COALESCE_ALIGN(coal->rx_max_coalesced_frames,
> +			     COALESCE_PENDING_LIMIT_UNIT);
> +	CHECK_COALESCE_ALIGN(coal->rx_max_coalesced_frames_high,
> +			     COALESCE_PENDING_LIMIT_UNIT);
> +	CHECK_COALESCE_ALIGN(coal->rx_max_coalesced_frames_low,
> +			     COALESCE_PENDING_LIMIT_UNIT);
> +
> +	ori_intr_coal = &nic_dev->intr_coalesce[0];
> +	snprintf(obj_str, sizeof(obj_str), "for netdev");
> +
> +	CHECK_COALESCE_CHANGED(coal->rx_coalesce_usecs, COALESCE_TIMER_CFG_UNIT,
> +			       ori_intr_coal->coalesce_timer_cfg, obj_str);
> +	CHECK_COALESCE_CHANGED(coal->rx_max_coalesced_frames,
> +			       COALESCE_PENDING_LIMIT_UNIT,
> +			       ori_intr_coal->pending_limit, obj_str);
> +	CHECK_COALESCE_CHANGED(coal->rx_max_coalesced_frames_high,
> +			       COALESCE_PENDING_LIMIT_UNIT,
> +			       ori_intr_coal->rx_pending_limit_high, obj_str);
> +	CHECK_COALESCE_CHANGED(coal->rx_max_coalesced_frames_low,
> +			       COALESCE_PENDING_LIMIT_UNIT,
> +			       ori_intr_coal->rx_pending_limit_low, obj_str);
> +
> +	intr_coal.coalesce_timer_cfg =
> +		(u8)(coal->rx_coalesce_usecs / COALESCE_TIMER_CFG_UNIT);
> +	intr_coal.pending_limit = (u8)(coal->rx_max_coalesced_frames /
> +				      COALESCE_PENDING_LIMIT_UNIT);
> +
> +	nic_dev->adaptive_rx_coal = coal->use_adaptive_rx_coalesce;
> +
> +	intr_coal.rx_pending_limit_high =
> +		(u8)(coal->rx_max_coalesced_frames_high /
> +		     COALESCE_PENDING_LIMIT_UNIT);
> +
> +	intr_coal.rx_pending_limit_low =
> +		(u8)(coal->rx_max_coalesced_frames_low /
> +		     COALESCE_PENDING_LIMIT_UNIT);
> +
> +	/* coalesce timer or pending set to zero will disable coalesce */
> +	if (!nic_dev->adaptive_rx_coal &&
> +	    (!intr_coal.coalesce_timer_cfg || !intr_coal.pending_limit))
> +		netdev_warn(netdev, "Coalesce will be disabled\n");
> +
> +	return hinic3_set_hw_coal_param(netdev, &intr_coal);
> +}
> +
>   static const struct ethtool_ops hinic3_ethtool_ops = {
> -	.supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
> -					  ETHTOOL_COALESCE_PKT_RATE_RX_USECS,
> +	.supported_coalesce_params      = ETHTOOL_COALESCE_RX_USECS |
> +					  ETHTOOL_COALESCE_RX_MAX_FRAMES |
> +					  ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
> +					  ETHTOOL_COALESCE_RX_MAX_FRAMES_LOW |
> +					  ETHTOOL_COALESCE_RX_MAX_FRAMES_HIGH,

Looks like ETHTOOL_COALESCE_TX_USECS support got dropped. is it intentional?

>   	.get_link_ksettings             = hinic3_get_link_ksettings,
>   	.get_drvinfo                    = hinic3_get_drvinfo,
>   	.get_msglevel                   = hinic3_get_msglevel,
> @@ -1004,6 +1231,8 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
>   	.get_eth_ctrl_stats             = hinic3_get_eth_ctrl_stats,
>   	.get_rmon_stats                 = hinic3_get_rmon_stats,
>   	.get_pause_stats                = hinic3_get_pause_stats,
> +	.get_coalesce                   = hinic3_get_coalesce,
> +	.set_coalesce                   = hinic3_set_coalesce,
>   };
>   
>   void hinic3_set_ethtool_ops(struct net_device *netdev)


^ permalink raw reply

* Re: [PATCH net-next v03 4/6] hinic3: Add ethtool rss ops
From: Mohsin Bashir @ 2026-04-01  7:53 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <a8347921a7ac11ca7e0db52381be70689b830005.1774940117.git.zhuyikai1@h-partners.com>

>   /* hilink mac group command */
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c b/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c
> index 25db74d8c7dd..1c8aea9d8887 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c
> @@ -155,7 +155,7 @@ static int hinic3_set_rss_type(struct hinic3_hwdev *hwdev,
>   				       L2NIC_CMD_SET_RSS_CTX_TBL, &msg_params);
>   
>   	if (ctx_tbl.msg_head.status == MGMT_STATUS_CMD_UNSUPPORTED) {
> -		return MGMT_STATUS_CMD_UNSUPPORTED;
> +		return -EOPNOTSUPP;

Looks like an unrelated change?

>   	} else if (err || ctx_tbl.msg_head.status) {
>   		dev_err(hwdev->dev, "mgmt Failed to set rss context offload, err: %d, status: 0x%x\n",
>   			err, ctx_tbl.msg_head.status);
> @@ -165,6 +165,39 @@ static int hinic3_set_rss_type(struct hinic3_hwdev *hwdev,
>   	return 0;
>   }
>   



> +static int hinic3_set_rss_hash_opts(struct net_device *netdev,
> +				    struct ethtool_rxnfc *cmd)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_rss_type *rss_type;
> +	int err;
> +
> +	rss_type = &nic_dev->rss_type;
> +
> +	if (!test_bit(HINIC3_RSS_ENABLE, &nic_dev->flags)) {
> +		cmd->data = 0;
> +		netdev_err(netdev, "RSS is disable, not support to set flow-hash\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	/* RSS only supports hashing of IP addresses and L4 ports */
> +	if (cmd->data & ~(RXH_IP_SRC | RXH_IP_DST |
> +			  RXH_L4_B_0_1 | RXH_L4_B_2_3))
> +		return -EINVAL;
> +
> +	/* Both IP addresses must be part of the hash tuple */
> +	if (!(cmd->data & RXH_IP_SRC) || !(cmd->data & RXH_IP_DST))
> +		return -EINVAL;
> +
> +	err = hinic3_get_rss_type(nic_dev->hwdev, rss_type);
> +	if (err) {
> +		netdev_err(netdev, "Failed to get rss type\n");
> +		return err;
> +	}
> +
> +	err = hinic3_update_rss_hash_opts(netdev, cmd, rss_type);
> +	if (err)
> +		return err;
> +
> +	err = hinic3_set_rss_type(nic_dev->hwdev, *rss_type);

So if we fail here, we have already modified the rss_type in-place. From 
this on-wards, the HW state would diverge from in-memory state. How 
about use a local copy and only update if no error?

> +	if (err) {
> +		netdev_err(netdev, "Failed to set rss type\n");
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +

^ permalink raw reply

* Re: [PATCH v2 1/2] drm: Rename drm_atomic_state
From: Maxime Ripard @ 2026-04-01  8:06 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Jonathan Corbet,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Alex Deucher, Christian König, Rob Clark, Dmitry Baryshkov,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Dave Stevenson,
	Laurent Pinchart, dri-devel, linux-doc, Simona Vetter
In-Reply-To: <316d8ab8-78d0-4169-9264-e4da5424b5d6@suse.de>

[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]

Hi Thomas,

On Wed, Apr 01, 2026 at 08:05:12AM +0200, Thomas Zimmermann wrote:
> Am 31.03.26 um 16:41 schrieb Maxime Ripard:
> > The KMS framework uses two slightly different definitions for the state
> > concept. For a given object (plane, CRTC, encoder, etc., so
> > drm_$OBJECT_state), the state is the entire state of that object.
> > However, at the device level, drm_atomic_state refers to a state update
> > for a limited number of objects.
> > 
> > Thus, drm_atomic_state isn't the entire device state, but only the full
> > state of some objects in that device. This has been an endless source of
> > confusion and thus bugs.
> > 
> > We can rename drm_atomic_state to drm_atomic_commit to make it less
> > confusing.
> 
> Nit: The subject should already spell out the new name.
> 
> > 
> > This patch was created using:
> > 
> > rg -l drm_atomic_state | \
> > 	xargs sed -i 's/drm_atomic_state/drm_atomic_commit/g; s/drm_atomic_commit_helper/drm_atomic_state_helper/g'
> > mv drivers/gpu/drm/tests/drm_atomic_state_test.c drivers/gpu/drm/tests/drm_atomic_commit_test.c
> 
> We now have many places that read like "struct drm_atomic_commit *state',
> which mixes up terminology. Is there a way of transforming this
> automatically to use 'commit' for the variable's name?

I know what you're saying, but it would be much more intrusive and I'm
not sure I feel comfortable doing it in one go. I had a try this morning
to come up with a coccinelle script and it looks like it chokes up a bit
on it too.

I'm sure we could blame my coccinelle skills, but how about we do this
driver by driver later on? I can do it if you want me to, and we don't
really need to have that one big commit, it can be split into smaller
units that would be easier to test and merge.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 273 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/2] drm: Rename drm_atomic_state
From: Thomas Zimmermann @ 2026-04-01  8:12 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Jonathan Corbet,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Alex Deucher, Christian König, Rob Clark, Dmitry Baryshkov,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Dave Stevenson,
	Laurent Pinchart, dri-devel, linux-doc, Simona Vetter
In-Reply-To: <20260401-imperial-tan-emu-0a1ab4@houat>

Hi

Am 01.04.26 um 10:06 schrieb Maxime Ripard:
> Hi Thomas,
>
> On Wed, Apr 01, 2026 at 08:05:12AM +0200, Thomas Zimmermann wrote:
>> Am 31.03.26 um 16:41 schrieb Maxime Ripard:
>>> The KMS framework uses two slightly different definitions for the state
>>> concept. For a given object (plane, CRTC, encoder, etc., so
>>> drm_$OBJECT_state), the state is the entire state of that object.
>>> However, at the device level, drm_atomic_state refers to a state update
>>> for a limited number of objects.
>>>
>>> Thus, drm_atomic_state isn't the entire device state, but only the full
>>> state of some objects in that device. This has been an endless source of
>>> confusion and thus bugs.
>>>
>>> We can rename drm_atomic_state to drm_atomic_commit to make it less
>>> confusing.
>> Nit: The subject should already spell out the new name.
>>
>>> This patch was created using:
>>>
>>> rg -l drm_atomic_state | \
>>> 	xargs sed -i 's/drm_atomic_state/drm_atomic_commit/g; s/drm_atomic_commit_helper/drm_atomic_state_helper/g'
>>> mv drivers/gpu/drm/tests/drm_atomic_state_test.c drivers/gpu/drm/tests/drm_atomic_commit_test.c
>> We now have many places that read like "struct drm_atomic_commit *state',
>> which mixes up terminology. Is there a way of transforming this
>> automatically to use 'commit' for the variable's name?
> I know what you're saying, but it would be much more intrusive and I'm
> not sure I feel comfortable doing it in one go. I had a try this morning
> to come up with a coccinelle script and it looks like it chokes up a bit
> on it too.
>
> I'm sure we could blame my coccinelle skills, but how about we do this
> driver by driver later on? I can do it if you want me to, and we don't
> really need to have that one big commit, it can be split into smaller
> units that would be easier to test and merge.

No problem, there's no hurry. It was just a question.

Best regards
Thomas

>
> Maxime

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply

* Re: [PATCH v8 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta @ 2026-04-01  8:12 UTC (permalink / raw)
  To: David Laight
  Cc: Borislav Petkov, x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, David Kaplan, Sean Christopherson, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, David Ahern,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Paolo Bonzini,
	Jonathan Corbet, linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf,
	netdev, linux-doc
In-Reply-To: <20260328100837.7e6dc7fe@pumpkin>

On Sat, Mar 28, 2026 at 10:08:37AM +0000, David Laight wrote:
> On Fri, 27 Mar 2026 17:42:56 -0700
> Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> 
> > On Thu, Mar 26, 2026 at 01:29:31PM -0700, Pawan Gupta wrote:
> > > On Thu, Mar 26, 2026 at 10:45:57AM +0000, David Laight wrote:  
> > > > On Thu, 26 Mar 2026 11:01:20 +0100
> > > > Borislav Petkov <bp@alien8.de> wrote:
> > > >   
> > > > > On Thu, Mar 26, 2026 at 01:39:34AM -0700, Pawan Gupta wrote:  
> > > > > > I believe the equivalent for cpu_feature_enabled() in asm is the
> > > > > > ALTERNATIVE. Please let me know if I am missing something.    
> > > > > 
> > > > > Yes, you are.
> > > > > 
> > > > > The point is that you don't want to stick those alternative calls inside some
> > > > > magic bhb_loop function but hand them in from the outside, as function
> > > > > arguments.
> > > > > 
> > > > > Basically what I did.
> > > > > 
> > > > > Then you were worried about this being C code and it had to be noinstr... So
> > > > > that outer function can be rewritten in asm, I think, and still keep it well
> > > > > separate.
> > > > > 
> > > > > I'll try to rewrite it once I get a free minute, and see how it looks.
> > > > >   
> > > > 
> > > > I think someone tried getting C code to write the values to global data
> > > > and getting the asm to read them.
> > > > That got discounted because it spilt things between two largely unrelated files.  
> > > 
> > > 
> > > The implementation with global variables wasn't that bad, let me revive it.
> > > 
> > > This part which ties sequence to BHI mitigation, which is not ideal,
> > > (because VMSCAPE also uses it) it does seems a cleaner option.
> > > 
> > > --- a/arch/x86/kernel/cpu/bugs.c
> > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > @@ -2095,6 +2095,11 @@ static void __init bhi_select_mitigation(void)
> > > 
> > >  static void __init bhi_update_mitigation(void)
> > >  {
> > > +   if (!cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > +       bhi_seq_outer_loop = 5;
> > > +       bhi_seq_inner_loop = 5;
> > > +   }
> > > +
> > > 
> > > I believe this can be moved to somewhere common to all mitigations.
> > >   
> > > > I think the BPF code would need significant refactoring to call a C function.  
> > > 
> > > Ya, true. Will use globals and keep clear_bhb_loop() in asm.  
> > 
> > While testing this approach, I noticed that syscalls were suffering an 8%
> > regression on ICX for Native BHI mitigation:
> > 
> >   $ perf bench syscall basic -l 100000000
> > 
> > Bisection pointed to the change for using 8-bit registers (al/ah replacing
> > eax/ecx) as the main contributor to the regression. (Global variables added
> > a bit, but within noise).
> > 
> > Further digging revealed a strange behavior, using %ah for the inner loop
> > was causing the regression, interchanging %al and %ah in the loops
> > (for movb and sub) eliminated the regression.
> > 
> > <clear_bhb_loop_nofence>:
> > 
> > 	movb	bhb_seq_outer_loop(%rip), %al
> > 
> > 	call	1f
> > 	jmp	5f
> > 1:	call	2f
> > .Lret1:	RET
> > 2:	movb	bhb_seq_inner_loop(%rip), %ah
> > 3:	jmp	4f
> > 	nop
> > 4:	sub	$1, %ah <---- No regression with %al here
> > 	jnz	3b
> > 	sub	$1, %al
> > 	jnz	1b
> > 
> > My guess is, "sub $1, %al" is faster than "sub $1, %ah". Using %al in the
> > inner loop, which is executed more number of times is likely making the
> > difference. A perf profile is needed to confirm this.
> 
> I bet it is also CPU dependant - it is quite likely that there isn't
> any special hardware to support partial writes of %ah so it ends up taking
> a slow path (possibly even a microcoded one to get an 8% regression).

Strangely, %ah in the inner loop incurs less uops and has fewer branch
misses, yet takes more cycles. Below is the perf data for the sequence on a
Rocket Lake (similar observation on ICX and EMR):

  Event                     %al inner      %ah inner       Delta
  ----------------------  -------------  -------------  ----------
  cycles                    776,775,020    972,322,384    +25.2%
  instructions/cycle               1.23           0.98    -20.3%
  branch-misses               4,792,502        560,449    -88.3%
  uops_issued.any           768,019,010    696,888,357     -9.3%
  time elapsed                 0.1627s        0.2048s     +25.9%

Time elapsed directly correlates with the increase in cycles.

> As well as swapping %al <-> %ah try changing the outer loop decrement to
> 	sub $0x100, %ax
> since %al is zero that will set the z flag the same.

Unfortunately, using "sub $0x100, %ax"(with %al as inner loop) isn't better
than just using "sub $1, %ah" in the outer loop:

  Event                     %al inner      + sub %ax       Delta
  ----------------------  -------------  -------------  ----------
  cycles                    776,775,020    813,372,036     +4.7%
  instructions/cycle               1.23           1.17     -4.5%
  branch-misses               4,792,502      7,610,323    +58.8%
  uops_issued.any           768,019,010    827,465,137     +7.7%
  time elapsed                 0.1627s        0.1707s      +4.9%

> I've just hacked a test into some test code I've got.
> I'm not seeing an unexpected costs on either zen-5 or haswell.
> So it may be more subtle.

This is puzzling, but atleast it is evident that using %al for the inner
loop seems to be the best option. In summary:

  Variant   Cycles     Uops Issued  Branch Misses
  -------  ----------  -----------  -------------
  %al       776M        768M           4.8M         (fastest)
  %ah       972M (+25%) 697M (-9%)     560K (-88%)  (fewer uops + misses, yet slowest)
  sub %ax   813M (+5%)  827M (+8%)     7.6M (+59%)  (most uops + misses)

^ permalink raw reply

* Re: [PATCH v8 04/10] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
From: Pawan Gupta @ 2026-04-01  8:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Borislav Petkov, Dave Hansen, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Jiri Olsa, David S. Miller, David Laight, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, David Ahern, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Paolo Bonzini, Jonathan Corbet,
	linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <acwJVUeW9KoLft4d@google.com>

On Tue, Mar 31, 2026 at 10:50:13AM -0700, Sean Christopherson wrote:
> On Tue, Mar 24, 2026, Pawan Gupta wrote:
> > With the upcoming changes x86_ibpb_exit_to_user will also be used when BHB
> > clearing sequence is used. Rename it cover both the cases.
> > 
> > No functional change.
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > ---
> 
> Acked-by: Sean Christopherson <seanjc@google.com>

Thanks.

^ permalink raw reply

* [PATCH RESEND v10 2/8] LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
loongarch. While at it, add input validation to make the code more
robust.

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/loongarch/include/asm/acpi.h | 1 +
 arch/loongarch/kernel/acpi.c      | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/loongarch/include/asm/acpi.h b/arch/loongarch/include/asm/acpi.h
index 7376840fa9f7..8bb101b4557e 100644
--- a/arch/loongarch/include/asm/acpi.h
+++ b/arch/loongarch/include/asm/acpi.h
@@ -44,6 +44,7 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
 {
 	return acpi_core_pic[cpu_logical_map(cpu)].processor_id;
 }
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
 
 #endif /* !CONFIG_ACPI */
 
diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index 1367ca759468..058f0dbe8e8f 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -385,3 +385,12 @@ int acpi_unmap_cpu(int cpu)
 EXPORT_SYMBOL(acpi_unmap_cpu);
 
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
+
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid)
+{
+	if (cpu >= nr_cpu_ids)
+		return -EINVAL;
+	*uid = acpi_core_pic[cpu_logical_map(cpu)].processor_id;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_get_cpu_uid);
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 4/8] x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
x86. While at it, add input validation to make the code more robust.

Update Xen-related code to use acpi_get_cpu_uid() instead of the legacy
cpu_acpi_id() function, and remove the now-unused cpu_acpi_id() to clean
up redundant code.

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/acpi.h  |  2 ++
 arch/x86/include/asm/cpu.h   |  1 -
 arch/x86/include/asm/smp.h   |  1 -
 arch/x86/kernel/acpi/boot.c  | 20 ++++++++++++++++++++
 arch/x86/xen/enlighten_hvm.c |  5 +++--
 5 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index a03aa6f999d1..92b5c27c4fea 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -157,6 +157,8 @@ static inline bool acpi_has_cpu_in_madt(void)
 	return !!acpi_lapic;
 }
 
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
+
 #define ACPI_HAVE_ARCH_SET_ROOT_POINTER
 static __always_inline void acpi_arch_set_root_pointer(u64 addr)
 {
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index ad235dda1ded..57a0786dfd75 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -11,7 +11,6 @@
 
 #ifndef CONFIG_SMP
 #define cpu_physical_id(cpu)			boot_cpu_physical_apicid
-#define cpu_acpi_id(cpu)			0
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 84951572ab81..05d1d479b4cf 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -130,7 +130,6 @@ __visible void smp_call_function_interrupt(struct pt_regs *regs);
 __visible void smp_call_function_single_interrupt(struct pt_regs *r);
 
 #define cpu_physical_id(cpu)	per_cpu(x86_cpu_to_apicid, cpu)
-#define cpu_acpi_id(cpu)	per_cpu(x86_cpu_to_acpiid, cpu)
 
 /*
  * This function is needed by all SMP systems. It must _always_ be valid
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index a3f2fb1fea1b..ceba24f65ae3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1848,3 +1848,23 @@ void __iomem * (*acpi_os_ioremap)(acpi_physical_address phys, acpi_size size) =
 	x86_acpi_os_ioremap;
 EXPORT_SYMBOL_GPL(acpi_os_ioremap);
 #endif
+
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid)
+{
+	u32 acpi_id;
+
+	if (cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+#ifdef CONFIG_SMP
+	acpi_id = per_cpu(x86_cpu_to_acpiid, cpu);
+	if (acpi_id == CPU_ACPIID_INVALID)
+		return -ENODEV;
+#else
+	acpi_id = 0;
+#endif
+
+	*uid = acpi_id;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_get_cpu_uid);
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index fe57ff85d004..2f9fa27e5a3c 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -151,6 +151,7 @@ static void xen_hvm_crash_shutdown(struct pt_regs *regs)
 
 static int xen_cpu_up_prepare_hvm(unsigned int cpu)
 {
+	u32 cpu_uid;
 	int rc = 0;
 
 	/*
@@ -161,8 +162,8 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
 	 */
 	xen_uninit_lock_cpu(cpu);
 
-	if (cpu_acpi_id(cpu) != CPU_ACPIID_INVALID)
-		per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
+	if (acpi_get_cpu_uid(cpu, &cpu_uid) == 0)
+		per_cpu(xen_vcpu_id, cpu) = cpu_uid;
 	else
 		per_cpu(xen_vcpu_id, cpu) = cpu;
 	xen_vcpu_setup(cpu);
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 0/8] ACPI: Unify CPU UID interface and fix ARM64 TPH steer-tag issue
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86

This patchset unifies ACPI Processor UID retrieval across
arm64/loongarch/riscv/x86 via acpi_get_cpu_uid() (with input validation)
and fixes ARM64 CPU steer-tag retrieval failure in PCI/TPH:

1-4: Add acpi_get_cpu_uid() for arm64/loongarch/riscv/x86 (update
     respective users)
5: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
6: Clean up perf/arm_cspmu
7: Clean up ACPI/PPTT and remove unused get_acpi_id_for_cpu()
8: Pass ACPI Processor UID to Cache Locality _DSM

The interface refactor ensures consistent CPU UID retrieval across
architectures (no functional changes for valid inputs) and provides the
unified interface required for the ARM64 TPH fix

---
Changes in v10-resend:
- Add Catalin's ack-by for arm64 commit
- Add CC to x86@kernel.org

Changes in v10:
- Refine commit header&log according to Punit's and Bjorn's review
- Split perf/arm_cspmu as a separate commit which address Punit's
  review

Changes in v9:
- Address Bjorn's review: split commits to each platform so that make
  them easy to review

Changes in v8:
- Moving arm64's get_cpu_for_acpi_id() to kernel/acpi.c which address
  Jeremy's review

Chengwen Feng (8):
  arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID
    retrieval
  x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  ACPI: Centralize acpi_get_cpu_uid() declaration in
    include/linux/acpi.h
  perf: arm_cspmu: Switch to acpi_get_cpu_uid() from
    get_acpi_id_for_cpu()
  ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
  PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM

 Documentation/PCI/tph.rst          |  4 +--
 arch/arm64/include/asm/acpi.h      | 17 +---------
 arch/arm64/kernel/acpi.c           | 30 ++++++++++++++++++
 arch/loongarch/include/asm/acpi.h  |  5 ---
 arch/loongarch/kernel/acpi.c       |  9 ++++++
 arch/riscv/include/asm/acpi.h      |  4 ---
 arch/riscv/kernel/acpi.c           | 16 ++++++++++
 arch/riscv/kernel/acpi_numa.c      |  9 ++++--
 arch/x86/include/asm/cpu.h         |  1 -
 arch/x86/include/asm/smp.h         |  1 -
 arch/x86/kernel/acpi/boot.c        | 20 ++++++++++++
 arch/x86/xen/enlighten_hvm.c       |  5 +--
 drivers/acpi/pptt.c                | 50 ++++++++++++++++++++++--------
 drivers/acpi/riscv/rhct.c          |  7 ++++-
 drivers/pci/tph.c                  | 16 +++++++---
 drivers/perf/arm_cspmu/arm_cspmu.c |  6 ++--
 include/linux/acpi.h               | 11 +++++++
 include/linux/pci-tph.h            |  4 +--
 18 files changed, 158 insertions(+), 57 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH RESEND v10 5/8] ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

Centralize acpi_get_cpu_uid() in include/linux/acpi.h (global scope) and
remove arch-specific declarations from arm64/loongarch/riscv/x86
asm/acpi.h. This unifies the interface across architectures and
simplifies maintenance by eliminating duplicate prototypes.

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/arm64/include/asm/acpi.h     |  1 -
 arch/loongarch/include/asm/acpi.h |  1 -
 arch/riscv/include/asm/acpi.h     |  1 -
 arch/x86/include/asm/acpi.h       |  2 --
 include/linux/acpi.h              | 11 +++++++++++
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 2219a3301e72..bdb0ecf95b5c 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -118,7 +118,6 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
 {
 	return	acpi_cpu_get_madt_gicc(cpu)->uid;
 }
-int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
 int get_cpu_for_acpi_id(u32 uid);
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
diff --git a/arch/loongarch/include/asm/acpi.h b/arch/loongarch/include/asm/acpi.h
index 8bb101b4557e..7376840fa9f7 100644
--- a/arch/loongarch/include/asm/acpi.h
+++ b/arch/loongarch/include/asm/acpi.h
@@ -44,7 +44,6 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
 {
 	return acpi_core_pic[cpu_logical_map(cpu)].processor_id;
 }
-int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
 
 #endif /* !CONFIG_ACPI */
 
diff --git a/arch/riscv/include/asm/acpi.h b/arch/riscv/include/asm/acpi.h
index f3520cc85af3..6e13695120bc 100644
--- a/arch/riscv/include/asm/acpi.h
+++ b/arch/riscv/include/asm/acpi.h
@@ -65,7 +65,6 @@ static inline u32 get_acpi_id_for_cpu(int cpu)
 {
 	return acpi_cpu_get_madt_rintc(cpu)->uid;
 }
-int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
 
 int acpi_get_riscv_isa(struct acpi_table_header *table,
 		       unsigned int cpu, const char **isa);
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 92b5c27c4fea..a03aa6f999d1 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -157,8 +157,6 @@ static inline bool acpi_has_cpu_in_madt(void)
 	return !!acpi_lapic;
 }
 
-int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
-
 #define ACPI_HAVE_ARCH_SET_ROOT_POINTER
 static __always_inline void acpi_arch_set_root_pointer(u64 addr)
 {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4d2f0bed7a06..74a73f0e5944 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -324,6 +324,17 @@ int acpi_unmap_cpu(int cpu);
 
 acpi_handle acpi_get_processor_handle(int cpu);
 
+/**
+ * acpi_get_cpu_uid() - Get ACPI Processor UID of from MADT table
+ * @cpu: Logical CPU number (0-based)
+ * @uid: Pointer to store ACPI Processor UID
+ *
+ * Return: 0 on success (ACPI Processor ID stored in *uid);
+ *         -EINVAL if CPU number is invalid or out of range;
+ *         -ENODEV if ACPI Processor UID for the CPU is not found.
+ */
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
 #endif
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 7/8] ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

Update acpi/pptt.c to use acpi_get_cpu_uid() and remove unused
get_acpi_id_for_cpu() from arm64/loongarch/riscv, completing PPTT's
migration to the unified ACPI CPU UID interface

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/arm64/include/asm/acpi.h     |  4 ---
 arch/loongarch/include/asm/acpi.h |  5 ----
 arch/riscv/include/asm/acpi.h     |  4 ---
 drivers/acpi/pptt.c               | 50 +++++++++++++++++++++++--------
 4 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index bdb0ecf95b5c..8a54ca6ba602 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -114,10 +114,6 @@ static inline bool acpi_has_cpu_in_madt(void)
 }
 
 struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
-static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
-{
-	return	acpi_cpu_get_madt_gicc(cpu)->uid;
-}
 int get_cpu_for_acpi_id(u32 uid);
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
diff --git a/arch/loongarch/include/asm/acpi.h b/arch/loongarch/include/asm/acpi.h
index 7376840fa9f7..eda9d4d0a493 100644
--- a/arch/loongarch/include/asm/acpi.h
+++ b/arch/loongarch/include/asm/acpi.h
@@ -40,11 +40,6 @@ extern struct acpi_madt_core_pic acpi_core_pic[MAX_CORE_PIC];
 
 extern int __init parse_acpi_topology(void);
 
-static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
-{
-	return acpi_core_pic[cpu_logical_map(cpu)].processor_id;
-}
-
 #endif /* !CONFIG_ACPI */
 
 #define ACPI_TABLE_UPGRADE_MAX_PHYS ARCH_LOW_ADDRESS_LIMIT
diff --git a/arch/riscv/include/asm/acpi.h b/arch/riscv/include/asm/acpi.h
index 6e13695120bc..26ab37c171bc 100644
--- a/arch/riscv/include/asm/acpi.h
+++ b/arch/riscv/include/asm/acpi.h
@@ -61,10 +61,6 @@ static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 
 void acpi_init_rintc_map(void);
 struct acpi_madt_rintc *acpi_cpu_get_madt_rintc(int cpu);
-static inline u32 get_acpi_id_for_cpu(int cpu)
-{
-	return acpi_cpu_get_madt_rintc(cpu)->uid;
-}
 
 int acpi_get_riscv_isa(struct acpi_table_header *table,
 		       unsigned int cpu, const char **isa);
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index de5f8c018333..7bd5bc1f225a 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -459,11 +459,14 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
 {
 	struct acpi_pptt_cache *found_cache;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	u32 acpi_cpu_id;
 	struct cacheinfo *this_leaf;
 	unsigned int index = 0;
 	struct acpi_pptt_processor *cpu_node = NULL;
 
+	if (acpi_get_cpu_uid(cpu, &acpi_cpu_id) != 0)
+		return;
+
 	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
 		this_leaf = this_cpu_ci->info_list + index;
 		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
@@ -546,7 +549,10 @@ static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
 				     unsigned int cpu, int level, int flag)
 {
 	struct acpi_pptt_processor *cpu_node;
-	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	u32 acpi_cpu_id;
+
+	if (acpi_get_cpu_uid(cpu, &acpi_cpu_id) != 0)
+		return -ENOENT;
 
 	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
 	if (cpu_node) {
@@ -614,18 +620,22 @@ static int find_acpi_cpu_topology_tag(unsigned int cpu, int level, int flag)
  *
  * Check the node representing a CPU for a given flag.
  *
- * Return: -ENOENT if the PPTT doesn't exist, the CPU cannot be found or
- *	   the table revision isn't new enough.
+ * Return: -ENOENT if can't get CPU's ACPI Processor UID, the PPTT doesn't
+ *	   exist, the CPU cannot be found or the table revision isn't new
+ *	   enough.
  *	   1, any passed flag set
  *	   0, flag unset
  */
 static int check_acpi_cpu_flag(unsigned int cpu, int rev, u32 flag)
 {
 	struct acpi_table_header *table;
-	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	u32 acpi_cpu_id;
 	struct acpi_pptt_processor *cpu_node = NULL;
 	int ret = -ENOENT;
 
+	if (acpi_get_cpu_uid(cpu, &acpi_cpu_id) != 0)
+		return -ENOENT;
+
 	table = acpi_get_pptt();
 	if (!table)
 		return -ENOENT;
@@ -651,7 +661,8 @@ static int check_acpi_cpu_flag(unsigned int cpu, int rev, u32 flag)
  * in the PPTT. Errors caused by lack of a PPTT table, or otherwise, return 0
  * indicating we didn't find any cache levels.
  *
- * Return: -ENOENT if no PPTT table or no PPTT processor struct found.
+ * Return: -ENOENT if no PPTT table, can't get CPU's ACPI Process UID or no PPTT
+ *	   processor struct found.
  *	   0 on success.
  */
 int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
@@ -671,7 +682,9 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
 
 	pr_debug("Cache Setup: find cache levels for CPU=%d\n", cpu);
 
-	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	if (acpi_get_cpu_uid(cpu, &acpi_cpu_id))
+		return -ENOENT;
+
 	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
 	if (!cpu_node)
 		return -ENOENT;
@@ -780,8 +793,9 @@ int find_acpi_cpu_topology_package(unsigned int cpu)
  * It may not exist in single CPU systems. In simple multi-CPU systems,
  * it may be equal to the package topology level.
  *
- * Return: -ENOENT if the PPTT doesn't exist, the CPU cannot be found
- * or there is no toplogy level above the CPU..
+ * Return: -ENOENT if the PPTT doesn't exist, can't get CPU's ACPI
+ * Processor UID, the CPU cannot be found or there is no toplogy level
+ * above the CPU.
  * Otherwise returns a value which represents the package for this CPU.
  */
 
@@ -797,7 +811,9 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu)
 	if (!table)
 		return -ENOENT;
 
-	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	if (acpi_get_cpu_uid(cpu, &acpi_cpu_id) != 0)
+		return -ENOENT;
+
 	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
 	if (!cpu_node || !cpu_node->parent)
 		return -ENOENT;
@@ -872,7 +888,9 @@ static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
 	cpumask_clear(cpus);
 
 	for_each_possible_cpu(cpu) {
-		acpi_id = get_acpi_id_for_cpu(cpu);
+		if (acpi_get_cpu_uid(cpu, &acpi_id) != 0)
+			continue;
+
 		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
 
 		while (cpu_node) {
@@ -966,10 +984,13 @@ int find_acpi_cache_level_from_id(u32 cache_id)
 	for_each_possible_cpu(cpu) {
 		bool empty;
 		int level = 1;
-		u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		u32 acpi_cpu_id;
 		struct acpi_pptt_cache *cache;
 		struct acpi_pptt_processor *cpu_node;
 
+		if (acpi_get_cpu_uid(cpu, &acpi_cpu_id) != 0)
+			continue;
+
 		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
 		if (!cpu_node)
 			continue;
@@ -1030,10 +1051,13 @@ int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
 	for_each_possible_cpu(cpu) {
 		bool empty;
 		int level = 1;
-		u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		u32 acpi_cpu_id;
 		struct acpi_pptt_cache *cache;
 		struct acpi_pptt_processor *cpu_node;
 
+		if (acpi_get_cpu_uid(cpu, &acpi_cpu_id) != 0)
+			continue;
+
 		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
 		if (!cpu_node)
 			continue;
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 3/8] RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
riscv. While at it, add input validation to make the code more robust.

And also update acpi_numa.c and rhct.c to use the new interface instead
of the legacy get_acpi_id_for_cpu().

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/riscv/include/asm/acpi.h |  1 +
 arch/riscv/kernel/acpi.c      | 16 ++++++++++++++++
 arch/riscv/kernel/acpi_numa.c |  9 ++++++---
 drivers/acpi/riscv/rhct.c     |  7 ++++++-
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/acpi.h b/arch/riscv/include/asm/acpi.h
index 6e13695120bc..f3520cc85af3 100644
--- a/arch/riscv/include/asm/acpi.h
+++ b/arch/riscv/include/asm/acpi.h
@@ -65,6 +65,7 @@ static inline u32 get_acpi_id_for_cpu(int cpu)
 {
 	return acpi_cpu_get_madt_rintc(cpu)->uid;
 }
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
 
 int acpi_get_riscv_isa(struct acpi_table_header *table,
 		       unsigned int cpu, const char **isa);
diff --git a/arch/riscv/kernel/acpi.c b/arch/riscv/kernel/acpi.c
index 71698ee11621..322ea92aa39f 100644
--- a/arch/riscv/kernel/acpi.c
+++ b/arch/riscv/kernel/acpi.c
@@ -337,3 +337,19 @@ int raw_pci_write(unsigned int domain, unsigned int bus,
 }
 
 #endif	/* CONFIG_PCI */
+
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid)
+{
+	struct acpi_madt_rintc *rintc;
+
+	if (cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	rintc = acpi_cpu_get_madt_rintc(cpu);
+	if (!rintc)
+		return -ENODEV;
+
+	*uid = rintc->uid;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_get_cpu_uid);
diff --git a/arch/riscv/kernel/acpi_numa.c b/arch/riscv/kernel/acpi_numa.c
index 130769e3a99c..6a2d4289f806 100644
--- a/arch/riscv/kernel/acpi_numa.c
+++ b/arch/riscv/kernel/acpi_numa.c
@@ -37,11 +37,14 @@ static int __init acpi_numa_get_nid(unsigned int cpu)
 
 static inline int get_cpu_for_acpi_id(u32 uid)
 {
-	int cpu;
+	u32 cpu_uid;
+	int ret;
 
-	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
-		if (uid == get_acpi_id_for_cpu(cpu))
+	for (int cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+		if (ret == 0 && uid == cpu_uid)
 			return cpu;
+	}
 
 	return -EINVAL;
 }
diff --git a/drivers/acpi/riscv/rhct.c b/drivers/acpi/riscv/rhct.c
index caa2c16e1697..8f3f38c64a88 100644
--- a/drivers/acpi/riscv/rhct.c
+++ b/drivers/acpi/riscv/rhct.c
@@ -44,10 +44,15 @@ int acpi_get_riscv_isa(struct acpi_table_header *table, unsigned int cpu, const
 	struct acpi_rhct_isa_string *isa_node;
 	struct acpi_table_rhct *rhct;
 	u32 *hart_info_node_offset;
-	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	u32 acpi_cpu_id;
+	int ret;
 
 	BUG_ON(acpi_disabled);
 
+	ret = acpi_get_cpu_uid(cpu, &acpi_cpu_id);
+	if (ret != 0)
+		return ret;
+
 	if (!table) {
 		rhct = acpi_get_rhct();
 		if (!rhct)
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 8/8] PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

pcie_tph_get_cpu_st() uses the Query Cache Locality Features _DSM [1]
to retrieve the TPH Steering Tag for memory associated with the CPU
identified by its "cpu_uid" parameter, a Linux logical CPU ID.

The _DSM requires an ACPI Processor UID, which pcie_tph_get_cpu_st()
previously assumed was the same as the Linux logical CPU ID. This is
true on x86 but not on arm64, so pcie_tph_get_cpu_st() returned the
wrong Steering Tag, resulting in incorrect TPH functionality on arm64.

Convert the Linux logical CPU ID to the ACPI Processor UID with
acpi_get_cpu_uid() before passing it to the _DSM. Additionally, rename
the pcie_tph_get_cpu_st() parameter from "cpu_uid" to "cpu" to reflect
that it represents a logical CPU ID (not an ACPI Processor UID).

[1] According to ECN_TPH-ST_Revision_20200924
    (https://members.pcisig.com/wg/PCI-SIG/document/15470), the input
    is defined as: "If the target is a processor, then this field
    represents the ACPI Processor UID of the processor as specified in
    the MADT. If the target is a processor container, then this field
    represents the ACPI Processor UID of the processor container as
    specified in the PPTT."

Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/tph.rst |  4 ++--
 drivers/pci/tph.c         | 16 +++++++++++-----
 include/linux/pci-tph.h   |  4 ++--
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/Documentation/PCI/tph.rst b/Documentation/PCI/tph.rst
index e8993be64fd6..b6cf22b9bd90 100644
--- a/Documentation/PCI/tph.rst
+++ b/Documentation/PCI/tph.rst
@@ -79,10 +79,10 @@ To retrieve a Steering Tag for a target memory associated with a specific
 CPU, use the following function::
 
   int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type type,
-                          unsigned int cpu_uid, u16 *tag);
+                          unsigned int cpu, u16 *tag);
 
 The `type` argument is used to specify the memory type, either volatile
-or persistent, of the target memory. The `cpu_uid` argument specifies the
+or persistent, of the target memory. The `cpu` argument specifies the
 CPU where the memory is associated to.
 
 After the ST value is retrieved, the device driver can use the following
diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index ca4f97be7538..b67c9ad14bda 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -236,21 +236,27 @@ static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag)
  * with a specific CPU
  * @pdev: PCI device
  * @mem_type: target memory type (volatile or persistent RAM)
- * @cpu_uid: associated CPU id
+ * @cpu: associated CPU id
  * @tag: Steering Tag to be returned
  *
  * Return the Steering Tag for a target memory that is associated with a
- * specific CPU as indicated by cpu_uid.
+ * specific CPU as indicated by cpu.
  *
  * Return: 0 if success, otherwise negative value (-errno)
  */
 int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
-			unsigned int cpu_uid, u16 *tag)
+			unsigned int cpu, u16 *tag)
 {
 #ifdef CONFIG_ACPI
 	struct pci_dev *rp;
 	acpi_handle rp_acpi_handle;
 	union st_info info;
+	u32 cpu_uid;
+	int ret;
+
+	ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+	if (ret != 0)
+		return ret;
 
 	rp = pcie_find_root_port(pdev);
 	if (!rp || !rp->bus || !rp->bus->bridge)
@@ -265,9 +271,9 @@ int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type,
 
 	*tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info);
 
-	pci_dbg(pdev, "get steering tag: mem_type=%s, cpu_uid=%d, tag=%#04x\n",
+	pci_dbg(pdev, "get steering tag: mem_type=%s, cpu=%d, tag=%#04x\n",
 		(mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent",
-		cpu_uid, *tag);
+		cpu, *tag);
 
 	return 0;
 #else
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index ba28140ce670..be68cd17f2f8 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -25,7 +25,7 @@ int pcie_tph_set_st_entry(struct pci_dev *pdev,
 			  unsigned int index, u16 tag);
 int pcie_tph_get_cpu_st(struct pci_dev *dev,
 			enum tph_mem_type mem_type,
-			unsigned int cpu_uid, u16 *tag);
+			unsigned int cpu, u16 *tag);
 void pcie_disable_tph(struct pci_dev *pdev);
 int pcie_enable_tph(struct pci_dev *pdev, int mode);
 u16 pcie_tph_get_st_table_size(struct pci_dev *pdev);
@@ -36,7 +36,7 @@ static inline int pcie_tph_set_st_entry(struct pci_dev *pdev,
 { return -EINVAL; }
 static inline int pcie_tph_get_cpu_st(struct pci_dev *dev,
 				      enum tph_mem_type mem_type,
-				      unsigned int cpu_uid, u16 *tag)
+				      unsigned int cpu, u16 *tag)
 { return -EINVAL; }
 static inline void pcie_disable_tph(struct pci_dev *pdev) { }
 static inline int pcie_enable_tph(struct pci_dev *pdev, int mode)
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 1/8] arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
arm64. While at it, add input validation to make the code more robust.

Reimplement get_cpu_for_acpi_id() based on acpi_get_cpu_uid() for
consistency, and move its implementation next to the new function for
code coherence.

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/acpi.h | 14 ++------------
 arch/arm64/kernel/acpi.c      | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index c07a58b96329..2219a3301e72 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -118,18 +118,8 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
 {
 	return	acpi_cpu_get_madt_gicc(cpu)->uid;
 }
-
-static inline int get_cpu_for_acpi_id(u32 uid)
-{
-	int cpu;
-
-	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
-		if (acpi_cpu_get_madt_gicc(cpu) &&
-		    uid == get_acpi_id_for_cpu(cpu))
-			return cpu;
-
-	return -EINVAL;
-}
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid);
+int get_cpu_for_acpi_id(u32 uid);
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index af90128cfed5..24b9d934be54 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -458,3 +458,33 @@ int acpi_unmap_cpu(int cpu)
 }
 EXPORT_SYMBOL(acpi_unmap_cpu);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
+
+int acpi_get_cpu_uid(unsigned int cpu, u32 *uid)
+{
+	struct acpi_madt_generic_interrupt *gicc;
+
+	if (cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	gicc = acpi_cpu_get_madt_gicc(cpu);
+	if (!gicc)
+		return -ENODEV;
+
+	*uid = gicc->uid;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_get_cpu_uid);
+
+int get_cpu_for_acpi_id(u32 uid)
+{
+	u32 cpu_uid;
+	int ret;
+
+	for (int cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+		if (ret == 0 && uid == cpu_uid)
+			return cpu;
+	}
+
+	return -EINVAL;
+}
-- 
2.17.1


^ permalink raw reply related

* [PATCH RESEND v10 6/8] perf: arm_cspmu: Switch to acpi_get_cpu_uid() from get_acpi_id_for_cpu()
From: Chengwen Feng @ 2026-04-01  8:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Catalin Marinas, Will Deacon, Rafael J . Wysocki
  Cc: Jonathan Corbet, WANG Xuerui, Thomas Gleixner, Dave Hansen,
	H . Peter Anvin, Juergen Gross, Boris Ostrovsky, Len Brown,
	Sunil V L, Mark Rutland, Jonathan Cameron, Kees Cook, Yanteng Si,
	Sean Christopherson, Kai Huang, Tom Lendacky, Thomas Huth,
	Thorsten Blum, Kevin Loughlin, Zheyun Shen, Peter Zijlstra,
	Pawan Gupta, Xin Li, Ahmed S . Darwish, Sohil Mehta,
	Ilkka Koskinen, Robin Murphy, James Clark, Besar Wicaksono, Ma Ke,
	Wei Huang, Andy Gospodarek, Somnath Kotur, punit.agrawal,
	guohanjun, suzuki.poulose, ryan.roberts, chenl311, masahiroy,
	wangyuquan1236, anshuman.khandual, heinrich.schuchardt,
	Eric.VanTassell, wangzhou1, wanghuiqiang, liuyonglong,
	fengchengwen, linux-pci, linux-doc, linux-kernel,
	linux-arm-kernel, loongarch, linux-riscv, xen-devel, linux-acpi,
	linux-perf-users, stable, x86
In-Reply-To: <20260401081640.26875-1-fengchengwen@huawei.com>

Update arm_cspmu to use acpi_get_cpu_uid() instead of
get_acpi_id_for_cpu(), aligning with unified ACPI CPU UID interface.

No functional changes are introduced by this switch (valid inputs retain
original behavior).

Cc: stable@vger.kernel.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/perf/arm_cspmu/arm_cspmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index 34430b68f602..ed72c3d1f796 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -1107,15 +1107,17 @@ static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)
 {
 	struct acpi_apmt_node *apmt_node;
 	int affinity_flag;
+	u32 cpu_uid;
 	int cpu;
+	int ret;
 
 	apmt_node = arm_cspmu_apmt_node(cspmu->dev);
 	affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
 
 	if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
 		for_each_possible_cpu(cpu) {
-			if (apmt_node->proc_affinity ==
-			    get_acpi_id_for_cpu(cpu)) {
+			ret = acpi_get_cpu_uid(cpu, &cpu_uid);
+			if (ret == 0 && apmt_node->proc_affinity == cpu_uid) {
 				cpumask_set_cpu(cpu, &cspmu->associated_cpus);
 				break;
 			}
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox