linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Introducing firmware late binding
@ 2025-06-25 17:00 Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 01/10] mei: bus: add mei_cldev_mtu interface Badal Nilawar
                   ` (9 more replies)
  0 siblings, 10 replies; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Introducing firmware late binding feature to enable firmware loading
for the devices, such as the fan controller and voltage regulator,
during the driver probe.
Typically, firmware for these devices are part of IFWI flash image but
can be replaced at probe after OEM tuning.

v2:
 - Dropped voltage regulator specific code as binaries for it will not
   be available for upstreaming as of now.
 - Address review comments
v3:
 - Dropped fwctl patch for now
 - Added new patch to extract binary version
 - Address v2 review comments
v4:
 - Address v3 review comments

Alexander Usyskin (2):
  mei: bus: add mei_cldev_mtu interface
  mei: late_bind: add late binding component driver

Badal Nilawar (8):
  drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw
  drm/xe/xe_late_bind_fw: Initialize late binding firmware
  drm/xe/xe_late_bind_fw: Load late binding firmware
  drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume
  drm/xe/xe_late_bind_fw: Reload late binding fw during system resume
  drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late
    binding
  drm/xe/xe_late_bind_fw: Extract and print version info
  drm/xe/xe_late_bind_fw: Select INTEL_MEI_LATE_BIND for CI

 drivers/gpu/drm/xe/Kconfig                  |   1 +
 drivers/gpu/drm/xe/Makefile                 |   1 +
 drivers/gpu/drm/xe/xe_debugfs.c             |  41 ++
 drivers/gpu/drm/xe/xe_device.c              |   5 +
 drivers/gpu/drm/xe/xe_device_types.h        |   6 +
 drivers/gpu/drm/xe/xe_late_bind_fw.c        | 465 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_late_bind_fw.h        |  17 +
 drivers/gpu/drm/xe/xe_late_bind_fw_types.h  |  81 ++++
 drivers/gpu/drm/xe/xe_pci.c                 |   3 +
 drivers/gpu/drm/xe/xe_pm.c                  |  10 +
 drivers/gpu/drm/xe/xe_uc_fw_abi.h           |  66 +++
 drivers/misc/mei/Kconfig                    |   1 +
 drivers/misc/mei/Makefile                   |   1 +
 drivers/misc/mei/bus.c                      |  13 +
 drivers/misc/mei/late_bind/Kconfig          |  13 +
 drivers/misc/mei/late_bind/Makefile         |   9 +
 drivers/misc/mei/late_bind/mei_late_bind.c  | 281 ++++++++++++
 include/drm/intel/i915_component.h          |   1 +
 include/drm/intel/late_bind_mei_interface.h |  64 +++
 include/linux/mei_cl_bus.h                  |   1 +
 20 files changed, 1080 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.c
 create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.h
 create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw_types.h
 create mode 100644 drivers/misc/mei/late_bind/Kconfig
 create mode 100644 drivers/misc/mei/late_bind/Makefile
 create mode 100644 drivers/misc/mei/late_bind/mei_late_bind.c
 create mode 100644 include/drm/intel/late_bind_mei_interface.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 01/10] mei: bus: add mei_cldev_mtu interface
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 02/10] mei: late_bind: add late binding component driver Badal Nilawar
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

From: Alexander Usyskin <alexander.usyskin@intel.com>

Allow to bus client to obtain client mtu.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/misc/mei/bus.c     | 13 +++++++++++++
 include/linux/mei_cl_bus.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index 67176caf5416..f860b1b6eda0 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -614,6 +614,19 @@ u8 mei_cldev_ver(const struct mei_cl_device *cldev)
 }
 EXPORT_SYMBOL_GPL(mei_cldev_ver);
 
+/**
+ * mei_cldev_mtu - max message that client can send and receive
+ *
+ * @cldev: mei client device
+ *
+ * Return: mtu or 0 if client is not connected
+ */
+size_t mei_cldev_mtu(const struct mei_cl_device *cldev)
+{
+	return mei_cl_mtu(cldev->cl);
+}
+EXPORT_SYMBOL_GPL(mei_cldev_mtu);
+
 /**
  * mei_cldev_enabled - check whether the device is enabled
  *
diff --git a/include/linux/mei_cl_bus.h b/include/linux/mei_cl_bus.h
index 725fd7727422..a82755e1fc40 100644
--- a/include/linux/mei_cl_bus.h
+++ b/include/linux/mei_cl_bus.h
@@ -113,6 +113,7 @@ int mei_cldev_register_notif_cb(struct mei_cl_device *cldev,
 				mei_cldev_cb_t notif_cb);
 
 u8 mei_cldev_ver(const struct mei_cl_device *cldev);
+size_t mei_cldev_mtu(const struct mei_cl_device *cldev);
 
 void *mei_cldev_get_drvdata(const struct mei_cl_device *cldev);
 void mei_cldev_set_drvdata(struct mei_cl_device *cldev, void *data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 01/10] mei: bus: add mei_cldev_mtu interface Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-26  3:50   ` Gupta, Anshuman
                     ` (2 more replies)
  2025-06-25 17:00 ` [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw Badal Nilawar
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

From: Alexander Usyskin <alexander.usyskin@intel.com>

Add late binding component driver.
It allows pushing the late binding configuration from, for example,
the Xe graphics driver to the Intel discrete graphics card's CSE device.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
v2:
 - Use generic naming (Jani)
 - Drop xe_late_bind_component struct to move to xe code (Daniele/Sasha)
v3:
 - Updated kconfig description
 - Move CSC late binding specific flags/defines to late_bind_mei_interface.h (Daniele)
v4:
 - Add match for PCI_CLASS_DISPLAY_OTHER to support headless cards (Anshuman)
v5:
 - Add fixes in push_config (Sasha)
 - Use INTEL_ prefix for component, refine doc,
   add status enum to headerlate_bind_mei_interface.h (Anshuman)
---
 drivers/misc/mei/Kconfig                    |   1 +
 drivers/misc/mei/Makefile                   |   1 +
 drivers/misc/mei/late_bind/Kconfig          |  13 +
 drivers/misc/mei/late_bind/Makefile         |   9 +
 drivers/misc/mei/late_bind/mei_late_bind.c  | 281 ++++++++++++++++++++
 include/drm/intel/i915_component.h          |   1 +
 include/drm/intel/late_bind_mei_interface.h |  64 +++++
 7 files changed, 370 insertions(+)
 create mode 100644 drivers/misc/mei/late_bind/Kconfig
 create mode 100644 drivers/misc/mei/late_bind/Makefile
 create mode 100644 drivers/misc/mei/late_bind/mei_late_bind.c
 create mode 100644 include/drm/intel/late_bind_mei_interface.h

diff --git a/drivers/misc/mei/Kconfig b/drivers/misc/mei/Kconfig
index 7575fee96cc6..771becc68095 100644
--- a/drivers/misc/mei/Kconfig
+++ b/drivers/misc/mei/Kconfig
@@ -84,5 +84,6 @@ config INTEL_MEI_VSC
 source "drivers/misc/mei/hdcp/Kconfig"
 source "drivers/misc/mei/pxp/Kconfig"
 source "drivers/misc/mei/gsc_proxy/Kconfig"
+source "drivers/misc/mei/late_bind/Kconfig"
 
 endif
diff --git a/drivers/misc/mei/Makefile b/drivers/misc/mei/Makefile
index 6f9fdbf1a495..84bfde888d81 100644
--- a/drivers/misc/mei/Makefile
+++ b/drivers/misc/mei/Makefile
@@ -31,6 +31,7 @@ CFLAGS_mei-trace.o = -I$(src)
 obj-$(CONFIG_INTEL_MEI_HDCP) += hdcp/
 obj-$(CONFIG_INTEL_MEI_PXP) += pxp/
 obj-$(CONFIG_INTEL_MEI_GSC_PROXY) += gsc_proxy/
+obj-$(CONFIG_INTEL_MEI_LATE_BIND) += late_bind/
 
 obj-$(CONFIG_INTEL_MEI_VSC_HW) += mei-vsc-hw.o
 mei-vsc-hw-y := vsc-tp.o
diff --git a/drivers/misc/mei/late_bind/Kconfig b/drivers/misc/mei/late_bind/Kconfig
new file mode 100644
index 000000000000..65c7180c5678
--- /dev/null
+++ b/drivers/misc/mei/late_bind/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025, Intel Corporation. All rights reserved.
+#
+config INTEL_MEI_LATE_BIND
+	tristate "Intel late binding support on ME Interface"
+	select INTEL_MEI_ME
+	depends on DRM_XE
+	help
+	  MEI Support for Late Binding for Intel graphics card.
+
+	  Enables the ME FW interfaces for Late Binding feature,
+	  allowing loading of firmware for the devices like Fan
+	  Controller during by Intel Xe driver.
diff --git a/drivers/misc/mei/late_bind/Makefile b/drivers/misc/mei/late_bind/Makefile
new file mode 100644
index 000000000000..a0aeda5853f0
--- /dev/null
+++ b/drivers/misc/mei/late_bind/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2025, Intel Corporation. All rights reserved.
+#
+# Makefile - Late Binding client driver for Intel MEI Bus Driver.
+
+subdir-ccflags-y += -I$(srctree)/drivers/misc/mei/
+
+obj-$(CONFIG_INTEL_MEI_LATE_BIND) += mei_late_bind.o
diff --git a/drivers/misc/mei/late_bind/mei_late_bind.c b/drivers/misc/mei/late_bind/mei_late_bind.c
new file mode 100644
index 000000000000..ffb89ccdfbb1
--- /dev/null
+++ b/drivers/misc/mei/late_bind/mei_late_bind.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2025 Intel Corporation
+ */
+#include <drm/intel/i915_component.h>
+#include <drm/intel/late_bind_mei_interface.h>
+#include <linux/component.h>
+#include <linux/pci.h>
+#include <linux/mei_cl_bus.h>
+#include <linux/module.h>
+#include <linux/overflow.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+
+#include "mkhi.h"
+
+#define GFX_SRV_MKHI_LATE_BINDING_CMD 0x12
+#define GFX_SRV_MKHI_LATE_BINDING_RSP (GFX_SRV_MKHI_LATE_BINDING_CMD | 0x80)
+
+#define LATE_BIND_SEND_TIMEOUT_MSEC 3000
+#define LATE_BIND_RECV_TIMEOUT_MSEC 3000
+
+/**
+ * struct csc_heci_late_bind_req - late binding request
+ * @header: @ref mkhi_msg_hdr
+ * @type: type of the late binding payload
+ * @flags: flags to be passed to the firmware
+ * @reserved: reserved field
+ * @payload_size: size of the payload data in bytes
+ * @payload: data to be sent to the firmware
+ */
+struct csc_heci_late_bind_req {
+	struct mkhi_msg_hdr header;
+	u32 type;
+	u32 flags;
+	u32 reserved[2];
+	u32 payload_size;
+	u8  payload[] __counted_by(payload_size);
+} __packed;
+
+/**
+ * struct csc_heci_late_bind_rsp - late binding response
+ * @header: @ref mkhi_msg_hdr
+ * @type: type of the late binding payload
+ * @reserved: reserved field
+ * @status: status of the late binding command execution by firmware
+ */
+struct csc_heci_late_bind_rsp {
+	struct mkhi_msg_hdr header;
+	u32 type;
+	u32 reserved[2];
+	u32 status;
+} __packed;
+
+static int mei_late_bind_check_response(const struct device *dev, const struct mkhi_msg_hdr *hdr)
+{
+	if (hdr->group_id != MKHI_GROUP_ID_GFX) {
+		dev_err(dev, "Mismatch group id: 0x%x instead of 0x%x\n",
+			hdr->group_id, MKHI_GROUP_ID_GFX);
+		return -EINVAL;
+	}
+
+	if (hdr->command != GFX_SRV_MKHI_LATE_BINDING_RSP) {
+		dev_err(dev, "Mismatch command: 0x%x instead of 0x%x\n",
+			hdr->command, GFX_SRV_MKHI_LATE_BINDING_RSP);
+		return -EINVAL;
+	}
+
+	if (hdr->result) {
+		dev_err(dev, "Error in result: 0x%x\n", hdr->result);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * mei_late_bind_push_config - Sends a config to the firmware.
+ * @dev: device struct corresponding to the mei device
+ * @type: payload type
+ * @flags: payload flags
+ * @payload: payload buffer
+ * @payload_size: payload buffer size
+ *
+ * Return: 0 success, negative errno value on transport failure,
+ *         positive status returned by FW
+ */
+static int mei_late_bind_push_config(struct device *dev, u32 type, u32 flags,
+				     const void *payload, size_t payload_size)
+{
+	struct mei_cl_device *cldev;
+	struct csc_heci_late_bind_req *req = NULL;
+	struct csc_heci_late_bind_rsp rsp;
+	size_t req_size;
+	ssize_t ret;
+
+	if (!dev || !payload || !payload_size)
+		return -EINVAL;
+
+	cldev = to_mei_cl_device(dev);
+
+	ret = mei_cldev_enable(cldev);
+	if (ret < 0) {
+		dev_dbg(dev, "mei_cldev_enable failed. %zd\n", ret);
+		return ret;
+	}
+
+	req_size = struct_size(req, payload, payload_size);
+	if (req_size > mei_cldev_mtu(cldev)) {
+		dev_err(dev, "Payload is too big %zu\n", payload_size);
+		ret = -EMSGSIZE;
+		goto end;
+	}
+
+	req = kmalloc(req_size, GFP_KERNEL);
+	if (!req) {
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	req->header.group_id = MKHI_GROUP_ID_GFX;
+	req->header.command = GFX_SRV_MKHI_LATE_BINDING_CMD;
+	req->type = type;
+	req->flags = flags;
+	req->reserved[0] = 0;
+	req->reserved[1] = 0;
+	req->payload_size = payload_size;
+	memcpy(req->payload, payload, payload_size);
+
+	ret = mei_cldev_send_timeout(cldev, (void *)req, req_size, LATE_BIND_SEND_TIMEOUT_MSEC);
+	if (ret < 0) {
+		dev_err(dev, "mei_cldev_send failed. %zd\n", ret);
+		goto end;
+	}
+
+	ret = mei_cldev_recv_timeout(cldev, (void *)&rsp, sizeof(rsp), LATE_BIND_RECV_TIMEOUT_MSEC);
+	if (ret < 0) {
+		dev_err(dev, "mei_cldev_recv failed. %zd\n", ret);
+		goto end;
+	}
+	if (ret < sizeof(rsp.header)) {
+		dev_err(dev, "bad response header from the firmware: size %zd < %zu\n",
+			ret, sizeof(rsp.header));
+		goto end;
+	}
+	if (ret < sizeof(rsp)) {
+		dev_err(dev, "bad response from the firmware: size %zd < %zu\n",
+			ret, sizeof(rsp));
+		goto end;
+	}
+
+	ret = mei_late_bind_check_response(dev, &rsp.header);
+	if (ret) {
+		dev_err(dev, "bad result response from the firmware: 0x%x\n",
+			*(uint32_t *)&rsp.header);
+		goto end;
+	}
+
+	ret = (int)rsp.status;
+	dev_dbg(dev, "%s status = %zd\n", __func__, ret);
+
+end:
+	mei_cldev_disable(cldev);
+	kfree(req);
+	return ret;
+}
+
+static const struct late_bind_component_ops mei_late_bind_ops = {
+	.owner = THIS_MODULE,
+	.push_config = mei_late_bind_push_config,
+};
+
+static int mei_component_master_bind(struct device *dev)
+{
+	return component_bind_all(dev, (void *)&mei_late_bind_ops);
+}
+
+static void mei_component_master_unbind(struct device *dev)
+{
+	component_unbind_all(dev, (void *)&mei_late_bind_ops);
+}
+
+static const struct component_master_ops mei_component_master_ops = {
+	.bind = mei_component_master_bind,
+	.unbind = mei_component_master_unbind,
+};
+
+/**
+ * mei_late_bind_component_match - compare function for matching mei late bind.
+ *
+ *    The function checks if requester is Intel PCI_CLASS_DISPLAY_VGA or
+ *    PCI_CLASS_DISPLAY_OTHER device, and checks if the parent of requester
+ *    and the grand parent of mei_if are the same device
+ *
+ * @dev: master device
+ * @subcomponent: subcomponent to match (INTEL_COMPONENT_LATE_BIND)
+ * @data: compare data (late_bind mei device on mei bus)
+ *
+ * Return:
+ * * 1 - if components match
+ * * 0 - otherwise
+ */
+static int mei_late_bind_component_match(struct device *dev, int subcomponent,
+					 void *data)
+{
+	struct device *base = data;
+	struct pci_dev *pdev;
+
+	if (!dev)
+		return 0;
+
+	if (!dev_is_pci(dev))
+		return 0;
+
+	pdev = to_pci_dev(dev);
+
+	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
+		return 0;
+
+	if (pdev->class != (PCI_CLASS_DISPLAY_VGA << 8) &&
+	    pdev->class != (PCI_CLASS_DISPLAY_OTHER << 8))
+		return 0;
+
+	if (subcomponent != INTEL_COMPONENT_LATE_BIND)
+		return 0;
+
+	base = base->parent;
+	if (!base) /* mei device */
+		return 0;
+
+	base = base->parent; /* pci device */
+
+	return !!base && dev == base;
+}
+
+static int mei_late_bind_probe(struct mei_cl_device *cldev,
+			       const struct mei_cl_device_id *id)
+{
+	struct component_match *master_match = NULL;
+	int ret;
+
+	component_match_add_typed(&cldev->dev, &master_match,
+				  mei_late_bind_component_match, &cldev->dev);
+	if (IS_ERR_OR_NULL(master_match))
+		return -ENOMEM;
+
+	ret = component_master_add_with_match(&cldev->dev,
+					      &mei_component_master_ops,
+					      master_match);
+	if (ret < 0)
+		dev_err(&cldev->dev, "Master comp add failed %d\n", ret);
+
+	return ret;
+}
+
+static void mei_late_bind_remove(struct mei_cl_device *cldev)
+{
+	component_master_del(&cldev->dev, &mei_component_master_ops);
+}
+
+#define MEI_GUID_MKHI UUID_LE(0xe2c2afa2, 0x3817, 0x4d19, \
+			      0x9d, 0x95, 0x6, 0xb1, 0x6b, 0x58, 0x8a, 0x5d)
+
+static struct mei_cl_device_id mei_late_bind_tbl[] = {
+	{ .uuid = MEI_GUID_MKHI, .version = MEI_CL_VERSION_ANY },
+	{ }
+};
+MODULE_DEVICE_TABLE(mei, mei_late_bind_tbl);
+
+static struct mei_cl_driver mei_late_bind_driver = {
+	.id_table = mei_late_bind_tbl,
+	.name = KBUILD_MODNAME,
+	.probe = mei_late_bind_probe,
+	.remove	= mei_late_bind_remove,
+};
+
+module_mei_cl_driver(mei_late_bind_driver);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("MEI Late Binding");
diff --git a/include/drm/intel/i915_component.h b/include/drm/intel/i915_component.h
index 4ea3b17aa143..456849a97d75 100644
--- a/include/drm/intel/i915_component.h
+++ b/include/drm/intel/i915_component.h
@@ -31,6 +31,7 @@ enum i915_component_type {
 	I915_COMPONENT_HDCP,
 	I915_COMPONENT_PXP,
 	I915_COMPONENT_GSC_PROXY,
+	INTEL_COMPONENT_LATE_BIND,
 };
 
 /* MAX_PORT is the number of port
diff --git a/include/drm/intel/late_bind_mei_interface.h b/include/drm/intel/late_bind_mei_interface.h
new file mode 100644
index 000000000000..ec58ef1ab4e8
--- /dev/null
+++ b/include/drm/intel/late_bind_mei_interface.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright (c) 2025 Intel Corporation
+ */
+
+#ifndef _LATE_BIND_MEI_INTERFACE_H_
+#define _LATE_BIND_MEI_INTERFACE_H_
+
+#include <linux/types.h>
+
+struct device;
+struct module;
+
+/**
+ * Late Binding flags
+ * Persistent across warm reset
+ */
+#define CSC_LATE_BINDING_FLAGS_IS_PERSISTENT	BIT(0)
+
+/**
+ * xe_late_bind_fw_type - enum to determine late binding fw type
+ */
+enum late_bind_type {
+	CSC_LATE_BINDING_TYPE_FAN_CONTROL = 1,
+};
+
+/**
+ * Late Binding payload status
+ */
+enum csc_late_binding_status {
+	CSC_LATE_BINDING_STATUS_SUCCESS           = 0,
+	CSC_LATE_BINDING_STATUS_4ID_MISMATCH      = 1,
+	CSC_LATE_BINDING_STATUS_ARB_FAILURE       = 2,
+	CSC_LATE_BINDING_STATUS_GENERAL_ERROR     = 3,
+	CSC_LATE_BINDING_STATUS_INVALID_PARAMS    = 4,
+	CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE = 5,
+	CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD   = 6,
+	CSC_LATE_BINDING_STATUS_TIMEOUT           = 7,
+};
+
+/**
+ * struct late_bind_component_ops - ops for Late Binding services.
+ * @owner: Module providing the ops
+ * @push_config: Sends a config to FW.
+ */
+struct late_bind_component_ops {
+	struct module *owner;
+
+	/**
+	 * @push_config: Sends a config to FW.
+	 * @dev: device struct corresponding to the mei device
+	 * @type: payload type
+	 * @flags: payload flags
+	 * @payload: payload buffer
+	 * @payload_size: payload buffer size
+	 *
+	 * Return: 0 success, negative errno value on transport failure,
+	 *         positive status returned by FW
+	 */
+	int (*push_config)(struct device *dev, u32 type, u32 flags,
+			   const void *payload, size_t payload_size);
+};
+
+#endif /* _LATE_BIND_MEI_INTERFACE_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 01/10] mei: bus: add mei_cldev_mtu interface Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 02/10] mei: late_bind: add late binding component driver Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-27 21:04   ` Rodrigo Vivi
  2025-06-25 17:00 ` [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware Badal Nilawar
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Introducing xe_late_bind_fw to enable firmware loading for the devices,
such as the fan controller, during the driver probe. Typically,
firmware for such devices are part of IFWI flash image but can be
replaced at probe after OEM tuning.
This patch binds mei late binding component to enable firmware loading.

v2:
 - Add devm_add_action_or_reset to remove the component (Daniele)
 - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
v3:
 - Fail driver probe if late bind initialization fails,
   add has_late_bind flag (Daniele)
v4:
 - %S/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/Makefile                |  1 +
 drivers/gpu/drm/xe/xe_device.c             |  5 ++
 drivers/gpu/drm/xe/xe_device_types.h       |  6 ++
 drivers/gpu/drm/xe/xe_late_bind_fw.c       | 90 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_late_bind_fw.h       | 15 ++++
 drivers/gpu/drm/xe/xe_late_bind_fw_types.h | 37 +++++++++
 drivers/gpu/drm/xe/xe_pci.c                |  3 +
 7 files changed, 157 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.c
 create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.h
 create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 7c039caefd00..521547d78fd2 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -76,6 +76,7 @@ xe-y += xe_bb.o \
 	xe_hw_fence.o \
 	xe_irq.o \
 	xe_lrc.o \
+	xe_late_bind_fw.o \
 	xe_migrate.o \
 	xe_mmio.o \
 	xe_mocs.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index cd17c1354ab3..584acd63b0d9 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -44,6 +44,7 @@
 #include "xe_hw_engine_group.h"
 #include "xe_hwmon.h"
 #include "xe_irq.h"
+#include "xe_late_bind_fw.h"
 #include "xe_memirq.h"
 #include "xe_mmio.h"
 #include "xe_module.h"
@@ -889,6 +890,10 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		return err;
 
+	err = xe_late_bind_init(&xe->late_bind);
+	if (err && err != -ENODEV)
+		return err;
+
 	err = xe_oa_init(xe);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 6aca4b1a2824..321f9e9a94f6 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -16,6 +16,7 @@
 #include "xe_devcoredump_types.h"
 #include "xe_heci_gsc.h"
 #include "xe_lmtt_types.h"
+#include "xe_late_bind_fw_types.h"
 #include "xe_memirq_types.h"
 #include "xe_oa_types.h"
 #include "xe_platform_types.h"
@@ -323,6 +324,8 @@ struct xe_device {
 		u8 has_heci_cscfi:1;
 		/** @info.has_heci_gscfi: device has heci gscfi */
 		u8 has_heci_gscfi:1;
+		/** @info.has_late_bind: Device has firmware late binding support */
+		u8 has_late_bind:1;
 		/** @info.has_llc: Device has a shared CPU+GPU last level cache */
 		u8 has_llc:1;
 		/** @info.has_mbx_power_limits: Device has support to manage power limits using
@@ -555,6 +558,9 @@ struct xe_device {
 	/** @nvm: discrete graphics non-volatile memory */
 	struct intel_dg_nvm_dev *nvm;
 
+	/** @late_bind: xe mei late bind interface */
+	struct xe_late_bind late_bind;
+
 	/** @oa: oa observation subsystem */
 	struct xe_oa oa;
 
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
new file mode 100644
index 000000000000..eaf12cfec848
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <linux/component.h>
+#include <linux/delay.h>
+
+#include <drm/drm_managed.h>
+#include <drm/intel/i915_component.h>
+#include <drm/intel/late_bind_mei_interface.h>
+#include <drm/drm_print.h>
+
+#include "xe_device.h"
+#include "xe_late_bind_fw.h"
+
+static struct xe_device *
+late_bind_to_xe(struct xe_late_bind *late_bind)
+{
+	return container_of(late_bind, struct xe_device, late_bind);
+}
+
+static int xe_late_bind_component_bind(struct device *xe_kdev,
+				       struct device *mei_kdev, void *data)
+{
+	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
+	struct xe_late_bind *late_bind = &xe->late_bind;
+
+	mutex_lock(&late_bind->mutex);
+	late_bind->component.ops = data;
+	late_bind->component.mei_dev = mei_kdev;
+	mutex_unlock(&late_bind->mutex);
+
+	return 0;
+}
+
+static void xe_late_bind_component_unbind(struct device *xe_kdev,
+					  struct device *mei_kdev, void *data)
+{
+	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
+	struct xe_late_bind *late_bind = &xe->late_bind;
+
+	mutex_lock(&late_bind->mutex);
+	late_bind->component.ops = NULL;
+	mutex_unlock(&late_bind->mutex);
+}
+
+static const struct component_ops xe_late_bind_component_ops = {
+	.bind   = xe_late_bind_component_bind,
+	.unbind = xe_late_bind_component_unbind,
+};
+
+static void xe_late_bind_remove(void *arg)
+{
+	struct xe_late_bind *late_bind = arg;
+	struct xe_device *xe = late_bind_to_xe(late_bind);
+
+	component_del(xe->drm.dev, &xe_late_bind_component_ops);
+	mutex_destroy(&late_bind->mutex);
+}
+
+/**
+ * xe_late_bind_init() - add xe mei late binding component
+ *
+ * Return: 0 if the initialization was successful, a negative errno otherwise.
+ */
+int xe_late_bind_init(struct xe_late_bind *late_bind)
+{
+	struct xe_device *xe = late_bind_to_xe(late_bind);
+	int err;
+
+	if (!xe->info.has_late_bind)
+		return 0;
+
+	mutex_init(&late_bind->mutex);
+
+	if (!IS_ENABLED(CONFIG_INTEL_MEI_LATE_BIND) || !IS_ENABLED(CONFIG_INTEL_MEI_GSC)) {
+		drm_info(&xe->drm, "Can't init xe mei late bind missing mei component\n");
+		return -ENODEV;
+	}
+
+	err = component_add_typed(xe->drm.dev, &xe_late_bind_component_ops,
+				  INTEL_COMPONENT_LATE_BIND);
+	if (err < 0) {
+		drm_info(&xe->drm, "Failed to add mei late bind component (%pe)\n", ERR_PTR(err));
+		return err;
+	}
+
+	return devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
+}
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
new file mode 100644
index 000000000000..4c73571c3e62
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_LATE_BIND_FW_H_
+#define _XE_LATE_BIND_FW_H_
+
+#include <linux/types.h>
+
+struct xe_late_bind;
+
+int xe_late_bind_init(struct xe_late_bind *late_bind);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
new file mode 100644
index 000000000000..1156ef94f0d5
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#ifndef _XE_LATE_BIND_TYPES_H_
+#define _XE_LATE_BIND_TYPES_H_
+
+#include <linux/iosys-map.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/**
+ * struct xe_late_bind_component - Late Binding services component
+ * @mei_dev: device that provide Late Binding service.
+ * @ops: Ops implemented by Late Binding driver, used by Xe driver.
+ *
+ * Communication between Xe and MEI drivers for Late Binding services
+ */
+struct xe_late_bind_component {
+	/** @late_bind_component.mei_dev: mei device */
+	struct device *mei_dev;
+	/** @late_bind_component.ops: late binding ops */
+	const struct late_bind_component_ops *ops;
+};
+
+/**
+ * struct xe_late_bind
+ */
+struct xe_late_bind {
+	/** @late_bind.component: struct for communication with mei component */
+	struct xe_late_bind_component component;
+	/** @late_bind.mutex: protects the component binding and usage */
+	struct mutex mutex;
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 08e21d4099e0..e5018d3ae74f 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -66,6 +66,7 @@ struct xe_device_desc {
 	u8 has_gsc_nvm:1;
 	u8 has_heci_gscfi:1;
 	u8 has_heci_cscfi:1;
+	u8 has_late_bind:1;
 	u8 has_llc:1;
 	u8 has_mbx_power_limits:1;
 	u8 has_pxp:1;
@@ -355,6 +356,7 @@ static const struct xe_device_desc bmg_desc = {
 	.has_mbx_power_limits = true,
 	.has_gsc_nvm = 1,
 	.has_heci_cscfi = 1,
+	.has_late_bind = true,
 	.needs_scratch = true,
 };
 
@@ -600,6 +602,7 @@ static int xe_info_init_early(struct xe_device *xe,
 	xe->info.has_gsc_nvm = desc->has_gsc_nvm;
 	xe->info.has_heci_gscfi = desc->has_heci_gscfi;
 	xe->info.has_heci_cscfi = desc->has_heci_cscfi;
+	xe->info.has_late_bind = desc->has_late_bind;
 	xe->info.has_llc = desc->has_llc;
 	xe->info.has_pxp = desc->has_pxp;
 	xe->info.has_sriov = desc->has_sriov;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (2 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-26 21:06   ` Daniele Ceraolo Spurio
  2025-06-25 17:00 ` [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load " Badal Nilawar
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Search for late binding firmware binaries and populate the meta data of
firmware structures.

v2 (Daniele):
 - drm_err if firmware size is more than max pay load size
 - s/request_firmware/firmware_request_nowarn/ as firmware will
   not be available for all possible cards
v3 (Daniele):
 - init firmware from within xe_late_bind_init, propagate error
 - switch late_bind_fw to array to handle multiple firmware types
v4 (Daniele):
 - Alloc payload dynamically, fix nits

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_late_bind_fw.c       | 103 ++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_late_bind_fw_types.h |  32 +++++++
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
index eaf12cfec848..32d1436e7191 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
@@ -5,6 +5,7 @@
 
 #include <linux/component.h>
 #include <linux/delay.h>
+#include <linux/firmware.h>
 
 #include <drm/drm_managed.h>
 #include <drm/intel/i915_component.h>
@@ -13,6 +14,16 @@
 
 #include "xe_device.h"
 #include "xe_late_bind_fw.h"
+#include "xe_pcode.h"
+#include "xe_pcode_api.h"
+
+static const u32 fw_id_to_type[] = {
+		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
+	};
+
+static const char * const fw_id_to_name[] = {
+		[XE_LB_FW_FAN_CONTROL] = "fan_control",
+	};
 
 static struct xe_device *
 late_bind_to_xe(struct xe_late_bind *late_bind)
@@ -20,6 +31,92 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
 	return container_of(late_bind, struct xe_device, late_bind);
 }
 
+static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
+{
+	struct xe_device *xe = late_bind_to_xe(late_bind);
+	struct xe_tile *root_tile = xe_device_get_root_tile(xe);
+	u32 uval;
+
+	if (!xe_pcode_read(root_tile,
+			   PCODE_MBOX(FAN_SPEED_CONTROL, FSC_READ_NUM_FANS, 0), &uval, NULL))
+		return uval;
+	else
+		return 0;
+}
+
+static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
+{
+	struct xe_device *xe = late_bind_to_xe(late_bind);
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+	struct xe_late_bind_fw *lb_fw;
+	const struct firmware *fw;
+	u32 num_fans;
+	int ret;
+
+	if (fw_id >= XE_LB_FW_MAX_ID)
+		return -EINVAL;
+
+	lb_fw = &late_bind->late_bind_fw[fw_id];
+
+	lb_fw->valid = false;
+	lb_fw->id = fw_id;
+	lb_fw->type = fw_id_to_type[lb_fw->id];
+	lb_fw->flags &= ~CSC_LATE_BINDING_FLAGS_IS_PERSISTENT;
+
+	if (lb_fw->type == CSC_LATE_BINDING_TYPE_FAN_CONTROL) {
+		num_fans = xe_late_bind_fw_num_fans(late_bind);
+		drm_dbg(&xe->drm, "Number of Fans: %d\n", num_fans);
+		if (!num_fans)
+			return 0;
+	}
+
+	snprintf(lb_fw->blob_path, sizeof(lb_fw->blob_path), "xe/%s_8086_%04x_%04x_%04x.bin",
+		 fw_id_to_name[lb_fw->id], pdev->device,
+		 pdev->subsystem_vendor, pdev->subsystem_device);
+
+	drm_dbg(&xe->drm, "Request late binding firmware %s\n", lb_fw->blob_path);
+	ret = firmware_request_nowarn(&fw, lb_fw->blob_path, xe->drm.dev);
+	if (ret) {
+		drm_dbg(&xe->drm, "%s late binding fw not available for current device",
+			fw_id_to_name[lb_fw->id]);
+		return 0;
+	}
+
+	if (fw->size > MAX_PAYLOAD_SIZE) {
+		drm_err(&xe->drm, "Firmware %s size %zu is larger than max pay load size %u\n",
+			lb_fw->blob_path, fw->size, MAX_PAYLOAD_SIZE);
+		release_firmware(fw);
+		return -ENODATA;
+	}
+
+	lb_fw->payload = drmm_kzalloc(&xe->drm, lb_fw->payload_size, GFP_KERNEL);
+	if (!lb_fw->payload) {
+		release_firmware(fw);
+		return -ENOMEM;
+	}
+
+	lb_fw->payload_size = fw->size;
+
+	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
+	release_firmware(fw);
+	lb_fw->valid = true;
+
+	return 0;
+}
+
+static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
+{
+	int ret;
+	int fw_id;
+
+	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
+		ret = __xe_late_bind_fw_init(late_bind, fw_id);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int xe_late_bind_component_bind(struct device *xe_kdev,
 				       struct device *mei_kdev, void *data)
 {
@@ -86,5 +183,9 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
 		return err;
 	}
 
-	return devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
+	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
+	if (err)
+		return err;
+
+	return xe_late_bind_fw_init(late_bind);
 }
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
index 1156ef94f0d5..93abf4c51789 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
@@ -10,6 +10,36 @@
 #include <linux/mutex.h>
 #include <linux/types.h>
 
+#define MAX_PAYLOAD_SIZE SZ_4K
+
+/**
+ * xe_late_bind_fw_id - enum to determine late binding fw index
+ */
+enum xe_late_bind_fw_id {
+	XE_LB_FW_FAN_CONTROL = 0,
+	XE_LB_FW_MAX_ID
+};
+
+/**
+ * struct xe_late_bind_fw
+ */
+struct xe_late_bind_fw {
+	/** @late_bind_fw.valid: to check if fw is valid */
+	bool valid;
+	/** @late_bind_fw.id: firmware index */
+	u32 id;
+	/** @late_bind_fw.blob_path: firmware binary path */
+	char blob_path[PATH_MAX];
+	/** @late_bind_fw.type: firmware type */
+	u32  type;
+	/** @late_bind_fw.flags: firmware flags */
+	u32  flags;
+	/** @late_bind_fw.payload: to store the late binding blob */
+	u8  *payload;
+	/** @late_bind_fw.payload_size: late binding blob payload_size */
+	size_t payload_size;
+};
+
 /**
  * struct xe_late_bind_component - Late Binding services component
  * @mei_dev: device that provide Late Binding service.
@@ -32,6 +62,8 @@ struct xe_late_bind {
 	struct xe_late_bind_component component;
 	/** @late_bind.mutex: protects the component binding and usage */
 	struct mutex mutex;
+	/** @late_bind.late_bind_fw: late binding firmware array */
+	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
 };
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load late binding firmware
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (3 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-26 17:24   ` Rodrigo Vivi
  2025-06-25 17:00 ` [PATCH v4 06/10] drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume Badal Nilawar
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Load late binding firmware

v2:
 - s/EAGAIN/EBUSY/
 - Flush worker in suspend and driver unload (Daniele)
v3:
 - Use retry interval of 6s, in steps of 200ms, to allow
   other OS components release MEI CL handle (Sasha)
v4:
 - return -ENODEV if component not added (Daniele)
 - parse and print status returned by csc
 - Use xe_pm_get_if_in_active (Daniele)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_late_bind_fw.c       | 149 ++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_late_bind_fw.h       |   1 +
 drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   7 +
 3 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
index 32d1436e7191..52243063d98a 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
@@ -16,6 +16,20 @@
 #include "xe_late_bind_fw.h"
 #include "xe_pcode.h"
 #include "xe_pcode_api.h"
+#include "xe_pm.h"
+
+/*
+ * The component should load quite quickly in most cases, but it could take
+ * a bit. Using a very big timeout just to cover the worst case scenario
+ */
+#define LB_INIT_TIMEOUT_MS 20000
+
+/*
+ * Retry interval set to 6 seconds, in steps of 200 ms, to allow time for
+ * other OS components to release the MEI CL handle
+ */
+#define LB_FW_LOAD_RETRY_MAXCOUNT 30
+#define LB_FW_LOAD_RETRY_PAUSE_MS 200
 
 static const u32 fw_id_to_type[] = {
 		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
@@ -31,6 +45,30 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
 	return container_of(late_bind, struct xe_device, late_bind);
 }
 
+static const char *xe_late_bind_parse_status(uint32_t status)
+{
+	switch (status) {
+	case CSC_LATE_BINDING_STATUS_SUCCESS:
+		return "success";
+	case CSC_LATE_BINDING_STATUS_4ID_MISMATCH:
+		return "4Id Mismatch";
+	case CSC_LATE_BINDING_STATUS_ARB_FAILURE:
+		return "ARB Failure";
+	case CSC_LATE_BINDING_STATUS_GENERAL_ERROR:
+		return "General Error";
+	case CSC_LATE_BINDING_STATUS_INVALID_PARAMS:
+		return "Invalid Params";
+	case CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE:
+		return "Invalid Signature";
+	case CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD:
+		return "Invalid Payload";
+	case CSC_LATE_BINDING_STATUS_TIMEOUT:
+		return "Timeout";
+	default:
+		return "Unknown error";
+	}
+}
+
 static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
 {
 	struct xe_device *xe = late_bind_to_xe(late_bind);
@@ -44,6 +82,93 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
 		return 0;
 }
 
+static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
+{
+	struct xe_device *xe = late_bind_to_xe(late_bind);
+	struct xe_late_bind_fw *lbfw;
+	int fw_id;
+
+	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
+		lbfw = &late_bind->late_bind_fw[fw_id];
+		if (lbfw->valid && late_bind->wq) {
+			drm_dbg(&xe->drm, "Flush work: load %s firmware\n",
+				fw_id_to_name[lbfw->id]);
+			flush_work(&lbfw->work);
+		}
+	}
+}
+
+static void xe_late_bind_work(struct work_struct *work)
+{
+	struct xe_late_bind_fw *lbfw = container_of(work, struct xe_late_bind_fw, work);
+	struct xe_late_bind *late_bind = container_of(lbfw, struct xe_late_bind,
+						      late_bind_fw[lbfw->id]);
+	struct xe_device *xe = late_bind_to_xe(late_bind);
+	int retry = LB_FW_LOAD_RETRY_MAXCOUNT;
+	int ret;
+	int slept;
+
+	/* we can queue this before the component is bound */
+	for (slept = 0; slept < LB_INIT_TIMEOUT_MS; slept += 100) {
+		if (late_bind->component.ops)
+			break;
+		msleep(100);
+	}
+
+	if (!xe_pm_runtime_get_if_active(xe))
+		return;
+
+	mutex_lock(&late_bind->mutex);
+
+	if (!late_bind->component.ops) {
+		drm_err(&xe->drm, "Late bind component not bound\n");
+		goto out;
+	}
+
+	drm_dbg(&xe->drm, "Load %s firmware\n", fw_id_to_name[lbfw->id]);
+
+	do {
+		ret = late_bind->component.ops->push_config(late_bind->component.mei_dev,
+							    lbfw->type, lbfw->flags,
+							    lbfw->payload, lbfw->payload_size);
+		if (!ret)
+			break;
+		msleep(LB_FW_LOAD_RETRY_PAUSE_MS);
+	} while (--retry && ret == -EBUSY);
+
+	if (!ret) {
+		drm_dbg(&xe->drm, "Load %s firmware successful\n",
+			fw_id_to_name[lbfw->id]);
+		goto out;
+	}
+
+	if (ret > 0)
+		drm_err(&xe->drm, "Load %s firmware failed with err %d, %s\n",
+			fw_id_to_name[lbfw->id], ret, xe_late_bind_parse_status(ret));
+	else
+		drm_err(&xe->drm, "Load %s firmware failed with err %d",
+			fw_id_to_name[lbfw->id], ret);
+out:
+	mutex_unlock(&late_bind->mutex);
+	xe_pm_runtime_put(xe);
+}
+
+int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
+{
+	struct xe_late_bind_fw *lbfw;
+	int fw_id;
+
+	if (!late_bind->component_added)
+		return -ENODEV;
+
+	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
+		lbfw = &late_bind->late_bind_fw[fw_id];
+		if (lbfw->valid)
+			queue_work(late_bind->wq, &lbfw->work);
+	}
+	return 0;
+}
+
 static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
 {
 	struct xe_device *xe = late_bind_to_xe(late_bind);
@@ -99,6 +224,7 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
 
 	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
 	release_firmware(fw);
+	INIT_WORK(&lb_fw->work, xe_late_bind_work);
 	lb_fw->valid = true;
 
 	return 0;
@@ -109,11 +235,16 @@ static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
 	int ret;
 	int fw_id;
 
+	late_bind->wq = alloc_ordered_workqueue("late-bind-ordered-wq", 0);
+	if (!late_bind->wq)
+		return -ENOMEM;
+
 	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
 		ret = __xe_late_bind_fw_init(late_bind, fw_id);
 		if (ret)
 			return ret;
 	}
+
 	return 0;
 }
 
@@ -137,6 +268,8 @@ static void xe_late_bind_component_unbind(struct device *xe_kdev,
 	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
 	struct xe_late_bind *late_bind = &xe->late_bind;
 
+	xe_late_bind_wait_for_worker_completion(late_bind);
+
 	mutex_lock(&late_bind->mutex);
 	late_bind->component.ops = NULL;
 	mutex_unlock(&late_bind->mutex);
@@ -152,7 +285,15 @@ static void xe_late_bind_remove(void *arg)
 	struct xe_late_bind *late_bind = arg;
 	struct xe_device *xe = late_bind_to_xe(late_bind);
 
+	xe_late_bind_wait_for_worker_completion(late_bind);
+
+	late_bind->component_added = false;
+
 	component_del(xe->drm.dev, &xe_late_bind_component_ops);
+	if (late_bind->wq) {
+		destroy_workqueue(late_bind->wq);
+		late_bind->wq = NULL;
+	}
 	mutex_destroy(&late_bind->mutex);
 }
 
@@ -183,9 +324,15 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
 		return err;
 	}
 
+	late_bind->component_added = true;
+
 	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
 	if (err)
 		return err;
 
-	return xe_late_bind_fw_init(late_bind);
+	err = xe_late_bind_fw_init(late_bind);
+	if (err)
+		return err;
+
+	return xe_late_bind_fw_load(late_bind);
 }
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
index 4c73571c3e62..28d56ed2bfdc 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
@@ -11,5 +11,6 @@
 struct xe_late_bind;
 
 int xe_late_bind_init(struct xe_late_bind *late_bind);
+int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
index 93abf4c51789..f119a75f4c9c 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
@@ -9,6 +9,7 @@
 #include <linux/iosys-map.h>
 #include <linux/mutex.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #define MAX_PAYLOAD_SIZE SZ_4K
 
@@ -38,6 +39,8 @@ struct xe_late_bind_fw {
 	u8  *payload;
 	/** @late_bind_fw.payload_size: late binding blob payload_size */
 	size_t payload_size;
+	/** @late_bind_fw.work: worker to upload latebind blob */
+	struct work_struct work;
 };
 
 /**
@@ -64,6 +67,10 @@ struct xe_late_bind {
 	struct mutex mutex;
 	/** @late_bind.late_bind_fw: late binding firmware array */
 	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
+	/** @late_bind.wq: workqueue to submit request to download late bind blob */
+	struct workqueue_struct *wq;
+	/** @late_bind.component_added: whether the component has been added */
+	bool component_added;
 };
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 06/10] drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (4 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load " Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 07/10] drm/xe/xe_late_bind_fw: Reload late binding fw during system resume Badal Nilawar
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Reload late binding fw during runtime resume.

v2: Flush worker during runtime suspend

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_late_bind_fw.c | 2 +-
 drivers/gpu/drm/xe/xe_late_bind_fw.h | 1 +
 drivers/gpu/drm/xe/xe_pm.c           | 6 ++++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
index 52243063d98a..737780336000 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
@@ -82,7 +82,7 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
 		return 0;
 }
 
-static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
+void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
 {
 	struct xe_device *xe = late_bind_to_xe(late_bind);
 	struct xe_late_bind_fw *lbfw;
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
index 28d56ed2bfdc..07e437390539 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
@@ -12,5 +12,6 @@ struct xe_late_bind;
 
 int xe_late_bind_init(struct xe_late_bind *late_bind);
 int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
+void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index ff749edc005b..91923fd4af80 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -20,6 +20,7 @@
 #include "xe_gt.h"
 #include "xe_guc.h"
 #include "xe_irq.h"
+#include "xe_late_bind_fw.h"
 #include "xe_pcode.h"
 #include "xe_pxp.h"
 #include "xe_trace.h"
@@ -460,6 +461,8 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
 	if (err)
 		goto out;
 
+	xe_late_bind_wait_for_worker_completion(&xe->late_bind);
+
 	/*
 	 * Applying lock for entire list op as xe_ttm_bo_destroy and xe_bo_move_notify
 	 * also checks and deletes bo entry from user fault list.
@@ -550,6 +553,9 @@ int xe_pm_runtime_resume(struct xe_device *xe)
 
 	xe_pxp_pm_resume(xe->pxp);
 
+	if (xe->d3cold.allowed)
+		xe_late_bind_fw_load(&xe->late_bind);
+
 out:
 	xe_rpm_lockmap_release(xe);
 	xe_pm_write_callback_task(xe, NULL);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 07/10] drm/xe/xe_late_bind_fw: Reload late binding fw during system resume
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (5 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 06/10] drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-27  7:53   ` Nilawar, Badal
  2025-06-25 17:00 ` [PATCH v4 08/10] drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding Badal Nilawar
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Reload late binding fw during resume from system suspend

v2:
  - Unconditionally reload late binding fw (Rodrigo)
  - Flush worker during system suspend

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 91923fd4af80..f49b7b6eab97 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -127,6 +127,8 @@ int xe_pm_suspend(struct xe_device *xe)
 	if (err)
 		goto err;
 
+	xe_late_bind_wait_for_worker_completion(&xe->late_bind);
+
 	for_each_gt(gt, xe, id)
 		xe_gt_suspend_prepare(gt);
 
@@ -205,6 +207,8 @@ int xe_pm_resume(struct xe_device *xe)
 
 	xe_pxp_pm_resume(xe->pxp);
 
+	xe_late_bind_fw_load(&xe->late_bind);
+
 	drm_dbg(&xe->drm, "Device resumed\n");
 	return 0;
 err:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 08/10] drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (6 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 07/10] drm/xe/xe_late_bind_fw: Reload late binding fw during system resume Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 09/10] drm/xe/xe_late_bind_fw: Extract and print version info Badal Nilawar
  2025-06-25 17:00 ` [PATCH v4 10/10] drm/xe/xe_late_bind_fw: Select INTEL_MEI_LATE_BIND for CI Badal Nilawar
  9 siblings, 0 replies; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Introduce a debug filesystem node to disable late binding fw reload
during the system or runtime resume. This is intended for situations
where the late binding fw needs to be loaded from user mode,
perticularly for validation purpose.
Note that xe kmd doesn't participate in late binding flow from user
space. Binary loaded from the userspace will be lost upon entering to
D3 cold hence user space app need to handle this situation.

v2:
  - s/(uval == 1) ? true : false/!!uval/ (Daniele)
v3:
  - Refine the commit message (Daniele)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c            | 41 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_late_bind_fw.c       |  3 ++
 drivers/gpu/drm/xe/xe_late_bind_fw_types.h |  2 ++
 3 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index d83cd6ed3fa8..d1f6f556efa2 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -226,6 +226,44 @@ static const struct file_operations atomic_svm_timeslice_ms_fops = {
 	.write = atomic_svm_timeslice_ms_set,
 };
 
+static ssize_t disable_late_binding_show(struct file *f, char __user *ubuf,
+					 size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	struct xe_late_bind *late_bind = &xe->late_bind;
+	char buf[32];
+	int len;
+
+	len = scnprintf(buf, sizeof(buf), "%d\n", late_bind->disable);
+
+	return simple_read_from_buffer(ubuf, size, pos, buf, len);
+}
+
+static ssize_t disable_late_binding_set(struct file *f, const char __user *ubuf,
+					size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	struct xe_late_bind *late_bind = &xe->late_bind;
+	u32 uval;
+	ssize_t ret;
+
+	ret = kstrtouint_from_user(ubuf, size, sizeof(uval), &uval);
+	if (ret)
+		return ret;
+
+	if (uval > 1)
+		return -EINVAL;
+
+	late_bind->disable = !!uval;
+	return size;
+}
+
+static const struct file_operations disable_late_binding_fops = {
+	.owner = THIS_MODULE,
+	.read = disable_late_binding_show,
+	.write = disable_late_binding_set,
+};
+
 void xe_debugfs_register(struct xe_device *xe)
 {
 	struct ttm_device *bdev = &xe->ttm;
@@ -249,6 +287,9 @@ void xe_debugfs_register(struct xe_device *xe)
 	debugfs_create_file("atomic_svm_timeslice_ms", 0600, root, xe,
 			    &atomic_svm_timeslice_ms_fops);
 
+	debugfs_create_file("disable_late_binding", 0600, root, xe,
+			    &disable_late_binding_fops);
+
 	for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) {
 		man = ttm_manager_type(bdev, mem_type);
 
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
index 737780336000..777f66692d7f 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
@@ -161,6 +161,9 @@ int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
 	if (!late_bind->component_added)
 		return -ENODEV;
 
+	if (late_bind->disable)
+		return 0;
+
 	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
 		lbfw = &late_bind->late_bind_fw[fw_id];
 		if (lbfw->valid)
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
index f119a75f4c9c..16f2bd6bbdf1 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
@@ -71,6 +71,8 @@ struct xe_late_bind {
 	struct workqueue_struct *wq;
 	/** @late_bind.component_added: whether the component has been added */
 	bool component_added;
+	/** @late_bind.disable to block late binding reload during pm resume flow*/
+	bool disable;
 };
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 09/10] drm/xe/xe_late_bind_fw: Extract and print version info
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (7 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 08/10] drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  2025-06-26 21:32   ` Daniele Ceraolo Spurio
  2025-06-25 17:00 ` [PATCH v4 10/10] drm/xe/xe_late_bind_fw: Select INTEL_MEI_LATE_BIND for CI Badal Nilawar
  9 siblings, 1 reply; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Extract and print version info of the late binding binary.

v2: Some refinements (Daniele)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/xe_late_bind_fw.c       | 124 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   3 +
 drivers/gpu/drm/xe/xe_uc_fw_abi.h          |  66 +++++++++++
 3 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
index 777f66692d7f..253908794d4a 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
@@ -45,6 +45,121 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
 	return container_of(late_bind, struct xe_device, late_bind);
 }
 
+static struct xe_device *
+late_bind_fw_to_xe(struct xe_late_bind_fw *lb_fw)
+{
+	return container_of(lb_fw, struct xe_device, late_bind.late_bind_fw[lb_fw->id]);
+}
+
+/* Refer to the "Late Bind based Firmware Layout" documentation entry for details */
+static int parse_cpd_header(struct xe_late_bind_fw *lb_fw,
+			    const void *data, size_t size, const char *manifest_entry)
+{
+	struct xe_device *xe = late_bind_fw_to_xe(lb_fw);
+	const struct gsc_cpd_header_v2 *header = data;
+	const struct gsc_manifest_header *manifest;
+	const struct gsc_cpd_entry *entry;
+	size_t min_size = sizeof(*header);
+	u32 offset;
+	int i;
+
+	/* manifest_entry is mandatory */
+	xe_assert(xe, manifest_entry);
+
+	if (size < min_size || header->header_marker != GSC_CPD_HEADER_MARKER)
+		return -ENOENT;
+
+	if (header->header_length < sizeof(struct gsc_cpd_header_v2)) {
+		drm_err(&xe->drm, "%s late binding fw: Invalid CPD header length %u!\n",
+			fw_id_to_name[lb_fw->id], header->header_length);
+		return -EINVAL;
+	}
+
+	min_size = header->header_length + sizeof(struct gsc_cpd_entry) * header->num_of_entries;
+	if (size < min_size) {
+		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
+			fw_id_to_name[lb_fw->id], size, min_size);
+		return -ENODATA;
+	}
+
+	/* Look for the manifest first */
+	entry = (void *)header + header->header_length;
+	for (i = 0; i < header->num_of_entries; i++, entry++)
+		if (strcmp(entry->name, manifest_entry) == 0)
+			offset = entry->offset & GSC_CPD_ENTRY_OFFSET_MASK;
+
+	if (!offset) {
+		drm_err(&xe->drm, "%s late binding fw: Failed to find manifest_entry\n",
+			fw_id_to_name[lb_fw->id]);
+		return -ENODATA;
+	}
+
+	min_size = offset + sizeof(struct gsc_manifest_header);
+	if (size < min_size) {
+		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
+			fw_id_to_name[lb_fw->id], size, min_size);
+		return -ENODATA;
+	}
+
+	manifest = data + offset;
+
+	lb_fw->version = manifest->fw_version;
+
+	return 0;
+}
+
+/* Refer to the "Late Bind based Firmware Layout" documentation entry for details */
+static int parse_lb_layout(struct xe_late_bind_fw *lb_fw,
+			   const void *data, size_t size, const char *fpt_entry)
+{
+	struct xe_device *xe = late_bind_fw_to_xe(lb_fw);
+	const struct csc_fpt_header *header = data;
+	const struct csc_fpt_entry *entry;
+	size_t min_size = sizeof(*header);
+	u32 offset;
+	int i;
+
+	/* fpt_entry is mandatory */
+	xe_assert(xe, fpt_entry);
+
+	if (size < min_size || header->header_marker != CSC_FPT_HEADER_MARKER)
+		return -ENOENT;
+
+	if (header->header_length < sizeof(struct csc_fpt_header)) {
+		drm_err(&xe->drm, "%s late binding fw: Invalid FPT header length %u!\n",
+			fw_id_to_name[lb_fw->id], header->header_length);
+		return -EINVAL;
+	}
+
+	min_size = header->header_length + sizeof(struct csc_fpt_entry) * header->num_of_entries;
+	if (size < min_size) {
+		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
+			fw_id_to_name[lb_fw->id], size, min_size);
+		return -ENODATA;
+	}
+
+	/* Look for the cpd header first */
+	entry = (void *)header + header->header_length;
+	for (i = 0; i < header->num_of_entries; i++, entry++)
+		if (strcmp(entry->name, fpt_entry) == 0)
+			offset = entry->offset;
+
+	if (!offset) {
+		drm_err(&xe->drm, "%s late binding fw: Failed to find fpt_entry\n",
+			fw_id_to_name[lb_fw->id]);
+		return -ENODATA;
+	}
+
+	min_size = offset + sizeof(struct gsc_cpd_header_v2);
+	if (size < min_size) {
+		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
+			fw_id_to_name[lb_fw->id], size, min_size);
+		return -ENODATA;
+	}
+
+	return parse_cpd_header(lb_fw, data + offset, size - offset, "LTES.man");
+}
+
 static const char *xe_late_bind_parse_status(uint32_t status)
 {
 	switch (status) {
@@ -217,6 +332,10 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
 		return -ENODATA;
 	}
 
+	ret = parse_lb_layout(lb_fw, fw->data, fw->size, "LTES");
+	if (ret)
+		return ret;
+
 	lb_fw->payload = drmm_kzalloc(&xe->drm, lb_fw->payload_size, GFP_KERNEL);
 	if (!lb_fw->payload) {
 		release_firmware(fw);
@@ -225,6 +344,11 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
 
 	lb_fw->payload_size = fw->size;
 
+	drm_info(&xe->drm, "Using %s firmware from %s version %u.%u.%u.%u\n",
+		 fw_id_to_name[lb_fw->id], lb_fw->blob_path,
+		 lb_fw->version.major, lb_fw->version.minor,
+		 lb_fw->version.hotfix, lb_fw->version.build);
+
 	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
 	release_firmware(fw);
 	INIT_WORK(&lb_fw->work, xe_late_bind_work);
diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
index 16f2bd6bbdf1..7f98a1380844 100644
--- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
+++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
@@ -10,6 +10,7 @@
 #include <linux/mutex.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
+#include "xe_uc_fw_abi.h"
 
 #define MAX_PAYLOAD_SIZE SZ_4K
 
@@ -41,6 +42,8 @@ struct xe_late_bind_fw {
 	size_t payload_size;
 	/** @late_bind_fw.work: worker to upload latebind blob */
 	struct work_struct work;
+	/** @late_bind_fw.version: late binding blob manifest version */
+	struct gsc_version version;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_uc_fw_abi.h b/drivers/gpu/drm/xe/xe_uc_fw_abi.h
index 87ade41209d0..78782d105fa9 100644
--- a/drivers/gpu/drm/xe/xe_uc_fw_abi.h
+++ b/drivers/gpu/drm/xe/xe_uc_fw_abi.h
@@ -318,4 +318,70 @@ struct gsc_manifest_header {
 	u32 exponent_size; /* in dwords */
 } __packed;
 
+/**
+ * DOC: Late binding Firmware Layout
+ *
+ * The Late binding binary starts with FPT header, which contains locations
+ * of various partitions of the binary. Here we're interested in finding out
+ * manifest version. To the manifest version, we need to locate CPD header
+ * one of the entry in CPD header points to manifest header. Manifest header
+ * contains the version.
+ *
+ *      +================================================+
+ *      |  FPT Header                                    |
+ *      +================================================+
+ *      |  FPT entries[]                                 |
+ *      |      entry1                                    |
+ *      |      ...                                       |
+ *      |      entryX                                    |
+ *      |          "LTES"                                |
+ *      |          ...                                   |
+ *      |          offset  >-----------------------------|------o
+ *      +================================================+      |
+ *                                                              |
+ *      +================================================+      |
+ *      |  CPD Header                                    |<-----o
+ *      +================================================+
+ *      |  CPD entries[]                                 |
+ *      |      entry1                                    |
+ *      |      ...                                       |
+ *      |      entryX                                    |
+ *      |          "LTES.man"                            |
+ *      |           ...                                  |
+ *      |           offset  >----------------------------|------o
+ *      +================================================+      |
+ *                                                              |
+ *      +================================================+      |
+ *      |  Manifest Header                               |<-----o
+ *      |      ...                                       |
+ *      |      FW version                                |
+ *      |      ...                                       |
+ *      +================================================+
+ */
+
+/* FPT Headers */
+struct csc_fpt_header {
+	u32 header_marker;
+#define CSC_FPT_HEADER_MARKER 0x54504624
+	u32 num_of_entries;
+	u8 header_version;
+	u8 entry_version;
+	u8 header_length; /* in bytes */
+	u8 flags;
+	u16 ticks_to_add;
+	u16 tokens_to_add;
+	u32 uma_size;
+	u32 crc32;
+	struct gsc_version fitc_version;
+} __packed;
+
+struct csc_fpt_entry {
+	u8 name[4]; /* partition name */
+	u32 reserved1;
+	u32 offset; /* offset from beginning of CSE region */
+	u32 length; /* partition length in bytes */
+	u32 reserved2[3];
+	u32 partition_flags;
+} __packed;
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 10/10] drm/xe/xe_late_bind_fw: Select INTEL_MEI_LATE_BIND for CI
  2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
                   ` (8 preceding siblings ...)
  2025-06-25 17:00 ` [PATCH v4 09/10] drm/xe/xe_late_bind_fw: Extract and print version info Badal Nilawar
@ 2025-06-25 17:00 ` Badal Nilawar
  9 siblings, 0 replies; 38+ messages in thread
From: Badal Nilawar @ 2025-06-25 17:00 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh,
	daniele.ceraolospurio

Do not review

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
---
 drivers/gpu/drm/xe/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 30ed74ad29ab..b161e1156c73 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -44,6 +44,7 @@ config DRM_XE
 	select WANT_DEV_COREDUMP
 	select AUXILIARY_BUS
 	select HMM_MIRROR
+	select INTEL_MEI_LATE_BIND
 	help
 	  Driver for Intel Xe2 series GPUs and later. Experimental support
 	  for Xe series is also available.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-25 17:00 ` [PATCH v4 02/10] mei: late_bind: add late binding component driver Badal Nilawar
@ 2025-06-26  3:50   ` Gupta, Anshuman
  2025-06-27 14:06     ` Nilawar, Badal
  2025-06-28 12:18   ` Greg KH
  2025-06-28 12:19   ` Greg KH
  2 siblings, 1 reply; 38+ messages in thread
From: Gupta, Anshuman @ 2025-06-26  3:50 UTC (permalink / raw)
  To: Nilawar, Badal, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
  Cc: Vivi, Rodrigo, Usyskin, Alexander, gregkh@linuxfoundation.org,
	Ceraolo Spurio, Daniele



> -----Original Message-----
> From: Nilawar, Badal <badal.nilawar@intel.com>
> Sent: Wednesday, June 25, 2025 10:30 PM
> To: intel-xe@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
> kernel@vger.kernel.org
> Cc: Gupta, Anshuman <anshuman.gupta@intel.com>; Vivi, Rodrigo
> <rodrigo.vivi@intel.com>; Usyskin, Alexander <alexander.usyskin@intel.com>;
> gregkh@linuxfoundation.org; Ceraolo Spurio, Daniele
> <daniele.ceraolospurio@intel.com>
> Subject: [PATCH v4 02/10] mei: late_bind: add late binding component driver
> 
> From: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Add late binding component driver.
> It allows pushing the late binding configuration from, for example, the Xe
> graphics driver to the Intel discrete graphics card's CSE device.
> 
> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
> v2:
>  - Use generic naming (Jani)
>  - Drop xe_late_bind_component struct to move to xe code (Daniele/Sasha)
> v3:
>  - Updated kconfig description
>  - Move CSC late binding specific flags/defines to late_bind_mei_interface.h
> (Daniele)
> v4:
>  - Add match for PCI_CLASS_DISPLAY_OTHER to support headless cards
> (Anshuman)
> v5:
>  - Add fixes in push_config (Sasha)
>  - Use INTEL_ prefix for component, refine doc,
>    add status enum to headerlate_bind_mei_interface.h (Anshuman)
> ---
>  drivers/misc/mei/Kconfig                    |   1 +
>  drivers/misc/mei/Makefile                   |   1 +
>  drivers/misc/mei/late_bind/Kconfig          |  13 +
>  drivers/misc/mei/late_bind/Makefile         |   9 +
>  drivers/misc/mei/late_bind/mei_late_bind.c  | 281 ++++++++++++++++++++
>  include/drm/intel/i915_component.h          |   1 +
>  include/drm/intel/late_bind_mei_interface.h |  64 +++++
>  7 files changed, 370 insertions(+)
>  create mode 100644 drivers/misc/mei/late_bind/Kconfig
>  create mode 100644 drivers/misc/mei/late_bind/Makefile
>  create mode 100644 drivers/misc/mei/late_bind/mei_late_bind.c
>  create mode 100644 include/drm/intel/late_bind_mei_interface.h
> 
> diff --git a/drivers/misc/mei/Kconfig b/drivers/misc/mei/Kconfig index
> 7575fee96cc6..771becc68095 100644
> --- a/drivers/misc/mei/Kconfig
> +++ b/drivers/misc/mei/Kconfig
> @@ -84,5 +84,6 @@ config INTEL_MEI_VSC
>  source "drivers/misc/mei/hdcp/Kconfig"
>  source "drivers/misc/mei/pxp/Kconfig"
>  source "drivers/misc/mei/gsc_proxy/Kconfig"
> +source "drivers/misc/mei/late_bind/Kconfig"
> 
>  endif
> diff --git a/drivers/misc/mei/Makefile b/drivers/misc/mei/Makefile index
> 6f9fdbf1a495..84bfde888d81 100644
> --- a/drivers/misc/mei/Makefile
> +++ b/drivers/misc/mei/Makefile
> @@ -31,6 +31,7 @@ CFLAGS_mei-trace.o = -I$(src)
>  obj-$(CONFIG_INTEL_MEI_HDCP) += hdcp/
>  obj-$(CONFIG_INTEL_MEI_PXP) += pxp/
>  obj-$(CONFIG_INTEL_MEI_GSC_PROXY) += gsc_proxy/
> +obj-$(CONFIG_INTEL_MEI_LATE_BIND) += late_bind/
> 
>  obj-$(CONFIG_INTEL_MEI_VSC_HW) += mei-vsc-hw.o  mei-vsc-hw-y := vsc-
> tp.o diff --git a/drivers/misc/mei/late_bind/Kconfig
> b/drivers/misc/mei/late_bind/Kconfig
> new file mode 100644
> index 000000000000..65c7180c5678
> --- /dev/null
> +++ b/drivers/misc/mei/late_bind/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2025, Intel Corporation. All rights reserved.
> +#
> +config INTEL_MEI_LATE_BIND
> +	tristate "Intel late binding support on ME Interface"
> +	select INTEL_MEI_ME
> +	depends on DRM_XE
> +	help
> +	  MEI Support for Late Binding for Intel graphics card.
> +
> +	  Enables the ME FW interfaces for Late Binding feature,
> +	  allowing loading of firmware for the devices like Fan
> +	  Controller during by Intel Xe driver.
> diff --git a/drivers/misc/mei/late_bind/Makefile
> b/drivers/misc/mei/late_bind/Makefile
> new file mode 100644
> index 000000000000..a0aeda5853f0
> --- /dev/null
> +++ b/drivers/misc/mei/late_bind/Makefile
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (c) 2025, Intel Corporation. All rights reserved.
> +#
> +# Makefile - Late Binding client driver for Intel MEI Bus Driver.
> +
> +subdir-ccflags-y += -I$(srctree)/drivers/misc/mei/
> +
> +obj-$(CONFIG_INTEL_MEI_LATE_BIND) += mei_late_bind.o
> diff --git a/drivers/misc/mei/late_bind/mei_late_bind.c
> b/drivers/misc/mei/late_bind/mei_late_bind.c
> new file mode 100644
> index 000000000000..ffb89ccdfbb1
> --- /dev/null
> +++ b/drivers/misc/mei/late_bind/mei_late_bind.c
> @@ -0,0 +1,281 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2025 Intel Corporation  */ #include
> +<drm/intel/i915_component.h> #include
> +<drm/intel/late_bind_mei_interface.h>
> +#include <linux/component.h>
> +#include <linux/pci.h>
> +#include <linux/mei_cl_bus.h>
> +#include <linux/module.h>
> +#include <linux/overflow.h>
> +#include <linux/slab.h>
> +#include <linux/uuid.h>
> +
> +#include "mkhi.h"
> +
> +#define GFX_SRV_MKHI_LATE_BINDING_CMD 0x12 #define
> +GFX_SRV_MKHI_LATE_BINDING_RSP (GFX_SRV_MKHI_LATE_BINDING_CMD
> | 0x80)
> +
> +#define LATE_BIND_SEND_TIMEOUT_MSEC 3000 #define
> +LATE_BIND_RECV_TIMEOUT_MSEC 3000
> +
> +/**
> + * struct csc_heci_late_bind_req - late binding request
> + * @header: @ref mkhi_msg_hdr
> + * @type: type of the late binding payload
> + * @flags: flags to be passed to the firmware
> + * @reserved: reserved field
> + * @payload_size: size of the payload data in bytes
> + * @payload: data to be sent to the firmware  */ struct
> +csc_heci_late_bind_req {
> +	struct mkhi_msg_hdr header;
> +	u32 type;
> +	u32 flags;
> +	u32 reserved[2];
> +	u32 payload_size;
> +	u8  payload[] __counted_by(payload_size); } __packed;
> +
> +/**
> + * struct csc_heci_late_bind_rsp - late binding response
> + * @header: @ref mkhi_msg_hdr
> + * @type: type of the late binding payload
> + * @reserved: reserved field
> + * @status: status of the late binding command execution by firmware
> +*/ struct csc_heci_late_bind_rsp {
> +	struct mkhi_msg_hdr header;
> +	u32 type;
> +	u32 reserved[2];
> +	u32 status;
> +} __packed;
> +
> +static int mei_late_bind_check_response(const struct device *dev, const
> +struct mkhi_msg_hdr *hdr) {
> +	if (hdr->group_id != MKHI_GROUP_ID_GFX) {
> +		dev_err(dev, "Mismatch group id: 0x%x instead of 0x%x\n",
> +			hdr->group_id, MKHI_GROUP_ID_GFX);
> +		return -EINVAL;
> +	}
> +
> +	if (hdr->command != GFX_SRV_MKHI_LATE_BINDING_RSP) {
> +		dev_err(dev, "Mismatch command: 0x%x instead of 0x%x\n",
> +			hdr->command,
> GFX_SRV_MKHI_LATE_BINDING_RSP);
> +		return -EINVAL;
> +	}
> +
> +	if (hdr->result) {
> +		dev_err(dev, "Error in result: 0x%x\n", hdr->result);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * mei_late_bind_push_config - Sends a config to the firmware.
> + * @dev: device struct corresponding to the mei device
> + * @type: payload type
> + * @flags: payload flags
> + * @payload: payload buffer
> + * @payload_size: payload buffer size
> + *
> + * Return: 0 success, negative errno value on transport failure,
> + *         positive status returned by FW
> + */
> +static int mei_late_bind_push_config(struct device *dev, u32 type, u32 flags,
> +				     const void *payload, size_t payload_size) {
> +	struct mei_cl_device *cldev;
> +	struct csc_heci_late_bind_req *req = NULL;
> +	struct csc_heci_late_bind_rsp rsp;
> +	size_t req_size;
> +	ssize_t ret;
> +
> +	if (!dev || !payload || !payload_size)
> +		return -EINVAL;
> +
> +	cldev = to_mei_cl_device(dev);
> +
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0) {
> +		dev_dbg(dev, "mei_cldev_enable failed. %zd\n", ret);
> +		return ret;
> +	}
> +
> +	req_size = struct_size(req, payload, payload_size);
> +	if (req_size > mei_cldev_mtu(cldev)) {
> +		dev_err(dev, "Payload is too big %zu\n", payload_size);
> +		ret = -EMSGSIZE;
> +		goto end;
> +	}
> +
> +	req = kmalloc(req_size, GFP_KERNEL);
> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto end;
> +	}
> +
> +	req->header.group_id = MKHI_GROUP_ID_GFX;
> +	req->header.command = GFX_SRV_MKHI_LATE_BINDING_CMD;
> +	req->type = type;
> +	req->flags = flags;
> +	req->reserved[0] = 0;
> +	req->reserved[1] = 0;
> +	req->payload_size = payload_size;
> +	memcpy(req->payload, payload, payload_size);
> +
> +	ret = mei_cldev_send_timeout(cldev, (void *)req, req_size,
> LATE_BIND_SEND_TIMEOUT_MSEC);
> +	if (ret < 0) {
> +		dev_err(dev, "mei_cldev_send failed. %zd\n", ret);
> +		goto end;
> +	}
> +
> +	ret = mei_cldev_recv_timeout(cldev, (void *)&rsp, sizeof(rsp),
> LATE_BIND_RECV_TIMEOUT_MSEC);
> +	if (ret < 0) {
> +		dev_err(dev, "mei_cldev_recv failed. %zd\n", ret);
> +		goto end;
> +	}
> +	if (ret < sizeof(rsp.header)) {
> +		dev_err(dev, "bad response header from the firmware: size
> %zd < %zu\n",
> +			ret, sizeof(rsp.header));
> +		goto end;
> +	}
> +	if (ret < sizeof(rsp)) {
> +		dev_err(dev, "bad response from the firmware: size %zd <
> %zu\n",
> +			ret, sizeof(rsp));
> +		goto end;
> +	}
> +
> +	ret = mei_late_bind_check_response(dev, &rsp.header);
> +	if (ret) {
> +		dev_err(dev, "bad result response from the firmware:
> 0x%x\n",
> +			*(uint32_t *)&rsp.header);
> +		goto end;
> +	}
> +
> +	ret = (int)rsp.status;
> +	dev_dbg(dev, "%s status = %zd\n", __func__, ret);
> +
> +end:
> +	mei_cldev_disable(cldev);
> +	kfree(req);
> +	return ret;
> +}
> +
> +static const struct late_bind_component_ops mei_late_bind_ops = {
> +	.owner = THIS_MODULE,
> +	.push_config = mei_late_bind_push_config, };
> +
> +static int mei_component_master_bind(struct device *dev) {
> +	return component_bind_all(dev, (void *)&mei_late_bind_ops); }
> +
> +static void mei_component_master_unbind(struct device *dev) {
> +	component_unbind_all(dev, (void *)&mei_late_bind_ops); }
> +
> +static const struct component_master_ops mei_component_master_ops = {
> +	.bind = mei_component_master_bind,
> +	.unbind = mei_component_master_unbind, };
> +
> +/**
> + * mei_late_bind_component_match - compare function for matching mei
> late bind.
> + *
> + *    The function checks if requester is Intel PCI_CLASS_DISPLAY_VGA or
> + *    PCI_CLASS_DISPLAY_OTHER device, and checks if the parent of requester
DOC is still wrong dev is requester here, you are checking base == dev.
With fixing of that.
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>

Thanks,
Anshuman 
> + *    and the grand parent of mei_if are the same device
> + *
> + * @dev: master device
> + * @subcomponent: subcomponent to match
> (INTEL_COMPONENT_LATE_BIND)
> + * @data: compare data (late_bind mei device on mei bus)
> + *
> + * Return:
> + * * 1 - if components match
> + * * 0 - otherwise
> + */
> +static int mei_late_bind_component_match(struct device *dev, int
> subcomponent,
> +					 void *data)
> +{
> +	struct device *base = data;
> +	struct pci_dev *pdev;
> +
> +	if (!dev)
> +		return 0;
> +
> +	if (!dev_is_pci(dev))
> +		return 0;
> +
> +	pdev = to_pci_dev(dev);
> +
> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> +		return 0;
> +
> +	if (pdev->class != (PCI_CLASS_DISPLAY_VGA << 8) &&
> +	    pdev->class != (PCI_CLASS_DISPLAY_OTHER << 8))
> +		return 0;
> +
> +	if (subcomponent != INTEL_COMPONENT_LATE_BIND)
> +		return 0;
> +
> +	base = base->parent;
> +	if (!base) /* mei device */
> +		return 0;
> +
> +	base = base->parent; /* pci device */
> +
> +	return !!base && dev == base;
> +}
> +
> +static int mei_late_bind_probe(struct mei_cl_device *cldev,
> +			       const struct mei_cl_device_id *id) {
> +	struct component_match *master_match = NULL;
> +	int ret;
> +
> +	component_match_add_typed(&cldev->dev, &master_match,
> +				  mei_late_bind_component_match, &cldev-
> >dev);
> +	if (IS_ERR_OR_NULL(master_match))
> +		return -ENOMEM;
> +
> +	ret = component_master_add_with_match(&cldev->dev,
> +					      &mei_component_master_ops,
> +					      master_match);
> +	if (ret < 0)
> +		dev_err(&cldev->dev, "Master comp add failed %d\n", ret);
> +
> +	return ret;
> +}
> +
> +static void mei_late_bind_remove(struct mei_cl_device *cldev) {
> +	component_master_del(&cldev->dev,
> &mei_component_master_ops); }
> +
> +#define MEI_GUID_MKHI UUID_LE(0xe2c2afa2, 0x3817, 0x4d19, \
> +			      0x9d, 0x95, 0x6, 0xb1, 0x6b, 0x58, 0x8a, 0x5d)
> +
> +static struct mei_cl_device_id mei_late_bind_tbl[] = {
> +	{ .uuid = MEI_GUID_MKHI, .version = MEI_CL_VERSION_ANY },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(mei, mei_late_bind_tbl);
> +
> +static struct mei_cl_driver mei_late_bind_driver = {
> +	.id_table = mei_late_bind_tbl,
> +	.name = KBUILD_MODNAME,
> +	.probe = mei_late_bind_probe,
> +	.remove	= mei_late_bind_remove,
> +};
> +
> +module_mei_cl_driver(mei_late_bind_driver);
> +
> +MODULE_AUTHOR("Intel Corporation");
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("MEI Late Binding");
> diff --git a/include/drm/intel/i915_component.h
> b/include/drm/intel/i915_component.h
> index 4ea3b17aa143..456849a97d75 100644
> --- a/include/drm/intel/i915_component.h
> +++ b/include/drm/intel/i915_component.h
> @@ -31,6 +31,7 @@ enum i915_component_type {
>  	I915_COMPONENT_HDCP,
>  	I915_COMPONENT_PXP,
>  	I915_COMPONENT_GSC_PROXY,
> +	INTEL_COMPONENT_LATE_BIND,
>  };
> 
>  /* MAX_PORT is the number of port
> diff --git a/include/drm/intel/late_bind_mei_interface.h
> b/include/drm/intel/late_bind_mei_interface.h
> new file mode 100644
> index 000000000000..ec58ef1ab4e8
> --- /dev/null
> +++ b/include/drm/intel/late_bind_mei_interface.h
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright (c) 2025 Intel Corporation  */
> +
> +#ifndef _LATE_BIND_MEI_INTERFACE_H_
> +#define _LATE_BIND_MEI_INTERFACE_H_
> +
> +#include <linux/types.h>
> +
> +struct device;
> +struct module;
> +
> +/**
> + * Late Binding flags
> + * Persistent across warm reset
> + */
> +#define CSC_LATE_BINDING_FLAGS_IS_PERSISTENT	BIT(0)
> +
> +/**
> + * xe_late_bind_fw_type - enum to determine late binding fw type  */
> +enum late_bind_type {
> +	CSC_LATE_BINDING_TYPE_FAN_CONTROL = 1, };
> +
> +/**
> + * Late Binding payload status
> + */
> +enum csc_late_binding_status {
> +	CSC_LATE_BINDING_STATUS_SUCCESS           = 0,
> +	CSC_LATE_BINDING_STATUS_4ID_MISMATCH      = 1,
> +	CSC_LATE_BINDING_STATUS_ARB_FAILURE       = 2,
> +	CSC_LATE_BINDING_STATUS_GENERAL_ERROR     = 3,
> +	CSC_LATE_BINDING_STATUS_INVALID_PARAMS    = 4,
> +	CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE = 5,
> +	CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD   = 6,
> +	CSC_LATE_BINDING_STATUS_TIMEOUT           = 7,
> +};
> +
> +/**
> + * struct late_bind_component_ops - ops for Late Binding services.
> + * @owner: Module providing the ops
> + * @push_config: Sends a config to FW.
> + */
> +struct late_bind_component_ops {
> +	struct module *owner;
> +
> +	/**
> +	 * @push_config: Sends a config to FW.
> +	 * @dev: device struct corresponding to the mei device
> +	 * @type: payload type
> +	 * @flags: payload flags
> +	 * @payload: payload buffer
> +	 * @payload_size: payload buffer size
> +	 *
> +	 * Return: 0 success, negative errno value on transport failure,
> +	 *         positive status returned by FW
> +	 */
> +	int (*push_config)(struct device *dev, u32 type, u32 flags,
> +			   const void *payload, size_t payload_size); };
> +
> +#endif /* _LATE_BIND_MEI_INTERFACE_H_ */
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load late binding firmware
  2025-06-25 17:00 ` [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load " Badal Nilawar
@ 2025-06-26 17:24   ` Rodrigo Vivi
  2025-06-26 21:27     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 38+ messages in thread
From: Rodrigo Vivi @ 2025-06-26 17:24 UTC (permalink / raw)
  To: Badal Nilawar
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh, daniele.ceraolospurio

On Wed, Jun 25, 2025 at 10:30:10PM +0530, Badal Nilawar wrote:
> Load late binding firmware
> 
> v2:
>  - s/EAGAIN/EBUSY/
>  - Flush worker in suspend and driver unload (Daniele)
> v3:
>  - Use retry interval of 6s, in steps of 200ms, to allow
>    other OS components release MEI CL handle (Sasha)
> v4:
>  - return -ENODEV if component not added (Daniele)
>  - parse and print status returned by csc
>  - Use xe_pm_get_if_in_active (Daniele)

The worker is considered outer bound and it is safe
to use xe_pm_runtime_get which takes the reference
and resume synchronously.

Otherwise, if using get_if_active you need to reschedule
the work or you lose your job.


> 
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_late_bind_fw.c       | 149 ++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_late_bind_fw.h       |   1 +
>  drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   7 +
>  3 files changed, 156 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> index 32d1436e7191..52243063d98a 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> @@ -16,6 +16,20 @@
>  #include "xe_late_bind_fw.h"
>  #include "xe_pcode.h"
>  #include "xe_pcode_api.h"
> +#include "xe_pm.h"
> +
> +/*
> + * The component should load quite quickly in most cases, but it could take
> + * a bit. Using a very big timeout just to cover the worst case scenario
> + */
> +#define LB_INIT_TIMEOUT_MS 20000
> +
> +/*
> + * Retry interval set to 6 seconds, in steps of 200 ms, to allow time for
> + * other OS components to release the MEI CL handle
> + */
> +#define LB_FW_LOAD_RETRY_MAXCOUNT 30
> +#define LB_FW_LOAD_RETRY_PAUSE_MS 200
>  
>  static const u32 fw_id_to_type[] = {
>  		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
> @@ -31,6 +45,30 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
>  	return container_of(late_bind, struct xe_device, late_bind);
>  }
>  
> +static const char *xe_late_bind_parse_status(uint32_t status)
> +{
> +	switch (status) {
> +	case CSC_LATE_BINDING_STATUS_SUCCESS:
> +		return "success";
> +	case CSC_LATE_BINDING_STATUS_4ID_MISMATCH:
> +		return "4Id Mismatch";
> +	case CSC_LATE_BINDING_STATUS_ARB_FAILURE:
> +		return "ARB Failure";
> +	case CSC_LATE_BINDING_STATUS_GENERAL_ERROR:
> +		return "General Error";
> +	case CSC_LATE_BINDING_STATUS_INVALID_PARAMS:
> +		return "Invalid Params";
> +	case CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE:
> +		return "Invalid Signature";
> +	case CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD:
> +		return "Invalid Payload";
> +	case CSC_LATE_BINDING_STATUS_TIMEOUT:
> +		return "Timeout";
> +	default:
> +		return "Unknown error";
> +	}
> +}
> +
>  static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>  {
>  	struct xe_device *xe = late_bind_to_xe(late_bind);
> @@ -44,6 +82,93 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>  		return 0;
>  }
>  
> +static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
> +{
> +	struct xe_device *xe = late_bind_to_xe(late_bind);
> +	struct xe_late_bind_fw *lbfw;
> +	int fw_id;
> +
> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> +		lbfw = &late_bind->late_bind_fw[fw_id];
> +		if (lbfw->valid && late_bind->wq) {
> +			drm_dbg(&xe->drm, "Flush work: load %s firmware\n",
> +				fw_id_to_name[lbfw->id]);
> +			flush_work(&lbfw->work);
> +		}
> +	}
> +}
> +
> +static void xe_late_bind_work(struct work_struct *work)
> +{
> +	struct xe_late_bind_fw *lbfw = container_of(work, struct xe_late_bind_fw, work);
> +	struct xe_late_bind *late_bind = container_of(lbfw, struct xe_late_bind,
> +						      late_bind_fw[lbfw->id]);
> +	struct xe_device *xe = late_bind_to_xe(late_bind);
> +	int retry = LB_FW_LOAD_RETRY_MAXCOUNT;
> +	int ret;
> +	int slept;
> +
> +	/* we can queue this before the component is bound */
> +	for (slept = 0; slept < LB_INIT_TIMEOUT_MS; slept += 100) {
> +		if (late_bind->component.ops)
> +			break;
> +		msleep(100);
> +	}
> +
> +	if (!xe_pm_runtime_get_if_active(xe))
> +		return;
> +
> +	mutex_lock(&late_bind->mutex);
> +
> +	if (!late_bind->component.ops) {
> +		drm_err(&xe->drm, "Late bind component not bound\n");
> +		goto out;
> +	}
> +
> +	drm_dbg(&xe->drm, "Load %s firmware\n", fw_id_to_name[lbfw->id]);
> +
> +	do {
> +		ret = late_bind->component.ops->push_config(late_bind->component.mei_dev,
> +							    lbfw->type, lbfw->flags,
> +							    lbfw->payload, lbfw->payload_size);
> +		if (!ret)
> +			break;
> +		msleep(LB_FW_LOAD_RETRY_PAUSE_MS);
> +	} while (--retry && ret == -EBUSY);
> +
> +	if (!ret) {
> +		drm_dbg(&xe->drm, "Load %s firmware successful\n",
> +			fw_id_to_name[lbfw->id]);
> +		goto out;
> +	}
> +
> +	if (ret > 0)
> +		drm_err(&xe->drm, "Load %s firmware failed with err %d, %s\n",
> +			fw_id_to_name[lbfw->id], ret, xe_late_bind_parse_status(ret));
> +	else
> +		drm_err(&xe->drm, "Load %s firmware failed with err %d",
> +			fw_id_to_name[lbfw->id], ret);
> +out:
> +	mutex_unlock(&late_bind->mutex);
> +	xe_pm_runtime_put(xe);
> +}
> +
> +int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
> +{
> +	struct xe_late_bind_fw *lbfw;
> +	int fw_id;
> +
> +	if (!late_bind->component_added)
> +		return -ENODEV;
> +
> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> +		lbfw = &late_bind->late_bind_fw[fw_id];
> +		if (lbfw->valid)
> +			queue_work(late_bind->wq, &lbfw->work);
> +	}
> +	return 0;
> +}
> +
>  static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>  {
>  	struct xe_device *xe = late_bind_to_xe(late_bind);
> @@ -99,6 +224,7 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>  
>  	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
>  	release_firmware(fw);
> +	INIT_WORK(&lb_fw->work, xe_late_bind_work);
>  	lb_fw->valid = true;
>  
>  	return 0;
> @@ -109,11 +235,16 @@ static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
>  	int ret;
>  	int fw_id;
>  
> +	late_bind->wq = alloc_ordered_workqueue("late-bind-ordered-wq", 0);
> +	if (!late_bind->wq)
> +		return -ENOMEM;
> +
>  	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>  		ret = __xe_late_bind_fw_init(late_bind, fw_id);
>  		if (ret)
>  			return ret;
>  	}
> +
>  	return 0;
>  }
>  
> @@ -137,6 +268,8 @@ static void xe_late_bind_component_unbind(struct device *xe_kdev,
>  	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
>  	struct xe_late_bind *late_bind = &xe->late_bind;
>  
> +	xe_late_bind_wait_for_worker_completion(late_bind);
> +
>  	mutex_lock(&late_bind->mutex);
>  	late_bind->component.ops = NULL;
>  	mutex_unlock(&late_bind->mutex);
> @@ -152,7 +285,15 @@ static void xe_late_bind_remove(void *arg)
>  	struct xe_late_bind *late_bind = arg;
>  	struct xe_device *xe = late_bind_to_xe(late_bind);
>  
> +	xe_late_bind_wait_for_worker_completion(late_bind);
> +
> +	late_bind->component_added = false;
> +
>  	component_del(xe->drm.dev, &xe_late_bind_component_ops);
> +	if (late_bind->wq) {
> +		destroy_workqueue(late_bind->wq);
> +		late_bind->wq = NULL;
> +	}
>  	mutex_destroy(&late_bind->mutex);
>  }
>  
> @@ -183,9 +324,15 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
>  		return err;
>  	}
>  
> +	late_bind->component_added = true;
> +
>  	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
>  	if (err)
>  		return err;
>  
> -	return xe_late_bind_fw_init(late_bind);
> +	err = xe_late_bind_fw_init(late_bind);
> +	if (err)
> +		return err;
> +
> +	return xe_late_bind_fw_load(late_bind);
>  }
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> index 4c73571c3e62..28d56ed2bfdc 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> @@ -11,5 +11,6 @@
>  struct xe_late_bind;
>  
>  int xe_late_bind_init(struct xe_late_bind *late_bind);
> +int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
>  
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> index 93abf4c51789..f119a75f4c9c 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> @@ -9,6 +9,7 @@
>  #include <linux/iosys-map.h>
>  #include <linux/mutex.h>
>  #include <linux/types.h>
> +#include <linux/workqueue.h>
>  
>  #define MAX_PAYLOAD_SIZE SZ_4K
>  
> @@ -38,6 +39,8 @@ struct xe_late_bind_fw {
>  	u8  *payload;
>  	/** @late_bind_fw.payload_size: late binding blob payload_size */
>  	size_t payload_size;
> +	/** @late_bind_fw.work: worker to upload latebind blob */
> +	struct work_struct work;
>  };
>  
>  /**
> @@ -64,6 +67,10 @@ struct xe_late_bind {
>  	struct mutex mutex;
>  	/** @late_bind.late_bind_fw: late binding firmware array */
>  	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
> +	/** @late_bind.wq: workqueue to submit request to download late bind blob */
> +	struct workqueue_struct *wq;
> +	/** @late_bind.component_added: whether the component has been added */
> +	bool component_added;
>  };
>  
>  #endif
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware
  2025-06-25 17:00 ` [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware Badal Nilawar
@ 2025-06-26 21:06   ` Daniele Ceraolo Spurio
  2025-06-27 12:48     ` Nilawar, Badal
  0 siblings, 1 reply; 38+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-06-26 21:06 UTC (permalink / raw)
  To: Badal Nilawar, intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh



On 6/25/2025 10:00 AM, Badal Nilawar wrote:
> Search for late binding firmware binaries and populate the meta data of
> firmware structures.
>
> v2 (Daniele):
>   - drm_err if firmware size is more than max pay load size
>   - s/request_firmware/firmware_request_nowarn/ as firmware will
>     not be available for all possible cards
> v3 (Daniele):
>   - init firmware from within xe_late_bind_init, propagate error
>   - switch late_bind_fw to array to handle multiple firmware types
> v4 (Daniele):
>   - Alloc payload dynamically, fix nits
>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_late_bind_fw.c       | 103 ++++++++++++++++++++-
>   drivers/gpu/drm/xe/xe_late_bind_fw_types.h |  32 +++++++
>   2 files changed, 134 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> index eaf12cfec848..32d1436e7191 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> @@ -5,6 +5,7 @@
>   
>   #include <linux/component.h>
>   #include <linux/delay.h>
> +#include <linux/firmware.h>
>   
>   #include <drm/drm_managed.h>
>   #include <drm/intel/i915_component.h>
> @@ -13,6 +14,16 @@
>   
>   #include "xe_device.h"
>   #include "xe_late_bind_fw.h"
> +#include "xe_pcode.h"
> +#include "xe_pcode_api.h"
> +
> +static const u32 fw_id_to_type[] = {
> +		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
> +	};
> +
> +static const char * const fw_id_to_name[] = {
> +		[XE_LB_FW_FAN_CONTROL] = "fan_control",
> +	};
>   
>   static struct xe_device *
>   late_bind_to_xe(struct xe_late_bind *late_bind)
> @@ -20,6 +31,92 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
>   	return container_of(late_bind, struct xe_device, late_bind);
>   }
>   
> +static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
> +{
> +	struct xe_device *xe = late_bind_to_xe(late_bind);
> +	struct xe_tile *root_tile = xe_device_get_root_tile(xe);
> +	u32 uval;
> +
> +	if (!xe_pcode_read(root_tile,
> +			   PCODE_MBOX(FAN_SPEED_CONTROL, FSC_READ_NUM_FANS, 0), &uval, NULL))
> +		return uval;
> +	else
> +		return 0;
> +}
> +
> +static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
> +{
> +	struct xe_device *xe = late_bind_to_xe(late_bind);
> +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> +	struct xe_late_bind_fw *lb_fw;
> +	const struct firmware *fw;
> +	u32 num_fans;
> +	int ret;
> +
> +	if (fw_id >= XE_LB_FW_MAX_ID)
> +		return -EINVAL;
> +
> +	lb_fw = &late_bind->late_bind_fw[fw_id];
> +
> +	lb_fw->valid = false;
> +	lb_fw->id = fw_id;
> +	lb_fw->type = fw_id_to_type[lb_fw->id];
> +	lb_fw->flags &= ~CSC_LATE_BINDING_FLAGS_IS_PERSISTENT;
> +
> +	if (lb_fw->type == CSC_LATE_BINDING_TYPE_FAN_CONTROL) {
> +		num_fans = xe_late_bind_fw_num_fans(late_bind);
> +		drm_dbg(&xe->drm, "Number of Fans: %d\n", num_fans);
> +		if (!num_fans)
> +			return 0;
> +	}
> +
> +	snprintf(lb_fw->blob_path, sizeof(lb_fw->blob_path), "xe/%s_8086_%04x_%04x_%04x.bin",
> +		 fw_id_to_name[lb_fw->id], pdev->device,
> +		 pdev->subsystem_vendor, pdev->subsystem_device);
> +
> +	drm_dbg(&xe->drm, "Request late binding firmware %s\n", lb_fw->blob_path);
> +	ret = firmware_request_nowarn(&fw, lb_fw->blob_path, xe->drm.dev);
> +	if (ret) {
> +		drm_dbg(&xe->drm, "%s late binding fw not available for current device",
> +			fw_id_to_name[lb_fw->id]);
> +		return 0;
> +	}
> +
> +	if (fw->size > MAX_PAYLOAD_SIZE) {
> +		drm_err(&xe->drm, "Firmware %s size %zu is larger than max pay load size %u\n",
> +			lb_fw->blob_path, fw->size, MAX_PAYLOAD_SIZE);
> +		release_firmware(fw);
> +		return -ENODATA;
> +	}
> +
> +	lb_fw->payload = drmm_kzalloc(&xe->drm, lb_fw->payload_size, GFP_KERNEL);

here you're using lb_fw->payload_size before assigning it.

> +	if (!lb_fw->payload) {
> +		release_firmware(fw);
> +		return -ENOMEM;
> +	}
> +
> +	lb_fw->payload_size = fw->size;
> +
> +	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
> +	release_firmware(fw);
> +	lb_fw->valid = true;

You can now use lb_fw->payload to check if the FW is valid, no need for 
a separate variable. not a blocker.

> +
> +	return 0;
> +}
> +
> +static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
> +{
> +	int ret;
> +	int fw_id;
> +
> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> +		ret = __xe_late_bind_fw_init(late_bind, fw_id);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
>   static int xe_late_bind_component_bind(struct device *xe_kdev,
>   				       struct device *mei_kdev, void *data)
>   {
> @@ -86,5 +183,9 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
>   		return err;
>   	}
>   
> -	return devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
> +	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
> +	if (err)
> +		return err;
> +
> +	return xe_late_bind_fw_init(late_bind);
>   }
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> index 1156ef94f0d5..93abf4c51789 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> @@ -10,6 +10,36 @@
>   #include <linux/mutex.h>
>   #include <linux/types.h>
>   
> +#define MAX_PAYLOAD_SIZE SZ_4K
> +
> +/**
> + * xe_late_bind_fw_id - enum to determine late binding fw index
> + */
> +enum xe_late_bind_fw_id {
> +	XE_LB_FW_FAN_CONTROL = 0,
> +	XE_LB_FW_MAX_ID
> +};
> +
> +/**
> + * struct xe_late_bind_fw
> + */
> +struct xe_late_bind_fw {
> +	/** @late_bind_fw.valid: to check if fw is valid */
> +	bool valid;
> +	/** @late_bind_fw.id: firmware index */
> +	u32 id;
> +	/** @late_bind_fw.blob_path: firmware binary path */
> +	char blob_path[PATH_MAX];
> +	/** @late_bind_fw.type: firmware type */
> +	u32  type;
> +	/** @late_bind_fw.flags: firmware flags */
> +	u32  flags;
> +	/** @late_bind_fw.payload: to store the late binding blob */
> +	u8  *payload;

Why a u8 pointer and not a void one?

Daniele

> +	/** @late_bind_fw.payload_size: late binding blob payload_size */
> +	size_t payload_size;
> +};
> +
>   /**
>    * struct xe_late_bind_component - Late Binding services component
>    * @mei_dev: device that provide Late Binding service.
> @@ -32,6 +62,8 @@ struct xe_late_bind {
>   	struct xe_late_bind_component component;
>   	/** @late_bind.mutex: protects the component binding and usage */
>   	struct mutex mutex;
> +	/** @late_bind.late_bind_fw: late binding firmware array */
> +	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
>   };
>   
>   #endif


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load late binding firmware
  2025-06-26 17:24   ` Rodrigo Vivi
@ 2025-06-26 21:27     ` Daniele Ceraolo Spurio
  2025-06-26 21:49       ` Rodrigo Vivi
  0 siblings, 1 reply; 38+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-06-26 21:27 UTC (permalink / raw)
  To: Rodrigo Vivi, Badal Nilawar
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh



On 6/26/2025 10:24 AM, Rodrigo Vivi wrote:
> On Wed, Jun 25, 2025 at 10:30:10PM +0530, Badal Nilawar wrote:
>> Load late binding firmware
>>
>> v2:
>>   - s/EAGAIN/EBUSY/
>>   - Flush worker in suspend and driver unload (Daniele)
>> v3:
>>   - Use retry interval of 6s, in steps of 200ms, to allow
>>     other OS components release MEI CL handle (Sasha)
>> v4:
>>   - return -ENODEV if component not added (Daniele)
>>   - parse and print status returned by csc
>>   - Use xe_pm_get_if_in_active (Daniele)
> The worker is considered outer bound and it is safe
> to use xe_pm_runtime_get which takes the reference
> and resume synchronously.
>
> Otherwise, if using get_if_active you need to reschedule
> the work or you lose your job.

The issue is that the next patch adds code to re-queue the work from the 
rpm resume path, so if we do a sync resume here the worker will re-queue 
itself immediately when not needed. Also, when the re-queued work runs 
it might end up doing another sync resume and re-queuing itself once 
more. However, in the next patch we do also have a flush of the work in 
the rpm_suspend path, so maybe the worker running when we are rpm 
suspended is not actually a possible case?
Also, thinking about this more, that re-queuing on rpm resume only 
happens if d3cold is allowed, so when d3cold is not allowed we do want 
to proceed here we can actually reach here when rpm suspended.

>
>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_late_bind_fw.c       | 149 ++++++++++++++++++++-
>>   drivers/gpu/drm/xe/xe_late_bind_fw.h       |   1 +
>>   drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   7 +
>>   3 files changed, 156 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> index 32d1436e7191..52243063d98a 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> @@ -16,6 +16,20 @@
>>   #include "xe_late_bind_fw.h"
>>   #include "xe_pcode.h"
>>   #include "xe_pcode_api.h"
>> +#include "xe_pm.h"
>> +
>> +/*
>> + * The component should load quite quickly in most cases, but it could take
>> + * a bit. Using a very big timeout just to cover the worst case scenario
>> + */
>> +#define LB_INIT_TIMEOUT_MS 20000
>> +
>> +/*
>> + * Retry interval set to 6 seconds, in steps of 200 ms, to allow time for
>> + * other OS components to release the MEI CL handle
>> + */
>> +#define LB_FW_LOAD_RETRY_MAXCOUNT 30
>> +#define LB_FW_LOAD_RETRY_PAUSE_MS 200
>>   
>>   static const u32 fw_id_to_type[] = {
>>   		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
>> @@ -31,6 +45,30 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
>>   	return container_of(late_bind, struct xe_device, late_bind);
>>   }
>>   
>> +static const char *xe_late_bind_parse_status(uint32_t status)
>> +{
>> +	switch (status) {
>> +	case CSC_LATE_BINDING_STATUS_SUCCESS:
>> +		return "success";
>> +	case CSC_LATE_BINDING_STATUS_4ID_MISMATCH:
>> +		return "4Id Mismatch";
>> +	case CSC_LATE_BINDING_STATUS_ARB_FAILURE:
>> +		return "ARB Failure";
>> +	case CSC_LATE_BINDING_STATUS_GENERAL_ERROR:
>> +		return "General Error";
>> +	case CSC_LATE_BINDING_STATUS_INVALID_PARAMS:
>> +		return "Invalid Params";
>> +	case CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE:
>> +		return "Invalid Signature";
>> +	case CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD:
>> +		return "Invalid Payload";
>> +	case CSC_LATE_BINDING_STATUS_TIMEOUT:
>> +		return "Timeout";
>> +	default:
>> +		return "Unknown error";
>> +	}
>> +}
>> +
>>   static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>>   {
>>   	struct xe_device *xe = late_bind_to_xe(late_bind);
>> @@ -44,6 +82,93 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>>   		return 0;
>>   }
>>   
>> +static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
>> +{
>> +	struct xe_device *xe = late_bind_to_xe(late_bind);
>> +	struct xe_late_bind_fw *lbfw;
>> +	int fw_id;
>> +
>> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>> +		lbfw = &late_bind->late_bind_fw[fw_id];
>> +		if (lbfw->valid && late_bind->wq) {
>> +			drm_dbg(&xe->drm, "Flush work: load %s firmware\n",
>> +				fw_id_to_name[lbfw->id]);
>> +			flush_work(&lbfw->work);
>> +		}
>> +	}
>> +}
>> +
>> +static void xe_late_bind_work(struct work_struct *work)
>> +{
>> +	struct xe_late_bind_fw *lbfw = container_of(work, struct xe_late_bind_fw, work);
>> +	struct xe_late_bind *late_bind = container_of(lbfw, struct xe_late_bind,
>> +						      late_bind_fw[lbfw->id]);
>> +	struct xe_device *xe = late_bind_to_xe(late_bind);
>> +	int retry = LB_FW_LOAD_RETRY_MAXCOUNT;
>> +	int ret;
>> +	int slept;
>> +
>> +	/* we can queue this before the component is bound */
>> +	for (slept = 0; slept < LB_INIT_TIMEOUT_MS; slept += 100) {
>> +		if (late_bind->component.ops)
>> +			break;
>> +		msleep(100);
>> +	}
>> +
>> +	if (!xe_pm_runtime_get_if_active(xe))
>> +		return;
>> +
>> +	mutex_lock(&late_bind->mutex);
>> +
>> +	if (!late_bind->component.ops) {
>> +		drm_err(&xe->drm, "Late bind component not bound\n");
>> +		goto out;
>> +	}
>> +
>> +	drm_dbg(&xe->drm, "Load %s firmware\n", fw_id_to_name[lbfw->id]);
>> +
>> +	do {
>> +		ret = late_bind->component.ops->push_config(late_bind->component.mei_dev,
>> +							    lbfw->type, lbfw->flags,
>> +							    lbfw->payload, lbfw->payload_size);
>> +		if (!ret)
>> +			break;
>> +		msleep(LB_FW_LOAD_RETRY_PAUSE_MS);
>> +	} while (--retry && ret == -EBUSY);
>> +
>> +	if (!ret) {
>> +		drm_dbg(&xe->drm, "Load %s firmware successful\n",
>> +			fw_id_to_name[lbfw->id]);
>> +		goto out;
>> +	}
>> +
>> +	if (ret > 0)

nit: here you can just do "else if" and drop the goto.

Daniele

>> +		drm_err(&xe->drm, "Load %s firmware failed with err %d, %s\n",
>> +			fw_id_to_name[lbfw->id], ret, xe_late_bind_parse_status(ret));
>> +	else
>> +		drm_err(&xe->drm, "Load %s firmware failed with err %d",
>> +			fw_id_to_name[lbfw->id], ret);
>> +out:
>> +	mutex_unlock(&late_bind->mutex);
>> +	xe_pm_runtime_put(xe);
>> +}
>> +
>> +int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
>> +{
>> +	struct xe_late_bind_fw *lbfw;
>> +	int fw_id;
>> +
>> +	if (!late_bind->component_added)
>> +		return -ENODEV;
>> +
>> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>> +		lbfw = &late_bind->late_bind_fw[fw_id];
>> +		if (lbfw->valid)
>> +			queue_work(late_bind->wq, &lbfw->work);
>> +	}
>> +	return 0;
>> +}
>> +
>>   static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>>   {
>>   	struct xe_device *xe = late_bind_to_xe(late_bind);
>> @@ -99,6 +224,7 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>>   
>>   	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
>>   	release_firmware(fw);
>> +	INIT_WORK(&lb_fw->work, xe_late_bind_work);
>>   	lb_fw->valid = true;
>>   
>>   	return 0;
>> @@ -109,11 +235,16 @@ static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
>>   	int ret;
>>   	int fw_id;
>>   
>> +	late_bind->wq = alloc_ordered_workqueue("late-bind-ordered-wq", 0);
>> +	if (!late_bind->wq)
>> +		return -ENOMEM;
>> +
>>   	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>>   		ret = __xe_late_bind_fw_init(late_bind, fw_id);
>>   		if (ret)
>>   			return ret;
>>   	}
>> +
>>   	return 0;
>>   }
>>   
>> @@ -137,6 +268,8 @@ static void xe_late_bind_component_unbind(struct device *xe_kdev,
>>   	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
>>   	struct xe_late_bind *late_bind = &xe->late_bind;
>>   
>> +	xe_late_bind_wait_for_worker_completion(late_bind);
>> +
>>   	mutex_lock(&late_bind->mutex);
>>   	late_bind->component.ops = NULL;
>>   	mutex_unlock(&late_bind->mutex);
>> @@ -152,7 +285,15 @@ static void xe_late_bind_remove(void *arg)
>>   	struct xe_late_bind *late_bind = arg;
>>   	struct xe_device *xe = late_bind_to_xe(late_bind);
>>   
>> +	xe_late_bind_wait_for_worker_completion(late_bind);
>> +
>> +	late_bind->component_added = false;
>> +
>>   	component_del(xe->drm.dev, &xe_late_bind_component_ops);
>> +	if (late_bind->wq) {
>> +		destroy_workqueue(late_bind->wq);
>> +		late_bind->wq = NULL;
>> +	}
>>   	mutex_destroy(&late_bind->mutex);
>>   }
>>   
>> @@ -183,9 +324,15 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
>>   		return err;
>>   	}
>>   
>> +	late_bind->component_added = true;
>> +
>>   	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
>>   	if (err)
>>   		return err;
>>   
>> -	return xe_late_bind_fw_init(late_bind);
>> +	err = xe_late_bind_fw_init(late_bind);
>> +	if (err)
>> +		return err;
>> +
>> +	return xe_late_bind_fw_load(late_bind);
>>   }
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> index 4c73571c3e62..28d56ed2bfdc 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> @@ -11,5 +11,6 @@
>>   struct xe_late_bind;
>>   
>>   int xe_late_bind_init(struct xe_late_bind *late_bind);
>> +int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
>>   
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> index 93abf4c51789..f119a75f4c9c 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> @@ -9,6 +9,7 @@
>>   #include <linux/iosys-map.h>
>>   #include <linux/mutex.h>
>>   #include <linux/types.h>
>> +#include <linux/workqueue.h>
>>   
>>   #define MAX_PAYLOAD_SIZE SZ_4K
>>   
>> @@ -38,6 +39,8 @@ struct xe_late_bind_fw {
>>   	u8  *payload;
>>   	/** @late_bind_fw.payload_size: late binding blob payload_size */
>>   	size_t payload_size;
>> +	/** @late_bind_fw.work: worker to upload latebind blob */
>> +	struct work_struct work;
>>   };
>>   
>>   /**
>> @@ -64,6 +67,10 @@ struct xe_late_bind {
>>   	struct mutex mutex;
>>   	/** @late_bind.late_bind_fw: late binding firmware array */
>>   	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
>> +	/** @late_bind.wq: workqueue to submit request to download late bind blob */
>> +	struct workqueue_struct *wq;
>> +	/** @late_bind.component_added: whether the component has been added */
>> +	bool component_added;
>>   };
>>   
>>   #endif
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 09/10] drm/xe/xe_late_bind_fw: Extract and print version info
  2025-06-25 17:00 ` [PATCH v4 09/10] drm/xe/xe_late_bind_fw: Extract and print version info Badal Nilawar
@ 2025-06-26 21:32   ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 38+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-06-26 21:32 UTC (permalink / raw)
  To: Badal Nilawar, intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh



On 6/25/2025 10:00 AM, Badal Nilawar wrote:
> Extract and print version info of the late binding binary.
>
> v2: Some refinements (Daniele)
>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

> ---
>   drivers/gpu/drm/xe/xe_late_bind_fw.c       | 124 +++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   3 +
>   drivers/gpu/drm/xe/xe_uc_fw_abi.h          |  66 +++++++++++
>   3 files changed, 193 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> index 777f66692d7f..253908794d4a 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> @@ -45,6 +45,121 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
>   	return container_of(late_bind, struct xe_device, late_bind);
>   }
>   
> +static struct xe_device *
> +late_bind_fw_to_xe(struct xe_late_bind_fw *lb_fw)
> +{
> +	return container_of(lb_fw, struct xe_device, late_bind.late_bind_fw[lb_fw->id]);
> +}
> +
> +/* Refer to the "Late Bind based Firmware Layout" documentation entry for details */
> +static int parse_cpd_header(struct xe_late_bind_fw *lb_fw,
> +			    const void *data, size_t size, const char *manifest_entry)
> +{
> +	struct xe_device *xe = late_bind_fw_to_xe(lb_fw);
> +	const struct gsc_cpd_header_v2 *header = data;
> +	const struct gsc_manifest_header *manifest;
> +	const struct gsc_cpd_entry *entry;
> +	size_t min_size = sizeof(*header);
> +	u32 offset;
> +	int i;
> +
> +	/* manifest_entry is mandatory */
> +	xe_assert(xe, manifest_entry);
> +
> +	if (size < min_size || header->header_marker != GSC_CPD_HEADER_MARKER)
> +		return -ENOENT;
> +
> +	if (header->header_length < sizeof(struct gsc_cpd_header_v2)) {
> +		drm_err(&xe->drm, "%s late binding fw: Invalid CPD header length %u!\n",
> +			fw_id_to_name[lb_fw->id], header->header_length);
> +		return -EINVAL;
> +	}
> +
> +	min_size = header->header_length + sizeof(struct gsc_cpd_entry) * header->num_of_entries;
> +	if (size < min_size) {
> +		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
> +			fw_id_to_name[lb_fw->id], size, min_size);
> +		return -ENODATA;
> +	}
> +
> +	/* Look for the manifest first */
> +	entry = (void *)header + header->header_length;
> +	for (i = 0; i < header->num_of_entries; i++, entry++)
> +		if (strcmp(entry->name, manifest_entry) == 0)
> +			offset = entry->offset & GSC_CPD_ENTRY_OFFSET_MASK;
> +
> +	if (!offset) {
> +		drm_err(&xe->drm, "%s late binding fw: Failed to find manifest_entry\n",
> +			fw_id_to_name[lb_fw->id]);
> +		return -ENODATA;
> +	}
> +
> +	min_size = offset + sizeof(struct gsc_manifest_header);
> +	if (size < min_size) {
> +		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
> +			fw_id_to_name[lb_fw->id], size, min_size);
> +		return -ENODATA;
> +	}
> +
> +	manifest = data + offset;
> +
> +	lb_fw->version = manifest->fw_version;
> +
> +	return 0;
> +}
> +
> +/* Refer to the "Late Bind based Firmware Layout" documentation entry for details */
> +static int parse_lb_layout(struct xe_late_bind_fw *lb_fw,
> +			   const void *data, size_t size, const char *fpt_entry)
> +{
> +	struct xe_device *xe = late_bind_fw_to_xe(lb_fw);
> +	const struct csc_fpt_header *header = data;
> +	const struct csc_fpt_entry *entry;
> +	size_t min_size = sizeof(*header);
> +	u32 offset;
> +	int i;
> +
> +	/* fpt_entry is mandatory */
> +	xe_assert(xe, fpt_entry);
> +
> +	if (size < min_size || header->header_marker != CSC_FPT_HEADER_MARKER)
> +		return -ENOENT;
> +
> +	if (header->header_length < sizeof(struct csc_fpt_header)) {
> +		drm_err(&xe->drm, "%s late binding fw: Invalid FPT header length %u!\n",
> +			fw_id_to_name[lb_fw->id], header->header_length);
> +		return -EINVAL;
> +	}
> +
> +	min_size = header->header_length + sizeof(struct csc_fpt_entry) * header->num_of_entries;
> +	if (size < min_size) {
> +		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
> +			fw_id_to_name[lb_fw->id], size, min_size);
> +		return -ENODATA;
> +	}
> +
> +	/* Look for the cpd header first */
> +	entry = (void *)header + header->header_length;
> +	for (i = 0; i < header->num_of_entries; i++, entry++)
> +		if (strcmp(entry->name, fpt_entry) == 0)
> +			offset = entry->offset;
> +
> +	if (!offset) {
> +		drm_err(&xe->drm, "%s late binding fw: Failed to find fpt_entry\n",
> +			fw_id_to_name[lb_fw->id]);
> +		return -ENODATA;
> +	}
> +
> +	min_size = offset + sizeof(struct gsc_cpd_header_v2);
> +	if (size < min_size) {
> +		drm_err(&xe->drm, "%s late binding fw: too small! %zu < %zu\n",
> +			fw_id_to_name[lb_fw->id], size, min_size);
> +		return -ENODATA;
> +	}
> +
> +	return parse_cpd_header(lb_fw, data + offset, size - offset, "LTES.man");
> +}
> +
>   static const char *xe_late_bind_parse_status(uint32_t status)
>   {
>   	switch (status) {
> @@ -217,6 +332,10 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>   		return -ENODATA;
>   	}
>   
> +	ret = parse_lb_layout(lb_fw, fw->data, fw->size, "LTES");
> +	if (ret)
> +		return ret;
> +
>   	lb_fw->payload = drmm_kzalloc(&xe->drm, lb_fw->payload_size, GFP_KERNEL);
>   	if (!lb_fw->payload) {
>   		release_firmware(fw);
> @@ -225,6 +344,11 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>   
>   	lb_fw->payload_size = fw->size;
>   
> +	drm_info(&xe->drm, "Using %s firmware from %s version %u.%u.%u.%u\n",
> +		 fw_id_to_name[lb_fw->id], lb_fw->blob_path,
> +		 lb_fw->version.major, lb_fw->version.minor,
> +		 lb_fw->version.hotfix, lb_fw->version.build);
> +
>   	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
>   	release_firmware(fw);
>   	INIT_WORK(&lb_fw->work, xe_late_bind_work);
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> index 16f2bd6bbdf1..7f98a1380844 100644
> --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> @@ -10,6 +10,7 @@
>   #include <linux/mutex.h>
>   #include <linux/types.h>
>   #include <linux/workqueue.h>
> +#include "xe_uc_fw_abi.h"
>   
>   #define MAX_PAYLOAD_SIZE SZ_4K
>   
> @@ -41,6 +42,8 @@ struct xe_late_bind_fw {
>   	size_t payload_size;
>   	/** @late_bind_fw.work: worker to upload latebind blob */
>   	struct work_struct work;
> +	/** @late_bind_fw.version: late binding blob manifest version */
> +	struct gsc_version version;
>   };
>   
>   /**
> diff --git a/drivers/gpu/drm/xe/xe_uc_fw_abi.h b/drivers/gpu/drm/xe/xe_uc_fw_abi.h
> index 87ade41209d0..78782d105fa9 100644
> --- a/drivers/gpu/drm/xe/xe_uc_fw_abi.h
> +++ b/drivers/gpu/drm/xe/xe_uc_fw_abi.h
> @@ -318,4 +318,70 @@ struct gsc_manifest_header {
>   	u32 exponent_size; /* in dwords */
>   } __packed;
>   
> +/**
> + * DOC: Late binding Firmware Layout
> + *
> + * The Late binding binary starts with FPT header, which contains locations
> + * of various partitions of the binary. Here we're interested in finding out
> + * manifest version. To the manifest version, we need to locate CPD header
> + * one of the entry in CPD header points to manifest header. Manifest header
> + * contains the version.
> + *
> + *      +================================================+
> + *      |  FPT Header                                    |
> + *      +================================================+
> + *      |  FPT entries[]                                 |
> + *      |      entry1                                    |
> + *      |      ...                                       |
> + *      |      entryX                                    |
> + *      |          "LTES"                                |
> + *      |          ...                                   |
> + *      |          offset  >-----------------------------|------o
> + *      +================================================+      |
> + *                                                              |
> + *      +================================================+      |
> + *      |  CPD Header                                    |<-----o
> + *      +================================================+
> + *      |  CPD entries[]                                 |
> + *      |      entry1                                    |
> + *      |      ...                                       |
> + *      |      entryX                                    |
> + *      |          "LTES.man"                            |
> + *      |           ...                                  |
> + *      |           offset  >----------------------------|------o
> + *      +================================================+      |
> + *                                                              |
> + *      +================================================+      |
> + *      |  Manifest Header                               |<-----o
> + *      |      ...                                       |
> + *      |      FW version                                |
> + *      |      ...                                       |
> + *      +================================================+
> + */
> +
> +/* FPT Headers */
> +struct csc_fpt_header {
> +	u32 header_marker;
> +#define CSC_FPT_HEADER_MARKER 0x54504624
> +	u32 num_of_entries;
> +	u8 header_version;
> +	u8 entry_version;
> +	u8 header_length; /* in bytes */
> +	u8 flags;
> +	u16 ticks_to_add;
> +	u16 tokens_to_add;
> +	u32 uma_size;
> +	u32 crc32;
> +	struct gsc_version fitc_version;
> +} __packed;
> +
> +struct csc_fpt_entry {
> +	u8 name[4]; /* partition name */
> +	u32 reserved1;
> +	u32 offset; /* offset from beginning of CSE region */
> +	u32 length; /* partition length in bytes */
> +	u32 reserved2[3];
> +	u32 partition_flags;
> +} __packed;
> +
>   #endif


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load late binding firmware
  2025-06-26 21:27     ` Daniele Ceraolo Spurio
@ 2025-06-26 21:49       ` Rodrigo Vivi
  2025-06-26 22:38         ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 38+ messages in thread
From: Rodrigo Vivi @ 2025-06-26 21:49 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio
  Cc: Badal Nilawar, intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh

On Thu, Jun 26, 2025 at 02:27:50PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 6/26/2025 10:24 AM, Rodrigo Vivi wrote:
> > On Wed, Jun 25, 2025 at 10:30:10PM +0530, Badal Nilawar wrote:
> > > Load late binding firmware
> > > 
> > > v2:
> > >   - s/EAGAIN/EBUSY/
> > >   - Flush worker in suspend and driver unload (Daniele)
> > > v3:
> > >   - Use retry interval of 6s, in steps of 200ms, to allow
> > >     other OS components release MEI CL handle (Sasha)
> > > v4:
> > >   - return -ENODEV if component not added (Daniele)
> > >   - parse and print status returned by csc
> > >   - Use xe_pm_get_if_in_active (Daniele)
> > The worker is considered outer bound and it is safe
> > to use xe_pm_runtime_get which takes the reference
> > and resume synchronously.
> > 
> > Otherwise, if using get_if_active you need to reschedule
> > the work or you lose your job.
> 
> The issue is that the next patch adds code to re-queue the work from the rpm
> resume path, so if we do a sync resume here the worker will re-queue itself
> immediately when not needed.

ops, I had forgotten about that case, I'm sorry.

> Also, when the re-queued work runs it might end
> up doing another sync resume and re-queuing itself once more. 

I believe it might be worse than that and even hang. This is the right
case for the if_active indeed. But we need to ensure that we will
always have an outer bound for that.

> However, in
> the next patch we do also have a flush of the work in the rpm_suspend path,
> so maybe the worker running when we are rpm suspended is not actually a
> possible case?

that's the kaboom case!

> Also, thinking about this more, that re-queuing on rpm resume only happens
> if d3cold is allowed, so when d3cold is not allowed we do want to proceed
> here we can actually reach here when rpm suspended.

no, when d3cold is not allowed we don't want to re-flash the fw.
We just skip and move forward.

My bad, sorry for the noise and please keep the if_active variant in here.

> 
> > 
> > > Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_late_bind_fw.c       | 149 ++++++++++++++++++++-
> > >   drivers/gpu/drm/xe/xe_late_bind_fw.h       |   1 +
> > >   drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   7 +
> > >   3 files changed, 156 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> > > index 32d1436e7191..52243063d98a 100644
> > > --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
> > > +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> > > @@ -16,6 +16,20 @@
> > >   #include "xe_late_bind_fw.h"
> > >   #include "xe_pcode.h"
> > >   #include "xe_pcode_api.h"
> > > +#include "xe_pm.h"
> > > +
> > > +/*
> > > + * The component should load quite quickly in most cases, but it could take
> > > + * a bit. Using a very big timeout just to cover the worst case scenario
> > > + */
> > > +#define LB_INIT_TIMEOUT_MS 20000
> > > +
> > > +/*
> > > + * Retry interval set to 6 seconds, in steps of 200 ms, to allow time for
> > > + * other OS components to release the MEI CL handle
> > > + */
> > > +#define LB_FW_LOAD_RETRY_MAXCOUNT 30
> > > +#define LB_FW_LOAD_RETRY_PAUSE_MS 200
> > >   static const u32 fw_id_to_type[] = {
> > >   		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
> > > @@ -31,6 +45,30 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
> > >   	return container_of(late_bind, struct xe_device, late_bind);
> > >   }
> > > +static const char *xe_late_bind_parse_status(uint32_t status)
> > > +{
> > > +	switch (status) {
> > > +	case CSC_LATE_BINDING_STATUS_SUCCESS:
> > > +		return "success";
> > > +	case CSC_LATE_BINDING_STATUS_4ID_MISMATCH:
> > > +		return "4Id Mismatch";
> > > +	case CSC_LATE_BINDING_STATUS_ARB_FAILURE:
> > > +		return "ARB Failure";
> > > +	case CSC_LATE_BINDING_STATUS_GENERAL_ERROR:
> > > +		return "General Error";
> > > +	case CSC_LATE_BINDING_STATUS_INVALID_PARAMS:
> > > +		return "Invalid Params";
> > > +	case CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE:
> > > +		return "Invalid Signature";
> > > +	case CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD:
> > > +		return "Invalid Payload";
> > > +	case CSC_LATE_BINDING_STATUS_TIMEOUT:
> > > +		return "Timeout";
> > > +	default:
> > > +		return "Unknown error";
> > > +	}
> > > +}
> > > +
> > >   static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
> > >   {
> > >   	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > @@ -44,6 +82,93 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
> > >   		return 0;
> > >   }
> > > +static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
> > > +{
> > > +	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > +	struct xe_late_bind_fw *lbfw;
> > > +	int fw_id;
> > > +
> > > +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> > > +		lbfw = &late_bind->late_bind_fw[fw_id];
> > > +		if (lbfw->valid && late_bind->wq) {
> > > +			drm_dbg(&xe->drm, "Flush work: load %s firmware\n",
> > > +				fw_id_to_name[lbfw->id]);
> > > +			flush_work(&lbfw->work);
> > > +		}
> > > +	}
> > > +}
> > > +
> > > +static void xe_late_bind_work(struct work_struct *work)
> > > +{
> > > +	struct xe_late_bind_fw *lbfw = container_of(work, struct xe_late_bind_fw, work);
> > > +	struct xe_late_bind *late_bind = container_of(lbfw, struct xe_late_bind,
> > > +						      late_bind_fw[lbfw->id]);
> > > +	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > +	int retry = LB_FW_LOAD_RETRY_MAXCOUNT;
> > > +	int ret;
> > > +	int slept;
> > > +
> > > +	/* we can queue this before the component is bound */
> > > +	for (slept = 0; slept < LB_INIT_TIMEOUT_MS; slept += 100) {
> > > +		if (late_bind->component.ops)
> > > +			break;
> > > +		msleep(100);
> > > +	}
> > > +
> > > +	if (!xe_pm_runtime_get_if_active(xe))
> > > +		return;
> > > +
> > > +	mutex_lock(&late_bind->mutex);
> > > +
> > > +	if (!late_bind->component.ops) {
> > > +		drm_err(&xe->drm, "Late bind component not bound\n");
> > > +		goto out;
> > > +	}
> > > +
> > > +	drm_dbg(&xe->drm, "Load %s firmware\n", fw_id_to_name[lbfw->id]);
> > > +
> > > +	do {
> > > +		ret = late_bind->component.ops->push_config(late_bind->component.mei_dev,
> > > +							    lbfw->type, lbfw->flags,
> > > +							    lbfw->payload, lbfw->payload_size);
> > > +		if (!ret)
> > > +			break;
> > > +		msleep(LB_FW_LOAD_RETRY_PAUSE_MS);
> > > +	} while (--retry && ret == -EBUSY);
> > > +
> > > +	if (!ret) {
> > > +		drm_dbg(&xe->drm, "Load %s firmware successful\n",
> > > +			fw_id_to_name[lbfw->id]);
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (ret > 0)
> 
> nit: here you can just do "else if" and drop the goto.
> 
> Daniele
> 
> > > +		drm_err(&xe->drm, "Load %s firmware failed with err %d, %s\n",
> > > +			fw_id_to_name[lbfw->id], ret, xe_late_bind_parse_status(ret));
> > > +	else
> > > +		drm_err(&xe->drm, "Load %s firmware failed with err %d",
> > > +			fw_id_to_name[lbfw->id], ret);
> > > +out:
> > > +	mutex_unlock(&late_bind->mutex);
> > > +	xe_pm_runtime_put(xe);
> > > +}
> > > +
> > > +int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
> > > +{
> > > +	struct xe_late_bind_fw *lbfw;
> > > +	int fw_id;
> > > +
> > > +	if (!late_bind->component_added)
> > > +		return -ENODEV;
> > > +
> > > +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> > > +		lbfw = &late_bind->late_bind_fw[fw_id];
> > > +		if (lbfw->valid)
> > > +			queue_work(late_bind->wq, &lbfw->work);
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > >   static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
> > >   {
> > >   	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > @@ -99,6 +224,7 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
> > >   	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
> > >   	release_firmware(fw);
> > > +	INIT_WORK(&lb_fw->work, xe_late_bind_work);
> > >   	lb_fw->valid = true;
> > >   	return 0;
> > > @@ -109,11 +235,16 @@ static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
> > >   	int ret;
> > >   	int fw_id;
> > > +	late_bind->wq = alloc_ordered_workqueue("late-bind-ordered-wq", 0);
> > > +	if (!late_bind->wq)
> > > +		return -ENOMEM;
> > > +
> > >   	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> > >   		ret = __xe_late_bind_fw_init(late_bind, fw_id);
> > >   		if (ret)
> > >   			return ret;
> > >   	}
> > > +
> > >   	return 0;
> > >   }
> > > @@ -137,6 +268,8 @@ static void xe_late_bind_component_unbind(struct device *xe_kdev,
> > >   	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
> > >   	struct xe_late_bind *late_bind = &xe->late_bind;
> > > +	xe_late_bind_wait_for_worker_completion(late_bind);
> > > +
> > >   	mutex_lock(&late_bind->mutex);
> > >   	late_bind->component.ops = NULL;
> > >   	mutex_unlock(&late_bind->mutex);
> > > @@ -152,7 +285,15 @@ static void xe_late_bind_remove(void *arg)
> > >   	struct xe_late_bind *late_bind = arg;
> > >   	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > +	xe_late_bind_wait_for_worker_completion(late_bind);
> > > +
> > > +	late_bind->component_added = false;
> > > +
> > >   	component_del(xe->drm.dev, &xe_late_bind_component_ops);
> > > +	if (late_bind->wq) {
> > > +		destroy_workqueue(late_bind->wq);
> > > +		late_bind->wq = NULL;
> > > +	}
> > >   	mutex_destroy(&late_bind->mutex);
> > >   }
> > > @@ -183,9 +324,15 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
> > >   		return err;
> > >   	}
> > > +	late_bind->component_added = true;
> > > +
> > >   	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
> > >   	if (err)
> > >   		return err;
> > > -	return xe_late_bind_fw_init(late_bind);
> > > +	err = xe_late_bind_fw_init(late_bind);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	return xe_late_bind_fw_load(late_bind);
> > >   }
> > > diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> > > index 4c73571c3e62..28d56ed2bfdc 100644
> > > --- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
> > > +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> > > @@ -11,5 +11,6 @@
> > >   struct xe_late_bind;
> > >   int xe_late_bind_init(struct xe_late_bind *late_bind);
> > > +int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
> > >   #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> > > index 93abf4c51789..f119a75f4c9c 100644
> > > --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> > > @@ -9,6 +9,7 @@
> > >   #include <linux/iosys-map.h>
> > >   #include <linux/mutex.h>
> > >   #include <linux/types.h>
> > > +#include <linux/workqueue.h>
> > >   #define MAX_PAYLOAD_SIZE SZ_4K
> > > @@ -38,6 +39,8 @@ struct xe_late_bind_fw {
> > >   	u8  *payload;
> > >   	/** @late_bind_fw.payload_size: late binding blob payload_size */
> > >   	size_t payload_size;
> > > +	/** @late_bind_fw.work: worker to upload latebind blob */
> > > +	struct work_struct work;
> > >   };
> > >   /**
> > > @@ -64,6 +67,10 @@ struct xe_late_bind {
> > >   	struct mutex mutex;
> > >   	/** @late_bind.late_bind_fw: late binding firmware array */
> > >   	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
> > > +	/** @late_bind.wq: workqueue to submit request to download late bind blob */
> > > +	struct workqueue_struct *wq;
> > > +	/** @late_bind.component_added: whether the component has been added */
> > > +	bool component_added;
> > >   };
> > >   #endif
> > > -- 
> > > 2.34.1
> > > 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load late binding firmware
  2025-06-26 21:49       ` Rodrigo Vivi
@ 2025-06-26 22:38         ` Daniele Ceraolo Spurio
  2025-06-26 22:49           ` Rodrigo Vivi
  0 siblings, 1 reply; 38+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-06-26 22:38 UTC (permalink / raw)
  To: Rodrigo Vivi
  Cc: Badal Nilawar, intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh



On 6/26/2025 2:49 PM, Rodrigo Vivi wrote:
> On Thu, Jun 26, 2025 at 02:27:50PM -0700, Daniele Ceraolo Spurio wrote:
>>
>> On 6/26/2025 10:24 AM, Rodrigo Vivi wrote:
>>> On Wed, Jun 25, 2025 at 10:30:10PM +0530, Badal Nilawar wrote:
>>>> Load late binding firmware
>>>>
>>>> v2:
>>>>    - s/EAGAIN/EBUSY/
>>>>    - Flush worker in suspend and driver unload (Daniele)
>>>> v3:
>>>>    - Use retry interval of 6s, in steps of 200ms, to allow
>>>>      other OS components release MEI CL handle (Sasha)
>>>> v4:
>>>>    - return -ENODEV if component not added (Daniele)
>>>>    - parse and print status returned by csc
>>>>    - Use xe_pm_get_if_in_active (Daniele)
>>> The worker is considered outer bound and it is safe
>>> to use xe_pm_runtime_get which takes the reference
>>> and resume synchronously.
>>>
>>> Otherwise, if using get_if_active you need to reschedule
>>> the work or you lose your job.
>> The issue is that the next patch adds code to re-queue the work from the rpm
>> resume path, so if we do a sync resume here the worker will re-queue itself
>> immediately when not needed.
> ops, I had forgotten about that case, I'm sorry.
>
>> Also, when the re-queued work runs it might end
>> up doing another sync resume and re-queuing itself once more.
> I believe it might be worse than that and even hang. This is the right
> case for the if_active indeed. But we need to ensure that we will
> always have an outer bound for that.
>
>> However, in
>> the next patch we do also have a flush of the work in the rpm_suspend path,
>> so maybe the worker running when we are rpm suspended is not actually a
>> possible case?
> that's the kaboom case!
>
>> Also, thinking about this more, that re-queuing on rpm resume only happens
>> if d3cold is allowed, so when d3cold is not allowed we do want to proceed
>> here we can actually reach here when rpm suspended.
> no, when d3cold is not allowed we don't want to re-flash the fw.
> We just skip and move forward.

My concern was about the first time we attempt the load in the d3cold 
disabled scenario. If we've somehow managed to rpm suspend between 
queuing the work for the first time and the work actually running, 
skipping the flashing would mean the binary is not actually ever loaded. 
Not sure if that's a case we can hit though.

Daniele

>
> My bad, sorry for the noise and please keep the if_active variant in here.
>
>>>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_late_bind_fw.c       | 149 ++++++++++++++++++++-
>>>>    drivers/gpu/drm/xe/xe_late_bind_fw.h       |   1 +
>>>>    drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   7 +
>>>>    3 files changed, 156 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>>>> index 32d1436e7191..52243063d98a 100644
>>>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
>>>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>>>> @@ -16,6 +16,20 @@
>>>>    #include "xe_late_bind_fw.h"
>>>>    #include "xe_pcode.h"
>>>>    #include "xe_pcode_api.h"
>>>> +#include "xe_pm.h"
>>>> +
>>>> +/*
>>>> + * The component should load quite quickly in most cases, but it could take
>>>> + * a bit. Using a very big timeout just to cover the worst case scenario
>>>> + */
>>>> +#define LB_INIT_TIMEOUT_MS 20000
>>>> +
>>>> +/*
>>>> + * Retry interval set to 6 seconds, in steps of 200 ms, to allow time for
>>>> + * other OS components to release the MEI CL handle
>>>> + */
>>>> +#define LB_FW_LOAD_RETRY_MAXCOUNT 30
>>>> +#define LB_FW_LOAD_RETRY_PAUSE_MS 200
>>>>    static const u32 fw_id_to_type[] = {
>>>>    		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
>>>> @@ -31,6 +45,30 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
>>>>    	return container_of(late_bind, struct xe_device, late_bind);
>>>>    }
>>>> +static const char *xe_late_bind_parse_status(uint32_t status)
>>>> +{
>>>> +	switch (status) {
>>>> +	case CSC_LATE_BINDING_STATUS_SUCCESS:
>>>> +		return "success";
>>>> +	case CSC_LATE_BINDING_STATUS_4ID_MISMATCH:
>>>> +		return "4Id Mismatch";
>>>> +	case CSC_LATE_BINDING_STATUS_ARB_FAILURE:
>>>> +		return "ARB Failure";
>>>> +	case CSC_LATE_BINDING_STATUS_GENERAL_ERROR:
>>>> +		return "General Error";
>>>> +	case CSC_LATE_BINDING_STATUS_INVALID_PARAMS:
>>>> +		return "Invalid Params";
>>>> +	case CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE:
>>>> +		return "Invalid Signature";
>>>> +	case CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD:
>>>> +		return "Invalid Payload";
>>>> +	case CSC_LATE_BINDING_STATUS_TIMEOUT:
>>>> +		return "Timeout";
>>>> +	default:
>>>> +		return "Unknown error";
>>>> +	}
>>>> +}
>>>> +
>>>>    static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>>>>    {
>>>>    	struct xe_device *xe = late_bind_to_xe(late_bind);
>>>> @@ -44,6 +82,93 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>>>>    		return 0;
>>>>    }
>>>> +static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
>>>> +{
>>>> +	struct xe_device *xe = late_bind_to_xe(late_bind);
>>>> +	struct xe_late_bind_fw *lbfw;
>>>> +	int fw_id;
>>>> +
>>>> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>>>> +		lbfw = &late_bind->late_bind_fw[fw_id];
>>>> +		if (lbfw->valid && late_bind->wq) {
>>>> +			drm_dbg(&xe->drm, "Flush work: load %s firmware\n",
>>>> +				fw_id_to_name[lbfw->id]);
>>>> +			flush_work(&lbfw->work);
>>>> +		}
>>>> +	}
>>>> +}
>>>> +
>>>> +static void xe_late_bind_work(struct work_struct *work)
>>>> +{
>>>> +	struct xe_late_bind_fw *lbfw = container_of(work, struct xe_late_bind_fw, work);
>>>> +	struct xe_late_bind *late_bind = container_of(lbfw, struct xe_late_bind,
>>>> +						      late_bind_fw[lbfw->id]);
>>>> +	struct xe_device *xe = late_bind_to_xe(late_bind);
>>>> +	int retry = LB_FW_LOAD_RETRY_MAXCOUNT;
>>>> +	int ret;
>>>> +	int slept;
>>>> +
>>>> +	/* we can queue this before the component is bound */
>>>> +	for (slept = 0; slept < LB_INIT_TIMEOUT_MS; slept += 100) {
>>>> +		if (late_bind->component.ops)
>>>> +			break;
>>>> +		msleep(100);
>>>> +	}
>>>> +
>>>> +	if (!xe_pm_runtime_get_if_active(xe))
>>>> +		return;
>>>> +
>>>> +	mutex_lock(&late_bind->mutex);
>>>> +
>>>> +	if (!late_bind->component.ops) {
>>>> +		drm_err(&xe->drm, "Late bind component not bound\n");
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	drm_dbg(&xe->drm, "Load %s firmware\n", fw_id_to_name[lbfw->id]);
>>>> +
>>>> +	do {
>>>> +		ret = late_bind->component.ops->push_config(late_bind->component.mei_dev,
>>>> +							    lbfw->type, lbfw->flags,
>>>> +							    lbfw->payload, lbfw->payload_size);
>>>> +		if (!ret)
>>>> +			break;
>>>> +		msleep(LB_FW_LOAD_RETRY_PAUSE_MS);
>>>> +	} while (--retry && ret == -EBUSY);
>>>> +
>>>> +	if (!ret) {
>>>> +		drm_dbg(&xe->drm, "Load %s firmware successful\n",
>>>> +			fw_id_to_name[lbfw->id]);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (ret > 0)
>> nit: here you can just do "else if" and drop the goto.
>>
>> Daniele
>>
>>>> +		drm_err(&xe->drm, "Load %s firmware failed with err %d, %s\n",
>>>> +			fw_id_to_name[lbfw->id], ret, xe_late_bind_parse_status(ret));
>>>> +	else
>>>> +		drm_err(&xe->drm, "Load %s firmware failed with err %d",
>>>> +			fw_id_to_name[lbfw->id], ret);
>>>> +out:
>>>> +	mutex_unlock(&late_bind->mutex);
>>>> +	xe_pm_runtime_put(xe);
>>>> +}
>>>> +
>>>> +int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
>>>> +{
>>>> +	struct xe_late_bind_fw *lbfw;
>>>> +	int fw_id;
>>>> +
>>>> +	if (!late_bind->component_added)
>>>> +		return -ENODEV;
>>>> +
>>>> +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>>>> +		lbfw = &late_bind->late_bind_fw[fw_id];
>>>> +		if (lbfw->valid)
>>>> +			queue_work(late_bind->wq, &lbfw->work);
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>> +
>>>>    static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>>>>    {
>>>>    	struct xe_device *xe = late_bind_to_xe(late_bind);
>>>> @@ -99,6 +224,7 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
>>>>    	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
>>>>    	release_firmware(fw);
>>>> +	INIT_WORK(&lb_fw->work, xe_late_bind_work);
>>>>    	lb_fw->valid = true;
>>>>    	return 0;
>>>> @@ -109,11 +235,16 @@ static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
>>>>    	int ret;
>>>>    	int fw_id;
>>>> +	late_bind->wq = alloc_ordered_workqueue("late-bind-ordered-wq", 0);
>>>> +	if (!late_bind->wq)
>>>> +		return -ENOMEM;
>>>> +
>>>>    	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>>>>    		ret = __xe_late_bind_fw_init(late_bind, fw_id);
>>>>    		if (ret)
>>>>    			return ret;
>>>>    	}
>>>> +
>>>>    	return 0;
>>>>    }
>>>> @@ -137,6 +268,8 @@ static void xe_late_bind_component_unbind(struct device *xe_kdev,
>>>>    	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
>>>>    	struct xe_late_bind *late_bind = &xe->late_bind;
>>>> +	xe_late_bind_wait_for_worker_completion(late_bind);
>>>> +
>>>>    	mutex_lock(&late_bind->mutex);
>>>>    	late_bind->component.ops = NULL;
>>>>    	mutex_unlock(&late_bind->mutex);
>>>> @@ -152,7 +285,15 @@ static void xe_late_bind_remove(void *arg)
>>>>    	struct xe_late_bind *late_bind = arg;
>>>>    	struct xe_device *xe = late_bind_to_xe(late_bind);
>>>> +	xe_late_bind_wait_for_worker_completion(late_bind);
>>>> +
>>>> +	late_bind->component_added = false;
>>>> +
>>>>    	component_del(xe->drm.dev, &xe_late_bind_component_ops);
>>>> +	if (late_bind->wq) {
>>>> +		destroy_workqueue(late_bind->wq);
>>>> +		late_bind->wq = NULL;
>>>> +	}
>>>>    	mutex_destroy(&late_bind->mutex);
>>>>    }
>>>> @@ -183,9 +324,15 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
>>>>    		return err;
>>>>    	}
>>>> +	late_bind->component_added = true;
>>>> +
>>>>    	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
>>>>    	if (err)
>>>>    		return err;
>>>> -	return xe_late_bind_fw_init(late_bind);
>>>> +	err = xe_late_bind_fw_init(late_bind);
>>>> +	if (err)
>>>> +		return err;
>>>> +
>>>> +	return xe_late_bind_fw_load(late_bind);
>>>>    }
>>>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>>>> index 4c73571c3e62..28d56ed2bfdc 100644
>>>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
>>>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>>>> @@ -11,5 +11,6 @@
>>>>    struct xe_late_bind;
>>>>    int xe_late_bind_init(struct xe_late_bind *late_bind);
>>>> +int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>>>> index 93abf4c51789..f119a75f4c9c 100644
>>>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>>>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>>>> @@ -9,6 +9,7 @@
>>>>    #include <linux/iosys-map.h>
>>>>    #include <linux/mutex.h>
>>>>    #include <linux/types.h>
>>>> +#include <linux/workqueue.h>
>>>>    #define MAX_PAYLOAD_SIZE SZ_4K
>>>> @@ -38,6 +39,8 @@ struct xe_late_bind_fw {
>>>>    	u8  *payload;
>>>>    	/** @late_bind_fw.payload_size: late binding blob payload_size */
>>>>    	size_t payload_size;
>>>> +	/** @late_bind_fw.work: worker to upload latebind blob */
>>>> +	struct work_struct work;
>>>>    };
>>>>    /**
>>>> @@ -64,6 +67,10 @@ struct xe_late_bind {
>>>>    	struct mutex mutex;
>>>>    	/** @late_bind.late_bind_fw: late binding firmware array */
>>>>    	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
>>>> +	/** @late_bind.wq: workqueue to submit request to download late bind blob */
>>>> +	struct workqueue_struct *wq;
>>>> +	/** @late_bind.component_added: whether the component has been added */
>>>> +	bool component_added;
>>>>    };
>>>>    #endif
>>>> -- 
>>>> 2.34.1
>>>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load late binding firmware
  2025-06-26 22:38         ` Daniele Ceraolo Spurio
@ 2025-06-26 22:49           ` Rodrigo Vivi
  0 siblings, 0 replies; 38+ messages in thread
From: Rodrigo Vivi @ 2025-06-26 22:49 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio
  Cc: Badal Nilawar, intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh

On Thu, Jun 26, 2025 at 03:38:18PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 6/26/2025 2:49 PM, Rodrigo Vivi wrote:
> > On Thu, Jun 26, 2025 at 02:27:50PM -0700, Daniele Ceraolo Spurio wrote:
> > > 
> > > On 6/26/2025 10:24 AM, Rodrigo Vivi wrote:
> > > > On Wed, Jun 25, 2025 at 10:30:10PM +0530, Badal Nilawar wrote:
> > > > > Load late binding firmware
> > > > > 
> > > > > v2:
> > > > >    - s/EAGAIN/EBUSY/
> > > > >    - Flush worker in suspend and driver unload (Daniele)
> > > > > v3:
> > > > >    - Use retry interval of 6s, in steps of 200ms, to allow
> > > > >      other OS components release MEI CL handle (Sasha)
> > > > > v4:
> > > > >    - return -ENODEV if component not added (Daniele)
> > > > >    - parse and print status returned by csc
> > > > >    - Use xe_pm_get_if_in_active (Daniele)
> > > > The worker is considered outer bound and it is safe
> > > > to use xe_pm_runtime_get which takes the reference
> > > > and resume synchronously.
> > > > 
> > > > Otherwise, if using get_if_active you need to reschedule
> > > > the work or you lose your job.
> > > The issue is that the next patch adds code to re-queue the work from the rpm
> > > resume path, so if we do a sync resume here the worker will re-queue itself
> > > immediately when not needed.
> > ops, I had forgotten about that case, I'm sorry.
> > 
> > > Also, when the re-queued work runs it might end
> > > up doing another sync resume and re-queuing itself once more.
> > I believe it might be worse than that and even hang. This is the right
> > case for the if_active indeed. But we need to ensure that we will
> > always have an outer bound for that.
> > 
> > > However, in
> > > the next patch we do also have a flush of the work in the rpm_suspend path,
> > > so maybe the worker running when we are rpm suspended is not actually a
> > > possible case?
> > that's the kaboom case!
> > 
> > > Also, thinking about this more, that re-queuing on rpm resume only happens
> > > if d3cold is allowed, so when d3cold is not allowed we do want to proceed
> > > here we can actually reach here when rpm suspended.
> > no, when d3cold is not allowed we don't want to re-flash the fw.
> > We just skip and move forward.
> 
> My concern was about the first time we attempt the load in the d3cold
> disabled scenario. If we've somehow managed to rpm suspend between queuing
> the work for the first time and the work actually running, skipping the
> flashing would mean the binary is not actually ever loaded. Not sure if
> that's a case we can hit though.

Well, the first time will be triggered during the probe, while it is
active. But we need to trigger without delay... But we need to think
how to ensure this will start running before the probe is finished and the rpm
is allowed.

Or we create 2 different workers, one for probe and one for suspend. :/

> 
> Daniele
> 
> > 
> > My bad, sorry for the noise and please keep the if_active variant in here.
> > 
> > > > > Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/xe/xe_late_bind_fw.c       | 149 ++++++++++++++++++++-
> > > > >    drivers/gpu/drm/xe/xe_late_bind_fw.h       |   1 +
> > > > >    drivers/gpu/drm/xe/xe_late_bind_fw_types.h |   7 +
> > > > >    3 files changed, 156 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> > > > > index 32d1436e7191..52243063d98a 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> > > > > @@ -16,6 +16,20 @@
> > > > >    #include "xe_late_bind_fw.h"
> > > > >    #include "xe_pcode.h"
> > > > >    #include "xe_pcode_api.h"
> > > > > +#include "xe_pm.h"
> > > > > +
> > > > > +/*
> > > > > + * The component should load quite quickly in most cases, but it could take
> > > > > + * a bit. Using a very big timeout just to cover the worst case scenario
> > > > > + */
> > > > > +#define LB_INIT_TIMEOUT_MS 20000
> > > > > +
> > > > > +/*
> > > > > + * Retry interval set to 6 seconds, in steps of 200 ms, to allow time for
> > > > > + * other OS components to release the MEI CL handle
> > > > > + */
> > > > > +#define LB_FW_LOAD_RETRY_MAXCOUNT 30
> > > > > +#define LB_FW_LOAD_RETRY_PAUSE_MS 200
> > > > >    static const u32 fw_id_to_type[] = {
> > > > >    		[XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
> > > > > @@ -31,6 +45,30 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
> > > > >    	return container_of(late_bind, struct xe_device, late_bind);
> > > > >    }
> > > > > +static const char *xe_late_bind_parse_status(uint32_t status)
> > > > > +{
> > > > > +	switch (status) {
> > > > > +	case CSC_LATE_BINDING_STATUS_SUCCESS:
> > > > > +		return "success";
> > > > > +	case CSC_LATE_BINDING_STATUS_4ID_MISMATCH:
> > > > > +		return "4Id Mismatch";
> > > > > +	case CSC_LATE_BINDING_STATUS_ARB_FAILURE:
> > > > > +		return "ARB Failure";
> > > > > +	case CSC_LATE_BINDING_STATUS_GENERAL_ERROR:
> > > > > +		return "General Error";
> > > > > +	case CSC_LATE_BINDING_STATUS_INVALID_PARAMS:
> > > > > +		return "Invalid Params";
> > > > > +	case CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE:
> > > > > +		return "Invalid Signature";
> > > > > +	case CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD:
> > > > > +		return "Invalid Payload";
> > > > > +	case CSC_LATE_BINDING_STATUS_TIMEOUT:
> > > > > +		return "Timeout";
> > > > > +	default:
> > > > > +		return "Unknown error";
> > > > > +	}
> > > > > +}
> > > > > +
> > > > >    static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
> > > > >    {
> > > > >    	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > > > @@ -44,6 +82,93 @@ static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
> > > > >    		return 0;
> > > > >    }
> > > > > +static void xe_late_bind_wait_for_worker_completion(struct xe_late_bind *late_bind)
> > > > > +{
> > > > > +	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > > > +	struct xe_late_bind_fw *lbfw;
> > > > > +	int fw_id;
> > > > > +
> > > > > +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> > > > > +		lbfw = &late_bind->late_bind_fw[fw_id];
> > > > > +		if (lbfw->valid && late_bind->wq) {
> > > > > +			drm_dbg(&xe->drm, "Flush work: load %s firmware\n",
> > > > > +				fw_id_to_name[lbfw->id]);
> > > > > +			flush_work(&lbfw->work);
> > > > > +		}
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +static void xe_late_bind_work(struct work_struct *work)
> > > > > +{
> > > > > +	struct xe_late_bind_fw *lbfw = container_of(work, struct xe_late_bind_fw, work);
> > > > > +	struct xe_late_bind *late_bind = container_of(lbfw, struct xe_late_bind,
> > > > > +						      late_bind_fw[lbfw->id]);
> > > > > +	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > > > +	int retry = LB_FW_LOAD_RETRY_MAXCOUNT;
> > > > > +	int ret;
> > > > > +	int slept;
> > > > > +
> > > > > +	/* we can queue this before the component is bound */
> > > > > +	for (slept = 0; slept < LB_INIT_TIMEOUT_MS; slept += 100) {
> > > > > +		if (late_bind->component.ops)
> > > > > +			break;
> > > > > +		msleep(100);
> > > > > +	}
> > > > > +
> > > > > +	if (!xe_pm_runtime_get_if_active(xe))
> > > > > +		return;
> > > > > +
> > > > > +	mutex_lock(&late_bind->mutex);
> > > > > +
> > > > > +	if (!late_bind->component.ops) {
> > > > > +		drm_err(&xe->drm, "Late bind component not bound\n");
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > > +	drm_dbg(&xe->drm, "Load %s firmware\n", fw_id_to_name[lbfw->id]);
> > > > > +
> > > > > +	do {
> > > > > +		ret = late_bind->component.ops->push_config(late_bind->component.mei_dev,
> > > > > +							    lbfw->type, lbfw->flags,
> > > > > +							    lbfw->payload, lbfw->payload_size);
> > > > > +		if (!ret)
> > > > > +			break;
> > > > > +		msleep(LB_FW_LOAD_RETRY_PAUSE_MS);
> > > > > +	} while (--retry && ret == -EBUSY);
> > > > > +
> > > > > +	if (!ret) {
> > > > > +		drm_dbg(&xe->drm, "Load %s firmware successful\n",
> > > > > +			fw_id_to_name[lbfw->id]);
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > > +	if (ret > 0)
> > > nit: here you can just do "else if" and drop the goto.
> > > 
> > > Daniele
> > > 
> > > > > +		drm_err(&xe->drm, "Load %s firmware failed with err %d, %s\n",
> > > > > +			fw_id_to_name[lbfw->id], ret, xe_late_bind_parse_status(ret));
> > > > > +	else
> > > > > +		drm_err(&xe->drm, "Load %s firmware failed with err %d",
> > > > > +			fw_id_to_name[lbfw->id], ret);
> > > > > +out:
> > > > > +	mutex_unlock(&late_bind->mutex);
> > > > > +	xe_pm_runtime_put(xe);
> > > > > +}
> > > > > +
> > > > > +int xe_late_bind_fw_load(struct xe_late_bind *late_bind)
> > > > > +{
> > > > > +	struct xe_late_bind_fw *lbfw;
> > > > > +	int fw_id;
> > > > > +
> > > > > +	if (!late_bind->component_added)
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> > > > > +		lbfw = &late_bind->late_bind_fw[fw_id];
> > > > > +		if (lbfw->valid)
> > > > > +			queue_work(late_bind->wq, &lbfw->work);
> > > > > +	}
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >    static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
> > > > >    {
> > > > >    	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > > > @@ -99,6 +224,7 @@ static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, u32 fw_id)
> > > > >    	memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
> > > > >    	release_firmware(fw);
> > > > > +	INIT_WORK(&lb_fw->work, xe_late_bind_work);
> > > > >    	lb_fw->valid = true;
> > > > >    	return 0;
> > > > > @@ -109,11 +235,16 @@ static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
> > > > >    	int ret;
> > > > >    	int fw_id;
> > > > > +	late_bind->wq = alloc_ordered_workqueue("late-bind-ordered-wq", 0);
> > > > > +	if (!late_bind->wq)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > >    	for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
> > > > >    		ret = __xe_late_bind_fw_init(late_bind, fw_id);
> > > > >    		if (ret)
> > > > >    			return ret;
> > > > >    	}
> > > > > +
> > > > >    	return 0;
> > > > >    }
> > > > > @@ -137,6 +268,8 @@ static void xe_late_bind_component_unbind(struct device *xe_kdev,
> > > > >    	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
> > > > >    	struct xe_late_bind *late_bind = &xe->late_bind;
> > > > > +	xe_late_bind_wait_for_worker_completion(late_bind);
> > > > > +
> > > > >    	mutex_lock(&late_bind->mutex);
> > > > >    	late_bind->component.ops = NULL;
> > > > >    	mutex_unlock(&late_bind->mutex);
> > > > > @@ -152,7 +285,15 @@ static void xe_late_bind_remove(void *arg)
> > > > >    	struct xe_late_bind *late_bind = arg;
> > > > >    	struct xe_device *xe = late_bind_to_xe(late_bind);
> > > > > +	xe_late_bind_wait_for_worker_completion(late_bind);
> > > > > +
> > > > > +	late_bind->component_added = false;
> > > > > +
> > > > >    	component_del(xe->drm.dev, &xe_late_bind_component_ops);
> > > > > +	if (late_bind->wq) {
> > > > > +		destroy_workqueue(late_bind->wq);
> > > > > +		late_bind->wq = NULL;
> > > > > +	}
> > > > >    	mutex_destroy(&late_bind->mutex);
> > > > >    }
> > > > > @@ -183,9 +324,15 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
> > > > >    		return err;
> > > > >    	}
> > > > > +	late_bind->component_added = true;
> > > > > +
> > > > >    	err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
> > > > >    	if (err)
> > > > >    		return err;
> > > > > -	return xe_late_bind_fw_init(late_bind);
> > > > > +	err = xe_late_bind_fw_init(late_bind);
> > > > > +	if (err)
> > > > > +		return err;
> > > > > +
> > > > > +	return xe_late_bind_fw_load(late_bind);
> > > > >    }
> > > > > diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> > > > > index 4c73571c3e62..28d56ed2bfdc 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_late_bind_fw.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> > > > > @@ -11,5 +11,6 @@
> > > > >    struct xe_late_bind;
> > > > >    int xe_late_bind_init(struct xe_late_bind *late_bind);
> > > > > +int xe_late_bind_fw_load(struct xe_late_bind *late_bind);
> > > > >    #endif
> > > > > diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> > > > > index 93abf4c51789..f119a75f4c9c 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> > > > > @@ -9,6 +9,7 @@
> > > > >    #include <linux/iosys-map.h>
> > > > >    #include <linux/mutex.h>
> > > > >    #include <linux/types.h>
> > > > > +#include <linux/workqueue.h>
> > > > >    #define MAX_PAYLOAD_SIZE SZ_4K
> > > > > @@ -38,6 +39,8 @@ struct xe_late_bind_fw {
> > > > >    	u8  *payload;
> > > > >    	/** @late_bind_fw.payload_size: late binding blob payload_size */
> > > > >    	size_t payload_size;
> > > > > +	/** @late_bind_fw.work: worker to upload latebind blob */
> > > > > +	struct work_struct work;
> > > > >    };
> > > > >    /**
> > > > > @@ -64,6 +67,10 @@ struct xe_late_bind {
> > > > >    	struct mutex mutex;
> > > > >    	/** @late_bind.late_bind_fw: late binding firmware array */
> > > > >    	struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
> > > > > +	/** @late_bind.wq: workqueue to submit request to download late bind blob */
> > > > > +	struct workqueue_struct *wq;
> > > > > +	/** @late_bind.component_added: whether the component has been added */
> > > > > +	bool component_added;
> > > > >    };
> > > > >    #endif
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 07/10] drm/xe/xe_late_bind_fw: Reload late binding fw during system resume
  2025-06-25 17:00 ` [PATCH v4 07/10] drm/xe/xe_late_bind_fw: Reload late binding fw during system resume Badal Nilawar
@ 2025-06-27  7:53   ` Nilawar, Badal
  0 siblings, 0 replies; 38+ messages in thread
From: Nilawar, Badal @ 2025-06-27  7:53 UTC (permalink / raw)
  To: intel-xe, dri-devel, linux-kernel, Daniele Ceraolo Spurio
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh

Hi Daniele,

On 25-06-2025 22:30, Badal Nilawar wrote:
> Reload late binding fw during resume from system suspend
>
> v2:
>    - Unconditionally reload late binding fw (Rodrigo)
>    - Flush worker during system suspend
>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_pm.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index 91923fd4af80..f49b7b6eab97 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -127,6 +127,8 @@ int xe_pm_suspend(struct xe_device *xe)
>   	if (err)
>   		goto err;
>   
> +	xe_late_bind_wait_for_worker_completion(&xe->late_bind);
> +

During system suspend, MEI will unbind the component. This flush is 
unnecessary because it takes place within 
xe_late_bind_component_unbind(). I will remove this call.

Badal

>   	for_each_gt(gt, xe, id)
>   		xe_gt_suspend_prepare(gt);
>   
> @@ -205,6 +207,8 @@ int xe_pm_resume(struct xe_device *xe)
>   
>   	xe_pxp_pm_resume(xe->pxp);
>   
> +	xe_late_bind_fw_load(&xe->late_bind);
> +
>   	drm_dbg(&xe->drm, "Device resumed\n");
>   	return 0;
>   err:

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware
  2025-06-26 21:06   ` Daniele Ceraolo Spurio
@ 2025-06-27 12:48     ` Nilawar, Badal
  0 siblings, 0 replies; 38+ messages in thread
From: Nilawar, Badal @ 2025-06-27 12:48 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-xe, dri-devel, linux-kernel
  Cc: anshuman.gupta, rodrigo.vivi, alexander.usyskin, gregkh


On 27-06-2025 02:36, Daniele Ceraolo Spurio wrote:
>
>
> On 6/25/2025 10:00 AM, Badal Nilawar wrote:
>> Search for late binding firmware binaries and populate the meta data of
>> firmware structures.
>>
>> v2 (Daniele):
>>   - drm_err if firmware size is more than max pay load size
>>   - s/request_firmware/firmware_request_nowarn/ as firmware will
>>     not be available for all possible cards
>> v3 (Daniele):
>>   - init firmware from within xe_late_bind_init, propagate error
>>   - switch late_bind_fw to array to handle multiple firmware types
>> v4 (Daniele):
>>   - Alloc payload dynamically, fix nits
>>
>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_late_bind_fw.c       | 103 ++++++++++++++++++++-
>>   drivers/gpu/drm/xe/xe_late_bind_fw_types.h |  32 +++++++
>>   2 files changed, 134 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c 
>> b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> index eaf12cfec848..32d1436e7191 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> @@ -5,6 +5,7 @@
>>     #include <linux/component.h>
>>   #include <linux/delay.h>
>> +#include <linux/firmware.h>
>>     #include <drm/drm_managed.h>
>>   #include <drm/intel/i915_component.h>
>> @@ -13,6 +14,16 @@
>>     #include "xe_device.h"
>>   #include "xe_late_bind_fw.h"
>> +#include "xe_pcode.h"
>> +#include "xe_pcode_api.h"
>> +
>> +static const u32 fw_id_to_type[] = {
>> +        [XE_LB_FW_FAN_CONTROL] = CSC_LATE_BINDING_TYPE_FAN_CONTROL,
>> +    };
>> +
>> +static const char * const fw_id_to_name[] = {
>> +        [XE_LB_FW_FAN_CONTROL] = "fan_control",
>> +    };
>>     static struct xe_device *
>>   late_bind_to_xe(struct xe_late_bind *late_bind)
>> @@ -20,6 +31,92 @@ late_bind_to_xe(struct xe_late_bind *late_bind)
>>       return container_of(late_bind, struct xe_device, late_bind);
>>   }
>>   +static int xe_late_bind_fw_num_fans(struct xe_late_bind *late_bind)
>> +{
>> +    struct xe_device *xe = late_bind_to_xe(late_bind);
>> +    struct xe_tile *root_tile = xe_device_get_root_tile(xe);
>> +    u32 uval;
>> +
>> +    if (!xe_pcode_read(root_tile,
>> +               PCODE_MBOX(FAN_SPEED_CONTROL, FSC_READ_NUM_FANS, 0), 
>> &uval, NULL))
>> +        return uval;
>> +    else
>> +        return 0;
>> +}
>> +
>> +static int __xe_late_bind_fw_init(struct xe_late_bind *late_bind, 
>> u32 fw_id)
>> +{
>> +    struct xe_device *xe = late_bind_to_xe(late_bind);
>> +    struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>> +    struct xe_late_bind_fw *lb_fw;
>> +    const struct firmware *fw;
>> +    u32 num_fans;
>> +    int ret;
>> +
>> +    if (fw_id >= XE_LB_FW_MAX_ID)
>> +        return -EINVAL;
>> +
>> +    lb_fw = &late_bind->late_bind_fw[fw_id];
>> +
>> +    lb_fw->valid = false;
>> +    lb_fw->id = fw_id;
>> +    lb_fw->type = fw_id_to_type[lb_fw->id];
>> +    lb_fw->flags &= ~CSC_LATE_BINDING_FLAGS_IS_PERSISTENT;
>> +
>> +    if (lb_fw->type == CSC_LATE_BINDING_TYPE_FAN_CONTROL) {
>> +        num_fans = xe_late_bind_fw_num_fans(late_bind);
>> +        drm_dbg(&xe->drm, "Number of Fans: %d\n", num_fans);
>> +        if (!num_fans)
>> +            return 0;
>> +    }
>> +
>> +    snprintf(lb_fw->blob_path, sizeof(lb_fw->blob_path), 
>> "xe/%s_8086_%04x_%04x_%04x.bin",
>> +         fw_id_to_name[lb_fw->id], pdev->device,
>> +         pdev->subsystem_vendor, pdev->subsystem_device);
>> +
>> +    drm_dbg(&xe->drm, "Request late binding firmware %s\n", 
>> lb_fw->blob_path);
>> +    ret = firmware_request_nowarn(&fw, lb_fw->blob_path, xe->drm.dev);
>> +    if (ret) {
>> +        drm_dbg(&xe->drm, "%s late binding fw not available for 
>> current device",
>> +            fw_id_to_name[lb_fw->id]);
>> +        return 0;
>> +    }
>> +
>> +    if (fw->size > MAX_PAYLOAD_SIZE) {
>> +        drm_err(&xe->drm, "Firmware %s size %zu is larger than max 
>> pay load size %u\n",
>> +            lb_fw->blob_path, fw->size, MAX_PAYLOAD_SIZE);
>> +        release_firmware(fw);
>> +        return -ENODATA;
>> +    }
>> +
>> +    lb_fw->payload = drmm_kzalloc(&xe->drm, lb_fw->payload_size, 
>> GFP_KERNEL);
>
> here you're using lb_fw->payload_size before assigning it.

My bad, I will fix it.  But I'm curious why drmm_kzalloc, unlike 
kzalloc, doesn't perform a size=0 check. When size=0, kzalloc returns 
ZERO_SIZE_POINTER, which is absent in drmm_kzalloc. Even if drmm_kzalloc 
had returned ZERO_SIZE_POINTER, the check below wouldn't have caught it.

>
>> +    if (!lb_fw->payload) {
>> +        release_firmware(fw);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    lb_fw->payload_size = fw->size;
>> +
>> +    memcpy(lb_fw->payload, fw->data, lb_fw->payload_size);
>> +    release_firmware(fw);
>> +    lb_fw->valid = true;
>
> You can now use lb_fw->payload to check if the FW is valid, no need 
> for a separate variable. not a blocker.
Sure.
>
>> +
>> +    return 0;
>> +}
>> +
>> +static int xe_late_bind_fw_init(struct xe_late_bind *late_bind)
>> +{
>> +    int ret;
>> +    int fw_id;
>> +
>> +    for (fw_id = 0; fw_id < XE_LB_FW_MAX_ID; fw_id++) {
>> +        ret = __xe_late_bind_fw_init(late_bind, fw_id);
>> +        if (ret)
>> +            return ret;
>> +    }
>> +    return 0;
>> +}
>> +
>>   static int xe_late_bind_component_bind(struct device *xe_kdev,
>>                          struct device *mei_kdev, void *data)
>>   {
>> @@ -86,5 +183,9 @@ int xe_late_bind_init(struct xe_late_bind *late_bind)
>>           return err;
>>       }
>>   -    return devm_add_action_or_reset(xe->drm.dev, 
>> xe_late_bind_remove, late_bind);
>> +    err = devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, 
>> late_bind);
>> +    if (err)
>> +        return err;
>> +
>> +    return xe_late_bind_fw_init(late_bind);
>>   }
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h 
>> b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> index 1156ef94f0d5..93abf4c51789 100644
>> --- a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> @@ -10,6 +10,36 @@
>>   #include <linux/mutex.h>
>>   #include <linux/types.h>
>>   +#define MAX_PAYLOAD_SIZE SZ_4K
>> +
>> +/**
>> + * xe_late_bind_fw_id - enum to determine late binding fw index
>> + */
>> +enum xe_late_bind_fw_id {
>> +    XE_LB_FW_FAN_CONTROL = 0,
>> +    XE_LB_FW_MAX_ID
>> +};
>> +
>> +/**
>> + * struct xe_late_bind_fw
>> + */
>> +struct xe_late_bind_fw {
>> +    /** @late_bind_fw.valid: to check if fw is valid */
>> +    bool valid;
>> +    /** @late_bind_fw.id: firmware index */
>> +    u32 id;
>> +    /** @late_bind_fw.blob_path: firmware binary path */
>> +    char blob_path[PATH_MAX];
>> +    /** @late_bind_fw.type: firmware type */
>> +    u32  type;
>> +    /** @late_bind_fw.flags: firmware flags */
>> +    u32  flags;
>> +    /** @late_bind_fw.payload: to store the late binding blob */
>> +    u8  *payload;
>
> Why a u8 pointer and not a void one?

It should have been const u8 * as firmware structure has uses const u8 *.

struct firmware {
         size_t size;
         const u8 *data;

         /* firmware loader private fields */
         void *priv;
};

Badal

> Daniele
>
>> +    /** @late_bind_fw.payload_size: late binding blob payload_size */
>> +    size_t payload_size;
>> +};
>> +
>>   /**
>>    * struct xe_late_bind_component - Late Binding services component
>>    * @mei_dev: device that provide Late Binding service.
>> @@ -32,6 +62,8 @@ struct xe_late_bind {
>>       struct xe_late_bind_component component;
>>       /** @late_bind.mutex: protects the component binding and usage */
>>       struct mutex mutex;
>> +    /** @late_bind.late_bind_fw: late binding firmware array */
>> +    struct xe_late_bind_fw late_bind_fw[XE_LB_FW_MAX_ID];
>>   };
>>     #endif
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-26  3:50   ` Gupta, Anshuman
@ 2025-06-27 14:06     ` Nilawar, Badal
  0 siblings, 0 replies; 38+ messages in thread
From: Nilawar, Badal @ 2025-06-27 14:06 UTC (permalink / raw)
  To: Gupta, Anshuman, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Usyskin, Alexander
  Cc: Vivi, Rodrigo, gregkh@linuxfoundation.org,
	Ceraolo Spurio, Daniele


On 26-06-2025 09:20, Gupta, Anshuman wrote:
>
>> -----Original Message-----
>> From: Nilawar, Badal <badal.nilawar@intel.com>
>> Sent: Wednesday, June 25, 2025 10:30 PM
>> To: intel-xe@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
>> kernel@vger.kernel.org
>> Cc: Gupta, Anshuman <anshuman.gupta@intel.com>; Vivi, Rodrigo
>> <rodrigo.vivi@intel.com>; Usyskin, Alexander <alexander.usyskin@intel.com>;
>> gregkh@linuxfoundation.org; Ceraolo Spurio, Daniele
>> <daniele.ceraolospurio@intel.com>
>> Subject: [PATCH v4 02/10] mei: late_bind: add late binding component driver
>>
>> From: Alexander Usyskin <alexander.usyskin@intel.com>
>>
>> Add late binding component driver.
>> It allows pushing the late binding configuration from, for example, the Xe
>> graphics driver to the Intel discrete graphics card's CSE device.
>>
>> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>> ---
>> v2:
>>   - Use generic naming (Jani)
>>   - Drop xe_late_bind_component struct to move to xe code (Daniele/Sasha)
>> v3:
>>   - Updated kconfig description
>>   - Move CSC late binding specific flags/defines to late_bind_mei_interface.h
>> (Daniele)
>> v4:
>>   - Add match for PCI_CLASS_DISPLAY_OTHER to support headless cards
>> (Anshuman)
>> v5:
>>   - Add fixes in push_config (Sasha)
>>   - Use INTEL_ prefix for component, refine doc,
>>     add status enum to headerlate_bind_mei_interface.h (Anshuman)
>> ---
>>   drivers/misc/mei/Kconfig                    |   1 +
>>   drivers/misc/mei/Makefile                   |   1 +
>>   drivers/misc/mei/late_bind/Kconfig          |  13 +
>>   drivers/misc/mei/late_bind/Makefile         |   9 +
>>   drivers/misc/mei/late_bind/mei_late_bind.c  | 281 ++++++++++++++++++++
>>   include/drm/intel/i915_component.h          |   1 +
>>   include/drm/intel/late_bind_mei_interface.h |  64 +++++
>>   7 files changed, 370 insertions(+)
>>   create mode 100644 drivers/misc/mei/late_bind/Kconfig
>>   create mode 100644 drivers/misc/mei/late_bind/Makefile
>>   create mode 100644 drivers/misc/mei/late_bind/mei_late_bind.c
>>   create mode 100644 include/drm/intel/late_bind_mei_interface.h
>>
>> diff --git a/drivers/misc/mei/Kconfig b/drivers/misc/mei/Kconfig index
>> 7575fee96cc6..771becc68095 100644
>> --- a/drivers/misc/mei/Kconfig
>> +++ b/drivers/misc/mei/Kconfig
>> @@ -84,5 +84,6 @@ config INTEL_MEI_VSC
>>   source "drivers/misc/mei/hdcp/Kconfig"
>>   source "drivers/misc/mei/pxp/Kconfig"
>>   source "drivers/misc/mei/gsc_proxy/Kconfig"
>> +source "drivers/misc/mei/late_bind/Kconfig"
>>
>>   endif
>> diff --git a/drivers/misc/mei/Makefile b/drivers/misc/mei/Makefile index
>> 6f9fdbf1a495..84bfde888d81 100644
>> --- a/drivers/misc/mei/Makefile
>> +++ b/drivers/misc/mei/Makefile
>> @@ -31,6 +31,7 @@ CFLAGS_mei-trace.o = -I$(src)
>>   obj-$(CONFIG_INTEL_MEI_HDCP) += hdcp/
>>   obj-$(CONFIG_INTEL_MEI_PXP) += pxp/
>>   obj-$(CONFIG_INTEL_MEI_GSC_PROXY) += gsc_proxy/
>> +obj-$(CONFIG_INTEL_MEI_LATE_BIND) += late_bind/
>>
>>   obj-$(CONFIG_INTEL_MEI_VSC_HW) += mei-vsc-hw.o  mei-vsc-hw-y := vsc-
>> tp.o diff --git a/drivers/misc/mei/late_bind/Kconfig
>> b/drivers/misc/mei/late_bind/Kconfig
>> new file mode 100644
>> index 000000000000..65c7180c5678
>> --- /dev/null
>> +++ b/drivers/misc/mei/late_bind/Kconfig
>> @@ -0,0 +1,13 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (c) 2025, Intel Corporation. All rights reserved.
>> +#
>> +config INTEL_MEI_LATE_BIND
>> +	tristate "Intel late binding support on ME Interface"
>> +	select INTEL_MEI_ME
>> +	depends on DRM_XE
>> +	help
>> +	  MEI Support for Late Binding for Intel graphics card.
>> +
>> +	  Enables the ME FW interfaces for Late Binding feature,
>> +	  allowing loading of firmware for the devices like Fan
>> +	  Controller during by Intel Xe driver.
>> diff --git a/drivers/misc/mei/late_bind/Makefile
>> b/drivers/misc/mei/late_bind/Makefile
>> new file mode 100644
>> index 000000000000..a0aeda5853f0
>> --- /dev/null
>> +++ b/drivers/misc/mei/late_bind/Makefile
>> @@ -0,0 +1,9 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +#
>> +# Copyright (c) 2025, Intel Corporation. All rights reserved.
>> +#
>> +# Makefile - Late Binding client driver for Intel MEI Bus Driver.
>> +
>> +subdir-ccflags-y += -I$(srctree)/drivers/misc/mei/
>> +
>> +obj-$(CONFIG_INTEL_MEI_LATE_BIND) += mei_late_bind.o
>> diff --git a/drivers/misc/mei/late_bind/mei_late_bind.c
>> b/drivers/misc/mei/late_bind/mei_late_bind.c
>> new file mode 100644
>> index 000000000000..ffb89ccdfbb1
>> --- /dev/null
>> +++ b/drivers/misc/mei/late_bind/mei_late_bind.c
>> @@ -0,0 +1,281 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2025 Intel Corporation  */ #include
>> +<drm/intel/i915_component.h> #include
>> +<drm/intel/late_bind_mei_interface.h>
>> +#include <linux/component.h>
>> +#include <linux/pci.h>
>> +#include <linux/mei_cl_bus.h>
>> +#include <linux/module.h>
>> +#include <linux/overflow.h>
>> +#include <linux/slab.h>
>> +#include <linux/uuid.h>
>> +
>> +#include "mkhi.h"
>> +
>> +#define GFX_SRV_MKHI_LATE_BINDING_CMD 0x12 #define
>> +GFX_SRV_MKHI_LATE_BINDING_RSP (GFX_SRV_MKHI_LATE_BINDING_CMD
>> | 0x80)
>> +
>> +#define LATE_BIND_SEND_TIMEOUT_MSEC 3000 #define
>> +LATE_BIND_RECV_TIMEOUT_MSEC 3000
>> +
>> +/**
>> + * struct csc_heci_late_bind_req - late binding request
>> + * @header: @ref mkhi_msg_hdr
>> + * @type: type of the late binding payload
>> + * @flags: flags to be passed to the firmware
>> + * @reserved: reserved field
>> + * @payload_size: size of the payload data in bytes
>> + * @payload: data to be sent to the firmware  */ struct
>> +csc_heci_late_bind_req {
>> +	struct mkhi_msg_hdr header;
>> +	u32 type;
>> +	u32 flags;
>> +	u32 reserved[2];
>> +	u32 payload_size;
>> +	u8  payload[] __counted_by(payload_size); } __packed;
>> +
>> +/**
>> + * struct csc_heci_late_bind_rsp - late binding response
>> + * @header: @ref mkhi_msg_hdr
>> + * @type: type of the late binding payload
>> + * @reserved: reserved field
>> + * @status: status of the late binding command execution by firmware
>> +*/ struct csc_heci_late_bind_rsp {
>> +	struct mkhi_msg_hdr header;
>> +	u32 type;
>> +	u32 reserved[2];
>> +	u32 status;
>> +} __packed;
>> +
>> +static int mei_late_bind_check_response(const struct device *dev, const
>> +struct mkhi_msg_hdr *hdr) {
>> +	if (hdr->group_id != MKHI_GROUP_ID_GFX) {
>> +		dev_err(dev, "Mismatch group id: 0x%x instead of 0x%x\n",
>> +			hdr->group_id, MKHI_GROUP_ID_GFX);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (hdr->command != GFX_SRV_MKHI_LATE_BINDING_RSP) {
>> +		dev_err(dev, "Mismatch command: 0x%x instead of 0x%x\n",
>> +			hdr->command,
>> GFX_SRV_MKHI_LATE_BINDING_RSP);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (hdr->result) {
>> +		dev_err(dev, "Error in result: 0x%x\n", hdr->result);
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * mei_late_bind_push_config - Sends a config to the firmware.
>> + * @dev: device struct corresponding to the mei device
>> + * @type: payload type
>> + * @flags: payload flags
>> + * @payload: payload buffer
>> + * @payload_size: payload buffer size
>> + *
>> + * Return: 0 success, negative errno value on transport failure,
>> + *         positive status returned by FW
>> + */
>> +static int mei_late_bind_push_config(struct device *dev, u32 type, u32 flags,
>> +				     const void *payload, size_t payload_size) {
>> +	struct mei_cl_device *cldev;
>> +	struct csc_heci_late_bind_req *req = NULL;
>> +	struct csc_heci_late_bind_rsp rsp;
>> +	size_t req_size;
>> +	ssize_t ret;
>> +
>> +	if (!dev || !payload || !payload_size)
>> +		return -EINVAL;
>> +
>> +	cldev = to_mei_cl_device(dev);
>> +
>> +	ret = mei_cldev_enable(cldev);
>> +	if (ret < 0) {
>> +		dev_dbg(dev, "mei_cldev_enable failed. %zd\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	req_size = struct_size(req, payload, payload_size);
>> +	if (req_size > mei_cldev_mtu(cldev)) {
>> +		dev_err(dev, "Payload is too big %zu\n", payload_size);
>> +		ret = -EMSGSIZE;
>> +		goto end;
>> +	}
>> +
>> +	req = kmalloc(req_size, GFP_KERNEL);
>> +	if (!req) {
>> +		ret = -ENOMEM;
>> +		goto end;
>> +	}
>> +
>> +	req->header.group_id = MKHI_GROUP_ID_GFX;
>> +	req->header.command = GFX_SRV_MKHI_LATE_BINDING_CMD;
>> +	req->type = type;
>> +	req->flags = flags;
>> +	req->reserved[0] = 0;
>> +	req->reserved[1] = 0;
>> +	req->payload_size = payload_size;
>> +	memcpy(req->payload, payload, payload_size);
>> +
>> +	ret = mei_cldev_send_timeout(cldev, (void *)req, req_size,
>> LATE_BIND_SEND_TIMEOUT_MSEC);
>> +	if (ret < 0) {
>> +		dev_err(dev, "mei_cldev_send failed. %zd\n", ret);
>> +		goto end;
>> +	}
>> +
>> +	ret = mei_cldev_recv_timeout(cldev, (void *)&rsp, sizeof(rsp),
>> LATE_BIND_RECV_TIMEOUT_MSEC);
>> +	if (ret < 0) {
>> +		dev_err(dev, "mei_cldev_recv failed. %zd\n", ret);
>> +		goto end;
>> +	}
>> +	if (ret < sizeof(rsp.header)) {
>> +		dev_err(dev, "bad response header from the firmware: size
>> %zd < %zu\n",
>> +			ret, sizeof(rsp.header));
>> +		goto end;
>> +	}
>> +	if (ret < sizeof(rsp)) {
>> +		dev_err(dev, "bad response from the firmware: size %zd <
>> %zu\n",
>> +			ret, sizeof(rsp));
>> +		goto end;
>> +	}
>> +
>> +	ret = mei_late_bind_check_response(dev, &rsp.header);
>> +	if (ret) {
>> +		dev_err(dev, "bad result response from the firmware:
>> 0x%x\n",
>> +			*(uint32_t *)&rsp.header);
>> +		goto end;
>> +	}
>> +
>> +	ret = (int)rsp.status;
>> +	dev_dbg(dev, "%s status = %zd\n", __func__, ret);
>> +
>> +end:
>> +	mei_cldev_disable(cldev);
>> +	kfree(req);
>> +	return ret;
>> +}
>> +
>> +static const struct late_bind_component_ops mei_late_bind_ops = {
>> +	.owner = THIS_MODULE,
>> +	.push_config = mei_late_bind_push_config, };
>> +
>> +static int mei_component_master_bind(struct device *dev) {
>> +	return component_bind_all(dev, (void *)&mei_late_bind_ops); }
>> +
>> +static void mei_component_master_unbind(struct device *dev) {
>> +	component_unbind_all(dev, (void *)&mei_late_bind_ops); }
>> +
>> +static const struct component_master_ops mei_component_master_ops = {
>> +	.bind = mei_component_master_bind,
>> +	.unbind = mei_component_master_unbind, };
>> +
>> +/**
>> + * mei_late_bind_component_match - compare function for matching mei
>> late bind.
>> + *
>> + *    The function checks if requester is Intel PCI_CLASS_DISPLAY_VGA or
>> + *    PCI_CLASS_DISPLAY_OTHER device, and checks if the parent of requester
> DOC is still wrong dev is requester here, you are checking base == dev.
> With fixing of that.
> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>

It should be requester is grand parent of late_bind mei device.

Thanks,
Badal

>
> Thanks,
> Anshuman
>> + *    and the grand parent of mei_if are the same device
>> + *
>> + * @dev: master device
>> + * @subcomponent: subcomponent to match
>> (INTEL_COMPONENT_LATE_BIND)
>> + * @data: compare data (late_bind mei device on mei bus)
>> + *
>> + * Return:
>> + * * 1 - if components match
>> + * * 0 - otherwise
>> + */
>> +static int mei_late_bind_component_match(struct device *dev, int
>> subcomponent,
>> +					 void *data)
>> +{
>> +	struct device *base = data;
>> +	struct pci_dev *pdev;
>> +
>> +	if (!dev)
>> +		return 0;
>> +
>> +	if (!dev_is_pci(dev))
>> +		return 0;
>> +
>> +	pdev = to_pci_dev(dev);
>> +
>> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
>> +		return 0;
>> +
>> +	if (pdev->class != (PCI_CLASS_DISPLAY_VGA << 8) &&
>> +	    pdev->class != (PCI_CLASS_DISPLAY_OTHER << 8))
>> +		return 0;
>> +
>> +	if (subcomponent != INTEL_COMPONENT_LATE_BIND)
>> +		return 0;
>> +
>> +	base = base->parent;
>> +	if (!base) /* mei device */
>> +		return 0;
>> +
>> +	base = base->parent; /* pci device */
>> +
>> +	return !!base && dev == base;
>> +}
>> +
>> +static int mei_late_bind_probe(struct mei_cl_device *cldev,
>> +			       const struct mei_cl_device_id *id) {
>> +	struct component_match *master_match = NULL;
>> +	int ret;
>> +
>> +	component_match_add_typed(&cldev->dev, &master_match,
>> +				  mei_late_bind_component_match, &cldev-
>>> dev);
>> +	if (IS_ERR_OR_NULL(master_match))
>> +		return -ENOMEM;
>> +
>> +	ret = component_master_add_with_match(&cldev->dev,
>> +					      &mei_component_master_ops,
>> +					      master_match);
>> +	if (ret < 0)
>> +		dev_err(&cldev->dev, "Master comp add failed %d\n", ret);
>> +
>> +	return ret;
>> +}
>> +
>> +static void mei_late_bind_remove(struct mei_cl_device *cldev) {
>> +	component_master_del(&cldev->dev,
>> &mei_component_master_ops); }
>> +
>> +#define MEI_GUID_MKHI UUID_LE(0xe2c2afa2, 0x3817, 0x4d19, \
>> +			      0x9d, 0x95, 0x6, 0xb1, 0x6b, 0x58, 0x8a, 0x5d)
>> +
>> +static struct mei_cl_device_id mei_late_bind_tbl[] = {
>> +	{ .uuid = MEI_GUID_MKHI, .version = MEI_CL_VERSION_ANY },
>> +	{ }
>> +};
>> +MODULE_DEVICE_TABLE(mei, mei_late_bind_tbl);
>> +
>> +static struct mei_cl_driver mei_late_bind_driver = {
>> +	.id_table = mei_late_bind_tbl,
>> +	.name = KBUILD_MODNAME,
>> +	.probe = mei_late_bind_probe,
>> +	.remove	= mei_late_bind_remove,
>> +};
>> +
>> +module_mei_cl_driver(mei_late_bind_driver);
>> +
>> +MODULE_AUTHOR("Intel Corporation");
>> +MODULE_LICENSE("GPL");
>> +MODULE_DESCRIPTION("MEI Late Binding");
>> diff --git a/include/drm/intel/i915_component.h
>> b/include/drm/intel/i915_component.h
>> index 4ea3b17aa143..456849a97d75 100644
>> --- a/include/drm/intel/i915_component.h
>> +++ b/include/drm/intel/i915_component.h
>> @@ -31,6 +31,7 @@ enum i915_component_type {
>>   	I915_COMPONENT_HDCP,
>>   	I915_COMPONENT_PXP,
>>   	I915_COMPONENT_GSC_PROXY,
>> +	INTEL_COMPONENT_LATE_BIND,
>>   };
>>
>>   /* MAX_PORT is the number of port
>> diff --git a/include/drm/intel/late_bind_mei_interface.h
>> b/include/drm/intel/late_bind_mei_interface.h
>> new file mode 100644
>> index 000000000000..ec58ef1ab4e8
>> --- /dev/null
>> +++ b/include/drm/intel/late_bind_mei_interface.h
>> @@ -0,0 +1,64 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright (c) 2025 Intel Corporation  */
>> +
>> +#ifndef _LATE_BIND_MEI_INTERFACE_H_
>> +#define _LATE_BIND_MEI_INTERFACE_H_
>> +
>> +#include <linux/types.h>
>> +
>> +struct device;
>> +struct module;
>> +
>> +/**
>> + * Late Binding flags
>> + * Persistent across warm reset
>> + */
>> +#define CSC_LATE_BINDING_FLAGS_IS_PERSISTENT	BIT(0)
>> +
>> +/**
>> + * xe_late_bind_fw_type - enum to determine late binding fw type  */
>> +enum late_bind_type {
>> +	CSC_LATE_BINDING_TYPE_FAN_CONTROL = 1, };
>> +
>> +/**
>> + * Late Binding payload status
>> + */
>> +enum csc_late_binding_status {
>> +	CSC_LATE_BINDING_STATUS_SUCCESS           = 0,
>> +	CSC_LATE_BINDING_STATUS_4ID_MISMATCH      = 1,
>> +	CSC_LATE_BINDING_STATUS_ARB_FAILURE       = 2,
>> +	CSC_LATE_BINDING_STATUS_GENERAL_ERROR     = 3,
>> +	CSC_LATE_BINDING_STATUS_INVALID_PARAMS    = 4,
>> +	CSC_LATE_BINDING_STATUS_INVALID_SIGNATURE = 5,
>> +	CSC_LATE_BINDING_STATUS_INVALID_PAYLOAD   = 6,
>> +	CSC_LATE_BINDING_STATUS_TIMEOUT           = 7,
>> +};
>> +
>> +/**
>> + * struct late_bind_component_ops - ops for Late Binding services.
>> + * @owner: Module providing the ops
>> + * @push_config: Sends a config to FW.
>> + */
>> +struct late_bind_component_ops {
>> +	struct module *owner;
>> +
>> +	/**
>> +	 * @push_config: Sends a config to FW.
>> +	 * @dev: device struct corresponding to the mei device
>> +	 * @type: payload type
>> +	 * @flags: payload flags
>> +	 * @payload: payload buffer
>> +	 * @payload_size: payload buffer size
>> +	 *
>> +	 * Return: 0 success, negative errno value on transport failure,
>> +	 *         positive status returned by FW
>> +	 */
>> +	int (*push_config)(struct device *dev, u32 type, u32 flags,
>> +			   const void *payload, size_t payload_size); };
>> +
>> +#endif /* _LATE_BIND_MEI_INTERFACE_H_ */
>> --
>> 2.34.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw
  2025-06-25 17:00 ` [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw Badal Nilawar
@ 2025-06-27 21:04   ` Rodrigo Vivi
  2025-06-30 13:49     ` Nilawar, Badal
  0 siblings, 1 reply; 38+ messages in thread
From: Rodrigo Vivi @ 2025-06-27 21:04 UTC (permalink / raw)
  To: Badal Nilawar
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh, daniele.ceraolospurio

On Wed, Jun 25, 2025 at 10:30:08PM +0530, Badal Nilawar wrote:
> Introducing xe_late_bind_fw to enable firmware loading for the devices,
> such as the fan controller, during the driver probe. Typically,
> firmware for such devices are part of IFWI flash image but can be
> replaced at probe after OEM tuning.
> This patch binds mei late binding component to enable firmware loading.
> 
> v2:
>  - Add devm_add_action_or_reset to remove the component (Daniele)
>  - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
> v3:
>  - Fail driver probe if late bind initialization fails,
>    add has_late_bind flag (Daniele)
> v4:
>  - %S/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
> 
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile                |  1 +
>  drivers/gpu/drm/xe/xe_device.c             |  5 ++
>  drivers/gpu/drm/xe/xe_device_types.h       |  6 ++
>  drivers/gpu/drm/xe/xe_late_bind_fw.c       | 90 ++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_late_bind_fw.h       | 15 ++++
>  drivers/gpu/drm/xe/xe_late_bind_fw_types.h | 37 +++++++++
>  drivers/gpu/drm/xe/xe_pci.c                |  3 +
>  7 files changed, 157 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.c
>  create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.h
>  create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 7c039caefd00..521547d78fd2 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -76,6 +76,7 @@ xe-y += xe_bb.o \
>  	xe_hw_fence.o \
>  	xe_irq.o \
>  	xe_lrc.o \
> +	xe_late_bind_fw.o \
>  	xe_migrate.o \
>  	xe_mmio.o \
>  	xe_mocs.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index cd17c1354ab3..584acd63b0d9 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -44,6 +44,7 @@
>  #include "xe_hw_engine_group.h"
>  #include "xe_hwmon.h"
>  #include "xe_irq.h"
> +#include "xe_late_bind_fw.h"
>  #include "xe_memirq.h"
>  #include "xe_mmio.h"
>  #include "xe_module.h"
> @@ -889,6 +890,10 @@ int xe_device_probe(struct xe_device *xe)
>  	if (err)
>  		return err;
>  
> +	err = xe_late_bind_init(&xe->late_bind);
> +	if (err && err != -ENODEV)
> +		return err;
> +
>  	err = xe_oa_init(xe);
>  	if (err)
>  		return err;
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 6aca4b1a2824..321f9e9a94f6 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -16,6 +16,7 @@
>  #include "xe_devcoredump_types.h"
>  #include "xe_heci_gsc.h"
>  #include "xe_lmtt_types.h"
> +#include "xe_late_bind_fw_types.h"
>  #include "xe_memirq_types.h"
>  #include "xe_oa_types.h"
>  #include "xe_platform_types.h"
> @@ -323,6 +324,8 @@ struct xe_device {
>  		u8 has_heci_cscfi:1;
>  		/** @info.has_heci_gscfi: device has heci gscfi */
>  		u8 has_heci_gscfi:1;
> +		/** @info.has_late_bind: Device has firmware late binding support */
> +		u8 has_late_bind:1;
>  		/** @info.has_llc: Device has a shared CPU+GPU last level cache */
>  		u8 has_llc:1;
>  		/** @info.has_mbx_power_limits: Device has support to manage power limits using
> @@ -555,6 +558,9 @@ struct xe_device {
>  	/** @nvm: discrete graphics non-volatile memory */
>  	struct intel_dg_nvm_dev *nvm;
>  
> +	/** @late_bind: xe mei late bind interface */
> +	struct xe_late_bind late_bind;
> +
>  	/** @oa: oa observation subsystem */
>  	struct xe_oa oa;
>  
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> new file mode 100644
> index 000000000000..eaf12cfec848
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
> @@ -0,0 +1,90 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#include <linux/component.h>
> +#include <linux/delay.h>
> +
> +#include <drm/drm_managed.h>
> +#include <drm/intel/i915_component.h>
> +#include <drm/intel/late_bind_mei_interface.h>
> +#include <drm/drm_print.h>
> +
> +#include "xe_device.h"
> +#include "xe_late_bind_fw.h"
> +
> +static struct xe_device *
> +late_bind_to_xe(struct xe_late_bind *late_bind)
> +{
> +	return container_of(late_bind, struct xe_device, late_bind);
> +}
> +
> +static int xe_late_bind_component_bind(struct device *xe_kdev,
> +				       struct device *mei_kdev, void *data)
> +{
> +	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
> +	struct xe_late_bind *late_bind = &xe->late_bind;
> +
> +	mutex_lock(&late_bind->mutex);
> +	late_bind->component.ops = data;
> +	late_bind->component.mei_dev = mei_kdev;
> +	mutex_unlock(&late_bind->mutex);
> +
> +	return 0;
> +}
> +
> +static void xe_late_bind_component_unbind(struct device *xe_kdev,
> +					  struct device *mei_kdev, void *data)
> +{
> +	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
> +	struct xe_late_bind *late_bind = &xe->late_bind;
> +
> +	mutex_lock(&late_bind->mutex);
> +	late_bind->component.ops = NULL;
> +	mutex_unlock(&late_bind->mutex);
> +}
> +
> +static const struct component_ops xe_late_bind_component_ops = {
> +	.bind   = xe_late_bind_component_bind,
> +	.unbind = xe_late_bind_component_unbind,
> +};
> +
> +static void xe_late_bind_remove(void *arg)
> +{
> +	struct xe_late_bind *late_bind = arg;
> +	struct xe_device *xe = late_bind_to_xe(late_bind);
> +
> +	component_del(xe->drm.dev, &xe_late_bind_component_ops);
> +	mutex_destroy(&late_bind->mutex);
> +}
> +
> +/**
> + * xe_late_bind_init() - add xe mei late binding component
> + *
> + * Return: 0 if the initialization was successful, a negative errno otherwise.
> + */
> +int xe_late_bind_init(struct xe_late_bind *late_bind)
> +{
> +	struct xe_device *xe = late_bind_to_xe(late_bind);
> +	int err;
> +
> +	if (!xe->info.has_late_bind)
> +		return 0;
> +
> +	mutex_init(&late_bind->mutex);
> +
> +	if (!IS_ENABLED(CONFIG_INTEL_MEI_LATE_BIND) || !IS_ENABLED(CONFIG_INTEL_MEI_GSC)) {
> +		drm_info(&xe->drm, "Can't init xe mei late bind missing mei component\n");
> +		return -ENODEV;
> +	}
> +
> +	err = component_add_typed(xe->drm.dev, &xe_late_bind_component_ops,
> +				  INTEL_COMPONENT_LATE_BIND);
> +	if (err < 0) {
> +		drm_info(&xe->drm, "Failed to add mei late bind component (%pe)\n", ERR_PTR(err));
> +		return err;
> +	}
> +
> +	return devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> new file mode 100644
> index 000000000000..4c73571c3e62
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_LATE_BIND_FW_H_
> +#define _XE_LATE_BIND_FW_H_
> +
> +#include <linux/types.h>
> +
> +struct xe_late_bind;
> +
> +int xe_late_bind_init(struct xe_late_bind *late_bind);
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> new file mode 100644
> index 000000000000..1156ef94f0d5
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_LATE_BIND_TYPES_H_
> +#define _XE_LATE_BIND_TYPES_H_
> +
> +#include <linux/iosys-map.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +/**
> + * struct xe_late_bind_component - Late Binding services component
> + * @mei_dev: device that provide Late Binding service.
> + * @ops: Ops implemented by Late Binding driver, used by Xe driver.
> + *
> + * Communication between Xe and MEI drivers for Late Binding services
> + */
> +struct xe_late_bind_component {
> +	/** @late_bind_component.mei_dev: mei device */
> +	struct device *mei_dev;
> +	/** @late_bind_component.ops: late binding ops */
> +	const struct late_bind_component_ops *ops;
> +};
> +
> +/**
> + * struct xe_late_bind
> + */
> +struct xe_late_bind {
> +	/** @late_bind.component: struct for communication with mei component */
> +	struct xe_late_bind_component component;
> +	/** @late_bind.mutex: protects the component binding and usage */

Please, before submitting another re-spin of this series, refactor
this mutex. This is absolutely not acceptable.

https://blog.ffwll.ch/2022/07/locking-engineering.html

This is protecting the code and not the data. If binding or usage
happens you need to have other ways of dealing with it.

The lock needs to be reduced to the data you are trying to protect.
Perhaps around the state/status or to certain register, but using
a big mutex like you use in the patch 5 of this series and stating
that it is to protect the code is not the right way.

Sorry for not having looked at this earlier.

> +	struct mutex mutex;
> +};
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 08e21d4099e0..e5018d3ae74f 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -66,6 +66,7 @@ struct xe_device_desc {
>  	u8 has_gsc_nvm:1;
>  	u8 has_heci_gscfi:1;
>  	u8 has_heci_cscfi:1;
> +	u8 has_late_bind:1;
>  	u8 has_llc:1;
>  	u8 has_mbx_power_limits:1;
>  	u8 has_pxp:1;
> @@ -355,6 +356,7 @@ static const struct xe_device_desc bmg_desc = {
>  	.has_mbx_power_limits = true,
>  	.has_gsc_nvm = 1,
>  	.has_heci_cscfi = 1,
> +	.has_late_bind = true,
>  	.needs_scratch = true,
>  };
>  
> @@ -600,6 +602,7 @@ static int xe_info_init_early(struct xe_device *xe,
>  	xe->info.has_gsc_nvm = desc->has_gsc_nvm;
>  	xe->info.has_heci_gscfi = desc->has_heci_gscfi;
>  	xe->info.has_heci_cscfi = desc->has_heci_cscfi;
> +	xe->info.has_late_bind = desc->has_late_bind;
>  	xe->info.has_llc = desc->has_llc;
>  	xe->info.has_pxp = desc->has_pxp;
>  	xe->info.has_sriov = desc->has_sriov;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-25 17:00 ` [PATCH v4 02/10] mei: late_bind: add late binding component driver Badal Nilawar
  2025-06-26  3:50   ` Gupta, Anshuman
@ 2025-06-28 12:18   ` Greg KH
  2025-07-01  8:32     ` Nilawar, Badal
  2025-07-01 10:05     ` Usyskin, Alexander
  2025-06-28 12:19   ` Greg KH
  2 siblings, 2 replies; 38+ messages in thread
From: Greg KH @ 2025-06-28 12:18 UTC (permalink / raw)
  To: Badal Nilawar
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta, rodrigo.vivi,
	alexander.usyskin, daniele.ceraolospurio

On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
> --- /dev/null
> +++ b/drivers/misc/mei/late_bind/mei_late_bind.c
> @@ -0,0 +1,281 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2025 Intel Corporation
> + */
> +#include <drm/intel/i915_component.h>
> +#include <drm/intel/late_bind_mei_interface.h>
> +#include <linux/component.h>
> +#include <linux/pci.h>
> +#include <linux/mei_cl_bus.h>
> +#include <linux/module.h>
> +#include <linux/overflow.h>
> +#include <linux/slab.h>
> +#include <linux/uuid.h>
> +
> +#include "mkhi.h"
> +
> +#define GFX_SRV_MKHI_LATE_BINDING_CMD 0x12
> +#define GFX_SRV_MKHI_LATE_BINDING_RSP (GFX_SRV_MKHI_LATE_BINDING_CMD | 0x80)
> +
> +#define LATE_BIND_SEND_TIMEOUT_MSEC 3000
> +#define LATE_BIND_RECV_TIMEOUT_MSEC 3000
> +
> +/**
> + * struct csc_heci_late_bind_req - late binding request
> + * @header: @ref mkhi_msg_hdr
> + * @type: type of the late binding payload
> + * @flags: flags to be passed to the firmware
> + * @reserved: reserved field

Reserved for what?  All reserved fields need to be set to a default
value, please document that here.

> + * @payload_size: size of the payload data in bytes
> + * @payload: data to be sent to the firmware
> + */
> +struct csc_heci_late_bind_req {
> +	struct mkhi_msg_hdr header;
> +	u32 type;
> +	u32 flags;

What is the endian of these fields?  And as this crosses the
kernel/hardware boundry, shouldn't these be __u32?

> +/**
> + * struct csc_heci_late_bind_rsp - late binding response
> + * @header: @ref mkhi_msg_hdr
> + * @type: type of the late binding payload
> + * @reserved: reserved field
> + * @status: status of the late binding command execution by firmware
> + */
> +struct csc_heci_late_bind_rsp {
> +	struct mkhi_msg_hdr header;
> +	u32 type;
> +	u32 reserved[2];
> +	u32 status;

Same questions as above.

> +} __packed;
> +/**
> + * mei_late_bind_push_config - Sends a config to the firmware.
> + * @dev: device struct corresponding to the mei device
> + * @type: payload type

Shouldn't type be an enum?

> + * @flags: payload flags
> + * @payload: payload buffer
> + * @payload_size: payload buffer size
> + *
> + * Return: 0 success, negative errno value on transport failure,
> + *         positive status returned by FW
> + */
> +static int mei_late_bind_push_config(struct device *dev, u32 type, u32 flags,
> +				     const void *payload, size_t payload_size)

Why do static functions need kerneldoc formatting?

> +{
> +	struct mei_cl_device *cldev;
> +	struct csc_heci_late_bind_req *req = NULL;
> +	struct csc_heci_late_bind_rsp rsp;
> +	size_t req_size;
> +	ssize_t ret;
> +
> +	if (!dev || !payload || !payload_size)
> +		return -EINVAL;

How can any of these ever happen as you control the callers of this
function?


> +
> +	cldev = to_mei_cl_device(dev);
> +
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0) {

You mean:
	if (ret)
right?


> +		dev_dbg(dev, "mei_cldev_enable failed. %zd\n", ret);

Why display the error again if this failed?  The caller already did
that.

And the function returns an int, not a ssize_t, didn't the compiler
complain?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-25 17:00 ` [PATCH v4 02/10] mei: late_bind: add late binding component driver Badal Nilawar
  2025-06-26  3:50   ` Gupta, Anshuman
  2025-06-28 12:18   ` Greg KH
@ 2025-06-28 12:19   ` Greg KH
  2025-07-01  8:07     ` Nilawar, Badal
  2 siblings, 1 reply; 38+ messages in thread
From: Greg KH @ 2025-06-28 12:19 UTC (permalink / raw)
  To: Badal Nilawar
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta, rodrigo.vivi,
	alexander.usyskin, daniele.ceraolospurio

On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
> +/**
> + * struct late_bind_component_ops - ops for Late Binding services.
> + * @owner: Module providing the ops
> + * @push_config: Sends a config to FW.
> + */
> +struct late_bind_component_ops {
> +	struct module *owner;

I don't think you ever set this field, so why is it here?

Or did I miss it somewhere?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw
  2025-06-27 21:04   ` Rodrigo Vivi
@ 2025-06-30 13:49     ` Nilawar, Badal
  0 siblings, 0 replies; 38+ messages in thread
From: Nilawar, Badal @ 2025-06-30 13:49 UTC (permalink / raw)
  To: Rodrigo Vivi
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta,
	alexander.usyskin, gregkh, daniele.ceraolospurio


On 28-06-2025 02:34, Rodrigo Vivi wrote:
> On Wed, Jun 25, 2025 at 10:30:08PM +0530, Badal Nilawar wrote:
>> Introducing xe_late_bind_fw to enable firmware loading for the devices,
>> such as the fan controller, during the driver probe. Typically,
>> firmware for such devices are part of IFWI flash image but can be
>> replaced at probe after OEM tuning.
>> This patch binds mei late binding component to enable firmware loading.
>>
>> v2:
>>   - Add devm_add_action_or_reset to remove the component (Daniele)
>>   - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
>> v3:
>>   - Fail driver probe if late bind initialization fails,
>>     add has_late_bind flag (Daniele)
>> v4:
>>   - %S/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
>>
>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile                |  1 +
>>   drivers/gpu/drm/xe/xe_device.c             |  5 ++
>>   drivers/gpu/drm/xe/xe_device_types.h       |  6 ++
>>   drivers/gpu/drm/xe/xe_late_bind_fw.c       | 90 ++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_late_bind_fw.h       | 15 ++++
>>   drivers/gpu/drm/xe/xe_late_bind_fw_types.h | 37 +++++++++
>>   drivers/gpu/drm/xe/xe_pci.c                |  3 +
>>   7 files changed, 157 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.c
>>   create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw.h
>>   create mode 100644 drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index 7c039caefd00..521547d78fd2 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -76,6 +76,7 @@ xe-y += xe_bb.o \
>>   	xe_hw_fence.o \
>>   	xe_irq.o \
>>   	xe_lrc.o \
>> +	xe_late_bind_fw.o \
>>   	xe_migrate.o \
>>   	xe_mmio.o \
>>   	xe_mocs.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index cd17c1354ab3..584acd63b0d9 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -44,6 +44,7 @@
>>   #include "xe_hw_engine_group.h"
>>   #include "xe_hwmon.h"
>>   #include "xe_irq.h"
>> +#include "xe_late_bind_fw.h"
>>   #include "xe_memirq.h"
>>   #include "xe_mmio.h"
>>   #include "xe_module.h"
>> @@ -889,6 +890,10 @@ int xe_device_probe(struct xe_device *xe)
>>   	if (err)
>>   		return err;
>>   
>> +	err = xe_late_bind_init(&xe->late_bind);
>> +	if (err && err != -ENODEV)
>> +		return err;
>> +
>>   	err = xe_oa_init(xe);
>>   	if (err)
>>   		return err;
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 6aca4b1a2824..321f9e9a94f6 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -16,6 +16,7 @@
>>   #include "xe_devcoredump_types.h"
>>   #include "xe_heci_gsc.h"
>>   #include "xe_lmtt_types.h"
>> +#include "xe_late_bind_fw_types.h"
>>   #include "xe_memirq_types.h"
>>   #include "xe_oa_types.h"
>>   #include "xe_platform_types.h"
>> @@ -323,6 +324,8 @@ struct xe_device {
>>   		u8 has_heci_cscfi:1;
>>   		/** @info.has_heci_gscfi: device has heci gscfi */
>>   		u8 has_heci_gscfi:1;
>> +		/** @info.has_late_bind: Device has firmware late binding support */
>> +		u8 has_late_bind:1;
>>   		/** @info.has_llc: Device has a shared CPU+GPU last level cache */
>>   		u8 has_llc:1;
>>   		/** @info.has_mbx_power_limits: Device has support to manage power limits using
>> @@ -555,6 +558,9 @@ struct xe_device {
>>   	/** @nvm: discrete graphics non-volatile memory */
>>   	struct intel_dg_nvm_dev *nvm;
>>   
>> +	/** @late_bind: xe mei late bind interface */
>> +	struct xe_late_bind late_bind;
>> +
>>   	/** @oa: oa observation subsystem */
>>   	struct xe_oa oa;
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.c b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> new file mode 100644
>> index 000000000000..eaf12cfec848
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.c
>> @@ -0,0 +1,90 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2025 Intel Corporation
>> + */
>> +
>> +#include <linux/component.h>
>> +#include <linux/delay.h>
>> +
>> +#include <drm/drm_managed.h>
>> +#include <drm/intel/i915_component.h>
>> +#include <drm/intel/late_bind_mei_interface.h>
>> +#include <drm/drm_print.h>
>> +
>> +#include "xe_device.h"
>> +#include "xe_late_bind_fw.h"
>> +
>> +static struct xe_device *
>> +late_bind_to_xe(struct xe_late_bind *late_bind)
>> +{
>> +	return container_of(late_bind, struct xe_device, late_bind);
>> +}
>> +
>> +static int xe_late_bind_component_bind(struct device *xe_kdev,
>> +				       struct device *mei_kdev, void *data)
>> +{
>> +	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
>> +	struct xe_late_bind *late_bind = &xe->late_bind;
>> +
>> +	mutex_lock(&late_bind->mutex);
>> +	late_bind->component.ops = data;
>> +	late_bind->component.mei_dev = mei_kdev;
>> +	mutex_unlock(&late_bind->mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static void xe_late_bind_component_unbind(struct device *xe_kdev,
>> +					  struct device *mei_kdev, void *data)
>> +{
>> +	struct xe_device *xe = kdev_to_xe_device(xe_kdev);
>> +	struct xe_late_bind *late_bind = &xe->late_bind;
>> +
>> +	mutex_lock(&late_bind->mutex);
>> +	late_bind->component.ops = NULL;
>> +	mutex_unlock(&late_bind->mutex);
>> +}
>> +
>> +static const struct component_ops xe_late_bind_component_ops = {
>> +	.bind   = xe_late_bind_component_bind,
>> +	.unbind = xe_late_bind_component_unbind,
>> +};
>> +
>> +static void xe_late_bind_remove(void *arg)
>> +{
>> +	struct xe_late_bind *late_bind = arg;
>> +	struct xe_device *xe = late_bind_to_xe(late_bind);
>> +
>> +	component_del(xe->drm.dev, &xe_late_bind_component_ops);
>> +	mutex_destroy(&late_bind->mutex);
>> +}
>> +
>> +/**
>> + * xe_late_bind_init() - add xe mei late binding component
>> + *
>> + * Return: 0 if the initialization was successful, a negative errno otherwise.
>> + */
>> +int xe_late_bind_init(struct xe_late_bind *late_bind)
>> +{
>> +	struct xe_device *xe = late_bind_to_xe(late_bind);
>> +	int err;
>> +
>> +	if (!xe->info.has_late_bind)
>> +		return 0;
>> +
>> +	mutex_init(&late_bind->mutex);
>> +
>> +	if (!IS_ENABLED(CONFIG_INTEL_MEI_LATE_BIND) || !IS_ENABLED(CONFIG_INTEL_MEI_GSC)) {
>> +		drm_info(&xe->drm, "Can't init xe mei late bind missing mei component\n");
>> +		return -ENODEV;
>> +	}
>> +
>> +	err = component_add_typed(xe->drm.dev, &xe_late_bind_component_ops,
>> +				  INTEL_COMPONENT_LATE_BIND);
>> +	if (err < 0) {
>> +		drm_info(&xe->drm, "Failed to add mei late bind component (%pe)\n", ERR_PTR(err));
>> +		return err;
>> +	}
>> +
>> +	return devm_add_action_or_reset(xe->drm.dev, xe_late_bind_remove, late_bind);
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw.h b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> new file mode 100644
>> index 000000000000..4c73571c3e62
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw.h
>> @@ -0,0 +1,15 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2025 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_LATE_BIND_FW_H_
>> +#define _XE_LATE_BIND_FW_H_
>> +
>> +#include <linux/types.h>
>> +
>> +struct xe_late_bind;
>> +
>> +int xe_late_bind_init(struct xe_late_bind *late_bind);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/xe/xe_late_bind_fw_types.h b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> new file mode 100644
>> index 000000000000..1156ef94f0d5
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_late_bind_fw_types.h
>> @@ -0,0 +1,37 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2025 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_LATE_BIND_TYPES_H_
>> +#define _XE_LATE_BIND_TYPES_H_
>> +
>> +#include <linux/iosys-map.h>
>> +#include <linux/mutex.h>
>> +#include <linux/types.h>
>> +
>> +/**
>> + * struct xe_late_bind_component - Late Binding services component
>> + * @mei_dev: device that provide Late Binding service.
>> + * @ops: Ops implemented by Late Binding driver, used by Xe driver.
>> + *
>> + * Communication between Xe and MEI drivers for Late Binding services
>> + */
>> +struct xe_late_bind_component {
>> +	/** @late_bind_component.mei_dev: mei device */
>> +	struct device *mei_dev;
>> +	/** @late_bind_component.ops: late binding ops */
>> +	const struct late_bind_component_ops *ops;
>> +};
>> +
>> +/**
>> + * struct xe_late_bind
>> + */
>> +struct xe_late_bind {
>> +	/** @late_bind.component: struct for communication with mei component */
>> +	struct xe_late_bind_component component;
>> +	/** @late_bind.mutex: protects the component binding and usage */
> Please, before submitting another re-spin of this series, refactor
> this mutex. This is absolutely not acceptable.
>
> https://blog.ffwll.ch/2022/07/locking-engineering.html
>
> This is protecting the code and not the data. If binding or usage
> happens you need to have other ways of dealing with it.
>
> The lock needs to be reduced to the data you are trying to protect.
> Perhaps around the state/status or to certain register, but using
> a big mutex like you use in the patch 5 of this series and stating
> that it is to protect the code is not the right way.
>
> Sorry for not having looked at this earlier.

Sure Rodrigo, I will try to refactor mutex.  The intention was to 
prevent component removal during the push_config process and when a 
module unbind operation is performed
This is how it is implemented for xe_gsc_proxy.c and all the components 
in i915.

I was considering alternatives like wait_event_timeout or 
wait_for_completion. However, since we are already using flushing the 
work in xe_late_bind_component_unbind, the above scenario is unlikely to 
occur.

Regards,
Badal

>
>> +	struct mutex mutex;
>> +};
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index 08e21d4099e0..e5018d3ae74f 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -66,6 +66,7 @@ struct xe_device_desc {
>>   	u8 has_gsc_nvm:1;
>>   	u8 has_heci_gscfi:1;
>>   	u8 has_heci_cscfi:1;
>> +	u8 has_late_bind:1;
>>   	u8 has_llc:1;
>>   	u8 has_mbx_power_limits:1;
>>   	u8 has_pxp:1;
>> @@ -355,6 +356,7 @@ static const struct xe_device_desc bmg_desc = {
>>   	.has_mbx_power_limits = true,
>>   	.has_gsc_nvm = 1,
>>   	.has_heci_cscfi = 1,
>> +	.has_late_bind = true,
>>   	.needs_scratch = true,
>>   };
>>   
>> @@ -600,6 +602,7 @@ static int xe_info_init_early(struct xe_device *xe,
>>   	xe->info.has_gsc_nvm = desc->has_gsc_nvm;
>>   	xe->info.has_heci_gscfi = desc->has_heci_gscfi;
>>   	xe->info.has_heci_cscfi = desc->has_heci_cscfi;
>> +	xe->info.has_late_bind = desc->has_late_bind;
>>   	xe->info.has_llc = desc->has_llc;
>>   	xe->info.has_pxp = desc->has_pxp;
>>   	xe->info.has_sriov = desc->has_sriov;
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-28 12:19   ` Greg KH
@ 2025-07-01  8:07     ` Nilawar, Badal
  2025-07-01  8:17       ` Greg KH
  0 siblings, 1 reply; 38+ messages in thread
From: Nilawar, Badal @ 2025-07-01  8:07 UTC (permalink / raw)
  To: Greg KH
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta, rodrigo.vivi,
	alexander.usyskin, daniele.ceraolospurio


On 28-06-2025 17:49, Greg KH wrote:
> On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
>> +/**
>> + * struct late_bind_component_ops - ops for Late Binding services.
>> + * @owner: Module providing the ops
>> + * @push_config: Sends a config to FW.
>> + */
>> +struct late_bind_component_ops {
>> +	struct module *owner;
> I don't think you ever set this field, so why is it here?
>
> Or did I miss it somewhere?

It is set in drivers/misc/mei/late_bind/mei_late_bind.c

static const struct late_bind_component_ops mei_late_bind_ops = {
         .owner = THIS_MODULE,
         .push_config = mei_late_bind_push_config,
};

Thanks,
Badal

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01  8:07     ` Nilawar, Badal
@ 2025-07-01  8:17       ` Greg KH
  2025-07-01  8:22         ` Nilawar, Badal
  0 siblings, 1 reply; 38+ messages in thread
From: Greg KH @ 2025-07-01  8:17 UTC (permalink / raw)
  To: Nilawar, Badal
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta, rodrigo.vivi,
	alexander.usyskin, daniele.ceraolospurio

On Tue, Jul 01, 2025 at 01:37:36PM +0530, Nilawar, Badal wrote:
> 
> On 28-06-2025 17:49, Greg KH wrote:
> > On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
> > > +/**
> > > + * struct late_bind_component_ops - ops for Late Binding services.
> > > + * @owner: Module providing the ops
> > > + * @push_config: Sends a config to FW.
> > > + */
> > > +struct late_bind_component_ops {
> > > +	struct module *owner;
> > I don't think you ever set this field, so why is it here?
> > 
> > Or did I miss it somewhere?
> 
> It is set in drivers/misc/mei/late_bind/mei_late_bind.c
> 
> static const struct late_bind_component_ops mei_late_bind_ops = {
>         .owner = THIS_MODULE,
>         .push_config = mei_late_bind_push_config,
> };

Ah.  But then who uses it?  And why?  Normally forcing callers to set
.owner is frowned apon, use a #define correctly to have it automatically
set for you in the registration function please.

And are you _sure_ you need it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01  8:17       ` Greg KH
@ 2025-07-01  8:22         ` Nilawar, Badal
  2025-07-01  8:32           ` Usyskin, Alexander
  0 siblings, 1 reply; 38+ messages in thread
From: Nilawar, Badal @ 2025-07-01  8:22 UTC (permalink / raw)
  To: Greg KH, Usyskin, Alexander
  Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Gupta, Anshuman, Vivi, Rodrigo,
	Ceraolo Spurio, Daniele



> -----Original Message-----
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: 01 July 2025 13:48
> To: Nilawar, Badal <badal.nilawar@intel.com>
> Cc: intel-xe@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
> kernel@vger.kernel.org; Gupta, Anshuman <anshuman.gupta@intel.com>;
> Vivi, Rodrigo <rodrigo.vivi@intel.com>; Usyskin, Alexander
> <alexander.usyskin@intel.com>; Ceraolo Spurio, Daniele
> <daniele.ceraolospurio@intel.com>
> Subject: Re: [PATCH v4 02/10] mei: late_bind: add late binding component
> driver
> 
> On Tue, Jul 01, 2025 at 01:37:36PM +0530, Nilawar, Badal wrote:
> >
> > On 28-06-2025 17:49, Greg KH wrote:
> > > On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
> > > > +/**
> > > > + * struct late_bind_component_ops - ops for Late Binding services.
> > > > + * @owner: Module providing the ops
> > > > + * @push_config: Sends a config to FW.
> > > > + */
> > > > +struct late_bind_component_ops {
> > > > +	struct module *owner;
> > > I don't think you ever set this field, so why is it here?
> > >
> > > Or did I miss it somewhere?
> >
> > It is set in drivers/misc/mei/late_bind/mei_late_bind.c
> >
> > static const struct late_bind_component_ops mei_late_bind_ops = {
> >         .owner = THIS_MODULE,
> >         .push_config = mei_late_bind_push_config, };
> 
> Ah.  But then who uses it?  And why?  Normally forcing callers to set .owner is
> frowned apon, use a #define correctly to have it automatically set for you in
> the registration function please.
> 
> And are you _sure_ you need it?

In xe kmd only uses .push_config so .owner can be dropped. Looks like it got propagated from previously implemented mei components but for none of the component .owner is used.  So it's fine to drop it. 
@Usyskin, Alexander please share your thoughts on this. 

Badal
  
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-28 12:18   ` Greg KH
@ 2025-07-01  8:32     ` Nilawar, Badal
  2025-07-01  9:45       ` Greg KH
  2025-07-01 10:05     ` Usyskin, Alexander
  1 sibling, 1 reply; 38+ messages in thread
From: Nilawar, Badal @ 2025-07-01  8:32 UTC (permalink / raw)
  To: Greg KH, Usyskin, Alexander
  Cc: intel-xe, dri-devel, linux-kernel, anshuman.gupta, rodrigo.vivi,
	alexander.usyskin, daniele.ceraolospurio


On 28-06-2025 17:48, Greg KH wrote:
> On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
>> --- /dev/null
>> +++ b/drivers/misc/mei/late_bind/mei_late_bind.c
>> @@ -0,0 +1,281 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2025 Intel Corporation
>> + */
>> +#include <drm/intel/i915_component.h>
>> +#include <drm/intel/late_bind_mei_interface.h>
>> +#include <linux/component.h>
>> +#include <linux/pci.h>
>> +#include <linux/mei_cl_bus.h>
>> +#include <linux/module.h>
>> +#include <linux/overflow.h>
>> +#include <linux/slab.h>
>> +#include <linux/uuid.h>
>> +
>> +#include "mkhi.h"
>> +
>> +#define GFX_SRV_MKHI_LATE_BINDING_CMD 0x12
>> +#define GFX_SRV_MKHI_LATE_BINDING_RSP (GFX_SRV_MKHI_LATE_BINDING_CMD | 0x80)
>> +
>> +#define LATE_BIND_SEND_TIMEOUT_MSEC 3000
>> +#define LATE_BIND_RECV_TIMEOUT_MSEC 3000
>> +
>> +/**
>> + * struct csc_heci_late_bind_req - late binding request
>> + * @header: @ref mkhi_msg_hdr
>> + * @type: type of the late binding payload
>> + * @flags: flags to be passed to the firmware
>> + * @reserved: reserved field
> Reserved for what?  All reserved fields need to be set to a default
> value, please document that here.
Reserved by CSC firmware probably for future use.  default value should 
be 0.
>
>> + * @payload_size: size of the payload data in bytes
>> + * @payload: data to be sent to the firmware
>> + */
>> +struct csc_heci_late_bind_req {
>> +	struct mkhi_msg_hdr header;
>> +	u32 type;
>> +	u32 flags;
> What is the endian of these fields?  And as this crosses the
> kernel/hardware boundry, shouldn't these be __u32?

endian of these fields is little endian, all the headers are little 
endian.  I will add comment at top.
On __u32 I doubt we need to do it as csc send copy it to internal buffer.

Sasha can help to answer.

>
>> +/**
>> + * struct csc_heci_late_bind_rsp - late binding response
>> + * @header: @ref mkhi_msg_hdr
>> + * @type: type of the late binding payload
>> + * @reserved: reserved field
>> + * @status: status of the late binding command execution by firmware
>> + */
>> +struct csc_heci_late_bind_rsp {
>> +	struct mkhi_msg_hdr header;
>> +	u32 type;
>> +	u32 reserved[2];
>> +	u32 status;
> Same questions as above.
>
>> +} __packed;
>> +/**
>> + * mei_late_bind_push_config - Sends a config to the firmware.
>> + * @dev: device struct corresponding to the mei device
>> + * @type: payload type
> Shouldn't type be an enum?
Sure will make enum.
>
>> + * @flags: payload flags
>> + * @payload: payload buffer
>> + * @payload_size: payload buffer size
>> + *
>> + * Return: 0 success, negative errno value on transport failure,
>> + *         positive status returned by FW
>> + */
>> +static int mei_late_bind_push_config(struct device *dev, u32 type, u32 flags,
>> +				     const void *payload, size_t payload_size)
> Why do static functions need kerneldoc formatting?
Sasha can help to answer this.
>
>> +{
>> +	struct mei_cl_device *cldev;
>> +	struct csc_heci_late_bind_req *req = NULL;
>> +	struct csc_heci_late_bind_rsp rsp;
>> +	size_t req_size;
>> +	ssize_t ret;
>> +
>> +	if (!dev || !payload || !payload_size)
>> +		return -EINVAL;
> How can any of these ever happen as you control the callers of this
> function?
I will add WARN here.
>
>
>> +
>> +	cldev = to_mei_cl_device(dev);
>> +
>> +	ret = mei_cldev_enable(cldev);
>> +	if (ret < 0) {
> You mean:
> 	if (ret)
> right?
yes
>
>
>> +		dev_dbg(dev, "mei_cldev_enable failed. %zd\n", ret);
> Why display the error again if this failed?  The caller already did
> that.
>
> And the function returns an int, not a ssize_t, didn't the compiler
> complain

It didn't. This is for debug from mei side, this can be removed or will 
fix format specifier.

Thanks,
Badal

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01  8:22         ` Nilawar, Badal
@ 2025-07-01  8:32           ` Usyskin, Alexander
  0 siblings, 0 replies; 38+ messages in thread
From: Usyskin, Alexander @ 2025-07-01  8:32 UTC (permalink / raw)
  To: Nilawar, Badal, Greg KH
  Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Gupta, Anshuman, Vivi, Rodrigo,
	Ceraolo Spurio, Daniele

> > Subject: Re: [PATCH v4 02/10] mei: late_bind: add late binding component
> > driver
> >
> > On Tue, Jul 01, 2025 at 01:37:36PM +0530, Nilawar, Badal wrote:
> > >
> > > On 28-06-2025 17:49, Greg KH wrote:
> > > > On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
> > > > > +/**
> > > > > + * struct late_bind_component_ops - ops for Late Binding services.
> > > > > + * @owner: Module providing the ops
> > > > > + * @push_config: Sends a config to FW.
> > > > > + */
> > > > > +struct late_bind_component_ops {
> > > > > +	struct module *owner;
> > > > I don't think you ever set this field, so why is it here?
> > > >
> > > > Or did I miss it somewhere?
> > >
> > > It is set in drivers/misc/mei/late_bind/mei_late_bind.c
> > >
> > > static const struct late_bind_component_ops mei_late_bind_ops = {
> > >         .owner = THIS_MODULE,
> > >         .push_config = mei_late_bind_push_config, };
> >
> > Ah.  But then who uses it?  And why?  Normally forcing callers to set .owner
> is
> > frowned apon, use a #define correctly to have it automatically set for you in
> > the registration function please.
> >
> > And are you _sure_ you need it?
> 
> In xe kmd only uses .push_config so .owner can be dropped. Looks like it got
> propagated from previously implemented mei components but for none of
> the component .owner is used.  So it's fine to drop it.
> @Usyskin, Alexander please share your thoughts on this.
> 

As caller do not need this, can be dropped.

- - 
Thanks,
Sasha



> Badal
> 
> >
> > thanks,
> >
> > greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01  8:32     ` Nilawar, Badal
@ 2025-07-01  9:45       ` Greg KH
  2025-07-01 12:34         ` Nilawar, Badal
  0 siblings, 1 reply; 38+ messages in thread
From: Greg KH @ 2025-07-01  9:45 UTC (permalink / raw)
  To: Nilawar, Badal
  Cc: Usyskin, Alexander, intel-xe, dri-devel, linux-kernel,
	anshuman.gupta, rodrigo.vivi, daniele.ceraolospurio

On Tue, Jul 01, 2025 at 02:02:15PM +0530, Nilawar, Badal wrote:
> On 28-06-2025 17:48, Greg KH wrote:
> > > + * @payload_size: size of the payload data in bytes
> > > + * @payload: data to be sent to the firmware
> > > + */
> > > +struct csc_heci_late_bind_req {
> > > +	struct mkhi_msg_hdr header;
> > > +	u32 type;
> > > +	u32 flags;
> > What is the endian of these fields?  And as this crosses the
> > kernel/hardware boundry, shouldn't these be __u32?
> 
> endian of these fields is little endian, all the headers are little endian. 
> I will add comment at top.

No, use the proper types if this is little endian.  Don't rely on a
comment to catch things when it goes wrong.

> On __u32 I doubt we need to do it as csc send copy it to internal buffer.

If this crosses the kernel boundry, it needs to use the proper type.

> > > +{
> > > +	struct mei_cl_device *cldev;
> > > +	struct csc_heci_late_bind_req *req = NULL;
> > > +	struct csc_heci_late_bind_rsp rsp;
> > > +	size_t req_size;
> > > +	ssize_t ret;
> > > +
> > > +	if (!dev || !payload || !payload_size)
> > > +		return -EINVAL;
> > How can any of these ever happen as you control the callers of this
> > function?
> I will add WARN here.

So you will end up crashing the machine and getting a CVE assigned for
it?

Please no.  If it can't happen, then don't check for it.  If it can
happen, great, handle it properly.  Don't just give up and cause a
system to reboot, that's a horrible way to write kernel code.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-06-28 12:18   ` Greg KH
  2025-07-01  8:32     ` Nilawar, Badal
@ 2025-07-01 10:05     ` Usyskin, Alexander
  1 sibling, 0 replies; 38+ messages in thread
From: Usyskin, Alexander @ 2025-07-01 10:05 UTC (permalink / raw)
  To: Greg KH, Nilawar, Badal
  Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, Gupta, Anshuman, Vivi, Rodrigo,
	Ceraolo Spurio, Daniele



> -----Original Message-----
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: Saturday, June 28, 2025 3:18 PM
> To: Nilawar, Badal <badal.nilawar@intel.com>
> Cc: intel-xe@lists.freedesktop.org; dri-devel@lists.freedesktop.org; linux-
> kernel@vger.kernel.org; Gupta, Anshuman <anshuman.gupta@intel.com>;
> Vivi, Rodrigo <rodrigo.vivi@intel.com>; Usyskin, Alexander
> <alexander.usyskin@intel.com>; Ceraolo Spurio, Daniele
> <daniele.ceraolospurio@intel.com>
> Subject: Re: [PATCH v4 02/10] mei: late_bind: add late binding component
> driver
> 
> On Wed, Jun 25, 2025 at 10:30:07PM +0530, Badal Nilawar wrote:
> > --- /dev/null
> > +++ b/drivers/misc/mei/late_bind/mei_late_bind.c
> > @@ -0,0 +1,281 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2025 Intel Corporation
> > + */
> > +#include <drm/intel/i915_component.h>
> > +#include <drm/intel/late_bind_mei_interface.h>
> > +#include <linux/component.h>
> > +#include <linux/pci.h>
> > +#include <linux/mei_cl_bus.h>
> > +#include <linux/module.h>
> > +#include <linux/overflow.h>
> > +#include <linux/slab.h>
> > +#include <linux/uuid.h>
> > +
> > +#include "mkhi.h"
> > +
> > +#define GFX_SRV_MKHI_LATE_BINDING_CMD 0x12
> > +#define GFX_SRV_MKHI_LATE_BINDING_RSP
> (GFX_SRV_MKHI_LATE_BINDING_CMD | 0x80)
> > +
> > +#define LATE_BIND_SEND_TIMEOUT_MSEC 3000
> > +#define LATE_BIND_RECV_TIMEOUT_MSEC 3000
> > +
> > +/**
> > + * struct csc_heci_late_bind_req - late binding request
> > + * @header: @ref mkhi_msg_hdr
> > + * @type: type of the late binding payload
> > + * @flags: flags to be passed to the firmware
> > + * @reserved: reserved field
> 
> Reserved for what?  All reserved fields need to be set to a default
> value, please document that here.
> 
> > + * @payload_size: size of the payload data in bytes
> > + * @payload: data to be sent to the firmware
> > + */
> > +struct csc_heci_late_bind_req {
> > +	struct mkhi_msg_hdr header;
> > +	u32 type;
> > +	u32 flags;
> 
> What is the endian of these fields?  And as this crosses the
> kernel/hardware boundry, shouldn't these be __u32?
> 
> > +/**
> > + * struct csc_heci_late_bind_rsp - late binding response
> > + * @header: @ref mkhi_msg_hdr
> > + * @type: type of the late binding payload
> > + * @reserved: reserved field
> > + * @status: status of the late binding command execution by firmware
> > + */
> > +struct csc_heci_late_bind_rsp {
> > +	struct mkhi_msg_hdr header;
> > +	u32 type;
> > +	u32 reserved[2];
> > +	u32 status;
> 
> Same questions as above.
> 
> > +} __packed;
> > +/**
> > + * mei_late_bind_push_config - Sends a config to the firmware.
> > + * @dev: device struct corresponding to the mei device
> > + * @type: payload type
> 
> Shouldn't type be an enum?
> 
> > + * @flags: payload flags
> > + * @payload: payload buffer
> > + * @payload_size: payload buffer size
> > + *
> > + * Return: 0 success, negative errno value on transport failure,
> > + *         positive status returned by FW
> > + */
> > +static int mei_late_bind_push_config(struct device *dev, u32 type, u32
> flags,
> > +				     const void *payload, size_t payload_size)
> 
> Why do static functions need kerneldoc formatting?
> 

The push_config function pointer is documented in late_bind_component_ops.
We can drop one here.

> > +{
> > +	struct mei_cl_device *cldev;
> > +	struct csc_heci_late_bind_req *req = NULL;
> > +	struct csc_heci_late_bind_rsp rsp;
> > +	size_t req_size;
> > +	ssize_t ret;
> > +
> > +	if (!dev || !payload || !payload_size)
> > +		return -EINVAL;
> 
> How can any of these ever happen as you control the callers of this
> function?
> 

This is the callback provided to another driver via component framework,
there is no control.
Should we trust caller here?

> 
> > +
> > +	cldev = to_mei_cl_device(dev);
> > +
> > +	ret = mei_cldev_enable(cldev);
> > +	if (ret < 0) {
> 
> You mean:
> 	if (ret)
> right?
> 
Yes, mei_cldev_enable should never return >0

> 
> > +		dev_dbg(dev, "mei_cldev_enable failed. %zd\n", ret);
> 
> Why display the error again if this failed?  The caller already did
> that.
> 

It is a separate module, and dynamic debug can be enabled separately.
I see it as debug refinement, but this can be dropped if seemed unneeded.

> And the function returns an int, not a ssize_t, didn't the compiler
> complain?
> 

Never seen that, do you suggest to add "return (int)ret;"
as we know that in this stage only error codes can be in this variable?

> thanks,
> 
> greg k-h


- - 
Thanks,
Sasha



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01  9:45       ` Greg KH
@ 2025-07-01 12:34         ` Nilawar, Badal
  2025-07-01 16:41           ` Nilawar, Badal
  0 siblings, 1 reply; 38+ messages in thread
From: Nilawar, Badal @ 2025-07-01 12:34 UTC (permalink / raw)
  To: Greg KH
  Cc: Usyskin, Alexander, intel-xe, dri-devel, linux-kernel,
	anshuman.gupta, rodrigo.vivi, daniele.ceraolospurio


On 01-07-2025 15:15, Greg KH wrote:
> On Tue, Jul 01, 2025 at 02:02:15PM +0530, Nilawar, Badal wrote:
>> On 28-06-2025 17:48, Greg KH wrote:
>>>> + * @payload_size: size of the payload data in bytes
>>>> + * @payload: data to be sent to the firmware
>>>> + */
>>>> +struct csc_heci_late_bind_req {
>>>> +	struct mkhi_msg_hdr header;
>>>> +	u32 type;
>>>> +	u32 flags;
>>> What is the endian of these fields?  And as this crosses the
>>> kernel/hardware boundry, shouldn't these be __u32?
>> endian of these fields is little endian, all the headers are little endian.
>> I will add comment at top.
> No, use the proper types if this is little endian.  Don't rely on a
> comment to catch things when it goes wrong.
>
>> On __u32 I doubt we need to do it as csc send copy it to internal buffer.
> If this crosses the kernel boundry, it needs to use the proper type.

Understood. I will proceed with using __le32 in this context, provided 
that Sasha agrees.

>
>>>> +{
>>>> +	struct mei_cl_device *cldev;
>>>> +	struct csc_heci_late_bind_req *req = NULL;
>>>> +	struct csc_heci_late_bind_rsp rsp;
>>>> +	size_t req_size;
>>>> +	ssize_t ret;
>>>> +
>>>> +	if (!dev || !payload || !payload_size)
>>>> +		return -EINVAL;
>>> How can any of these ever happen as you control the callers of this
>>> function?
>> I will add WARN here.
> So you will end up crashing the machine and getting a CVE assigned for
> it?
>
> Please no.  If it can't happen, then don't check for it.  If it can
> happen, great, handle it properly.  Don't just give up and cause a
> system to reboot, that's a horrible way to write kernel code.

Fine, will drop the idea of WARN here.

Thanks,
Badal

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01 12:34         ` Nilawar, Badal
@ 2025-07-01 16:41           ` Nilawar, Badal
  2025-07-01 17:34             ` Rodrigo Vivi
  0 siblings, 1 reply; 38+ messages in thread
From: Nilawar, Badal @ 2025-07-01 16:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Usyskin, Alexander, intel-xe, dri-devel, linux-kernel,
	anshuman.gupta, rodrigo.vivi, daniele.ceraolospurio


On 01-07-2025 18:04, Nilawar, Badal wrote:
>
> On 01-07-2025 15:15, Greg KH wrote:
>> On Tue, Jul 01, 2025 at 02:02:15PM +0530, Nilawar, Badal wrote:
>>> On 28-06-2025 17:48, Greg KH wrote:
>>>>> + * @payload_size: size of the payload data in bytes
>>>>> + * @payload: data to be sent to the firmware
>>>>> + */
>>>>> +struct csc_heci_late_bind_req {
>>>>> +    struct mkhi_msg_hdr header;
>>>>> +    u32 type;
>>>>> +    u32 flags;
>>>> What is the endian of these fields?  And as this crosses the
>>>> kernel/hardware boundry, shouldn't these be __u32?
>>> endian of these fields is little endian, all the headers are little 
>>> endian.
>>> I will add comment at top.
>> No, use the proper types if this is little endian.  Don't rely on a
>> comment to catch things when it goes wrong.
>>
>>> On __u32 I doubt we need to do it as csc send copy it to internal 
>>> buffer.
>> If this crosses the kernel boundry, it needs to use the proper type.
>
> Understood. I will proceed with using __le32 in this context, provided 
> that Sasha agrees.

I believe __le{32, 16} is used only when the byte order is fixed and 
matches the host system's native endianness. Since the CSC controller is 
little-endian, is it necessary to specify the endianness here?
If it is mandatory to use the __le{32, 16} endian type, then is there 
need to convert endianness using cpu_to_le and le_to_cpu?

>
>>
>>>>> +{
>>>>> +    struct mei_cl_device *cldev;
>>>>> +    struct csc_heci_late_bind_req *req = NULL;
>>>>> +    struct csc_heci_late_bind_rsp rsp;
>>>>> +    size_t req_size;
>>>>> +    ssize_t ret;
>>>>> +
>>>>> +    if (!dev || !payload || !payload_size)
>>>>> +        return -EINVAL;
>>>> How can any of these ever happen as you control the callers of this
>>>> function?
>>> I will add WARN here.
>> So you will end up crashing the machine and getting a CVE assigned for
>> it?
>>
>> Please no.  If it can't happen, then don't check for it.  If it can
>> happen, great, handle it properly.  Don't just give up and cause a
>> system to reboot, that's a horrible way to write kernel code.
>
> Fine, will drop the idea of WARN here.
>
> Thanks,
> Badal
>
>>
>> thanks,
>>
>> greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01 16:41           ` Nilawar, Badal
@ 2025-07-01 17:34             ` Rodrigo Vivi
  2025-07-02  4:12               ` Gupta, Anshuman
  2025-07-02  6:24               ` Usyskin, Alexander
  0 siblings, 2 replies; 38+ messages in thread
From: Rodrigo Vivi @ 2025-07-01 17:34 UTC (permalink / raw)
  To: Nilawar, Badal
  Cc: Greg KH, Usyskin, Alexander, intel-xe, dri-devel, linux-kernel,
	anshuman.gupta, daniele.ceraolospurio

On Tue, Jul 01, 2025 at 10:11:54PM +0530, Nilawar, Badal wrote:
> 
> On 01-07-2025 18:04, Nilawar, Badal wrote:
> > 
> > On 01-07-2025 15:15, Greg KH wrote:
> > > On Tue, Jul 01, 2025 at 02:02:15PM +0530, Nilawar, Badal wrote:
> > > > On 28-06-2025 17:48, Greg KH wrote:
> > > > > > + * @payload_size: size of the payload data in bytes
> > > > > > + * @payload: data to be sent to the firmware
> > > > > > + */
> > > > > > +struct csc_heci_late_bind_req {
> > > > > > +    struct mkhi_msg_hdr header;
> > > > > > +    u32 type;
> > > > > > +    u32 flags;
> > > > > What is the endian of these fields?  And as this crosses the
> > > > > kernel/hardware boundry, shouldn't these be __u32?
> > > > endian of these fields is little endian, all the headers are
> > > > little endian.
> > > > I will add comment at top.
> > > No, use the proper types if this is little endian.  Don't rely on a
> > > comment to catch things when it goes wrong.
> > > 
> > > > On __u32 I doubt we need to do it as csc send copy it to
> > > > internal buffer.
> > > If this crosses the kernel boundry, it needs to use the proper type.
> > 
> > Understood. I will proceed with using __le32 in this context, provided
> > that Sasha agrees.
> 
> I believe __le{32, 16} is used only when the byte order is fixed and matches
> the host system's native endianness. Since the CSC controller is
> little-endian, is it necessary to specify the endianness here?
> If it is mandatory to use the __le{32, 16} endian type, then is there need
> to convert endianness using cpu_to_le and le_to_cpu?

I honestly don't believe that specifying endianness here is **needed**.
I mean, it might be future safe to use the __le32 and
flags = cpu_to_le32(1 << 0) just in case someone decide to port all the
GPU code to run in big-endian CPU. Very unlikely I'd say, and much more cases
to resolve before we get to this gpu use case here I'm afraid.

Weel, unless this mei here can be used outside of GPU context?!

> 
> > 
> > > 
> > > > > > +{
> > > > > > +    struct mei_cl_device *cldev;
> > > > > > +    struct csc_heci_late_bind_req *req = NULL;
> > > > > > +    struct csc_heci_late_bind_rsp rsp;
> > > > > > +    size_t req_size;
> > > > > > +    ssize_t ret;
> > > > > > +
> > > > > > +    if (!dev || !payload || !payload_size)
> > > > > > +        return -EINVAL;
> > > > > How can any of these ever happen as you control the callers of this
> > > > > function?
> > > > I will add WARN here.
> > > So you will end up crashing the machine and getting a CVE assigned for
> > > it?
> > > 
> > > Please no.  If it can't happen, then don't check for it.  If it can
> > > happen, great, handle it properly.  Don't just give up and cause a
> > > system to reboot, that's a horrible way to write kernel code.

I agree here that the WARN is not a good way to handle that.
We either don't check (remove it) or handle properly (keep as is).

With the context of where this driver is used I'd say it can't happen.
Since xe is properly setting it right now and I don't believe we have
other usages of this mei driver here.

But if there's a chance of this getting used outside of xe, then
we need to keep the check...

But if you keep the check, then also use __lb32() because we need
some consistency in the reasoning, one way or the other.

> > 
> > Fine, will drop the idea of WARN here.
> > 
> > Thanks,
> > Badal
> > 
> > > 
> > > thanks,
> > > 
> > > greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01 17:34             ` Rodrigo Vivi
@ 2025-07-02  4:12               ` Gupta, Anshuman
  2025-07-02  6:24               ` Usyskin, Alexander
  1 sibling, 0 replies; 38+ messages in thread
From: Gupta, Anshuman @ 2025-07-02  4:12 UTC (permalink / raw)
  To: Vivi, Rodrigo, Nilawar, Badal, Usyskin, Alexander
  Cc: Greg KH, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Ceraolo Spurio, Daniele



> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Sent: Tuesday, July 1, 2025 11:05 PM
> To: Nilawar, Badal <badal.nilawar@intel.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>; Usyskin, Alexander
> <alexander.usyskin@intel.com>; intel-xe@lists.freedesktop.org; dri-
> devel@lists.freedesktop.org; linux-kernel@vger.kernel.org; Gupta, Anshuman
> <anshuman.gupta@intel.com>; Ceraolo Spurio, Daniele
> <daniele.ceraolospurio@intel.com>
> Subject: Re: [PATCH v4 02/10] mei: late_bind: add late binding component
> driver
> 
> On Tue, Jul 01, 2025 at 10:11:54PM +0530, Nilawar, Badal wrote:
> >
> > On 01-07-2025 18:04, Nilawar, Badal wrote:
> > >
> > > On 01-07-2025 15:15, Greg KH wrote:
> > > > On Tue, Jul 01, 2025 at 02:02:15PM +0530, Nilawar, Badal wrote:
> > > > > On 28-06-2025 17:48, Greg KH wrote:
> > > > > > > + * @payload_size: size of the payload data in bytes
> > > > > > > + * @payload: data to be sent to the firmware  */ struct
> > > > > > > +csc_heci_late_bind_req {
> > > > > > > +    struct mkhi_msg_hdr header;
> > > > > > > +    u32 type;
> > > > > > > +    u32 flags;
> > > > > > What is the endian of these fields?  And as this crosses the
> > > > > > kernel/hardware boundry, shouldn't these be __u32?
> > > > > endian of these fields is little endian, all the headers are
> > > > > little endian.
> > > > > I will add comment at top.
> > > > No, use the proper types if this is little endian.  Don't rely on
> > > > a comment to catch things when it goes wrong.
> > > >
> > > > > On __u32 I doubt we need to do it as csc send copy it to
> > > > > internal buffer.
> > > > If this crosses the kernel boundry, it needs to use the proper type.
> > >
> > > Understood. I will proceed with using __le32 in this context,
> > > provided that Sasha agrees.
> >
> > I believe __le{32, 16} is used only when the byte order is fixed and
> > matches the host system's native endianness. Since the CSC controller
> > is little-endian, is it necessary to specify the endianness here?
> > If it is mandatory to use the __le{32, 16} endian type, then is there
> > need to convert endianness using cpu_to_le and le_to_cpu?
> 
> I honestly don't believe that specifying endianness here is **needed**.
> I mean, it might be future safe to use the __le32 and flags = cpu_to_le32(1 <<
> 0) just in case someone decide to port all the GPU code to run in big-endian
> CPU. Very unlikely I'd say, and much more cases to resolve before we get to
> this gpu use case here I'm afraid.
> 
> Weel, unless this mei here can be used outside of GPU context?!
MEI is interface driver for CSC firmware that is also part of our GPU.
So, it is completely un-realistic CSC having different endianness as compared to HOST and GPU.
@Usyskin, Alexander what is you opinion ?
Thanks,
Anshuman.


> 
> >
> > >
> > > >
> > > > > > > +{
> > > > > > > +    struct mei_cl_device *cldev;
> > > > > > > +    struct csc_heci_late_bind_req *req = NULL;
> > > > > > > +    struct csc_heci_late_bind_rsp rsp;
> > > > > > > +    size_t req_size;
> > > > > > > +    ssize_t ret;
> > > > > > > +
> > > > > > > +    if (!dev || !payload || !payload_size)
> > > > > > > +        return -EINVAL;
> > > > > > How can any of these ever happen as you control the callers of
> > > > > > this function?
> > > > > I will add WARN here.
> > > > So you will end up crashing the machine and getting a CVE assigned
> > > > for it?
> > > >
> > > > Please no.  If it can't happen, then don't check for it.  If it
> > > > can happen, great, handle it properly.  Don't just give up and
> > > > cause a system to reboot, that's a horrible way to write kernel code.
> 
> I agree here that the WARN is not a good way to handle that.
> We either don't check (remove it) or handle properly (keep as is).
> 
> With the context of where this driver is used I'd say it can't happen.
> Since xe is properly setting it right now and I don't believe we have other
> usages of this mei driver here.
> 
> But if there's a chance of this getting used outside of xe, then we need to keep
> the check...
> 
> But if you keep the check, then also use __lb32() because we need some
> consistency in the reasoning, one way or the other.
> 
> > >
> > > Fine, will drop the idea of WARN here.
> > >
> > > Thanks,
> > > Badal
> > >
> > > >
> > > > thanks,
> > > >
> > > > greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 02/10] mei: late_bind: add late binding component driver
  2025-07-01 17:34             ` Rodrigo Vivi
  2025-07-02  4:12               ` Gupta, Anshuman
@ 2025-07-02  6:24               ` Usyskin, Alexander
  1 sibling, 0 replies; 38+ messages in thread
From: Usyskin, Alexander @ 2025-07-02  6:24 UTC (permalink / raw)
  To: Vivi, Rodrigo, Nilawar, Badal
  Cc: Greg KH, intel-xe@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Gupta, Anshuman, Ceraolo Spurio, Daniele

> Subject: Re: [PATCH v4 02/10] mei: late_bind: add late binding component
> driver
> 
> On Tue, Jul 01, 2025 at 10:11:54PM +0530, Nilawar, Badal wrote:
> >
> > On 01-07-2025 18:04, Nilawar, Badal wrote:
> > >
> > > On 01-07-2025 15:15, Greg KH wrote:
> > > > On Tue, Jul 01, 2025 at 02:02:15PM +0530, Nilawar, Badal wrote:
> > > > > On 28-06-2025 17:48, Greg KH wrote:
> > > > > > > + * @payload_size: size of the payload data in bytes
> > > > > > > + * @payload: data to be sent to the firmware
> > > > > > > + */
> > > > > > > +struct csc_heci_late_bind_req {
> > > > > > > +    struct mkhi_msg_hdr header;
> > > > > > > +    u32 type;
> > > > > > > +    u32 flags;
> > > > > > What is the endian of these fields?  And as this crosses the
> > > > > > kernel/hardware boundry, shouldn't these be __u32?
> > > > > endian of these fields is little endian, all the headers are
> > > > > little endian.
> > > > > I will add comment at top.
> > > > No, use the proper types if this is little endian.  Don't rely on a
> > > > comment to catch things when it goes wrong.
> > > >
> > > > > On __u32 I doubt we need to do it as csc send copy it to
> > > > > internal buffer.
> > > > If this crosses the kernel boundry, it needs to use the proper type.
> > >
> > > Understood. I will proceed with using __le32 in this context, provided
> > > that Sasha agrees.
> >
> > I believe __le{32, 16} is used only when the byte order is fixed and matches
> > the host system's native endianness. Since the CSC controller is
> > little-endian, is it necessary to specify the endianness here?
> > If it is mandatory to use the __le{32, 16} endian type, then is there need
> > to convert endianness using cpu_to_le and le_to_cpu?
> 
> I honestly don't believe that specifying endianness here is **needed**.
> I mean, it might be future safe to use the __le32 and
> flags = cpu_to_le32(1 << 0) just in case someone decide to port all the
> GPU code to run in big-endian CPU. Very unlikely I'd say, and much more cases
> to resolve before we get to this gpu use case here I'm afraid.
> 
> Weel, unless this mei here can be used outside of GPU context?!
> 

There is nothing useful in this outside of GPU context.
This module is tailored for GPU use-case.
If Xe driver is bound to be little-endian, this one should be too.
Other similar modules use u32.

- - 
Thanks,
Sasha


> >
> > >
> > > >
> > > > > > > +{
> > > > > > > +    struct mei_cl_device *cldev;
> > > > > > > +    struct csc_heci_late_bind_req *req = NULL;
> > > > > > > +    struct csc_heci_late_bind_rsp rsp;
> > > > > > > +    size_t req_size;
> > > > > > > +    ssize_t ret;
> > > > > > > +
> > > > > > > +    if (!dev || !payload || !payload_size)
> > > > > > > +        return -EINVAL;
> > > > > > How can any of these ever happen as you control the callers of this
> > > > > > function?
> > > > > I will add WARN here.
> > > > So you will end up crashing the machine and getting a CVE assigned for
> > > > it?
> > > >
> > > > Please no.  If it can't happen, then don't check for it.  If it can
> > > > happen, great, handle it properly.  Don't just give up and cause a
> > > > system to reboot, that's a horrible way to write kernel code.
> 
> I agree here that the WARN is not a good way to handle that.
> We either don't check (remove it) or handle properly (keep as is).
> 
> With the context of where this driver is used I'd say it can't happen.
> Since xe is properly setting it right now and I don't believe we have
> other usages of this mei driver here.
> 
> But if there's a chance of this getting used outside of xe, then
> we need to keep the check...
> 
> But if you keep the check, then also use __lb32() because we need
> some consistency in the reasoning, one way or the other.
> 
> > >
> > > Fine, will drop the idea of WARN here.
> > >
> > > Thanks,
> > > Badal
> > >
> > > >
> > > > thanks,
> > > >
> > > > greg k-h

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-07-02  6:24 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25 17:00 [PATCH v4 00/10] Introducing firmware late binding Badal Nilawar
2025-06-25 17:00 ` [PATCH v4 01/10] mei: bus: add mei_cldev_mtu interface Badal Nilawar
2025-06-25 17:00 ` [PATCH v4 02/10] mei: late_bind: add late binding component driver Badal Nilawar
2025-06-26  3:50   ` Gupta, Anshuman
2025-06-27 14:06     ` Nilawar, Badal
2025-06-28 12:18   ` Greg KH
2025-07-01  8:32     ` Nilawar, Badal
2025-07-01  9:45       ` Greg KH
2025-07-01 12:34         ` Nilawar, Badal
2025-07-01 16:41           ` Nilawar, Badal
2025-07-01 17:34             ` Rodrigo Vivi
2025-07-02  4:12               ` Gupta, Anshuman
2025-07-02  6:24               ` Usyskin, Alexander
2025-07-01 10:05     ` Usyskin, Alexander
2025-06-28 12:19   ` Greg KH
2025-07-01  8:07     ` Nilawar, Badal
2025-07-01  8:17       ` Greg KH
2025-07-01  8:22         ` Nilawar, Badal
2025-07-01  8:32           ` Usyskin, Alexander
2025-06-25 17:00 ` [PATCH v4 03/10] drm/xe/xe_late_bind_fw: Introducing xe_late_bind_fw Badal Nilawar
2025-06-27 21:04   ` Rodrigo Vivi
2025-06-30 13:49     ` Nilawar, Badal
2025-06-25 17:00 ` [PATCH v4 04/10] drm/xe/xe_late_bind_fw: Initialize late binding firmware Badal Nilawar
2025-06-26 21:06   ` Daniele Ceraolo Spurio
2025-06-27 12:48     ` Nilawar, Badal
2025-06-25 17:00 ` [PATCH v4 05/10] drm/xe/xe_late_bind_fw: Load " Badal Nilawar
2025-06-26 17:24   ` Rodrigo Vivi
2025-06-26 21:27     ` Daniele Ceraolo Spurio
2025-06-26 21:49       ` Rodrigo Vivi
2025-06-26 22:38         ` Daniele Ceraolo Spurio
2025-06-26 22:49           ` Rodrigo Vivi
2025-06-25 17:00 ` [PATCH v4 06/10] drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume Badal Nilawar
2025-06-25 17:00 ` [PATCH v4 07/10] drm/xe/xe_late_bind_fw: Reload late binding fw during system resume Badal Nilawar
2025-06-27  7:53   ` Nilawar, Badal
2025-06-25 17:00 ` [PATCH v4 08/10] drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding Badal Nilawar
2025-06-25 17:00 ` [PATCH v4 09/10] drm/xe/xe_late_bind_fw: Extract and print version info Badal Nilawar
2025-06-26 21:32   ` Daniele Ceraolo Spurio
2025-06-25 17:00 ` [PATCH v4 10/10] drm/xe/xe_late_bind_fw: Select INTEL_MEI_LATE_BIND for CI Badal Nilawar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).