* [PATCH 0/3] accel/qaic: Add Sub-system restart (SSR)
@ 2025-10-22 20:25 Youssef Samir
2025-10-22 20:25 ` [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents Youssef Samir
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Youssef Samir @ 2025-10-22 20:25 UTC (permalink / raw)
To: jeff.hugo, carl.vanderlip, troy.hanson, zachary.mckevitt
Cc: ogabbay, lizhi.hou, karol.wachowski, linux-arm-msm, dri-devel
SSR is a feature that mitigates a crash in device sub-system. Usually,
after a workload (using a sub-system) has crashed on the device, the
entire device crashes affecting all the workloads on device.
SSR is used to limit the damage only to that particular workload and
releases the resources used by it, leaving the decision to the user.
Applications are informed when SSR starts and ends via udev notifications.
All ongoing requests for that particular workload will be lost.
During SSR the affected DBC changes state as follows:
DBC_STATE_BEFORE_SHUTDOWN
DBC_STATE_AFTER_SHUTDOWN
DBC_STATE_BEFORE_POWER_UP
DBC_STATE_AFTER_POWER_UP
In addition to supporting the sub-system to recover from a crash, the
device can optionally use SSR to send a crashdump.
Jeff Hugo (1):
accel/qaic: Implement basic SSR handling
Pranjal Ramajor Asha Kanojiya (2):
accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents
accel/qaic: Collect crashdump from SSR channel
Documentation/ABI/stable/sysfs-driver-qaic | 16 +
Documentation/accel/qaic/aic100.rst | 24 +-
drivers/accel/qaic/Kconfig | 1 +
drivers/accel/qaic/Makefile | 2 +
drivers/accel/qaic/qaic.h | 36 +
drivers/accel/qaic/qaic_control.c | 2 +
drivers/accel/qaic/qaic_data.c | 61 +-
drivers/accel/qaic/qaic_drv.c | 25 +
drivers/accel/qaic/qaic_ssr.c | 819 +++++++++++++++++++++
drivers/accel/qaic/qaic_ssr.h | 16 +
drivers/accel/qaic/qaic_sysfs.c | 109 +++
11 files changed, 1102 insertions(+), 9 deletions(-)
create mode 100644 Documentation/ABI/stable/sysfs-driver-qaic
create mode 100644 drivers/accel/qaic/qaic_ssr.c
create mode 100644 drivers/accel/qaic/qaic_ssr.h
create mode 100644 drivers/accel/qaic/qaic_sysfs.c
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents
2025-10-22 20:25 [PATCH 0/3] accel/qaic: Add Sub-system restart (SSR) Youssef Samir
@ 2025-10-22 20:25 ` Youssef Samir
2025-10-23 17:41 ` kernel test robot
2025-10-22 20:25 ` [PATCH 2/3] accel/qaic: Implement basic SSR handling Youssef Samir
2025-10-22 20:25 ` [PATCH 3/3] accel/qaic: Collect crashdump from SSR channel Youssef Samir
2 siblings, 1 reply; 5+ messages in thread
From: Youssef Samir @ 2025-10-22 20:25 UTC (permalink / raw)
To: jeff.hugo, carl.vanderlip, troy.hanson, zachary.mckevitt
Cc: ogabbay, lizhi.hou, karol.wachowski, linux-arm-msm, dri-devel
From: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Expose sysfs files for each DBC representing the current state of that DBC.
For example, sysfs for DBC ID 0 and accel minor number 0 looks like this,
/sys/class/accel/accel0/dbc0_state
Following are the states and their corresponding values,
DBC_STATE_IDLE (0)
DBC_STATE_ASSIGNED (1)
DBC_STATE_BEFORE_SHUTDOWN (2)
DBC_STATE_AFTER_SHUTDOWN (3)
DBC_STATE_BEFORE_POWER_UP (4)
DBC_STATE_AFTER_POWER_UP (5)
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
---
Documentation/ABI/stable/sysfs-driver-qaic | 16 +++
drivers/accel/qaic/Makefile | 1 +
drivers/accel/qaic/qaic.h | 25 +++++
drivers/accel/qaic/qaic_control.c | 2 +
drivers/accel/qaic/qaic_drv.c | 8 ++
drivers/accel/qaic/qaic_sysfs.c | 109 +++++++++++++++++++++
6 files changed, 161 insertions(+)
create mode 100644 Documentation/ABI/stable/sysfs-driver-qaic
create mode 100644 drivers/accel/qaic/qaic_sysfs.c
diff --git a/Documentation/ABI/stable/sysfs-driver-qaic b/Documentation/ABI/stable/sysfs-driver-qaic
new file mode 100644
index 000000000000..d967e8a20c1c
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-driver-qaic
@@ -0,0 +1,16 @@
+What: /sys/bus/pci/drivers/qaic/XXXX:XX:XX.X/accel/accel<minor_nr>/dbc<N>_state
+Date: October 2025
+KernelVersion: 6.19
+Contact: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
+Description: Represents the current state of DMA Bridge channel (DBC). Below are the possible
+ states,
+ IDLE (0) - DBC is free and can be activated
+ ASSIGNED (1) - DBC is activated and a workload is running on device
+ BEFORE_SHUTDOWN (2) - Sub-system associated with this workload has crashed and
+ it will shutdown soon
+ AFTER_SHUTDOWN (3) - Sub-system associated with this workload has crashed and
+ it has shutdown
+ BEFORE_POWER_UP (4) - Sub-system associated with this workload is shutdown and
+ it will be powered up soon
+ AFTER_POWER_UP (5) - Sub-system associated with this workload is now powered up
+Users: Any userspace application or clients interested in DBC state.
diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
index 1106b876f737..8f6746e5f03a 100644
--- a/drivers/accel/qaic/Makefile
+++ b/drivers/accel/qaic/Makefile
@@ -11,6 +11,7 @@ qaic-y := \
qaic_data.o \
qaic_drv.o \
qaic_ras.o \
+ qaic_sysfs.o \
qaic_timesync.o \
sahara.o
diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
index 820d133236dd..4c2f25249e95 100644
--- a/drivers/accel/qaic/qaic.h
+++ b/drivers/accel/qaic/qaic.h
@@ -47,6 +47,22 @@ enum __packed dev_states {
QAIC_ONLINE,
};
+enum dbc_states {
+ /* DBC is free and can be activated */
+ DBC_STATE_IDLE,
+ /* DBC is activated and a workload is running on device */
+ DBC_STATE_ASSIGNED,
+ /* Sub-system associated with this workload has crashed and it will shutdown soon */
+ DBC_STATE_BEFORE_SHUTDOWN,
+ /* Sub-system associated with this workload has crashed and it has shutdown */
+ DBC_STATE_AFTER_SHUTDOWN,
+ /* Sub-system associated with this workload is shutdown and it will be powered up soon */
+ DBC_STATE_BEFORE_POWER_UP,
+ /* Sub-system associated with this workload is now powered up */
+ DBC_STATE_AFTER_POWER_UP,
+ DBC_STATE_MAX,
+};
+
extern bool datapath_polling;
struct qaic_user {
@@ -114,6 +130,8 @@ struct dma_bridge_chan {
unsigned int irq;
/* Polling work item to simulate interrupts */
struct work_struct poll_work;
+ /* Represents various states of this DBC from enum dbc_states */
+ unsigned int state;
};
struct qaic_device {
@@ -195,6 +213,8 @@ struct qaic_drm_device {
struct list_head users;
/* Synchronizes access to users list */
struct mutex users_mutex;
+ /* Pointer to array of DBC sysfs attributes */
+ void *sysfs_attrs;
};
struct qaic_bo {
@@ -319,4 +339,9 @@ int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data, struct drm_file
int qaic_detach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv);
void irq_polling_work(struct work_struct *work);
+/* qaic_sysfs.c */
+int qaic_sysfs_init(struct qaic_drm_device *qddev);
+void qaic_sysfs_remove(struct qaic_drm_device *qddev);
+void set_dbc_state(struct qaic_device *qdev, u32 dbc_id, unsigned int state);
+
#endif /* _QAIC_H_ */
diff --git a/drivers/accel/qaic/qaic_control.c b/drivers/accel/qaic/qaic_control.c
index b86a8e48e731..b1ab1282f9d3 100644
--- a/drivers/accel/qaic/qaic_control.c
+++ b/drivers/accel/qaic/qaic_control.c
@@ -309,6 +309,7 @@ static void save_dbc_buf(struct qaic_device *qdev, struct ioctl_resources *resou
enable_dbc(qdev, dbc_id, usr);
qdev->dbc[dbc_id].in_use = true;
resources->buf = NULL;
+ set_dbc_state(qdev, dbc_id, DBC_STATE_ASSIGNED);
}
}
@@ -921,6 +922,7 @@ static int decode_deactivate(struct qaic_device *qdev, void *trans, u32 *msg_len
}
release_dbc(qdev, dbc_id);
+ set_dbc_state(qdev, dbc_id, DBC_STATE_IDLE);
*msg_len += sizeof(*in_trans);
return 0;
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
index e162f4b8a262..a8a16f20320f 100644
--- a/drivers/accel/qaic/qaic_drv.c
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -270,6 +270,13 @@ static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id)
return ret;
}
+ ret = qaic_sysfs_init(qddev);
+ if (ret) {
+ drm_dev_unregister(drm);
+ pci_dbg(qdev->pdev, "qaic_sysfs_init failed %d\n", ret);
+ return ret;
+ }
+
qaic_debugfs_init(qddev);
return ret;
@@ -281,6 +288,7 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
struct drm_device *drm = to_drm(qddev);
struct qaic_user *usr;
+ qaic_sysfs_remove(qddev);
drm_dev_unregister(drm);
qddev->partition_id = 0;
/*
diff --git a/drivers/accel/qaic/qaic_sysfs.c b/drivers/accel/qaic/qaic_sysfs.c
new file mode 100644
index 000000000000..5d7aa1a5f4d1
--- /dev/null
+++ b/drivers/accel/qaic/qaic_sysfs.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2025, The Linux Foundation. All rights reserved. */
+
+#include <drm/drm_file.h>
+#include <drm/drm_managed.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/kobject.h>
+#include <linux/mutex.h>
+#include <linux/sysfs.h>
+
+#include "qaic.h"
+
+#define NAME_LEN 14
+
+struct dbc_attribute {
+ struct device_attribute dev_attr;
+ u32 dbc_id;
+ char name[NAME_LEN];
+};
+
+static ssize_t dbc_state_show(struct device *dev, struct device_attribute *a, char *buf)
+{
+ struct dbc_attribute *dbc_attr = container_of(a, struct dbc_attribute, dev_attr);
+ struct drm_minor *minor = dev_get_drvdata(dev);
+ struct qaic_device *qdev;
+
+ qdev = to_qaic_device(minor->dev);
+ return sysfs_emit(buf, "%d\n", qdev->dbc[dbc_attr->dbc_id].state);
+}
+
+void set_dbc_state(struct qaic_device *qdev, u32 dbc_id, unsigned int state)
+{
+ struct device *kdev = to_accel_kdev(qdev->qddev);
+ char *envp[3] = {};
+ char state_str[16];
+ char id_str[12];
+
+ envp[0] = id_str;
+ envp[1] = state_str;
+
+ if (state >= DBC_STATE_MAX)
+ return;
+ if (dbc_id >= qdev->num_dbc)
+ return;
+ if (state == qdev->dbc[dbc_id].state)
+ return;
+
+ snprintf(id_str, ARRAY_SIZE(id_str), "DBC_ID=%d", dbc_id);
+ snprintf(state_str, ARRAY_SIZE(state_str), "DBC_STATE=%d", state);
+
+ qdev->dbc[dbc_id].state = state;
+ kobject_uevent_env(&kdev->kobj, KOBJ_CHANGE, envp);
+}
+
+int qaic_sysfs_init(struct qaic_drm_device *qddev)
+{
+ struct device *kdev = to_accel_kdev(qddev);
+ struct drm_device *drm = to_drm(qddev);
+ u32 num_dbc = qddev->qdev->num_dbc;
+ struct dbc_attribute *dbc_attrs;
+ int i, ret;
+
+ dbc_attrs = drmm_kcalloc(drm, num_dbc, sizeof(*dbc_attrs), GFP_KERNEL);
+ if (!dbc_attrs)
+ return -ENOMEM;
+
+ for (i = 0; i < num_dbc; ++i) {
+ struct dbc_attribute *dbc_attr = &dbc_attrs[i];
+
+ sysfs_attr_init(&dbc_attr->dev_attr.attr);
+ dbc_attr->dbc_id = i;
+ snprintf(dbc_attr->name, NAME_LEN, "dbc%d_state", i);
+ dbc_attr->dev_attr.attr.name = dbc_attr->name;
+ dbc_attr->dev_attr.attr.mode = 0444;
+ dbc_attr->dev_attr.show = dbc_state_show;
+ ret = sysfs_create_file(&kdev->kobj, &dbc_attr->dev_attr.attr);
+ if (ret) {
+ int j;
+
+ for (j = 0; j < i; ++j) {
+ dbc_attr = &dbc_attrs[j];
+ sysfs_remove_file(&kdev->kobj, &dbc_attr->dev_attr.attr);
+ }
+ drmm_kfree(drm, dbc_attrs);
+ return ret;
+ }
+ }
+
+ qddev->sysfs_attrs = dbc_attrs;
+ return 0;
+}
+
+void qaic_sysfs_remove(struct qaic_drm_device *qddev)
+{
+ struct dbc_attribute *dbc_attrs = qddev->sysfs_attrs;
+ struct device *kdev = to_accel_kdev(qddev);
+ u32 num_dbc = qddev->qdev->num_dbc;
+ int i;
+
+ if (!dbc_attrs)
+ return;
+
+ qddev->sysfs_attrs = NULL;
+ for (i = 0; i < num_dbc; ++i)
+ sysfs_remove_file(&kdev->kobj, &dbc_attrs[i].dev_attr.attr);
+ drmm_kfree(to_drm(qddev), dbc_attrs);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] accel/qaic: Implement basic SSR handling
2025-10-22 20:25 [PATCH 0/3] accel/qaic: Add Sub-system restart (SSR) Youssef Samir
2025-10-22 20:25 ` [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents Youssef Samir
@ 2025-10-22 20:25 ` Youssef Samir
2025-10-22 20:25 ` [PATCH 3/3] accel/qaic: Collect crashdump from SSR channel Youssef Samir
2 siblings, 0 replies; 5+ messages in thread
From: Youssef Samir @ 2025-10-22 20:25 UTC (permalink / raw)
To: jeff.hugo, carl.vanderlip, troy.hanson, zachary.mckevitt
Cc: ogabbay, lizhi.hou, karol.wachowski, linux-arm-msm, dri-devel,
Jeffrey Hugo, Pranjal Ramajor Asha Kanojiya, Troy Hanson,
Aswin Venkatesan
From: Jeffrey Hugo <jhugo@codeaurora.org>
Subsystem restart (SSR) for a qaic device means that a NSP has crashed,
and will be restarted. However the restart process will lose any state
associated with activation, so the user will need to do some recovery.
While SSR has the provision to collect a crash dump, this patch does not
support it.
Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Co-developed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Co-developed-by: Troy Hanson <quic_thanson@quicinc.com>
Signed-off-by: Troy Hanson <quic_thanson@quicinc.com>
Co-developed-by: Aswin Venkatesan <aswivenk@qti.qualcomm.com>
Signed-off-by: Aswin Venkatesan <aswivenk@qti.qualcomm.com>
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
---
Documentation/accel/qaic/aic100.rst | 24 ++-
drivers/accel/qaic/Makefile | 1 +
drivers/accel/qaic/qaic.h | 9 +
drivers/accel/qaic/qaic_data.c | 61 +++++-
drivers/accel/qaic/qaic_drv.c | 17 ++
drivers/accel/qaic/qaic_ssr.c | 293 ++++++++++++++++++++++++++++
drivers/accel/qaic/qaic_ssr.h | 16 ++
7 files changed, 412 insertions(+), 9 deletions(-)
create mode 100644 drivers/accel/qaic/qaic_ssr.c
create mode 100644 drivers/accel/qaic/qaic_ssr.h
diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
index 273da6192fb3..3b287c3987d2 100644
--- a/Documentation/accel/qaic/aic100.rst
+++ b/Documentation/accel/qaic/aic100.rst
@@ -487,8 +487,8 @@ one user crashes, the fallout of that should be limited to that workload and not
impact other workloads. SSR accomplishes this.
If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
-channel. This notification identifies the workload by it's assigned DBC. A
-multi-stage recovery process is then used to cleanup both sides, and get the
+channel. This notification identifies the workload by its assigned DBC. A
+multi-stage recovery process is then used to cleanup both sides, and gets the
DBC/NSPs into a working state.
When SSR occurs, any state in the workload is lost. Any inputs that were in
@@ -496,6 +496,26 @@ process, or queued by not yet serviced, are lost. The loaded artifacts will
remain in on-card DDR, but the host will need to re-activate the workload if
it desires to recover the workload.
+When SSR occurs for a specific NSP, the assigned DBC goes through the
+following state transactions in order:
+DBC_STATE_BEFORE_SHUTDOWN
+ Indicates that the affected NSP was found in an unrecoverable error
+ condition.
+DBC_STATE_AFTER_SHUTDOWN
+ Indicates that the NSP is under reset.
+DBC_STATE_BEFORE_POWER_UP
+ Indicates that the NSP's debug information has been collected, and is
+ ready to be collected by the host (if desired). At that stage the NSP
+ is restarted by QSM.
+DBC_STATE_AFTER_POWER_UP
+ Indicates that the NSP has been restarted, fully operational and is
+ in idle state.
+
+SSR also has an optional crashdump collection feature. If enabled, the host can
+collect the memory dump for the crashed NSP and dump it to the user space via
+the dev_coredump subsystem. The host can also decline the crashdump collection
+request from the device.
+
Reliability, Accessibility, Serviceability (RAS)
================================================
diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
index 8f6746e5f03a..71f727b74da3 100644
--- a/drivers/accel/qaic/Makefile
+++ b/drivers/accel/qaic/Makefile
@@ -11,6 +11,7 @@ qaic-y := \
qaic_data.o \
qaic_drv.o \
qaic_ras.o \
+ qaic_ssr.o \
qaic_sysfs.o \
qaic_timesync.o \
sahara.o
diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
index 4c2f25249e95..3bd37b494d49 100644
--- a/drivers/accel/qaic/qaic.h
+++ b/drivers/accel/qaic/qaic.h
@@ -21,6 +21,7 @@
#define QAIC_DBC_BASE SZ_128K
#define QAIC_DBC_SIZE SZ_4K
+#define SSR_DBC_SENTINEL U32_MAX /* No ongoing SSR sentinel */
#define QAIC_NO_PARTITION -1
@@ -195,6 +196,12 @@ struct qaic_device {
unsigned int ue_count;
/* Un-correctable non-fatal error count */
unsigned int ue_nf_count;
+ /* MHI SSR channel device */
+ struct mhi_device *ssr_ch;
+ /* Work queue for tasks related to MHI SSR device */
+ struct workqueue_struct *ssr_wq;
+ /* DBC which is under SSR. Sentinel U32_MAX would mean that no SSR in progress */
+ u32 ssr_dbc;
};
struct qaic_drm_device {
@@ -338,6 +345,8 @@ int qaic_wait_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv);
int qaic_detach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv);
void irq_polling_work(struct work_struct *work);
+void dbc_enter_ssr(struct qaic_device *qdev, u32 dbc_id);
+void dbc_exit_ssr(struct qaic_device *qdev);
/* qaic_sysfs.c */
int qaic_sysfs_init(struct qaic_drm_device *qddev);
diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
index c4f117edb266..018c97e7ab72 100644
--- a/drivers/accel/qaic/qaic_data.c
+++ b/drivers/accel/qaic/qaic_data.c
@@ -1023,6 +1023,11 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
goto unlock_ch_srcu;
}
+ if (dbc->id == qdev->ssr_dbc) {
+ ret = -EPIPE;
+ goto unlock_ch_srcu;
+ }
+
ret = qaic_prepare_bo(qdev, bo, &args->hdr);
if (ret)
goto unlock_ch_srcu;
@@ -1356,6 +1361,11 @@ static int __qaic_execute_bo_ioctl(struct drm_device *dev, void *data, struct dr
goto release_ch_rcu;
}
+ if (dbc->id == qdev->ssr_dbc) {
+ ret = -EPIPE;
+ goto release_ch_rcu;
+ }
+
ret = mutex_lock_interruptible(&dbc->req_lock);
if (ret)
goto release_ch_rcu;
@@ -1709,6 +1719,11 @@ int qaic_wait_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
goto unlock_ch_srcu;
}
+ if (dbc->id == qdev->ssr_dbc) {
+ ret = -EPIPE;
+ goto unlock_ch_srcu;
+ }
+
obj = drm_gem_object_lookup(file_priv, args->handle);
if (!obj) {
ret = -ENOENT;
@@ -1729,6 +1744,9 @@ int qaic_wait_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
if (!dbc->usr)
ret = -EPERM;
+ if (dbc->id == qdev->ssr_dbc)
+ ret = -EPIPE;
+
put_obj:
drm_gem_object_put(obj);
unlock_ch_srcu:
@@ -1927,6 +1945,17 @@ static void empty_xfer_list(struct qaic_device *qdev, struct dma_bridge_chan *db
spin_unlock_irqrestore(&dbc->xfer_lock, flags);
}
+static void sync_empty_xfer_list(struct qaic_device *qdev, struct dma_bridge_chan *dbc)
+{
+ empty_xfer_list(qdev, dbc);
+ synchronize_srcu(&dbc->ch_lock);
+ /*
+ * Threads holding channel lock, may add more elements in the xfer_list.
+ * Flush out these elements from xfer_list.
+ */
+ empty_xfer_list(qdev, dbc);
+}
+
int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
{
if (!qdev->dbc[dbc_id].usr || qdev->dbc[dbc_id].usr->handle != usr->handle)
@@ -1955,13 +1984,7 @@ void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id)
struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
dbc->usr = NULL;
- empty_xfer_list(qdev, dbc);
- synchronize_srcu(&dbc->ch_lock);
- /*
- * Threads holding channel lock, may add more elements in the xfer_list.
- * Flush out these elements from xfer_list.
- */
- empty_xfer_list(qdev, dbc);
+ sync_empty_xfer_list(qdev, dbc);
}
void release_dbc(struct qaic_device *qdev, u32 dbc_id)
@@ -2002,3 +2025,27 @@ void qaic_data_get_fifo_info(struct dma_bridge_chan *dbc, u32 *head, u32 *tail)
*head = readl(dbc->dbc_base + REQHP_OFF);
*tail = readl(dbc->dbc_base + REQTP_OFF);
}
+
+/*
+ * dbc_enter_ssr - Prepare to enter in sub system reset(SSR) for given DBC ID.
+ * The device will automatically deactivate the workload as not
+ * all errors can be silently recovered. The user will be
+ * notified and will need to decide the required recovery
+ * action to take.
+ * @qdev: qaic device handle
+ * @dbc_id: ID of the DBC which will enter SSR
+ */
+void dbc_enter_ssr(struct qaic_device *qdev, u32 dbc_id)
+{
+ qdev->ssr_dbc = dbc_id;
+ release_dbc(qdev, dbc_id);
+}
+
+/*
+ * dbc_exit_ssr - Prepare to exit from sub system reset(SSR) for given DBC ID
+ * @qdev: qaic device handle
+ */
+void dbc_exit_ssr(struct qaic_device *qdev)
+{
+ qdev->ssr_dbc = SSR_DBC_SENTINEL;
+}
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
index a8a16f20320f..4aac8d1eba8c 100644
--- a/drivers/accel/qaic/qaic_drv.c
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -30,6 +30,7 @@
#include "qaic.h"
#include "qaic_debugfs.h"
#include "qaic_ras.h"
+#include "qaic_ssr.h"
#include "qaic_timesync.h"
#include "sahara.h"
@@ -390,6 +391,7 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
qaic_notify_reset(qdev);
/* start tearing things down */
+ clean_up_ssr(qdev);
for (i = 0; i < qdev->num_dbc; ++i)
release_dbc(qdev, i);
}
@@ -439,11 +441,18 @@ static struct qaic_device *create_qdev(struct pci_dev *pdev,
qdev->qts_wq = qaicm_wq_init(drm, "qaic_ts");
if (IS_ERR(qdev->qts_wq))
return NULL;
+ qdev->ssr_wq = qaicm_wq_init(drm, "qaic_ssr");
+ if (IS_ERR(qdev->ssr_wq))
+ return NULL;
ret = qaicm_srcu_init(drm, &qdev->dev_lock);
if (ret)
return NULL;
+ ret = ssr_init(qdev, drm);
+ if (ret)
+ pci_info(pdev, "QAIC SSR crashdump collection not supported.\n");
+
qdev->qddev = qddev;
qdev->pdev = pdev;
qddev->qdev = qdev;
@@ -710,9 +719,16 @@ static int __init qaic_init(void)
ret = qaic_ras_register();
if (ret)
pr_debug("qaic: qaic_ras_register failed %d\n", ret);
+ ret = qaic_ssr_register();
+ if (ret) {
+ pr_debug("qaic: qaic_ssr_register failed %d\n", ret);
+ goto free_bootlog;
+ }
return 0;
+free_bootlog:
+ qaic_bootlog_unregister();
free_mhi:
mhi_driver_unregister(&qaic_mhi_driver);
free_pci:
@@ -738,6 +754,7 @@ static void __exit qaic_exit(void)
* reinitializing the link_up state after the cleanup is done.
*/
link_up = true;
+ qaic_ssr_unregister();
qaic_ras_unregister();
qaic_bootlog_unregister();
qaic_timesync_deinit();
diff --git a/drivers/accel/qaic/qaic_ssr.c b/drivers/accel/qaic/qaic_ssr.c
new file mode 100644
index 000000000000..1ffb44767b3d
--- /dev/null
+++ b/drivers/accel/qaic/qaic_ssr.c
@@ -0,0 +1,293 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2024 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <asm/byteorder.h>
+#include <drm/drm_file.h>
+#include <drm/drm_managed.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/mhi.h>
+#include <linux/workqueue.h>
+
+#include "qaic.h"
+#include "qaic_ssr.h"
+
+#define MSG_BUF_SZ 32
+
+enum ssr_cmds {
+ DEBUG_TRANSFER_INFO = BIT(0),
+ DEBUG_TRANSFER_INFO_RSP = BIT(1),
+ MEMORY_READ = BIT(2),
+ MEMORY_READ_RSP = BIT(3),
+ DEBUG_TRANSFER_DONE = BIT(4),
+ DEBUG_TRANSFER_DONE_RSP = BIT(5),
+ SSR_EVENT = BIT(8),
+ SSR_EVENT_RSP = BIT(9),
+};
+
+enum ssr_events {
+ SSR_EVENT_NACK = BIT(0),
+ BEFORE_SHUTDOWN = BIT(1),
+ AFTER_SHUTDOWN = BIT(2),
+ BEFORE_POWER_UP = BIT(3),
+ AFTER_POWER_UP = BIT(4),
+};
+
+struct _ssr_hdr {
+ __le32 cmd;
+ __le32 len;
+ __le32 dbc_id;
+};
+
+struct ssr_hdr {
+ u32 cmd;
+ u32 len;
+ u32 dbc_id;
+};
+
+struct ssr_debug_transfer_info {
+ struct ssr_hdr hdr;
+ u32 resv;
+ u64 tbl_addr;
+ u64 tbl_len;
+} __packed;
+
+struct ssr_debug_transfer_info_rsp {
+ struct _ssr_hdr hdr;
+ __le32 ret;
+} __packed;
+
+struct ssr_memory_read {
+ struct _ssr_hdr hdr;
+ __le32 resv;
+ __le64 addr;
+ __le64 len;
+} __packed;
+
+struct ssr_memory_read_rsp {
+ struct _ssr_hdr hdr;
+ __le32 resv;
+ u8 data[];
+} __packed;
+
+struct ssr_debug_transfer_done {
+ struct _ssr_hdr hdr;
+ __le32 resv;
+} __packed;
+
+struct ssr_debug_transfer_done_rsp {
+ struct _ssr_hdr hdr;
+ __le32 ret;
+} __packed;
+
+struct ssr_event {
+ struct ssr_hdr hdr;
+ u32 event;
+} __packed;
+
+struct ssr_event_rsp {
+ struct _ssr_hdr hdr;
+ __le32 event;
+} __packed;
+
+struct ssr_resp {
+ /* Work struct to schedule work coming on QAIC_SSR channel */
+ struct work_struct work;
+ /* Root struct of device, used to access device resources */
+ struct qaic_device *qdev;
+ /* Buffer used by MHI for transfer requests */
+ u8 data[] __aligned(8);
+};
+
+void clean_up_ssr(struct qaic_device *qdev)
+{
+ dbc_exit_ssr(qdev);
+}
+
+static void ssr_worker(struct work_struct *work)
+{
+ struct ssr_resp *resp = container_of(work, struct ssr_resp, work);
+ struct ssr_hdr *hdr = (struct ssr_hdr *)resp->data;
+ struct ssr_debug_transfer_info_rsp *debug_rsp;
+ struct qaic_device *qdev = resp->qdev;
+ struct ssr_event_rsp *event_rsp;
+ struct dma_bridge_chan *dbc;
+ struct ssr_event *event;
+ u32 ssr_event_ack;
+ int ret;
+
+ le32_to_cpus(&hdr->cmd);
+ le32_to_cpus(&hdr->len);
+ le32_to_cpus(&hdr->dbc_id);
+
+ if (hdr->len > MSG_BUF_SZ)
+ goto out;
+
+ if (hdr->dbc_id >= qdev->num_dbc)
+ goto out;
+
+ dbc = &qdev->dbc[hdr->dbc_id];
+
+ switch (hdr->cmd) {
+ case DEBUG_TRANSFER_INFO:
+ /* Decline crash dump request from the device */
+ debug_rsp = kmalloc(sizeof(*debug_rsp), GFP_KERNEL);
+ if (!debug_rsp)
+ break;
+
+ debug_rsp->hdr.cmd = cpu_to_le32(DEBUG_TRANSFER_INFO_RSP);
+ debug_rsp->hdr.len = cpu_to_le32(sizeof(*debug_rsp));
+ debug_rsp->hdr.dbc_id = cpu_to_le32(event->hdr.dbc_id);
+ debug_rsp->ret = cpu_to_le32(1);
+
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_TO_DEVICE,
+ debug_rsp, sizeof(*debug_rsp), MHI_EOT);
+ if (ret) {
+ pci_warn(qdev->pdev, "Could not send DEBUG_TRANSFER_INFO_RSP %d\n", ret);
+ kfree(debug_rsp);
+ }
+ return;
+ case SSR_EVENT:
+ event = (struct ssr_event *)hdr;
+ le32_to_cpus(&event->event);
+ ssr_event_ack = event->event;
+
+ switch (event->event) {
+ case BEFORE_SHUTDOWN:
+ set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_BEFORE_SHUTDOWN);
+ dbc_enter_ssr(qdev, hdr->dbc_id);
+ break;
+ case AFTER_SHUTDOWN:
+ set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_AFTER_SHUTDOWN);
+ break;
+ case BEFORE_POWER_UP:
+ set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_BEFORE_POWER_UP);
+ break;
+ case AFTER_POWER_UP:
+ set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_AFTER_POWER_UP);
+ break;
+ default:
+ break;
+ }
+
+ event_rsp = kmalloc(sizeof(*event_rsp), GFP_KERNEL);
+ if (!event_rsp)
+ break;
+
+ event_rsp->hdr.cmd = cpu_to_le32(SSR_EVENT_RSP);
+ event_rsp->hdr.len = cpu_to_le32(sizeof(*event_rsp));
+ event_rsp->hdr.dbc_id = cpu_to_le32(hdr->dbc_id);
+ event_rsp->event = cpu_to_le32(ssr_event_ack);
+
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_TO_DEVICE, event_rsp, sizeof(*event_rsp),
+ MHI_EOT);
+ if (ret)
+ kfree(event_rsp);
+
+ if (event->event == AFTER_POWER_UP) {
+ dbc_exit_ssr(qdev);
+ set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_IDLE);
+ }
+
+ break;
+ default:
+ break;
+ }
+
+out:
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_FROM_DEVICE, resp->data, MSG_BUF_SZ, MHI_EOT);
+ if (ret)
+ kfree(resp);
+}
+
+static int qaic_ssr_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
+{
+ struct qaic_device *qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+ struct ssr_resp *resp;
+ int ret;
+
+ ret = mhi_prepare_for_transfer(mhi_dev);
+ if (ret)
+ return ret;
+
+ resp = kzalloc(sizeof(*resp) + MSG_BUF_SZ, GFP_KERNEL);
+ if (!resp) {
+ mhi_unprepare_from_transfer(mhi_dev);
+ return -ENOMEM;
+ }
+
+ resp->qdev = qdev;
+ INIT_WORK(&resp->work, ssr_worker);
+
+ ret = mhi_queue_buf(mhi_dev, DMA_FROM_DEVICE, resp->data, MSG_BUF_SZ, MHI_EOT);
+ if (ret) {
+ kfree(resp);
+ mhi_unprepare_from_transfer(mhi_dev);
+ return ret;
+ }
+
+ dev_set_drvdata(&mhi_dev->dev, qdev);
+ qdev->ssr_ch = mhi_dev;
+
+ return 0;
+}
+
+static void qaic_ssr_mhi_remove(struct mhi_device *mhi_dev)
+{
+ struct qaic_device *qdev;
+
+ qdev = dev_get_drvdata(&mhi_dev->dev);
+ mhi_unprepare_from_transfer(qdev->ssr_ch);
+ qdev->ssr_ch = NULL;
+}
+
+static void qaic_ssr_mhi_ul_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
+{
+ kfree(mhi_result->buf_addr);
+}
+
+static void qaic_ssr_mhi_dl_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
+{
+ struct ssr_resp *resp = container_of(mhi_result->buf_addr, struct ssr_resp, data);
+ struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+
+ if (mhi_result->transaction_status) {
+ kfree(resp);
+ return;
+ }
+ queue_work(qdev->ssr_wq, &resp->work);
+}
+
+static const struct mhi_device_id qaic_ssr_mhi_match_table[] = {
+ { .chan = "QAIC_SSR", },
+ {},
+};
+
+static struct mhi_driver qaic_ssr_mhi_driver = {
+ .id_table = qaic_ssr_mhi_match_table,
+ .remove = qaic_ssr_mhi_remove,
+ .probe = qaic_ssr_mhi_probe,
+ .ul_xfer_cb = qaic_ssr_mhi_ul_xfer_cb,
+ .dl_xfer_cb = qaic_ssr_mhi_dl_xfer_cb,
+ .driver = {
+ .name = "qaic_ssr",
+ },
+};
+
+int ssr_init(struct qaic_device *qdev, struct drm_device *drm)
+{
+ qdev->ssr_dbc = SSR_DBC_SENTINEL;
+ return 0;
+}
+
+int qaic_ssr_register(void)
+{
+ return mhi_driver_register(&qaic_ssr_mhi_driver);
+}
+
+void qaic_ssr_unregister(void)
+{
+ mhi_driver_unregister(&qaic_ssr_mhi_driver);
+}
diff --git a/drivers/accel/qaic/qaic_ssr.h b/drivers/accel/qaic/qaic_ssr.h
new file mode 100644
index 000000000000..7de1eb4086cd
--- /dev/null
+++ b/drivers/accel/qaic/qaic_ssr.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2021, 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef __QAIC_SSR_H__
+#define __QAIC_SSR_H__
+
+#include <drm/drm_device.h>
+
+int qaic_ssr_register(void);
+void qaic_ssr_unregister(void);
+void clean_up_ssr(struct qaic_device *qdev);
+int ssr_init(struct qaic_device *qdev, struct drm_device *drm);
+#endif /* __QAIC_SSR_H__ */
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] accel/qaic: Collect crashdump from SSR channel
2025-10-22 20:25 [PATCH 0/3] accel/qaic: Add Sub-system restart (SSR) Youssef Samir
2025-10-22 20:25 ` [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents Youssef Samir
2025-10-22 20:25 ` [PATCH 2/3] accel/qaic: Implement basic SSR handling Youssef Samir
@ 2025-10-22 20:25 ` Youssef Samir
2 siblings, 0 replies; 5+ messages in thread
From: Youssef Samir @ 2025-10-22 20:25 UTC (permalink / raw)
To: jeff.hugo, carl.vanderlip, troy.hanson, zachary.mckevitt
Cc: ogabbay, lizhi.hou, karol.wachowski, linux-arm-msm, dri-devel,
Jeffrey Hugo, Pranjal Ramajor Asha Kanojiya
From: Pranjal Ramajor Asha Kanojiya <pkanojiy@codeaurora.org>
After subsystem of the device has crashed it sends a message with
command DEBUG_TRANSFER_INFO to kernel(host). Send ACK for that message
and then prepare to collect the ramdump of the subsystem
Steps of crashdump collection is as follows,
1) Device sends DEBUG_TRANSFER_INFO message indicating that device wants
to send crashdump.
2) Send an acknowledgment to that message either ACK or NACK.
a) NACK will inform the device that host will not download the
crashdump
b) ACK will inform the device that host will download the crashdump
3) Along with the DEBUG_TRANSFER_INFO we receive a table base address and
its length, use that to download that table from device.
a) This table is meta data of the crashdump and not the actual
crashdump.
4) After we respond as ACK for message received on step 1) we start
downloading the table. Use series of MEMORY_READ/MEMORY_READ_RSP SSR
commands to download the entire table.
5) Each entry in the table represents a segment of crashdump. Once the
table downloading is complete, iterate through each entry of table
and download each crashdump segment(same as table itself). Table entry
contains the memory base address and length along with other info.
6) After the entire crashdump is downloaded send DEBUG_TRANSFER_DONE
which marks that host is terminating the crashdump transfer. This
message can be send in both success or error case.
7) After receiving DEBUG_TRANSFER_DONE_RSP hand over the crashdump to
dev_coredumpv() and free all the necessary memory.
Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Co-developed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <pkanojiy@codeaurora.org>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
---
drivers/accel/qaic/Kconfig | 1 +
drivers/accel/qaic/qaic.h | 2 +
drivers/accel/qaic/qaic_ssr.c | 556 +++++++++++++++++++++++++++++++++-
3 files changed, 544 insertions(+), 15 deletions(-)
diff --git a/drivers/accel/qaic/Kconfig b/drivers/accel/qaic/Kconfig
index 5e405a19c157..116e42d152ca 100644
--- a/drivers/accel/qaic/Kconfig
+++ b/drivers/accel/qaic/Kconfig
@@ -9,6 +9,7 @@ config DRM_ACCEL_QAIC
depends on PCI && HAS_IOMEM
depends on MHI_BUS
select CRC32
+ select WANT_DEV_COREDUMP
help
Enables driver for Qualcomm's Cloud AI accelerator PCIe cards that are
designed to accelerate Deep Learning inference workloads.
diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
index 3bd37b494d49..b6612a086078 100644
--- a/drivers/accel/qaic/qaic.h
+++ b/drivers/accel/qaic/qaic.h
@@ -200,6 +200,8 @@ struct qaic_device {
struct mhi_device *ssr_ch;
/* Work queue for tasks related to MHI SSR device */
struct workqueue_struct *ssr_wq;
+ /* Buffer to collect SSR crashdump via SSR MHI channel */
+ void *ssr_mhi_buf;
/* DBC which is under SSR. Sentinel U32_MAX would mean that no SSR in progress */
u32 ssr_dbc;
};
diff --git a/drivers/accel/qaic/qaic_ssr.c b/drivers/accel/qaic/qaic_ssr.c
index 1ffb44767b3d..964c3034e6cf 100644
--- a/drivers/accel/qaic/qaic_ssr.c
+++ b/drivers/accel/qaic/qaic_ssr.c
@@ -6,6 +6,7 @@
#include <asm/byteorder.h>
#include <drm/drm_file.h>
#include <drm/drm_managed.h>
+#include <linux/devcoredump.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/mhi.h>
@@ -15,6 +16,9 @@
#include "qaic_ssr.h"
#define MSG_BUF_SZ 32
+#define SSR_MHI_BUF_SIZE SZ_64K
+#define SSR_MEM_READ_DATA_SIZE ((u64)SSR_MHI_BUF_SIZE - sizeof(struct ssr_crashdump))
+#define SSR_MEM_READ_CHUNK_SIZE ((u64)SSR_MEM_READ_DATA_SIZE - sizeof(struct ssr_memory_read_rsp))
enum ssr_cmds {
DEBUG_TRANSFER_INFO = BIT(0),
@@ -35,6 +39,19 @@ enum ssr_events {
AFTER_POWER_UP = BIT(4),
};
+struct debug_info_table {
+ /* Save preferences. Default is mandatory */
+ u64 save_perf;
+ /* Base address of the debug region */
+ u64 mem_base;
+ /* Size of debug region in bytes */
+ u64 len;
+ /* Description */
+ char desc[20];
+ /* Filename of debug region */
+ char filename[20];
+};
+
struct _ssr_hdr {
__le32 cmd;
__le32 len;
@@ -101,17 +118,453 @@ struct ssr_resp {
u8 data[] __aligned(8);
};
+/* SSR crashdump book keeping structure */
+struct ssr_dump_info {
+ /* DBC associated with this SSR crashdump */
+ struct dma_bridge_chan *dbc;
+ /*
+ * It will be used when we complete the crashdump download and switch
+ * to waiting on SSR events
+ */
+ struct ssr_resp *resp;
+ /* MEMORY READ request MHI buffer.*/
+ struct ssr_memory_read *read_buf_req;
+ /* TRUE: ->read_buf_req is queued for MHI transaction. FALSE: Otherwise */
+ bool read_buf_req_queued;
+ /* Address of table in host */
+ void *tbl_addr;
+ /* Total size of table */
+ u64 tbl_len;
+ /* Offset of table(->tbl_addr) where the new chunk will be dumped */
+ u64 tbl_off;
+ /* Address of table in device/target */
+ u64 tbl_addr_dev;
+ /* Ptr to the entire dump */
+ void *dump_addr;
+ /* Entire crashdump size */
+ u64 dump_sz;
+ /* Offset of crashdump(->dump_addr) where the new chunk will be dumped */
+ u64 dump_off;
+ /* Points to the table entry we are currently downloading */
+ struct debug_info_table *tbl_ent;
+ /* Offset in the current table entry(->tbl_ent) for next chuck */
+ u64 tbl_ent_off;
+};
+
+struct ssr_crashdump {
+ /*
+ * Points to a book keeping struct maintained by MHI SSR device while
+ * downloading a SSR crashdump. It is NULL when crashdump downloading
+ * not in progress.
+ */
+ struct ssr_dump_info *dump_info;
+ /* Work struct to schedule work coming on QAIC_SSR channel */
+ struct work_struct work;
+ /* Root struct of device, used to access device resources */
+ struct qaic_device *qdev;
+ /* Buffer used by MHI for transfer requests */
+ u8 data[];
+};
+
+#define QAIC_SSR_DUMP_V1_MAGIC 0x1234567890abcdef
+#define QAIC_SSR_DUMP_V1_VER 1
+struct dump_file_meta {
+ u64 magic;
+ u64 version;
+ u64 size; /* Total size of the entire dump */
+ u64 tbl_len; /* Length of the table in byte */
+};
+
+/*
+ * Layout of crashdump
+ * +------------------------------------------+
+ * | Crashdump Meta structure |
+ * | type: struct dump_file_meta |
+ * +------------------------------------------+
+ * | Crashdump Table |
+ * | type: array of struct debug_info_table |
+ * | |
+ * | |
+ * | |
+ * +------------------------------------------+
+ * | Crashdump |
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * +------------------------------------------+
+ */
+
+static void free_ssr_dump_info(struct ssr_crashdump *ssr_crash)
+{
+ struct ssr_dump_info *dump_info = ssr_crash->dump_info;
+
+ ssr_crash->dump_info = NULL;
+ if (!dump_info)
+ return;
+ if (!dump_info->read_buf_req_queued)
+ kfree(dump_info->read_buf_req);
+ vfree(dump_info->tbl_addr);
+ vfree(dump_info->dump_addr);
+ kfree(dump_info);
+}
+
void clean_up_ssr(struct qaic_device *qdev)
{
+ struct ssr_crashdump *ssr_crash = qdev->ssr_mhi_buf;
+
+ if (!ssr_crash)
+ return;
+
dbc_exit_ssr(qdev);
+ free_ssr_dump_info(ssr_crash);
+}
+
+static int alloc_dump(struct ssr_dump_info *dump_info)
+{
+ struct debug_info_table *tbl_ent = dump_info->tbl_addr;
+ struct dump_file_meta *dump_meta;
+ u64 tbl_sz_lp = 0;
+ u64 dump_size = 0;
+
+ while (tbl_sz_lp < dump_info->tbl_len) {
+ le64_to_cpus(&tbl_ent->save_perf);
+ le64_to_cpus(&tbl_ent->mem_base);
+ le64_to_cpus(&tbl_ent->len);
+
+ if (tbl_ent->len == 0)
+ return -EINVAL;
+
+ dump_size += tbl_ent->len;
+ tbl_ent++;
+ tbl_sz_lp += sizeof(*tbl_ent);
+ }
+
+ dump_info->dump_sz = dump_size + dump_info->tbl_len + sizeof(*dump_meta);
+ dump_info->dump_addr = vzalloc(dump_info->dump_sz);
+ if (!dump_info->dump_addr)
+ return -ENOMEM;
+
+ /* Copy crashdump meta and table */
+ dump_meta = dump_info->dump_addr;
+ dump_meta->magic = QAIC_SSR_DUMP_V1_MAGIC;
+ dump_meta->version = QAIC_SSR_DUMP_V1_VER;
+ dump_meta->size = dump_info->dump_sz;
+ dump_meta->tbl_len = dump_info->tbl_len;
+ memcpy(dump_info->dump_addr + sizeof(*dump_meta), dump_info->tbl_addr, dump_info->tbl_len);
+ /* Offset by crashdump meta and table (copied above) */
+ dump_info->dump_off = dump_info->tbl_len + sizeof(*dump_meta);
+
+ return 0;
+}
+
+static int send_xfer_done(struct qaic_device *qdev, void *resp, u32 dbc_id)
+{
+ struct ssr_debug_transfer_done *xfer_done;
+ int ret;
+
+ xfer_done = kmalloc(sizeof(*xfer_done), GFP_KERNEL);
+ if (!xfer_done) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_FROM_DEVICE, resp, MSG_BUF_SZ, MHI_EOT);
+ if (ret)
+ goto free_xfer_done;
+
+ xfer_done->hdr.cmd = cpu_to_le32(DEBUG_TRANSFER_DONE);
+ xfer_done->hdr.len = cpu_to_le32(sizeof(*xfer_done));
+ xfer_done->hdr.dbc_id = cpu_to_le32(dbc_id);
+
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_TO_DEVICE, xfer_done, sizeof(*xfer_done), MHI_EOT);
+ if (ret)
+ goto free_xfer_done;
+
+ return 0;
+
+free_xfer_done:
+ kfree(xfer_done);
+out:
+ return ret;
+}
+
+static int mem_read_req(struct qaic_device *qdev, u64 dest_addr, u64 dest_len)
+{
+ struct ssr_crashdump *ssr_crash = qdev->ssr_mhi_buf;
+ struct ssr_memory_read *read_buf_req;
+ struct ssr_dump_info *dump_info;
+ int ret;
+
+ dump_info = ssr_crash->dump_info;
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_FROM_DEVICE, ssr_crash->data, SSR_MEM_READ_DATA_SIZE,
+ MHI_EOT);
+ if (ret)
+ goto out;
+
+ read_buf_req = dump_info->read_buf_req;
+ read_buf_req->hdr.cmd = cpu_to_le32(MEMORY_READ);
+ read_buf_req->hdr.len = cpu_to_le32(sizeof(*read_buf_req));
+ read_buf_req->hdr.dbc_id = cpu_to_le32(qdev->ssr_dbc);
+ read_buf_req->addr = cpu_to_le64(dest_addr);
+ read_buf_req->len = cpu_to_le64(dest_len);
+
+ ret = mhi_queue_buf(qdev->ssr_ch, DMA_TO_DEVICE, read_buf_req, sizeof(*read_buf_req),
+ MHI_EOT);
+ if (!ret)
+ dump_info->read_buf_req_queued = true;
+
+out:
+ return ret;
+}
+
+static int ssr_copy_table(struct ssr_dump_info *dump_info, void *data, u64 len)
+{
+ if (len > dump_info->tbl_len - dump_info->tbl_off)
+ return -EINVAL;
+
+ memcpy(dump_info->tbl_addr + dump_info->tbl_off, data, len);
+ dump_info->tbl_off += len;
+
+ /* Entire table has been downloaded, alloc dump memory */
+ if (dump_info->tbl_off == dump_info->tbl_len) {
+ dump_info->tbl_ent = dump_info->tbl_addr;
+ return alloc_dump(dump_info);
+ }
+
+ return 0;
+}
+
+static int ssr_copy_dump(struct ssr_dump_info *dump_info, void *data, u64 len)
+{
+ struct debug_info_table *tbl_ent;
+
+ tbl_ent = dump_info->tbl_ent;
+
+ if (len > tbl_ent->len - dump_info->tbl_ent_off)
+ return -EINVAL;
+
+ memcpy(dump_info->dump_addr + dump_info->dump_off, data, len);
+ dump_info->dump_off += len;
+ dump_info->tbl_ent_off += len;
+
+ /*
+ * Current segment (a entry in table) of the crashdump is complete,
+ * move to next one
+ */
+ if (tbl_ent->len == dump_info->tbl_ent_off) {
+ dump_info->tbl_ent++;
+ dump_info->tbl_ent_off = 0;
+ }
+
+ return 0;
+}
+
+static void ssr_dump_worker(struct work_struct *work)
+{
+ struct ssr_crashdump *ssr_crash = container_of(work, struct ssr_crashdump, work);
+ struct qaic_device *qdev = ssr_crash->qdev;
+ struct ssr_memory_read_rsp *mem_rd_resp;
+ struct debug_info_table *tbl_ent;
+ struct ssr_dump_info *dump_info;
+ u64 dest_addr, dest_len;
+ struct _ssr_hdr *_hdr;
+ struct ssr_hdr hdr;
+ u64 data_len;
+ int ret;
+
+ mem_rd_resp = (struct ssr_memory_read_rsp *)ssr_crash->data;
+ _hdr = &mem_rd_resp->hdr;
+ hdr.cmd = le32_to_cpu(_hdr->cmd);
+ hdr.len = le32_to_cpu(_hdr->len);
+ hdr.dbc_id = le32_to_cpu(_hdr->dbc_id);
+
+ if (hdr.dbc_id != qdev->ssr_dbc)
+ goto reset_device;
+
+ dump_info = ssr_crash->dump_info;
+ if (!dump_info)
+ goto reset_device;
+
+ if (hdr.cmd != MEMORY_READ_RSP)
+ goto free_dump_info;
+
+ if (hdr.len > SSR_MEM_READ_DATA_SIZE)
+ goto free_dump_info;
+
+ data_len = hdr.len - sizeof(*mem_rd_resp);
+
+ if (dump_info->tbl_off < dump_info->tbl_len) /* Chunk belongs to table */
+ ret = ssr_copy_table(dump_info, mem_rd_resp->data, data_len);
+ else /* Chunk belongs to crashdump */
+ ret = ssr_copy_dump(dump_info, mem_rd_resp->data, data_len);
+
+ if (ret)
+ goto free_dump_info;
+
+ if (dump_info->tbl_off < dump_info->tbl_len) {
+ /* Continue downloading table */
+ dest_addr = dump_info->tbl_addr_dev + dump_info->tbl_off;
+ dest_len = min(SSR_MEM_READ_CHUNK_SIZE, dump_info->tbl_len - dump_info->tbl_off);
+ ret = mem_read_req(qdev, dest_addr, dest_len);
+ } else if (dump_info->dump_off < dump_info->dump_sz) {
+ /* Continue downloading crashdump */
+ tbl_ent = dump_info->tbl_ent;
+ dest_addr = tbl_ent->mem_base + dump_info->tbl_ent_off;
+ dest_len = min(SSR_MEM_READ_CHUNK_SIZE, tbl_ent->len - dump_info->tbl_ent_off);
+ ret = mem_read_req(qdev, dest_addr, dest_len);
+ } else {
+ /* Crashdump download complete */
+ ret = send_xfer_done(qdev, dump_info->resp->data, hdr.dbc_id);
+ }
+
+ /* Most likely a MHI xfer has failed */
+ if (ret)
+ goto free_dump_info;
+
+ return;
+
+free_dump_info:
+ /* Free the allocated memory */
+ free_ssr_dump_info(ssr_crash);
+reset_device:
+ /*
+ * After subsystem crashes in device crashdump collection begins but
+ * something went wrong while collecting crashdump, now instead of
+ * handling this error we just reset the device as the best effort has
+ * been made
+ */
+ mhi_soc_reset(qdev->mhi_cntrl);
+}
+
+static struct ssr_dump_info *alloc_dump_info(struct qaic_device *qdev,
+ struct ssr_debug_transfer_info *debug_info)
+{
+ struct ssr_dump_info *dump_info;
+ int ret;
+
+ le64_to_cpus(&debug_info->tbl_len);
+ le64_to_cpus(&debug_info->tbl_addr);
+
+ if (debug_info->tbl_len == 0 ||
+ debug_info->tbl_len % sizeof(struct debug_info_table) != 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* Allocate SSR crashdump book keeping structure */
+ dump_info = kzalloc(sizeof(*dump_info), GFP_KERNEL);
+ if (!dump_info) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ /* Buffer used to send MEMORY READ request to device via MHI */
+ dump_info->read_buf_req = kzalloc(sizeof(*dump_info->read_buf_req), GFP_KERNEL);
+ if (!dump_info->read_buf_req) {
+ ret = -ENOMEM;
+ goto free_dump_info;
+ }
+
+ /* Crashdump meta table buffer */
+ dump_info->tbl_addr = vzalloc(debug_info->tbl_len);
+ if (!dump_info->tbl_addr) {
+ ret = -ENOMEM;
+ goto free_read_buf_req;
+ }
+
+ dump_info->tbl_addr_dev = debug_info->tbl_addr;
+ dump_info->tbl_len = debug_info->tbl_len;
+
+ return dump_info;
+
+free_read_buf_req:
+ kfree(dump_info->read_buf_req);
+free_dump_info:
+ kfree(dump_info);
+out:
+ return ERR_PTR(ret);
+}
+
+static int dbg_xfer_info_rsp(struct qaic_device *qdev, struct dma_bridge_chan *dbc,
+ struct ssr_debug_transfer_info *debug_info)
+{
+ struct ssr_debug_transfer_info_rsp *debug_rsp;
+ struct ssr_crashdump *ssr_crash = NULL;
+ int ret = 0, ret2;
+
+ debug_rsp = kmalloc(sizeof(*debug_rsp), GFP_KERNEL);
+ if (!debug_rsp)
+ return -ENOMEM;
+
+ if (!qdev->ssr_mhi_buf) {
+ ret = -ENOMEM;
+ goto send_rsp;
+ }
+
+ if (dbc->state != DBC_STATE_BEFORE_POWER_UP) {
+ ret = -EINVAL;
+ goto send_rsp;
+ }
+
+ ssr_crash = qdev->ssr_mhi_buf;
+ ssr_crash->dump_info = alloc_dump_info(qdev, debug_info);
+ if (IS_ERR(ssr_crash->dump_info)) {
+ ret = PTR_ERR(ssr_crash->dump_info);
+ ssr_crash->dump_info = NULL;
+ }
+
+send_rsp:
+ debug_rsp->hdr.cmd = cpu_to_le32(DEBUG_TRANSFER_INFO_RSP);
+ debug_rsp->hdr.len = cpu_to_le32(sizeof(*debug_rsp));
+ debug_rsp->hdr.dbc_id = cpu_to_le32(dbc->id);
+ /*
+ * 0 = Return an ACK confirming the host is ready to download crashdump
+ * 1 = Return an NACK confirming the host is not ready to download crashdump
+ */
+ debug_rsp->ret = cpu_to_le32(ret ? 1 : 0);
+
+ ret2 = mhi_queue_buf(qdev->ssr_ch, DMA_TO_DEVICE, debug_rsp, sizeof(*debug_rsp), MHI_EOT);
+ if (ret2) {
+ free_ssr_dump_info(ssr_crash);
+ kfree(debug_rsp);
+ return ret2;
+ }
+
+ return ret;
+}
+
+static void dbg_xfer_done_rsp(struct qaic_device *qdev, struct dma_bridge_chan *dbc,
+ struct ssr_debug_transfer_done_rsp *xfer_rsp)
+{
+ struct ssr_crashdump *ssr_crash = qdev->ssr_mhi_buf;
+ u32 status = le32_to_cpu(xfer_rsp->ret);
+ struct device *dev = &qdev->pdev->dev;
+ struct ssr_dump_info *dump_info;
+
+ dump_info = ssr_crash->dump_info;
+ if (!dump_info)
+ return;
+
+ if (status) {
+ free_ssr_dump_info(ssr_crash);
+ return;
+ }
+
+ dev_coredumpv(dev, dump_info->dump_addr, dump_info->dump_sz, GFP_KERNEL);
+ /* dev_coredumpv will free dump_info->dump_addr */
+ dump_info->dump_addr = NULL;
+ free_ssr_dump_info(ssr_crash);
}
static void ssr_worker(struct work_struct *work)
{
struct ssr_resp *resp = container_of(work, struct ssr_resp, work);
struct ssr_hdr *hdr = (struct ssr_hdr *)resp->data;
- struct ssr_debug_transfer_info_rsp *debug_rsp;
+ struct ssr_dump_info *dump_info = NULL;
struct qaic_device *qdev = resp->qdev;
+ struct ssr_crashdump *ssr_crash;
struct ssr_event_rsp *event_rsp;
struct dma_bridge_chan *dbc;
struct ssr_event *event;
@@ -132,27 +585,34 @@ static void ssr_worker(struct work_struct *work)
switch (hdr->cmd) {
case DEBUG_TRANSFER_INFO:
- /* Decline crash dump request from the device */
- debug_rsp = kmalloc(sizeof(*debug_rsp), GFP_KERNEL);
- if (!debug_rsp)
+ ret = dbg_xfer_info_rsp(qdev, dbc, (struct ssr_debug_transfer_info *)resp->data);
+ if (ret)
break;
- debug_rsp->hdr.cmd = cpu_to_le32(DEBUG_TRANSFER_INFO_RSP);
- debug_rsp->hdr.len = cpu_to_le32(sizeof(*debug_rsp));
- debug_rsp->hdr.dbc_id = cpu_to_le32(event->hdr.dbc_id);
- debug_rsp->ret = cpu_to_le32(1);
+ ssr_crash = qdev->ssr_mhi_buf;
+ dump_info = ssr_crash->dump_info;
+ dump_info->dbc = dbc;
+ dump_info->resp = resp;
- ret = mhi_queue_buf(qdev->ssr_ch, DMA_TO_DEVICE,
- debug_rsp, sizeof(*debug_rsp), MHI_EOT);
+ /* Start by downloading debug table */
+ ret = mem_read_req(qdev, dump_info->tbl_addr_dev,
+ min(dump_info->tbl_len, SSR_MEM_READ_CHUNK_SIZE));
if (ret) {
- pci_warn(qdev->pdev, "Could not send DEBUG_TRANSFER_INFO_RSP %d\n", ret);
- kfree(debug_rsp);
+ free_ssr_dump_info(ssr_crash);
+ break;
}
+
+ /*
+ * Till now everything went fine, which means that we will be
+ * collecting crashdump chunk by chunk. Do not queue a response
+ * buffer for SSR cmds till the crashdump is complete.
+ */
return;
case SSR_EVENT:
event = (struct ssr_event *)hdr;
le32_to_cpus(&event->event);
ssr_event_ack = event->event;
+ ssr_crash = qdev->ssr_mhi_buf;
switch (event->event) {
case BEFORE_SHUTDOWN:
@@ -166,6 +626,18 @@ static void ssr_worker(struct work_struct *work)
set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_BEFORE_POWER_UP);
break;
case AFTER_POWER_UP:
+ /*
+ * If dump info is a non NULL value it means that we
+ * have received this SSR event while downloading a
+ * crashdump for this DBC is still in progress. NACK
+ * the SSR event
+ */
+ if (ssr_crash && ssr_crash->dump_info) {
+ free_ssr_dump_info(ssr_crash);
+ ssr_event_ack = SSR_EVENT_NACK;
+ break;
+ }
+
set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_AFTER_POWER_UP);
break;
default:
@@ -186,11 +658,14 @@ static void ssr_worker(struct work_struct *work)
if (ret)
kfree(event_rsp);
- if (event->event == AFTER_POWER_UP) {
+ if (event->event == AFTER_POWER_UP && ssr_event_ack != SSR_EVENT_NACK) {
dbc_exit_ssr(qdev);
set_dbc_state(qdev, hdr->dbc_id, DBC_STATE_IDLE);
}
+ break;
+ case DEBUG_TRANSFER_DONE_RSP:
+ dbg_xfer_done_rsp(qdev, dbc, (struct ssr_debug_transfer_done_rsp *)hdr);
break;
default:
break;
@@ -245,6 +720,31 @@ static void qaic_ssr_mhi_remove(struct mhi_device *mhi_dev)
static void qaic_ssr_mhi_ul_xfer_cb(struct mhi_device *mhi_dev, struct mhi_result *mhi_result)
{
+ struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+ struct ssr_crashdump *ssr_crash = qdev->ssr_mhi_buf;
+ struct _ssr_hdr *hdr = mhi_result->buf_addr;
+ struct ssr_dump_info *dump_info;
+
+ if (mhi_result->transaction_status) {
+ kfree(mhi_result->buf_addr);
+ return;
+ }
+
+ /*
+ * MEMORY READ is used to download crashdump. And crashdump is
+ * downloaded chunk by chunk in a series of MEMORY READ SSR commands.
+ * Hence to avoid too many kmalloc() and kfree() of the same MEMORY READ
+ * request buffer, we allocate only one such buffer and free it only
+ * once.
+ */
+ if (le32_to_cpu(hdr->cmd) == MEMORY_READ) {
+ dump_info = ssr_crash->dump_info;
+ if (dump_info) {
+ dump_info->read_buf_req_queued = false;
+ return;
+ }
+ }
+
kfree(mhi_result->buf_addr);
}
@@ -252,12 +752,23 @@ static void qaic_ssr_mhi_dl_xfer_cb(struct mhi_device *mhi_dev, struct mhi_resul
{
struct ssr_resp *resp = container_of(mhi_result->buf_addr, struct ssr_resp, data);
struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+ struct ssr_crashdump *ssr_crash = qdev->ssr_mhi_buf;
+ bool memory_read_rsp = false;
+
+ if (ssr_crash && ssr_crash->data == mhi_result->buf_addr)
+ memory_read_rsp = true;
if (mhi_result->transaction_status) {
- kfree(resp);
+ /* Do not free SSR crashdump buffer as it allocated via managed APIs */
+ if (!memory_read_rsp)
+ kfree(resp);
return;
}
- queue_work(qdev->ssr_wq, &resp->work);
+
+ if (memory_read_rsp)
+ queue_work(qdev->ssr_wq, &ssr_crash->work);
+ else
+ queue_work(qdev->ssr_wq, &resp->work);
}
static const struct mhi_device_id qaic_ssr_mhi_match_table[] = {
@@ -278,7 +789,22 @@ static struct mhi_driver qaic_ssr_mhi_driver = {
int ssr_init(struct qaic_device *qdev, struct drm_device *drm)
{
+ struct ssr_crashdump *ssr_crash;
+
qdev->ssr_dbc = SSR_DBC_SENTINEL;
+
+ /*
+ * Device requests only one SSR at a time. So allocating only one
+ * buffer to download crashdump is good enough.
+ */
+ ssr_crash = drmm_kzalloc(drm, SSR_MHI_BUF_SIZE, GFP_KERNEL);
+ if (!ssr_crash)
+ return -ENOMEM;
+
+ ssr_crash->qdev = qdev;
+ INIT_WORK(&ssr_crash->work, ssr_dump_worker);
+ qdev->ssr_mhi_buf = ssr_crash;
+
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents
2025-10-22 20:25 ` [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents Youssef Samir
@ 2025-10-23 17:41 ` kernel test robot
0 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2025-10-23 17:41 UTC (permalink / raw)
To: Youssef Samir, jeff.hugo, carl.vanderlip, troy.hanson,
zachary.mckevitt
Cc: oe-kbuild-all, ogabbay, lizhi.hou, karol.wachowski, linux-arm-msm,
dri-devel
Hi Youssef,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.18-rc2 next-20251023]
[cannot apply to drm-misc/drm-misc-next drm-tip/drm-tip]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Youssef-Samir/accel-qaic-Add-DMA-Bridge-Channel-DBC-sysfs-and-uevents/20251023-042723
base: linus/master
patch link: https://lore.kernel.org/r/20251022202527.3873558-2-youssef.abdulrahman%40oss.qualcomm.com
patch subject: [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents
config: openrisc-allyesconfig (https://download.01.org/0day-ci/archive/20251024/202510240118.dkdhMpfy-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251024/202510240118.dkdhMpfy-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510240118.dkdhMpfy-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/accel/qaic/qaic_sysfs.c: In function 'qaic_sysfs_init':
>> drivers/accel/qaic/qaic_sysfs.c:74:58: warning: '_state' directive output may be truncated writing 6 bytes into a region of size between 3 and 10 [-Wformat-truncation=]
74 | snprintf(dbc_attr->name, NAME_LEN, "dbc%d_state", i);
| ^~~~~~
drivers/accel/qaic/qaic_sysfs.c:74:17: note: 'snprintf' output between 11 and 18 bytes into a destination of size 14
74 | snprintf(dbc_attr->name, NAME_LEN, "dbc%d_state", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/_state +74 drivers/accel/qaic/qaic_sysfs.c
56
57 int qaic_sysfs_init(struct qaic_drm_device *qddev)
58 {
59 struct device *kdev = to_accel_kdev(qddev);
60 struct drm_device *drm = to_drm(qddev);
61 u32 num_dbc = qddev->qdev->num_dbc;
62 struct dbc_attribute *dbc_attrs;
63 int i, ret;
64
65 dbc_attrs = drmm_kcalloc(drm, num_dbc, sizeof(*dbc_attrs), GFP_KERNEL);
66 if (!dbc_attrs)
67 return -ENOMEM;
68
69 for (i = 0; i < num_dbc; ++i) {
70 struct dbc_attribute *dbc_attr = &dbc_attrs[i];
71
72 sysfs_attr_init(&dbc_attr->dev_attr.attr);
73 dbc_attr->dbc_id = i;
> 74 snprintf(dbc_attr->name, NAME_LEN, "dbc%d_state", i);
75 dbc_attr->dev_attr.attr.name = dbc_attr->name;
76 dbc_attr->dev_attr.attr.mode = 0444;
77 dbc_attr->dev_attr.show = dbc_state_show;
78 ret = sysfs_create_file(&kdev->kobj, &dbc_attr->dev_attr.attr);
79 if (ret) {
80 int j;
81
82 for (j = 0; j < i; ++j) {
83 dbc_attr = &dbc_attrs[j];
84 sysfs_remove_file(&kdev->kobj, &dbc_attr->dev_attr.attr);
85 }
86 drmm_kfree(drm, dbc_attrs);
87 return ret;
88 }
89 }
90
91 qddev->sysfs_attrs = dbc_attrs;
92 return 0;
93 }
94
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-23 17:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-22 20:25 [PATCH 0/3] accel/qaic: Add Sub-system restart (SSR) Youssef Samir
2025-10-22 20:25 ` [PATCH 1/3] accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents Youssef Samir
2025-10-23 17:41 ` kernel test robot
2025-10-22 20:25 ` [PATCH 2/3] accel/qaic: Implement basic SSR handling Youssef Samir
2025-10-22 20:25 ` [PATCH 3/3] accel/qaic: Collect crashdump from SSR channel Youssef Samir
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox