Re: [PATCH v4 03/10] soc: qcom: Add QMI TMD support for remote thermal mitigation

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Gaurav Kohli" <gaurav.kohli@oss.qualcomm.com>
Cc: robh@kernel.org, devicetree@vger.kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH v4 03/10] soc: qcom: Add QMI TMD support for remote thermal mitigation
Date: Fri, 03 Jul 2026 05:17:42 +0000	[thread overview]
Message-ID: <20260703051742.A28901F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260703-qmi-tmd-v4-3-3882189c1f83@oss.qualcomm.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Improper teardown sequence in qmi_tmd_exit() creates multiple Use-After-Free vulnerabilities.
- [High] Failure to clear tmd->cdev on registration error causes a kernel panic during unregistration.
- [Medium] Time-Of-Check to Time-Of-Use race in qmi_tmd_register() silently drops requested thermal mitigation state.
- [Medium] Size mismatch between mitigation_device_list_len field and QMI element descriptor causes incorrect decoding on big-endian systems.
- [Low] Missing NET dependency inheritance in Kconfig causes unmet dependency warnings and potential build failures.
- [Low] Mixing goto-based error handling with __free() and scoped_guard() cleanup helpers violates strict cleanup rules.
--

commit f78be8814546ffe0023b5b8c2d56a90a684efe2d
Author: Casey Connolly <casey.connolly@linaro.org>

soc: qcom: Add QMI TMD support for remote thermal mitigation

Add support for Qualcomm Messaging Interface (QMI) based Thermal Mitigation
Device (TMD) cooling devices provided by remote subsystems.

> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
> index 2caadbbcf8307..44c2b533b494f 100644
> --- a/drivers/soc/qcom/Kconfig
> +++ b/drivers/soc/qcom/Kconfig
[ ... ]
> @@ -128,6 +128,16 @@ config QCOM_QMI_HELPERS
>  	tristate
>  	depends on NET
>  
> +config QCOM_QMI_TMD
> +	bool "Qualcomm remote subsystem TMD" if COMPILE_TEST
> +	depends on ARCH_QCOM
> +	select QCOM_QMI_HELPERS

[Severity: Low]
Does this config need to inherit the NET dependency?

QCOM_QMI_HELPERS depends on NET, so selecting it without a NET dependency can
cause unmet dependency warnings and build failures if COMPILE_TEST is set and
NET is disabled.

> +	help
> +	  This enables Qualcomm Messaging Interface (QMI) based Thermal Mitigation
> +	  Device (TMD) support for Qualcomm remote subsystems. It manages

> diff --git a/drivers/soc/qcom/qmi_tmd.c b/drivers/soc/qcom/qmi_tmd.c
> new file mode 100644
> --- /dev/null
> +++ b/drivers/soc/qcom/qmi_tmd.c
[ ... ]
> +struct tmd_get_dev_list_resp {
> +	struct qmi_response_type_v01 resp;
> +	u8 mitigation_device_list_valid;
> +	u32 mitigation_device_list_len;
> +	struct tmd_dev_list
> +		mitigation_device_list[QMI_TMD_DEV_LIST_MAX];
> +};
> +
> +static const struct qmi_elem_info tmd_get_dev_list_resp_ei[] = {
[ ... ]
> +	{
> +		.data_type = QMI_DATA_LEN,
> +		.elem_len = 1,
> +		.elem_size = sizeof(uint8_t),
> +		.array_type = NO_ARRAY,
> +		.tlv_type = 0x10,
> +		.offset = offsetof(struct tmd_get_dev_list_resp,
> +				   mitigation_device_list_len),
> +	},

[Severity: Medium]
Is this element size correct for the target field?

The struct defines mitigation_device_list_len as a u32, but the QMI definition
specifies sizeof(uint8_t).

On big-endian systems, writing 1 byte to the start of a u32 field populates
the most significant byte, resulting in a massively inflated length value. The
loop in qmi_tmd_svc_arrive() would then read out of bounds.

[ ... ]
> +static int qmi_tmd_register(struct qmi_tmd_client *qmi_tmd_cli,
> +			    const char *label, u8 max_state)
> +{
[ ... ]
> +found:
> +	tmd->max_state = max_state;
> +
> +	/*
> +	 * If the cooling device already exists then the QMI service went away and
> +	 * came back. So just make sure the current cooling device state is
> +	 * reflected on the remote side and then return.
> +	 */
> +	if (tmd->cdev)
> +		return qmi_tmd_send_state_request(tmd, tmd->cur_state);

[Severity: Medium]
Can this race with qmi_tmd_set_cur_state() and send a stale state to the DSP?

qmi_tmd_register() locklessly reads tmd->cur_state here. If a thermal
governor concurrently calls qmi_tmd_set_cur_state() which updates the
requested state, qmi_tmd_register() might read the old state before acquiring
the mutex in qmi_tmd_send_state_request().

This could result in sending the stale state to the DSP, overriding the
active mitigation.

> +
> +	tmd->cdev = thermal_of_cooling_device_register(dev->of_node, index,
> +						       label, tmd, &qmi_tmd_cooling_ops);
> +	if (IS_ERR(tmd->cdev))
> +		return PTR_ERR(tmd->cdev);

[Severity: High]
Should tmd->cdev be set to NULL here on failure?

If thermal_of_cooling_device_register() fails, it returns an ERR_PTR.
Leaving the error pointer in tmd->cdev causes issues during teardown.

> +
> +	return 0;
> +}
> +
> +static void qmi_tmd_unregister(struct qmi_tmd_client *qmi_tmd_cli)
> +{
> +	struct qmi_tmd *tmd;
> +	int index;
> +
> +	for (index = 0; index < qmi_tmd_cli->num_tmds; index++) {
> +		tmd = &qmi_tmd_cli->tmds[index];
> +
> +		if (!tmd->cdev)
> +			continue;
> +
> +		thermal_cooling_device_unregister(tmd->cdev);

[Severity: High]
Can this dereference an error pointer and crash?

Because qmi_tmd_register() leaves an ERR_PTR in tmd->cdev on failure, the
if (!tmd->cdev) check evaluates to false. thermal_cooling_device_unregister()
will then attempt to dereference the error pointer.

> +		tmd->cdev = NULL;
> +	}
> +}
> +
> +static void qmi_tmd_svc_arrive(struct work_struct *work)
> +{
> +	struct qmi_tmd_client *qmi_tmd_cli =
> +		container_of(work, struct qmi_tmd_client, svc_arrive_work);
> +
> +	struct tmd_get_dev_list_req req = { 0 };
> +	struct tmd_get_dev_list_resp *resp __free(kfree) = NULL;
> +	int ret, i;
> +	struct qmi_txn txn;
> +
> +	resp = kzalloc_obj(*resp, GFP_KERNEL);
> +	if (!resp) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	scoped_guard(mutex, &qmi_tmd_cli->mutex) {
> +		ret = qmi_txn_init(&qmi_tmd_cli->handle, &txn,
> +				   tmd_get_dev_list_resp_ei, resp);
> +		if (ret < 0)
> +			goto out;

[Severity: Low]
This isn't a bug, but does this function mix goto-based error handling with
__free() and scoped_guard() cleanup helpers?

According to the cleanup guidelines in include/linux/cleanup.h, mixing goto
and cleanup helpers in the same function can create confusing ownership
semantics. It is recommended to convert all resources to scope-based
cleanup, or convert none of them.

[ ... ]
> +void qmi_tmd_exit(struct qmi_tmd_client *qmi_tmd_cli)
> +{
> +	if (!qmi_tmd_cli)
> +		return;
> +
> +	cancel_work_sync(&qmi_tmd_cli->svc_arrive_work);
> +	qmi_handle_release(&qmi_tmd_cli->handle);
> +	qmi_tmd_unregister(qmi_tmd_cli);
> +
> +	scoped_guard(mutex, &qmi_tmd_cli->mutex)
> +		qmi_tmd_cli->connection_active = false;
> +}

[Severity: High]
Does this teardown sequence create multiple use-after-free conditions?

First, if svc_arrive_work is cancelled before qmi_handle_release() is called,
an incoming server event can immediately re-queue the work via
qmi_tmd_new_server() since the QMI socket is still active. When the device
unbinds, qmi_tmd_cli is freed, and the workqueue will execute on the freed
structure.

Second, qmi_handle_release() destroys the QMI handle (including the txns IDR)
before the thermal cooling devices are unregistered and before
connection_active is set to false. During this window, a concurrent sysfs
write could invoke qmi_tmd_send_state_request(), see
connection_active == true, and attempt to use the destroyed IDR:

Sysfs thread:
qmi_tmd_send_state_request()
   if (!qmi_tmd_cli->connection_active) // Still true!
   ret = qmi_txn_init(&qmi_tmd_cli->handle, ...) // UAF on txns IDR

Should this sequence clear connection_active and unregister the thermal
devices first, then release the QMI handle, and finally cancel the workqueue?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260703-qmi-tmd-v4-0-3882189c1f83@oss.qualcomm.com?part=3

next prev parent reply	other threads:[~2026-07-03  5:17 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-03  5:03 [PATCH v4 00/10] Add support for Qualcomm remoteproc subsystem cooling Gaurav Kohli
2026-07-03  5:03 ` [PATCH v4 01/10] dt-bindings: firmware: qcom: tmd: add TMD device type constants Gaurav Kohli
2026-07-03  7:47   ` Krzysztof Kozlowski
2026-07-03 10:14     ` Gaurav Kohli
2026-07-03  7:52   ` Krzysztof Kozlowski
2026-07-03 10:31     ` Gaurav Kohli
2026-07-03  7:53   ` Konrad Dybcio
2026-07-03 14:13     ` Gaurav Kohli
2026-07-03 15:42       ` Dmitry Baryshkov
2026-07-03  5:03 ` [PATCH v4 02/10] dt-bindings: remoteproc: qcom,pas: add #cooling-cells property Gaurav Kohli
2026-07-03  5:15   ` sashiko-bot
2026-07-03  7:49   ` Krzysztof Kozlowski
2026-07-03  5:03 ` [PATCH v4 03/10] soc: qcom: Add QMI TMD support for remote thermal mitigation Gaurav Kohli
2026-07-03  5:17   ` sashiko-bot [this message]
2026-07-03  8:03   ` Krzysztof Kozlowski
2026-07-03 18:09   ` Julian Braha
2026-07-03  5:03 ` [PATCH v4 04/10] remoteproc: qcom: pas: add support for TMD thermal cooling devices Gaurav Kohli
2026-07-03  5:22   ` sashiko-bot
2026-07-03  7:56   ` Krzysztof Kozlowski
2026-07-03  5:03 ` [PATCH v4 05/10] remoteproc: qcom_q6v5_pas: enable QMI TMD cooling support Gaurav Kohli
2026-07-03  5:23   ` sashiko-bot
2026-07-03  5:03 ` [PATCH v4 06/10] arm64: dts: qcom: kodiak: Enable CDSP & Modem cooling Gaurav Kohli
2026-07-03  7:51   ` Krzysztof Kozlowski
2026-07-03 15:48   ` Dmitry Baryshkov
2026-07-03  5:03 ` [PATCH v4 07/10] arm64: dts: qcom: lemans: Enable CDSP cooling Gaurav Kohli
2026-07-03  5:18   ` sashiko-bot
2026-07-03  5:03 ` [PATCH v4 08/10] arm64: dts: qcom: talos: " Gaurav Kohli
2026-07-03  5:03 ` [PATCH v4 09/10] arm64: dts: qcom: monaco: " Gaurav Kohli
2026-07-03  5:03 ` [PATCH v4 10/10] arm64: dts: qcom: hamoa: " Gaurav Kohli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260703051742.A28901F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=gaurav.kohli@oss.qualcomm.com \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox