Re: [PATCH v6 2/5] thermal: samsung: Add Exynos ACPM TMU driver GS101

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Tudor Ambarus" <tudor.ambarus@linaro.org>
Cc: conor+dt@kernel.org, robh@kernel.org, devicetree@vger.kernel.org
Subject: Re: [PATCH v6 2/5] thermal: samsung: Add Exynos ACPM TMU driver GS101
Date: Thu, 28 May 2026 12:55:20 +0000	[thread overview]
Message-ID: <20260528125521.335221F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260528-acpm-tmu-v6-2-b4d7ce57594b@linaro.org>

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] Thermal zone thresholds are never initialized because sensors are enabled too late during probe.
- [High] Missed thermal interrupts due to clearing the interrupt pending flag after updating the thermal zone.
- [High] A failure during system suspend permanently disables thermal mitigation.
- [Medium] Mixing goto-based error handling with guard() cleanup in the same function.
- [Medium] Failure to roll back partially enabled sensors if acpm_tmu_control(true) fails.
- [Medium] Race condition in acpm_tmu_get_temp() reading sensor->enabled outside of sensor->lock.
- [Medium] Asynchronous runtime PM suspend queued by acpm_tmu_set_trips can be cancelled by devm_pm_runtime_disable during driver removal.
--

commit 90afa895cf46b40d05a7597e89b586da2a473318
Author: Tudor Ambarus <tudor.ambarus@linaro.org>

thermal: samsung: Add Exynos ACPM TMU driver GS101

Add driver for the Thermal Management Unit (TMU) managed via the Alive
Clock and Power Manager (ACPM), found on Samsung Exynos SoCs such as
Google GS101 (and Exynos850, autov920, etc.).

> diff --git a/drivers/thermal/samsung/acpm-tmu.c b/drivers/thermal/samsung/acpm-tmu.c
> new file mode 100644
> index 0000000000000..b56869c9fc3b1
> --- /dev/null
> +++ b/drivers/thermal/samsung/acpm-tmu.c
[ ... ]
> +static int acpm_tmu_control(struct acpm_tmu_priv *priv, bool on)
> +{
[ ... ]
> +	for (i = 0; i < priv->num_sensors; i++) {
> +		struct acpm_tmu_sensor *sensor = &priv->sensors[i];
> +
> +		/* Skip sensors that weren't found in DT */
> +		if (!sensor->tzd)
> +			continue;
> +
> +		mutex_lock(&sensor->lock);
> +		ret = acpm_tmu_op_tz_control(sensor, on);
> +		mutex_unlock(&sensor->lock);
> +		if (ret) {
> +			if (!err)
> +				err = ret;
> +
> +			/* On enable, stop on first error. On disable, keep going */
> +			if (on)
> +				break;

[Severity: Medium]
If an error occurs when `on` is true, the loop breaks immediately. Does this
leave the previously enabled sensors active in the ACPM firmware?

If this happens during probe, will the lack of hardware state rollback
result in unhandled spurious interrupts?

> +		}
> +	}
> +
> +	pm_runtime_put_sync(dev);
> +	return err;
> +}
> +
> +static int acpm_tmu_get_temp(struct thermal_zone_device *tz, int *temp)
> +{
> +	struct acpm_tmu_sensor *sensor = thermal_zone_device_priv(tz);
> +	struct acpm_tmu_priv *priv = sensor->priv;
> +	struct acpm_handle *handle = priv->handle;
> +	const struct acpm_tmu_ops *ops = &handle->ops->tmu;
> +	struct device *dev = priv->dev;
> +	int acpm_temp = 0, ret;
> +
> +	if (!sensor->enabled)

[Severity: Medium]
Is there a race condition here since `sensor->enabled` is checked before
acquiring `sensor->lock`?

If `acpm_tmu_control(priv, false)` runs concurrently (e.g., via system
suspend), it could set `sensor->enabled = false` just before the lock is
acquired here. Could this result in calling `ops->read_temp()` on a disabled
sensor?

> +		return -EAGAIN;
> +
> +	ret = pm_runtime_resume_and_get(dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	scoped_guard(mutex, &sensor->lock) {
> +		ret = ops->read_temp(handle, priv->mbox_chan_id,
> +				     sensor->group->id, &acpm_temp);
> +	}
[ ... ]
> +static int acpm_tmu_update_thresholds(struct acpm_tmu_sensor *sensor,
> +				      u8 thresholds[2], u8 inten)
> +{
> +	struct acpm_tmu_priv *priv = sensor->priv;
> +	struct acpm_handle *handle = priv->handle;
> +	const struct acpm_tmu_ops *ops = &handle->ops->tmu;
> +	unsigned int mbox_chan_id = priv->mbox_chan_id;
> +	u8 acpm_sensor_id = sensor->group->id;
> +	bool was_enabled;
> +	int ret, restore_ret;
> +
> +	guard(mutex)(&sensor->lock);

[Severity: Medium]
Is it ok to mix `guard()` with `goto`-based error handling in this function?
According to the cleanup guidelines, functions should either convert all
resources to scope-based cleanup or convert none of them.

> +
> +	was_enabled = sensor->enabled;
> +
> +	if (was_enabled) {
> +		ret = acpm_tmu_op_tz_control(sensor, false);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = ops->set_threshold(handle, mbox_chan_id, acpm_sensor_id,
> +				 thresholds, 2);
> +	if (ret)
> +		goto err_restore;
[ ... ]
> +static irqreturn_t acpm_tmu_thread_fn(int irq, void *id)
> +{
[ ... ]
> +		handled = true;
> +
> +		thermal_zone_device_update(sensor->tzd,
> +					   THERMAL_EVENT_UNSPECIFIED);
> +
> +		scoped_guard(mutex, &sensor->lock) {
> +			ret = ops->clear_tz_irq(handle, priv->mbox_chan_id,
> +						sensor->group->id);

[Severity: High]
Could clearing the interrupt after updating the thermal zone cause missed
events?

If the temperature crosses a newly programmed threshold during
`thermal_zone_device_update()`, the hardware could assert a new interrupt.
Calling `clear_tz_irq()` afterwards would clear this new interrupt,
potentially missing critical thermal events.

> +			if (ret)
> +				dev_err(priv->dev, "Sensor %d: failed to clear IRQ (%d)\n",
> +					i, ret);
> +		}
> +	}
[ ... ]
> +static int acpm_tmu_probe(struct platform_device *pdev)
> +{
[ ... ]
> +	for (i = 0; i < priv->num_sensors; i++) {
> +		struct acpm_tmu_sensor *sensor = &priv->sensors[i];
> +
> +		mutex_init(&sensor->lock);
> +		sensor->group = &data->sensor_groups[i];
> +		sensor->priv = priv;
> +
> +		sensor->tzd = devm_thermal_of_zone_register(dev, i, sensor,
> +							    &acpm_tmu_sensor_ops);

[Severity: High]
Are the thermal zones registered before the sensors are fully enabled?

During `devm_thermal_of_zone_register()`, the thermal core attempts to
initialize thresholds, but since `sensor->enabled` is false,
`acpm_tmu_get_temp()` returns `-EAGAIN` and the update aborts.

Since `acpm_tmu_control()` below does not trigger a thermal zone update,
will the initial thresholds remain uninitialized in the hardware?

> +		if (IS_ERR(sensor->tzd)) {
[ ... ]
> +	ret = acpm_tmu_control(priv, true);
> +	if (ret) {
> +		ret = dev_err_probe(dev, ret, "Failed to enable TMU\n");
> +		goto err_pm_put;
> +	}
[ ... ]
> +static void acpm_tmu_remove(struct platform_device *pdev)
> +{
> +	struct acpm_tmu_priv *priv = platform_get_drvdata(pdev);
> +
> +	/* Stop IRQ first to prevent race with thread_fn */
> +	disable_irq(priv->irq);
> +
> +	acpm_tmu_control(priv, false);

[Severity: Medium]
If `acpm_tmu_set_trips()` queues an asynchronous suspend via
`pm_runtime_put_autosuspend()` shortly before driver removal, could the
implicit `pm_runtime_disable()` from `devm_pm_runtime_enable()` cleanup
cancel it?

Would this cause the hardware to remain powered on since the usage count
drops but the actual suspend is never executed? Should this use a
synchronous suspend before removal?

> +}
> +
> +static int acpm_tmu_suspend(struct device *dev)
> +{
> +	struct acpm_tmu_priv *priv = dev_get_drvdata(dev);
> +	struct acpm_handle *handle = priv->handle;
> +	const struct acpm_tmu_ops *ops = &handle->ops->tmu;
> +	int ret;
> +
> +	ret = acpm_tmu_control(priv, false);
> +	if (ret)
> +		return ret;
> +
> +	/* APB clock not required for this specific msg */
> +	return ops->suspend(handle, priv->mbox_chan_id);

[Severity: High]
If `ops->suspend()` returns an error, the system suspend is aborted. Since
the PM core won't invoke the resume callback on failure, and there is no
rollback here, will the TMU sensors remain permanently disabled while the
system continues running?

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528-acpm-tmu-v6-0-b4d7ce57594b@linaro.org?part=2

next prev parent reply	other threads:[~2026-05-28 12:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28 11:36 [PATCH v6 0/5] thermal: samsung: Add support for Google GS101 TMU Tudor Ambarus
2026-05-28 11:36 ` [PATCH v6 1/5] dt-bindings: thermal: Add " Tudor Ambarus
2026-05-28 11:36 ` [PATCH v6 2/5] thermal: samsung: Add Exynos ACPM TMU driver GS101 Tudor Ambarus
2026-05-28 12:55   ` sashiko-bot [this message]
2026-05-28 11:36 ` [PATCH v6 3/5] MAINTAINERS: Add entry for Samsung Exynos ACPM thermal driver Tudor Ambarus
2026-05-28 11:36 ` [PATCH v6 4/5] arm64: dts: exynos: gs101: Add thermal management unit Tudor Ambarus
2026-05-28 11:36 ` [PATCH v6 5/5] arm64: defconfig: enable Exynos ACPM thermal support Tudor Ambarus
2026-05-31 11:58 ` [PATCH v6 0/5] thermal: samsung: Add support for Google GS101 TMU Krzysztof Kozlowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528125521.335221F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tudor.ambarus@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox