From: sashiko-bot@kernel.org
To: "Sachin Gupta" <sachin.gupta@oss.qualcomm.com>
Cc: robh@kernel.org, devicetree@vger.kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH 2/2] thermal: qcom: Add support for Qualcomm MBG thermal monitoring
Date: Mon, 01 Jun 2026 11:40:34 +0000 [thread overview]
Message-ID: <20260601114035.2D9A81F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260601-spmi-mbg-driver-v1-2-b4892b55a17f@oss.qualcomm.com>
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Data race and TOCTOU window on `last_thres_crossed` leading to missed thermal violations.
- [High] Potential NULL pointer dereference accessing `node->name` on non-OF platforms.
- [High] Incorrect bitwise AND used to check a 4-bit fault status enumeration.
- [High] Silent disablement of supported thermal trips.
- [Medium] Hardware threshold interrupt is enabled before configuring the threshold value.
- [Low] Incompatible pointer type passed to `regmap_read`.
--
commit 59e6da95c48d5200129439570e254b1f9180acf4
Author: Satya Priya Kakitapalli <quic_skakitap@quicinc.com>
Subject: thermal: qcom: Add support for Qualcomm MBG thermal monitoring
This patch adds a driver for the Qualcomm MBG thermal monitoring device.
It monitors the die temperature and notifies the thermal framework when a
level 1 upper threshold violation occurs via an SPMI interrupt.
> diff --git a/drivers/thermal/qcom/qcom-spmi-mbg-tm.c b/drivers/thermal/qcom/qcom-spmi-mbg-tm.c
> new file mode 100644
> index 0000000000000..60190b341fc7c
> --- /dev/null
> +++ b/drivers/thermal/qcom/qcom-spmi-mbg-tm.c
[ ... ]
> +static int mbg_tm_get_temp(struct thermal_zone_device *tz, int *temp)
> +{
> + struct mbg_tm_chip *chip = thermal_zone_device_priv(tz);
> + int ret, milli_celsius;
> +
> + if (chip->last_thres_crossed) {
> + dev_dbg(chip->dev, "last_temp: %d\n", chip->last_temp);
> + chip->last_thres_crossed = false;
> + *temp = chip->last_temp;
> + return 0;
> + }
[Severity: High]
Could this lockless read and clear of chip->last_thres_crossed cause missed
thermal violations?
If a user-space sysfs read occurs concurrently with the thermal worker
execution (which is triggered by the ISR), the sysfs read might consume the
flag by setting it to false.
When the thermal worker subsequently calls mbg_tm_get_temp(), it would see
false and read the raw ADC value. Because ADC readings can fluctuate and
lag, it might read a value below the threshold, entirely missing the
violation.
[ ... ]
> +static int mbg_tm_set_trip_temp(struct thermal_zone_device *tz, int low_temp,
> + int temp)
> +{
> + struct mbg_tm_chip *chip = thermal_zone_device_priv(tz);
> + int ret = 0;
> +
> + guard(mutex)(&chip->lock);
> +
> + /* The HW has a limitation that the trip set must be above 25C */
> + if (temp > MBG_MIN_TRIP_TEMP && temp < MBG_MAX_SUPPORTED_TEMP) {
> + ret = regmap_set_bits(chip->map, chip->base + MBG_TEMP_MON2_MISC_CFG,
> + MON2_UP_THRESH_EN);
> + if (ret < 0)
> + return ret;
> +
> + ret = regmap_write(chip->map, chip->base + MON2_LVL1_UP_THRESH,
> + temp_to_vtemp_mv(temp));
[Severity: Medium]
Is it safe to enable the threshold comparator before programming the new
threshold value?
If the hardware retains a stale threshold from a prior state that is lower
than the current temperature, enabling the comparator first could
immediately trigger a spurious interrupt.
Would it be better to write the threshold value before enabling the
interrupt?
> + if (ret < 0)
> + return ret;
> + } else {
> + dev_dbg(chip->dev, "Set trip b/w 25C and 160C\n");
> + ret = regmap_clear_bits(chip->map, chip->base + MBG_TEMP_MON2_MISC_CFG,
> + MON2_UP_THRESH_EN);
> + return ret;
> + }
[Severity: High]
When the requested temperature falls outside the hardware's supported bounds,
the driver disables the threshold interrupt and returns 0.
By returning 0, does the driver deceptively signal to the thermal core that
the trip has been successfully configured?
Because the interrupt is disabled, the core will never be notified when the
temperature violates the trip point. Should this return an error code like
-ERANGE instead?
[ ... ]
> +static irqreturn_t mbg_tm_isr(int irq, void *data)
> +{
> + struct mbg_tm_chip *chip = data;
> + int ret, val;
[Severity: Low]
The regmap_read() function expects an unsigned int pointer for the value
parameter, but val is declared as a signed int here.
> +
> + scoped_guard(mutex, &chip->lock) {
> + ret = regmap_read(chip->map, chip->base + MBG_TEMP_MON2_FAULT_STATUS, &val);
> + if (ret < 0)
> + return IRQ_HANDLED;
> + }
> +
> + if (FIELD_GET(MON_FAULT_STATUS_MASK, val) & MON_FAULT_LVL1_UPR) {
[Severity: High]
Does this bitwise AND correctly check the 4-bit fault status enumeration?
MON_FAULT_LVL1_UPR is defined as 0x5 (binary 0101). A bitwise AND against
0x5 will evaluate to true if either bit 0 or bit 2 is set, matching
unrelated fault states like 0x1, 0x4, or 0x7.
Should this use an exact equality check (==) instead?
> + chip->last_thres_crossed = true;
[Severity: High]
This sets chip->last_thres_crossed to true outside of the mutex. As
mentioned earlier in mbg_tm_get_temp(), this flag is read and cleared
locklessly.
Could this lead to a data race if mbg_tm_get_temp() executes concurrently?
> + dev_dbg(chip->dev, "Notifying Thermal, fault status=%d\n", val);
> + thermal_zone_device_update(chip->tz_dev, THERMAL_TRIP_VIOLATED);
> + } else {
> + dev_dbg(chip->dev, "Lvl1 upper threshold not violated, ignoring interrupt\n");
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int mbg_tm_probe(struct platform_device *pdev)
> +{
> + struct mbg_tm_chip *chip;
> + struct device_node *node = pdev->dev.of_node;
[ ... ]
> + chip->tz_dev = devm_thermal_of_zone_register(chip->dev, 0, chip, &mbg_tm_ops);
> + if (IS_ERR(chip->tz_dev))
> + return dev_err_probe(chip->dev, PTR_ERR(chip->tz_dev),
> + "Failed to register sensor\n");
> +
> + return devm_request_threaded_irq(&pdev->dev, chip->irq, NULL, mbg_tm_isr, IRQF_ONESHOT,
> + node->name, chip);
> +}
[Severity: High]
Will node->name cause a NULL pointer dereference on non-OF platforms?
Because the driver uses device_property_read_u32(), it can be instantiated
via ACPI or MFD without a device tree node, meaning pdev->dev.of_node
could be NULL.
Would it be safer to use dev_name(&pdev->dev) instead?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260601-spmi-mbg-driver-v1-0-b4892b55a17f@oss.qualcomm.com?part=2
prev parent reply other threads:[~2026-06-01 11:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 11:01 [PATCH 0/2] thermal: qcom: Add Qualcomm SPMI MBG thermal monitor support Sachin Gupta
2026-06-01 11:01 ` [PATCH 1/2] dt-bindings: thermal: Add Qualcomm " Sachin Gupta
2026-06-01 11:30 ` sashiko-bot
2026-06-01 11:01 ` [PATCH 2/2] thermal: qcom: Add support for Qualcomm MBG thermal monitoring Sachin Gupta
2026-06-01 11:40 ` sashiko-bot [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260601114035.2D9A81F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=robh@kernel.org \
--cc=sachin.gupta@oss.qualcomm.com \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox