Linux Hardware Monitor development
 help / color / mirror / Atom feed
From: "Huan He" <hehuan1@eswincomputing.com>
To: sashiko@lists.linux.dev, linux@roeck-us.net
Cc: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
	p.zabel@pengutronix.de, linux-hwmon@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	ningyu@eswincomputing.com, linmin@eswincomputing.com,
	pinkesh.vaghela@einfochips.com, luyulin@eswincomputing.com
Subject: Re: Re: Re: [PATCH v4 2/2] hwmon: Add Eswin EIC7700 PVT sensor driver
Date: Tue, 12 May 2026 17:51:08 +0800 (GMT+08:00)	[thread overview]
Message-ID: <1526048b.617a.19e1b98ef15.Coremail.hehuan1@eswincomputing.com> (raw)
In-Reply-To: <7728dc4b.6178.19e1b936a72.Coremail.hehuan1@eswincomputing.com>

Hi Sashiko, Guenter,

Thank you very much for your detailed review. We appreciate the feedback.

> 
> Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
> - [High] Hardware access with disabled clock when CONFIG_PM=n
> - [High] Resource leak and hardware powered on during driver unbind
> - [High] System suspend state desynchronization causes potential crashes
> - [High] Unclocked hardware access in the interrupt handler
> - [Medium] Missing synchronization on timeout path causes race condition and stale data
> - [Medium] Unvalidated sysfs input and hwmon ABI violation for temp_offset
> - [Low] Mismatch between commit message and implemented voltage channels
> --
> 
> commit 226d2b003d7fbabad9b8abaf9c71cf6fdf386a91
> Author: Huan He <hehuan1@eswincomputing.com>
> 
> hwmon: Add Eswin EIC7700 PVT sensor driver
>     
> Add support for ESWIN EIC7700 Voltage and Temperature sensor. The driver
> supports temperature and voltage monitoring with polynomial conversion,
> and provides sysfs interface for sensor data access.
>     
> The PVT IP contains one temperature sensor and four voltage sensors for
> process variation monitoring.
> 
> > The PVT IP contains one temperature sensor and four voltage sensors for
> > process variation monitoring.
> 
> This isn't a bug, but the commit message states there are four voltage
> sensors, while the driver implementation only exposes a single voltage channel.
> Could the commit message be updated to match the implementation?

I will update the commit message accordingly.

The current driver only exposes one temperature channel and one voltage
channel, so the commit text will be changed to match the implementation.

> 
> > diff --git a/drivers/hwmon/eic7700-pvt.c b/drivers/hwmon/eic7700-pvt.c
> > new file mode 100644
> > index 0000000000000..d9d6a035defeb
> > --- /dev/null
> > +++ b/drivers/hwmon/eic7700-pvt.c
> [ ... ]
> > +static irqreturn_t eic7700_pvt_hard_isr(int irq, void *data)
> > +{
> > +	struct pvt_hwmon *pvt = data;
> > +	u32 val;
> > +
> > +	eic7700_pvt_update(pvt->regs + PVT_INT, PVT_INT_CLR, PVT_INT_CLR);
> > +	/*
> > +	 * Read the data, update the cache and notify a waiter of this event.
> > +	 */
> > +	val = readl(pvt->regs + PVT_DATA);
> 
> Could this cause a bus fault if a spurious or delayed interrupt fires while
> the device is in the RPM_SUSPENDED state?
> 
> When the device is suspended, the hardware clock is gated. Accessing registers
> without checking the PM runtime state could lead to a crash. Should this use
> pm_runtime_get_if_active() to protect against accessing unclocked hardware?

Yes, this is a valid concern.

If a delayed or spurious interrupt fires while the device is
runtime-suspended, the handler may otherwise touch registers after the
clock has been gated.

I will update the ISR to guard register accesses with
pm_runtime_get_if_active() and only access the hardware when the device is
runtime active. The temporary PM reference will be dropped with
pm_runtime_put_noidle() on the handled path.

The ISR will be changed along these lines:
+   int active;
+   if (IS_ENABLED(CONFIG_PM)) {
+       active = pm_runtime_get_if_active(pvt->dev);
+       if (active <= 0)
+           return IRQ_NONE;
+   }

    ... access registers ...

+   if (IS_ENABLED(CONFIG_PM))
+       pm_runtime_put_noidle(pvt->dev);

> 
> [ ... ]
> > +static int eic7700_pvt_read_data(struct pvt_hwmon *pvt,
> > +				 enum pvt_sensor_type type, long *val)
> > +{
> > +	unsigned long timeout;
> > +	u32 data;
> > +	int ret;
> > +
> > +	/*
> > +	 * Wait for PVT conversion to complete and update the data cache. The
> > +	 * data read procedure is following: set the requested PVT sensor mode,
> > +	 * enable conversion, wait until conversion is finished, then disable
> > +	 * conversion and IRQ, and read the cached data.
> > +	 */
> > +	reinit_completion(&pvt->conversion);
> > +
> > +	eic7700_pvt_set_mode(pvt, pvt_info[type].mode);
> > +	eic7700_pvt_update(pvt->regs + PVT_ENA, PVT_ENA_EN, PVT_ENA_EN);
> > +
> > +	/*
> > +	 * Wait with timeout since in case if the sensor is suddenly powered
> > +	 * down the request won't be completed and the caller will hang up on
> > +	 * this procedure until the power is back up again. Multiply the
> > +	 * timeout by the factor of two to prevent a false timeout.
> > +	 */
> > +	timeout = 2 * usecs_to_jiffies(ktime_to_us(pvt->timeout));
> > +	ret = wait_for_completion_timeout(&pvt->conversion, timeout);
> > +
> > +	eic7700_pvt_update(pvt->regs + PVT_ENA, PVT_ENA_EN, 0);
> > +	eic7700_pvt_update(pvt->regs + PVT_INT, PVT_INT_CLR, PVT_INT_CLR);
> > +
> > +	data = READ_ONCE(pvt->data_cache);
> > +
> > +	if (!ret)
> > +		return -ETIMEDOUT;
> 
> If the timeout expires, could an interrupt still be in flight?
> 
> If so, a subsequent read request would call reinit_completion(), and the
> delayed interrupt handler could execute, complete the new request prematurely,
> and cause the read to return stale data.
> 
> Does the timeout path need to call synchronize_irq() to ensure no interrupt
> handler is running before returning?

If a timeout occurs, a delayed interrupt may still be in flight and could
complete the next conversion request after reinit_completion().

I will add synchronize_irq() on the timeout path so that any in-flight
handler has completed before returning.

The eic7700_pvt_read_data() function will be changed along these lines:

    eic7700_pvt_update(pvt->regs + PVT_INT, PVT_INT_CLR, PVT_INT_CLR);

+   if (!ret)
+       synchronize_irq(pvt->irq);
+
    data = READ_ONCE(pvt->data_cache);

> 
> [ ... ]
> > +static int eic7700_pvt_write_trim(struct pvt_hwmon *pvt, long val)
> > +{
> > +	/*
> > +	 * Update PVT trim register safely while the controller is temporarily
> > +	 * disabled.
> > +	 */
> > +	eic7700_pvt_set_trim(pvt, val);
> > +
> > +	return 0;
> > +}
> 
> Does this code validate the user input before writing it to the hardware?
> 
> Writing a negative value could truncate to 0xFFFFFFFF and overwrite reserved
> bits in the register. Additionally, the hwmon ABI specifies that temp_offset
> should be in millidegrees Celsius, but this writes the raw value directly to
> the register without scaling.

Agreed.

The current temp_offset implementation exposes the raw PVT trim register
directly, which does not match the hwmon ABI requirement for millidegree
Celsius units.

I will remove the temp_offset attribute in the next revision instead of
exposing an ABI-incompatible raw hardware value.

> 
> [ ... ]
> > +static void eic7700_pvt_disable_pm_runtime(void *data)
> > +{
> > +	struct pvt_hwmon *pvt = data;
> > +
> > +	pm_runtime_dont_use_autosuspend(pvt->dev);
> > +	pm_runtime_disable(pvt->dev);
> > +}
> 
> Can this leak the clock reference and leave the hardware powered on if the
> device is unbound?
> 
> Because pm_runtime_put_autosuspend() is used after reads, the device remains
> active for a 3-second window. If the module is removed during this window,
> pm_runtime_disable() will cancel the pending suspend but it does not suspend
> a device that is currently RPM_ACTIVE.
> 
> Should this cleanup action explicitly suspend the device if it is active?

pm_runtime_disable() alone does not guarantee that an RPM_ACTIVE device
is runtime-suspended during driver unbind, so the clock may remain enabled
if the driver is removed inside the autosuspend window.

I will fix this by forcing the device into runtime suspend before
disabling runtime PM in the cleanup path.

The eic7700_pvt_disable_pm_runtime() function will be changed along these
lines:

    pm_runtime_dont_use_autosuspend(pvt->dev);
+   pm_runtime_force_suspend(pvt->dev);
    pm_runtime_disable(pvt->dev);

> 
> [ ... ]
> > +static int eic7700_pvt_probe(struct platform_device *pdev)
> > +{
> [ ... ]
> > +	ret = eic7700_pvt_init_iface(pvt);
> > +	if (ret) {
> > +		clk_disable_unprepare(pvt->clk);
> > +		return ret;
> > +	}
> > +
> > +	clk_disable_unprepare(pvt->clk);
> 
> Is this driver intended to be built when CONFIG_PM is disabled?
> 
> If CONFIG_PM is disabled, pm_runtime_get_sync() in the hwmon read function is
> a no-op that returns 1. Because the clock is explicitly disabled at the end of
> probe() and relies entirely on PM runtime to re-enable it, a read operation
> will access the hardware registers while the clock is gated, leading to an
> external abort or system hang.

With CONFIG_PM disabled, pm_runtime_get_sync() becomes a stub and the
runtime PM callbacks are never invoked, while probe() currently gates the
clock at the end. As a result, later hwmon register accesses may touch the
hardware with the clock disabled.

I will fix this by keeping the clock enabled permanently when CONFIG_PM=n
and gating it only during driver cleanup.

The eic7700_pvt_probe() function will be changed along these lines:

ret = devm_add_action_or_reset(dev, eic7700_pvt_disable_pm_runtime, pvt);
if (ret)
    return ret;

pm_runtime_put_autosuspend(dev);

+   if (!IS_ENABLED(CONFIG_PM)) {
+       ret = clk_prepare_enable(pvt->clk);
+       if (ret)
+           return dev_err_probe(&pdev->dev, ret,
+                        "Failed to enable clock\n");
+   }

In eic7700_pvt_disable_pm_runtime:

    pm_runtime_disable(pvt->dev);
+   if (!IS_ENABLED(CONFIG_PM))
+       clk_disable_unprepare(pvt->clk);

> 
> Also, eic7700_pvt_suspend() checks !pm_runtime_status_suspended(dev), which
> is always false when CONFIG_PM=n, causing it to erroneously call
> clk_disable_unprepare() on an already disabled clock.

This part is not applicable with CONFIG_PM=n.

The driver uses pm_ptr(&eic7700_pvt_pm_ops), so with CONFIG_PM disabled
the PM callback pointer is NULL and eic7700_pvt_suspend() is not executed
by the PM core. Therefore there is no real double clk_disable_unprepare()
from that path.

> 
> [ ... ]
> > +static int __maybe_unused eic7700_pvt_suspend(struct device *dev)
> > +{
> > +	int ret = 0;
> > +
> > +	if (!pm_runtime_status_suspended(dev)) {
> > +		ret = eic7700_pvt_runtime_suspend(dev);
> > +		if (ret) {
> > +			dev_err(dev, "Failed to suspend: %d\n", ret);
> > +			return ret;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> 
> Does calling eic7700_pvt_runtime_suspend() directly leave the PM runtime
> state desynchronized?
> 
> This manually gates the clock if the device is active, but it fails to update
> the device's PM runtime status. The PM core still considers the device to be
> RPM_ACTIVE.
> 
> If a kernel thread calls pm_runtime_get_sync() before late suspend, it will
> see the state as active, return immediately without re-enabling the clock,
> and crash when accessing registers.
> 
> Should this use pm_runtime_force_suspend() and pm_runtime_force_resume() to
> safely maintain state consistency?

Calling the driver's runtime suspend/resume callbacks directly from the
system sleep callbacks does not update the PM core state, so the runtime
PM status can become inconsistent with the actual clock state.

I will switch the system sleep callbacks to pm_runtime_force_suspend() and
pm_runtime_force_resume() in the next revision.

eic7700_pvt_pm_ops will be changed along these lines:

-   SYSTEM_SLEEP_PM_OPS(eic7700_pvt_suspend, eic7700_pvt_resume)
+   SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)

Best regards,
Huan He

  reply	other threads:[~2026-05-12  9:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  6:41 [PATCH v4 0/2] Add driver support for ESWIN EIC7700 PVT controller hehuan1
2026-04-30  6:44 ` [PATCH v4 1/2] dt-bindings: hwmon: Add Eswin EIC7700 PVT sensor hehuan1
2026-04-30  7:00   ` sashiko-bot
2026-05-12  9:14     ` Huan He
2026-05-03 12:03   ` Krzysztof Kozlowski
2026-05-12  9:16     ` Huan He
2026-05-12 14:26       ` Guenter Roeck
2026-04-30  6:45 ` [PATCH v4 2/2] hwmon: Add Eswin EIC7700 PVT sensor driver hehuan1
2026-04-30  8:08   ` sashiko-bot
2026-05-12  9:45     ` Huan He
2026-05-12  9:51       ` Huan He [this message]
2026-04-30 20:24 ` [PATCH v4 0/2] Add driver support for ESWIN EIC7700 PVT controller Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1526048b.617a.19e1b98ef15.Coremail.hehuan1@eswincomputing.com \
    --to=hehuan1@eswincomputing.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=krzk+dt@kernel.org \
    --cc=linmin@eswincomputing.com \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=luyulin@eswincomputing.com \
    --cc=ningyu@eswincomputing.com \
    --cc=p.zabel@pengutronix.de \
    --cc=pinkesh.vaghela@einfochips.com \
    --cc=robh@kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox