Re: [PATCH] phy: renesas: rcar-gen3-usb2: Avoid long delay in atomic context - sashiko-bot

From: sashiko-bot@kernel.org
To: "Claudiu Beznea" <claudiu.beznea@kernel.org>
Cc: vkoul@kernel.org, neil.armstrong@linaro.org,
	wsa+renesas@sang-engineering.com, linux-phy@lists.infradead.org,
	olteanv@gmail.com
Subject: Re: [PATCH] phy: renesas: rcar-gen3-usb2: Avoid long delay in atomic context
Date: Thu, 14 May 2026 18:52:02 +0000	[thread overview]
Message-ID: <20260514185203.2154EC2BCB3@smtp.kernel.org> (raw)
In-Reply-To: <20260514111300.2152386-1-claudiu.beznea@kernel.org>

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] PM runtime reference leak in interrupt handler due to goto mixing.
- [High] Unconditional calls to `disable_irq_nosync()` with IRQ 0.
- [High] Disabling a shared IRQ line globally starves other devices.
- [High] Time-of-Check to Time-of-Use (TOCTOU) race condition in PHY power control.
- [Medium] TOCTOU race condition in `role_store` causes spurious `-EIO` errors.
- [Medium] Tight completion timeout causes spurious PHY initialization failures.
- [Medium] Spurious kernel warnings generated by valid shared interrupts.
--

commit 856d28e68a5bcfa08fa03aa685cb15772197dedc
Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
phy: renesas: rcar-gen3-usb2: Avoid long delay in atomic context

This commit releases the spin lock before sleeping for 20 ms during
OTG PHY initialization and introduces completion mechanisms to prevent
other threads from configuring the hardware concurrently. It also disables
IRQs while waiting to prevent interrupt handlers from applying settings.

Closes: https://lore.kernel.org/all/afhkX2Ys2BG1gnqy@duo.ucw.cz
Closes: https://lore.kernel.org/all/afhkX2Ys2BG1gnqy@duo.ucw.cz

> diff --git a/drivers/phy/renesas/phy-rcar-gen3-usb2.c b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
> index 79e820e2fe55d..5a18bbd87c3b9 100644
> --- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c
> +++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
[ ... ]
> @@ -388,32 +392,68 @@ static bool rcar_gen3_are_all_rphys_power_off(struct rcar_gen3_chan *ch)
>  	return true;
>  }
>  
> +static int rcar_gen3_phy_wait_otg_init(struct rcar_gen3_chan *channel,
> +				       unsigned long *flags)
> +{
> +	unsigned long timeout = msecs_to_jiffies(25);

Could this completion timeout be too tight?

The initializing thread uses fsleep(20000) which guarantees a minimum sleep
of 20ms. Due to scheduler jitter or timer wheel granularity (especially if
HZ=100, where 25ms rounds to just 2 or 3 jiffies), the 20ms sleep can easily
exceed 25ms of wall-clock time, causing this to spuriously return -ETIMEDOUT.

> +	unsigned long ret = 1;
> +
> +	lockdep_assert_held(&channel->lock);
[ ... ]
> @@ -427,7 +467,10 @@ static ssize_t role_store(struct device *dev, struct device_attribute *attr,
>  			  const char *buf, size_t count)
>  {
>  	struct rcar_gen3_chan *ch = dev_get_drvdata(dev);
>  	bool is_b_device;
>  	enum phy_mode cur_mode, new_mode;
> +	unsigned long flags;
> +	int ret = -EIO;
>  
> -	guard(spinlock_irqsave)(&ch->lock);
> +	spin_lock_irqsave(&ch->lock, flags);
>  
>  	if (!ch->is_otg_channel || !rcar_gen3_is_any_otg_rphy_initialized(ch))
> -		return -EIO;
> +		goto unlock;
> +
> +	ret = rcar_gen3_phy_wait_otg_init(ch, &flags);

Does checking the initialized state before waiting for the lock to be
released introduce a race condition?

Because rcar_gen3_phy_usb2_init() only sets rphy->initialized = true after
it finishes its 20ms sleep and reacquires the lock, a concurrent sysfs write
to role_store() during this 20ms window will see initialized as false and
immediately return -EIO instead of waiting for initialization to complete.

> +	if (ret)
> +		goto unlock;
[ ... ]
> @@ -512,6 +556,11 @@ static irqreturn_t rcar_gen3_phy_usb2_irq(int irq, void *_ch)
>  		goto rpm_put;
>  
>  	scoped_guard(spinlock, &ch->lock) {
> +		if (READ_ONCE(ch->otg_initializing)) {
> +			dev_warn(dev, "%s: Got IRQ while waiting for OTG init!\n", __func__);
> +			return IRQ_NONE;
> +		}

Does returning early here leak the PM runtime reference?

The interrupt handler calls pm_runtime_get_noresume(dev) at the beginning of
the function, and releases it at the rpm_put label. By using return IRQ_NONE
inside the scoped_guard, the code bypasses pm_runtime_put_noidle() completely,
preventing the device from ever entering runtime suspend.

Additionally, since this interrupt is requested with IRQF_SHARED, could
another device sharing the line trigger a valid interrupt during the 20ms
OTG initialization window? If so, this would print a spurious warning to
the kernel log for a valid shared interrupt event.

> +
>  		status = readl(usb2_base + USB2_OBINTSTA);
>  		if (status & ch->phy_data->obint_enable_bits) {
>  			dev_vdbg(dev, "%s: %08x\n", __func__, status);
[ ... ]
> @@ -550,8 +605,22 @@ static int rcar_gen3_phy_usb2_init(struct phy *p)
>  	}
>  
>  	/* Initialize otg part (only if we initialize a PHY with IRQs). */
> -	if (rphy->int_enable_bits)
> -		rcar_gen3_init_otg(channel);
> +	if (rphy->int_enable_bits && channel->is_otg_channel &&
> +	    !rcar_gen3_is_any_otg_rphy_initialized(channel)) {
> +		rcar_gen3_init_otg_phase0(channel);
> +		disable_irq_nosync(channel->irq);

Could channel->irq be 0 here?

In rcar_gen3_phy_usb2_probe(), platform_get_irq_optional() is used. If it
returns -ENXIO, channel->irq remains 0. Since rphy->int_enable_bits is a
statically defined non-zero value for OTG PHYs, this would pass 0 to
disable_irq_nosync(), which manipulates critical system interrupts.

Also, does calling disable_irq_nosync() on a shared interrupt line starve
other devices?

Since the interrupt is requested with IRQF_SHARED, masking it at the
interrupt controller level means all other unrelated devices on the same
physical line will have their interrupts disabled for the full 20ms
duration. Should shared interrupts be masked locally at the device level
instead?

> +		reinit_completion(&channel->otg_init_done);
> +		WRITE_ONCE(channel->otg_initializing, true);
> +		spin_unlock_irqrestore(&channel->lock, flags);
[ ... ]
> @@ -598,6 +678,7 @@ static int rcar_gen3_phy_usb2_power_on(struct phy *p)
>  	struct rcar_gen3_phy *rphy = phy_get_drvdata(p);
>  	struct rcar_gen3_chan *channel = rphy->ch;
>  	void __iomem *usb2_base = channel->base;
> +	unsigned long flags;
>  	u32 val;
>  	int ret = 0;
>  
> @@ -607,11 +688,15 @@ static int rcar_gen3_phy_usb2_power_on(struct phy *p)
>  			return ret;
>  	}
>  
> -	guard(spinlock_irqsave)(&channel->lock);
> +	spin_lock_irqsave(&channel->lock, flags);
>  
>  	if (!rcar_gen3_are_all_rphys_power_off(channel))
>  		goto out;
>  
> +	ret = rcar_gen3_phy_wait_otg_init(channel, &flags);
> +	if (ret)
> +		goto unlock;
> +
>  	val = readl(usb2_base + USB2_USBCTR);
>  	val |= USB2_USBCTR_PLL_RST;

Does dropping the lock in rcar_gen3_phy_wait_otg_init() introduce a race
condition with the !rcar_gen3_are_all_rphys_power_off() check?

If two threads concurrently power on different PHYs, both will evaluate the
condition as false because neither has reached rphy->powered = true at the
end of the function. Both threads might then sleep in
rcar_gen3_phy_wait_otg_init().

Upon waking and reacquiring the lock, both will unconditionally apply
USB2_USBCTR_PLL_RST, which could catastrophically disrupt the PHY that was
just initialized by the first thread.

Should the power off condition be re-evaluated after the lock is reacquired?

>  	writel(val, usb2_base + USB2_USBCTR);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514111300.2152386-1-claudiu.beznea@kernel.org?part=1

-- 
linux-phy mailing list
linux-phy@lists.infradead.org
https://lists.infradead.org/mailman/listinfo/linux-phy