Re: [PATCH v3] phy: renesas: rcar-gen3-usb2: Avoid long delay in atomic context - sashiko-bot

From: sashiko-bot@kernel.org
To: "Claudiu Beznea" <claudiu.beznea@kernel.org>
Cc: linux-phy@lists.infradead.org, vkoul@kernel.org,
	neil.armstrong@linaro.org, olteanv@gmail.com,
	wsa+renesas@sang-engineering.com
Subject: Re: [PATCH v3] phy: renesas: rcar-gen3-usb2: Avoid long delay in atomic context
Date: Thu, 28 May 2026 07:42:38 +0000	[thread overview]
Message-ID: <20260528074239.06B9A1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260528070826.478813-1-claudiu.beznea@kernel.org>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] Threads waiting for the OTG initialization completion use an `if` statement instead of a `while` loop, causing a Time-of-Check to Time-of-Use (TOCTOU) vulnerability.
- [High] Returning `IRQ_NONE` while `otg_initializing` is true without correctly masking all interrupt sources causes an IRQ storm, permanently disabling the shared interrupt line.
- [Medium] A tight 40ms timeout is used to wait for an OTG initialization sequence that includes a 20ms sleep, which will lead to spurious timeouts under system load.

Pre-existing issues:
- [High] The IRQ handler uses `pm_runtime_get_noresume()` and `pm_runtime_suspended()` to guard hardware access, which is an anti-pattern and vulnerable to race conditions.
- [High] The driver schedules an asynchronous work item (`ch->work`) but fails to cancel it during device removal, leading to a potential Use-After-Free of the `channel` structure.
- [High] The sysfs `role_show` and `role_store` callbacks access hardware registers without PM runtime protection.
--

commit 9f26dd26c4ac57d5b07ac336f8309d15b8bde616
Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
phy: renesas: rcar-gen3-usb2: Avoid long delay in atomic context

This commit fixes a long delay in an atomic context by releasing the spin
lock before sleeping for 20 ms during OTG PHY initialization. It introduces
an otg_initializing variable and completion to prevent other threads from
configuring the hardware concurrently.

Link: https://lore.kernel.org/all/afhkX2Ys2BG1gnqy@duo.ucw.cz

> diff --git a/drivers/phy/renesas/phy-rcar-gen3-usb2.c b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
> index 79e820e2fe55d..24f9ef908768d 100644
> --- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c
> +++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
> @@ -138,12 +138,20 @@ struct rcar_gen3_chan {
>  	struct rcar_gen3_phy rphys[NUM_OF_PHYS];
>  	struct regulator *vbus;
>  	struct work_struct work;

[Severity: High]
This is a pre-existing issue, but does this code properly clean up the
asynchronous work item during device removal?

The driver initializes work and schedules it to handle state changes, but
there doesn't appear to be a call to cancel_work_sync() during unbind.
Since the channel structure is allocated with devm_kzalloc, it will be
freed automatically after remove() returns.

Could the workqueue execute rcar_gen3_phy_usb2_work() after the memory has
been freed if the work is scheduled right before or during unbind?

> +	struct completion otg_init_done;
>  	spinlock_t lock;	/* protects access to hardware and driver data structure. */

[ ... ]

> @@ -394,26 +402,52 @@ static ssize_t role_store(struct device *dev, struct device_attribute *attr,
>  	struct rcar_gen3_chan *ch = dev_get_drvdata(dev);
>  	bool is_b_device;
>  	enum phy_mode cur_mode, new_mode;
> +	unsigned long flags;
> +	int ret = -EIO;

[Severity: High]
This is a pre-existing issue, but is it safe to access hardware registers
in role_show and role_store without PM runtime protection?

These functions are exposed via sysfs and read/write hardware registers.
If the device has been auto-suspended and its clocks are gated, accessing
these registers from userspace might result in a synchronous external abort.

Should these sysfs callbacks call pm_runtime_get_sync() before accessing
the hardware?

> -	guard(spinlock_irqsave)(&ch->lock);
> +	spin_lock_irqsave(&ch->lock, flags);
>  
> -	if (!ch->is_otg_channel || !rcar_gen3_is_any_otg_rphy_initialized(ch))
> -		return -EIO;
> +	if (!ch->is_otg_channel)
> +		goto unlock;
> +
> +	if (ch->otg_initializing) {
> +		unsigned long timeout = msecs_to_jiffies(40);

[Severity: Medium]
Will this 40ms timeout trigger spuriously under system load?

The initialization thread performs a 20ms sleep via fsleep(20000). On
systems with low timer resolution (like HZ=100) or under heavy CPU load,
the actual sleep duration and scheduler delays could exceed 40ms.

Would this cause PHY operations like init, power_on, or role_store to
falsely fail with -ETIMEDOUT and break USB functionality?

> +
> +		spin_unlock_irqrestore(&ch->lock, flags);
>  
> -	if (sysfs_streq(buf, "host"))
> +		ret = wait_for_completion_timeout(&ch->otg_init_done, timeout);
> +		ret = ret ? 0 : -ETIMEDOUT;
> +		if (ret)
> +			goto exit;
> +
> +		spin_lock_irqsave(&ch->lock, flags);
> +	}

[Severity: High]
Does using an if statement here instead of a while loop introduce a
Time-of-Check to Time-of-Use race condition?

When a thread detects otg_initializing is true, it drops the spinlock and
waits on the completion. Upon waking up, it re-acquires the spinlock and
proceeds without re-evaluating otg_initializing.

Because multiple PHYs can be sequentially initialized, could another thread
start a new initialization cycle and set otg_initializing back to true in
the window between the wakeup and lock re-acquisition?

[ ... ]

> @@ -504,6 +542,7 @@ static irqreturn_t rcar_gen3_phy_usb2_irq(int irq, void *_ch)
>  	void __iomem *usb2_base = ch->base;
>  	struct device *dev = ch->dev;
>  	irqreturn_t ret = IRQ_NONE;
> +	unsigned long flags;
>  	u32 status;
>  
>  	pm_runtime_get_noresume(dev);
> @@ -511,33 +550,85 @@ static irqreturn_t rcar_gen3_phy_usb2_irq(int irq, void *_ch)
>  	if (pm_runtime_suspended(dev))
>  		goto rpm_put;

[Severity: High]
This is a pre-existing issue, but does using pm_runtime_get_noresume() and
pm_runtime_suspended() to guard hardware access expose this handler to
race conditions?

According to the PM subsystem guidelines, pm_runtime_get_noresume() only
increments the usage counter without verifying the actual runtime status.
Checking pm_runtime_suspended() immediately after is racy because the device
might already be transitioning to a suspended state.

On a shared IRQ line, if the interrupt fires while this device is suspended,
will the handler proceed and access powered-off registers, causing a
synchronous external abort? Should the handler use
pm_runtime_get_if_active() instead?

> -	scoped_guard(spinlock, &ch->lock) {
> -		status = readl(usb2_base + USB2_OBINTSTA);
> -		if (status & ch->phy_data->obint_enable_bits) {
> -			dev_vdbg(dev, "%s: %08x\n", __func__, status);
> -			if (ch->phy_data->vblvl_ctrl)
> -				writel(USB2_OBINTSTA_CLEAR, usb2_base + USB2_OBINTSTA);
> -			else
> -				writel(ch->phy_data->obint_enable_bits, usb2_base + USB2_OBINTSTA);
> -			rcar_gen3_device_recognition(ch);
> -			rcar_gen3_configure_vblvl_ctrl(ch);
> -			ret = IRQ_HANDLED;
> -		}
> +	spin_lock_irqsave(&ch->lock, flags);
> +
> +	if (ch->otg_initializing) {
> +		ret = IRQ_NONE;
> +		goto unlock;
> +	}

[Severity: High]
Could returning IRQ_NONE here while otg_initializing is true cause an
IRQ storm?

To prevent IRQs during the 20ms sleep, rcar_gen3_phy_usb2_irqs_mask_all()
is called, but it only clears bits in USB2_INT_ENABLE and ignores the
OTG-specific interrupts in USB2_OBINTEN.

If USB2_OBINTEN remains enabled from a previous initialization cycle, and
an OTG event occurs during the 20ms sleep, the level-triggered IRQ will be
asserted. By repeatedly returning IRQ_NONE without clearing USB2_OBINTSTA,
will this cause an immediate IRQ storm that forces the kernel to disable
the shared IRQ line?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528070826.478813-1-claudiu.beznea@kernel.org?part=1

-- 
linux-phy mailing list
linux-phy@lists.infradead.org
https://lists.infradead.org/mailman/listinfo/linux-phy